bell notificationshomepageloginNewPostedit profile

Topic : How should we go from Stack Exchange Q/A to publishable PDF with the least hassle? Over on another site we're talking about taking some of our content (on a particular theme) and re-packaging - selfpublishingguru.com

10.02% popularity

Over on another site we're talking about taking some of our content (on a particular theme) and re-packaging it as a printable PDF. (The primary use case is paper.) This wouldn't be a straight dump of the original posts; sometimes you want to edit some for a different audience, links don't work, and so on. We're currently thinking about using meta posts to facilitate this editing (so we can crowd-source that part of the work).

My question is: what's the best way to get from those posts to the final product, preserving as much formatting as possible so we don't have to re-do it? One could work with the Markdown (are there translators for that to other formats?), or with the generated HTML (the actual web page). Or one could cut/paste into one's favorite document-creation tool, which sounds like an unfortunate choice because it's labor-intensive and the formatting wouldn't follow. An additional consideration is that some of our content is in Hebrew (so non-ASCII).

I realize that I'm treading dangerously close to "too localized", but it seems like the same techniques that are used for wikis and blogs might apply here too.


Load Full (2)

Login to follow topic

More posts by @Hamaas631

2 Comments

Sorted by latest first Latest Oldest Best

10% popularity

If you only want to convert a handful of pages into PDF, then you can do that in Microsoft Word and you will probably be ok.

If you want to convert a large quantity of webpages into PDFs and wish to preserve their edibility and eliminate unnecessary information, I am going to strongly suggest the following:

Export the webpage with the source information as HTML. Open the saved page in Adobe Dreamweaver (or similar) and make all the changes to the text and page layout in HTML and then save the new content again as HTML. When all final changes have been made, then create the PDF from the HTML in Adobe Acrobat (or similar).

Why this and not Word? Two reasons.

One, I find that Word tends to get mucked up when you've cut and paste from the web. Things tend to flow incorrectly and un-mucking it up tends to be quite frustrating. You experience may vary.

Two, if you wish to be forward thinking and want to eventually create an ePub or a Kindle book or what have you I have found in my experiences that you get better results when you create your e-book from HTML as opposed to MS Word or even PDFs.

If you want to be forward thinking it's better to handle your product once through HTML editing than to handle it twice through Word.


Load Full (0)

10% popularity

We have learned through experimentation that a new-enough version of Microsoft Word (we tested with 2010) supports format-preserving cut-and-paste from Stack Exchange posts. We drew up some formatting guidelines to get the content into shape (e.g. de-linkifying, since this is for paper). This still involves manually cutting and pasting from the browser into some other program (e.g. Word), but it turns out we don't need to extract the source HTML or markdown to work with after all, so for a small project we can live with that.


Load Full (0)

Back to top