printing an electronic booklet from DTPO

twicks · December 31, 2009, 4:52am

[attempting to re-create my earlier post that the forum software hosed because it thought I hadn’t logged in.]

As many regulars know, one of my major projects with DTPO has been the scanning/OCR/uploading text to a web version of a small magazine that I published in the 80s.

Some trouble-making visitor recently suggested that I collate all the scanned images and accompanying plain text into some kind of electronic form like a PDF that wouldn’t need DTPO for things like search. Of course this person had no suggestions on how to do this or what means could be used.

One main problem is that each issue’s scans take up about 50mb before they’ve been ingested and processed by DTPO. I have some 70 issues, which conceivably could make the file size of just the unprocessed scanned images 3.5 gigs.

Has anyone taken their DTPO data and created a standalone searchable PDF for distribution?

Any suggestions on how that might work?

Greg_Jones · December 31, 2009, 11:34am

I have not done what you are contemplating, but I will share what I have done for my own use and that may help inform your thinking. I too have some scanned books and magazines that are similar in size to what you have. I found that the resources required to OCR, store, and search these documents outweighed the benefits of having the entire document searchable.

What I did was re-create the table of contents and in some cases I recreated the index and/or added my annotation of the document. I then edited the original PDF document, replacing the non-searchable pages with the ones I created. Now I can search for the big picture concepts in the documents without all the overhead. I also made the table of contents entries links to the specific pages in the document, but naturally these links would not work outside of DEVONthink.