ability to ocr existing pdf

erico · December 25, 2006, 4:06am

Hello Christian,

Over the holidays I’m testing dtpro office and am finding that what I’d really like is a simple way to ocr existing pdf files that do not have the OCR/text layer in them. I download these all the time for my research, and would really like to not have to save them out to the finder and ocr them in a separate program. It seems like this is one of the things that DTPRO office should do. If there could be a script command (“ocr record”), I would be especially grateful. I seriously do this ten times a day, and I was hoping that DTPRO office would do this.

best,

Erico

Bob_Sprague · December 26, 2006, 5:52pm

I have the same situation with inter-library loan material that only comes scanned as an image (most academic databases I use seem to have the text as meta-data in te PDF already). I download to a folder in the finder then use File>Import>Images (OCR). This command performs the OCR and places the files in DTPO. After import trash the files in the finder.

One of the issues you will find though is file size increases dramaticly. I hope this impoves over time.

If you save your PDFs in Bookends (indexing the attachment folder to DTPO) you have an additional step of exporting the files out of DTPO. I’m sure there is a better way… if there is… someone will let us know!

-bob