OCR requests

Okay, I’ve already paid for my DTP Office upgrade (even though I spent $99 for IRIS-11 less than six months ago, but c’est la vie).

I’m thrilled that you’re embracing OCR, but have some suggestions and requests:

  1. It should be easier to OCR scanned PDFs that are already in a DTP database. As it is, I have to export the document to the desktop and re-import it, resulting in duplicate copies inside DTPO. That seems awkward and unnecessary… why not allow OCR inside the program? (Or does that violate your IRIS license?)

  2. OCR-ing must re-rasterize the document, causing it to balloon in size. Given a sample document treated as above, the size went from 176K to 1800K… a tenfold increase! This will be good for external hard disk retailers, but no one else… Any way to revert to the original resolution once OCR is done?

  3. Please please please… post a script or a utility or something to do “batch mode” OCR-ing in the background. It’s CPU-intensive (even on a Core Duo MacBook Pro) and I have hundreds of documents to process. I want DTPO to harvest idle CPU cycles as a background task, turning all those scanned PDFs into PDF+Texts, but I don’t want to babysit it in the process.

Thanks for adding the core functionality,and I hope to see some tweaks like this in the final version…

Quick and short answers:

  1. Still a work in progress as it depends on some external factors. Also, we have a 50 page limit for OCR, so it may not even work on bigger documents.

  2. There is nothing we can do about it at the moment, we hope to improve this in a future release.

  3. You can batch these, just go to the Preferences -> OCR and deselect the “Set PDF Attributes” (see also the online documentation).

Yes, I also thought, having been using Devonthink Office for the past week, that it would be much easier if you could just import your basic scanned pdf’s from the scansnap s500m into DToffice automatically which would then ocr the documents during the machines idle time, say overnight. (as to do it during the day just means the machine cant do anything else.)

As it is, I’ve decided the best solution at the moment is to skip the ocr in DToffice and just scan to a set file on my HD, batch convert in acrobat professional overnight (which you can then also compress the documents even further after the ocr process is done) and then import to DToffice the following working day.

My ideal would be for all of this to be done in DTO as mentioned above - all i can hope for is that some of these users suggestions may possibily be included in the final application (please, please, please!!!)

Your overnight solution makes sense for a desktop setup.

Doesn’t work for me, since I use a laptop, and it’s either (1) running, with me sitting in front of it and unwilling to share CPU cycles, or (2) asleep, tucked into my briefcase.

I’m crossing my fingers to see what Devon comes up with for the release version. (But, as stated up-thread, I’ve already paid for my DTP Office upgrade, and I hope others here do the same!)