DT 3.0beta4 - search pdf OCR engine

I ocr pdfs before they get into DT so I’ve only used the built-in OCR engine a couple of times, for processing pdfs that have slipped through my workflow. To ocr, I use Acrobat Pro X1. Recently using an Acrobat Action I have customised, the process resulted in the text in a pdf (the pre-ocr pdf clearly legible) becoming hard to read. I looked at the Optimization settings for the Action and on-line read about not selecting Apply Adaptive Compression and the difference between Lossy and Lossless.

Which got me thinking that if I were to start using DT’s OCR engine more often whether I should be able to adjust its settings to Lossless and deselect Apply Adaptive Compression or equivalent. Does DT’s OCR engine use a default setting regardless or is it user customisable?

It’s only possible to disable the compression, see Preferences > OCR.

Noted, thank you. As I write this, i am using DTPO OCR processing 141 pages. I attach screenshot of my Preferences. Could my Preferences be improved upon?

These are the recommended default settings of DEVONthink Pro Office 2. In version 3 there’s the new option to disable the compression, everything else (quality, DPI) is handled automatically.

Thank you.

The resultant pdf-OCRd has blanked-out - and inverted some diagrams -a few pages but otherwise the result is better that I got using Acrobat Pro: less tiring on my eyes too.

I am now using DT3.0beta4 to ocr the same pre-ocr pdf. Nothing blanked-out and diagrams ok. Wow! vast improvement in engine speed - at least it seems so, just 9 minutes. As I didn’t make a note of how long it took using DTPO the first time, am doing it again using DTPO: start 15:32, finish? Must stop now, but 9 minutes have elapsed and Activity shows recognising page 59 (of 141 pages) so presumably twice as fast.

1 Like

What kind of compression is referred to in this? Rejpeging, something else? I checked the manual and it didn’t give me enough information to determine whether I would want this on or off.

This only applies when adding metadata or transferring annotations from an original pdf after OCR. In this case we need to re-save the changed file and this preference option refers to whether we apply a compression to the saved file or not.

I see the new options non DTP3—but what are the “automatic” compression settings and are they not adjustable anywhere?

It was great being able to reduce incoming scans to 150 dpi after OCR—I would have my ScanSnap save a higher-dpi version in an unsorted folder in case I ever lost my DTP database, then send it to DTPO2 for OCR and compression.

Can I really not do that anymore?

That is correct. There are no user-defined settings for the OCR now, outisde these shown…