Complex PDF with multi-columnar text and images: DevonThink OCR outperforms all other OCR apps including the best rated

rufus123 · March 25, 2024, 7:12am

The basic problem is that of the OCR of a searchable text in not adequate, it’s quasi impossible to annotate it.
I had a non searchable complex PDF with text in columns and images which I could not convert to a decent looking readable PDF. I tried the top rated ABBYY FineReader PDF app and many other apps including OWL OCR, pdf expert, pdf pen, fineprint and others. In all cases, the OCR was incapable of discerning individual columns of text on all pages. The quality of the results varied with the apps.
With DevonThink’s OCR, the conversion was perfect including the columnar structure of the text.

chrillek · March 25, 2024, 7:14am

Isn’t DT using Abbyy‘s engine internally?

rufus123 · March 25, 2024, 7:22am

I was wondering what type of OCR DevonThink is using.
Concerning Abbyy, I tried the following

OCR with Fujiscan’s ABBYY app → the app refuses to OCR because it only functions when a hardcopy is scanned.
I downloaded from the app store ABBYY FineReader PDF.app, one of the best rated apps with an expensive subscription → the OCR on my text was suboptimal in terms of columns
before trying with DevonThink, I even tried to print the text → scan with Fuji Scanner → convert to readable with the scanner’s ABBYY
Only DevonThink gave good results in the end.

Question : I thought that DevonThink had a “import as searchable PDF” menu item but I can’t find it. Would you know about it ? thank you

chrillek · March 25, 2024, 8:18am

AFAIK there’s no such menu. But there’s a smart rule workflow which has been discussed here often.

cgrunenberg · March 25, 2024, 8:23am

It does, see About dialog.

cgrunenberg · March 25, 2024, 8:24am

Maybe you’re looking for File > Import > Images (with OCR)…?

rufus123 · March 25, 2024, 10:21am

Yes, thank you @cgrunenberg