Problem with OCR - creating searchable PDF

HerbertHoerner · October 13, 2022, 7:54pm

When I try to create a searchable PDF I always receive the message (source is a scanned PDF)

13.10.22, 21:50:00: Der OCR-Vorgang ist fehlgeschlagen.	Erzeugung der PDF-Datei fehlgeschlagen.
13.10.22, 21:50:00: Erzeugung der PDF-Datei fehlgeschlagen.

Problem exists on different machines (old installation, working since month without changes as well as on new machine with new installation).
Databases can be read and also written to. Abby fine reader is installed on all machines.
Are there any hints to solve that issue?

HerbertHoerner · October 13, 2022, 8:32pm

I would like to add: if I start OCR - it runs through the document - can see this in the activity window. But when starting storing PDF - it hangs.

chrillek · October 14, 2022, 7:57am

Does this happen with all PDFs (even single page ones) or only with a particular one? In the latter case, I guess you should open a support ticket.

Also, it would be helpful to describe in which way you “try to create a searchable PDF”. Smart rule, context menu, any other steps?

HerbertHoerner · October 14, 2022, 10:57am

It happens to single as well as to multi-page PDF files.
First, the problem raised by using smart rules.
But when investigating the problem, the same occurs with specific requests of a specific file to convert into “searchable PDF”.
Have tested as next if this has any correlation to network resources (I use as source for INBOX a NAS drive.) .
So when I OCR to searchable PDF and the file is on NAS drive: OCR runs through pages and executes OCR. When saving file the process hangs and the saving file activity is shown until process is cancelled manually.
When I OCR the same document from within the database (manually copied form NAS to INBOX of database): the OCR runs through and the process is not “hanging”. But there is also no searchable PDF generated.
This occurs on the same manner in different databases.

cgrunenberg · October 14, 2022, 10:58am

How is the drive connected/mounted?

HerbertHoerner · October 14, 2022, 11:00am

with SMB. But see my recent post: it also occurs with local files

HerbertHoerner · October 14, 2022, 11:07am

One first finding in testing:
When I first convert the PDF from the scanner into paginated PDF and then in a second step OCR the document it works.
The scanner was not changed or updated.

cgrunenberg · October 14, 2022, 11:07am

What kind of Mac (Apple or Intel chip) and scanner do you use?

HerbertHoerner · October 14, 2022, 11:55am

One Apple MacBookPro M1 and one MacBookAir M2.
Scanner is Brother ADS2800W.

Was working fine for more than one year … but some update seems to jeopardize the process.

HerbertHoerner · October 14, 2022, 12:14pm

When I convert the PDF from Scanner as a single additional step into a paginated PDF then the OCR works.
After conversion to paginated PDF the difference is a small Logo looking like a squared smiley. What is the meaning of this attribute?

cgrunenberg · October 14, 2022, 12:20pm

A screenshot of the icon would be useful, thanks!

chrillek · October 14, 2022, 1:03pm

Cf “iconography” in the manual.

BLUEFROG · October 14, 2022, 1:56pm

A squared smiley is a miniature Finder icon.
This property icon indicates an indexed file.

aedwards · October 14, 2022, 4:40pm

Are you using the Brother scanner software to generate the PDF? If you are, try using either the scanner software in DEVONthink or Apples Image Capture. Does the PDF scanned with either of these now OCR ok?

clearsky · October 16, 2022, 6:19pm

Hello Folks, I’m new here. I have exactly the same Problem. Brother ADS 2800W scans via SMB to Synology Diskstation, files are synced with Synology Drive to a local folder on my MB Pro M1 Macbook and Devonthink monitors this folder. Files are not longer OCRd since a few days. I think the problem came up when my DT reloaded the latest ABBYY Finereader Plugin.

clearsky · October 16, 2022, 6:29pm

The Protcol shows “Creation of PDF file failed” (translated from german language, so the message may vary in english)

cgrunenberg · October 17, 2022, 7:30am

Does it work using the Image Capture application or scanning via DEVONthink?

BLUEFROG · October 17, 2022, 11:19am

Welcome @clearsky

This is an issue with the output from the Brother scanner not conforming to the date expected by the OCR engine.

Try opening and re-saving the scanned file in Preview and try the OCR again.

clearsky · October 18, 2022, 1:59pm

The brother scans to a folder on a synology NAS, which is synced to my mac. The import to devonthink runs via apple Script and Folder monitoring. Worked fine for over 2 years.

clearsky · October 18, 2022, 2:00pm

according to the protocol the OCR works fine. The problem comes up when DT tries to save the file.