OCR issues persisting

I have another, perhaps more complex issue open. But in the meantime, DT3 OCR is not working for incoming scans.

I use ScanSnap to scan, then send to DT3, with “Convert Incoming Scans To searchable PDF” in settings—but while the incoming scan is listed as “PDF+text” the text layer is not selectable, searchable, etc. When I manually OCR the scan after import, then the text layer appears and works. This happens regardless of whether OCR is selected in SnanSnap settings.

I’m including three files:

  1. The file from ScanSnap, saved to a folder (since I had OCR on in ScanSnap settings, it has a text layer). In Preview, the “content creator” is identified as “ScanSnap Home #iX500”.
  2. The file after being sent to DT3. As you will see, somehow the text layer has been “lost” despite having the “Convert to searchable PDF” setting on for incoming scans. In Preview, the “content creator” is identified as “ABBYY FineReader Engine 12”.
  3. The file after being manually scanned in DT3 (by choosing OCR to searchable PDF). As you will see, the text layer is now present and usable. The file is also about 3 times the size as when imported.

One further note: the Log is completely empty—nothing has registered on it.

Everything was working fine, and the only change I made was to update DT3 (currently 3.5.1).

Any idea what is going on here?

Thanks kindly,

William

1 ScanSnap.pdf (728.8 KB) 2 DT3 import.pdf (691.7 KB) 3 DT3 manual OCR.pdf (1.9 MB)

Because I am using the same setup without any trouble at all (the same, other than that I have not set ScanSnap to OCR), here are my ScanSnap Settings; are yours the same?




As a workaround, you might try setting “Convert incoming Scans” to no action and then use a smart rule:


That smart rule would only trigger if you have no text layer at all, otherwise it would not perform OCR. In my list of smart rules, this on is at the top, ensuring it runs on all documents.

Yes, same settings and version of ScanSnap—it made no difference if I selected OCR or not in ScanSnap, but I thought it was particularly interesting and possibly relevant that DT3 seemed to REMOVE a text layer that had been added by ScanSnap.

One question: when you scan a file using the above settings, does something show up in your DT3 Log? I feel like mine used to show activity on incoming scans but no longer does…

Thanks in advance!

No, I have no entries in my DT3 log; until recently I also used “Convert Incoming Scans” “to searchable PDF”, which worked and also created no entries in the log. I switched to using the smart rule so that the same thing happens to documents regardless of where they originated (scanner, drag & drop, Scanner Pro etc.)

Try the smart rule - I don’t like workarounds, but seeing as I have no idea what the source of your problem is, and the smart rule would have the desired effect (the document is OCR’d), it would be worth a try.

Nope! Created an identical smart rule and it doesn’t work—exact same problem. If OCR is turned off in ScanSnap, DT3 does not create a selectable text layer using that rule. Something is wonky.

Presumably this persists after rebooting the Mac? I wonder whether it’s possible that the ABBYY add-on did not install properly? Presumably if you look at DEVONthink 3/Install Add-Ons… ABBYY FineReader OCR is shown as installed?

EDIT: Sorry, just seen your other open thread, so it’s obvious it persists after reboot; also you’ve installed 3.5.1 since the problem first began, and FineReader will have been downloaded/updated and reinstalled at that time

Yes, it’s been happening for a while now—many reboots.

And yes—AABBYY is grayed out (meaning installed).

The odd thing is when I manually do it (right click file, OCR, convert to searchable PDF), it works fine. But it almost triples the size of the file.

Another oddity—if I open the file in PDF Pro, I can select the text. But I cannot select it in either DT3 or Preview—so it seems like DT3 must be OCRing the file, but not in a way that DT3 or Preview can use.

yeah, that’s bizarre - it suggests that DT handles OCR triggered by rule and by hand differently (which might help DT follow this up; it doesn’t help me help you, unfortunately)

Because this seems to affect you, but nobody else (or, at least, I haven’t seen any other reports) I wonder whether it would continue happening after a clean install of macOS (which OS are you on btw? I’m using Catalina), or on a “clean” user account. I don’t know how much time and effort you want to put into this, but my next step would probably be to do a clean install, and set up DT and ScanSnap early in the process. Then I’d be watching for the same error after installing each piece of additional software.

Maybe somebody has a better idea, though (although it’s noteworthy that there have been no solutions posted on your other thread - suggests a pretty stumped community to me, as it’s pretty lively otherwise)

Don’t think I’m up for that—I just set up this machine less than two months ago, so I can’t believe that’s necessary here. But I really appreciate your help!

I unfortunately haven’t got a clue of the inner workings of the PDF Framework used by macOS - and whether or not parts of that could have been replaced when a different piece of software was installed (I remember relevant bits of Windows being replaced willy nilly by installers, not something I have knowingly experienced in macOS)

I’m quite sure it’s not necessary - I just don’t know the more direct solution :see_no_evil: Jim @BLUEFROG help me out here :exploding_head:

(which OS are you on?)

I’m on 10.15.5 (19F101)

And further confirmation—I just opened my MacBook (synced via iCloud), and my DT3 install there succeeds and fails at selecting text from the same documents.

I presume you have PDF Pro on your MacBook too?

Have you tried OCRing documents on your MacBook (you could use the smart rule, and then just drag and drop a file which you have scanned (but not OCRd) on your desktop, if you haven’t got the scanner connected to the MB)?

Yes, but it’s PDF Expert, not PDF Pro—my error.

And I will try scanning with my MacBook—it’s a pain right now because the wireless isn’t set up and the USB cable is routed through my desk. But I’ll do it tonight some time.

:exploding_head: blast - I’ve got that on m devices too, so that’s not going to help us…

As I said, you don’t need to scan with the MB - set ScanSnap to not OCR, and save as a file rather than to DT; then just airdrop the file to your MB, and drag and drop into the inbox (after setting up a smart rule)

Okay, but I deleted that smart rule, because for some reason it was OCRing everything in my Inbox—is there a way to create it so it won’t do that?

sure, make it a simpler rule: set the trigger to “on import” and leave out “after sync” - then it should only touch PDFs arriving in your inbox without a text layer. Obviously you also don’t need to use the Change Creation Date etc. although you might leave in the change name section, just to prove something has actually happened to the document (as we don’t really know at which stage things are going wrong)

Okay, will try that—thanks again!