I have another, perhaps more complex issue open. But in the meantime, DT3 OCR is not working for incoming scans.
I use ScanSnap to scan, then send to DT3, with “Convert Incoming Scans To searchable PDF” in settings—but while the incoming scan is listed as “PDF+text” the text layer is not selectable, searchable, etc. When I manually OCR the scan after import, then the text layer appears and works. This happens regardless of whether OCR is selected in SnanSnap settings.
I’m including three files:
The file from ScanSnap, saved to a folder (since I had OCR on in ScanSnap settings, it has a text layer). In Preview, the “content creator” is identified as “ScanSnap Home #iX500”.
The file after being sent to DT3. As you will see, somehow the text layer has been “lost” despite having the “Convert to searchable PDF” setting on for incoming scans. In Preview, the “content creator” is identified as “ABBYY FineReader Engine 12”.
The file after being manually scanned in DT3 (by choosing OCR to searchable PDF). As you will see, the text layer is now present and usable. The file is also about 3 times the size as when imported.
One further note: the Log is completely empty—nothing has registered on it.
Everything was working fine, and the only change I made was to update DT3 (currently 3.5.1).
Yes, same settings and version of ScanSnap—it made no difference if I selected OCR or not in ScanSnap, but I thought it was particularly interesting and possibly relevant that DT3 seemed to REMOVE a text layer that had been added by ScanSnap.
One question: when you scan a file using the above settings, does something show up in your DT3 Log? I feel like mine used to show activity on incoming scans but no longer does…
No, I have no entries in my DT3 log; until recently I also used “Convert Incoming Scans” “to searchable PDF”, which worked and also created no entries in the log. I switched to using the smart rule so that the same thing happens to documents regardless of where they originated (scanner, drag & drop, Scanner Pro etc.)
Try the smart rule - I don’t like workarounds, but seeing as I have no idea what the source of your problem is, and the smart rule would have the desired effect (the document is OCR’d), it would be worth a try.
Presumably this persists after rebooting the Mac? I wonder whether it’s possible that the ABBYY add-on did not install properly? Presumably if you look at DEVONthink 3/Install Add-Ons… ABBYY FineReader OCR is shown as installed?
EDIT: Sorry, just seen your other open thread, so it’s obvious it persists after reboot; also you’ve installed 3.5.1 since the problem first began, and FineReader will have been downloaded/updated and reinstalled at that time
Another oddity—if I open the file in PDF Pro, I can select the text. But I cannot select it in either DT3 or Preview—so it seems like DT3 must be OCRing the file, but not in a way that DT3 or Preview can use.
yeah, that’s bizarre - it suggests that DT handles OCR triggered by rule and by hand differently (which might help DT follow this up; it doesn’t help me help you, unfortunately)
Because this seems to affect you, but nobody else (or, at least, I haven’t seen any other reports) I wonder whether it would continue happening after a clean install of macOS (which OS are you on btw? I’m using Catalina), or on a “clean” user account. I don’t know how much time and effort you want to put into this, but my next step would probably be to do a clean install, and set up DT and ScanSnap early in the process. Then I’d be watching for the same error after installing each piece of additional software.
Maybe somebody has a better idea, though (although it’s noteworthy that there have been no solutions posted on your other thread - suggests a pretty stumped community to me, as it’s pretty lively otherwise)
I unfortunately haven’t got a clue of the inner workings of the PDF Framework used by macOS - and whether or not parts of that could have been replaced when a different piece of software was installed (I remember relevant bits of Windows being replaced willy nilly by installers, not something I have knowingly experienced in macOS)
Have you tried OCRing documents on your MacBook (you could use the smart rule, and then just drag and drop a file which you have scanned (but not OCRd) on your desktop, if you haven’t got the scanner connected to the MB)?
As I said, you don’t need to scan with the MB - set ScanSnap to not OCR, and save as a file rather than to DT; then just airdrop the file to your MB, and drag and drop into the inbox (after setting up a smart rule)
sure, make it a simpler rule: set the trigger to “on import” and leave out “after sync” - then it should only touch PDFs arriving in your inbox without a text layer. Obviously you also don’t need to use the Change Creation Date etc. although you might leave in the change name section, just to prove something has actually happened to the document (as we don’t really know at which stage things are going wrong)