I need some help with my OCR workflow. I am using a ScanSnap scanner that saves scans into an indexed folder Scansnap on my Mac. I want DEVONthink to perform OCR on these files because its engine is superior, but I want the final searchable file to stay in (or return to) that same indexed Finder folder.
I am experiencing the following issues:
Missing Option: In my Smart Rule actions, when I select “OCR to searchable PDF” or apply, I do not see the option to “Replace Original”. It is simply not available in the dropdown menu.
Duplicates: Because I cannot “Replace Original”, the rule creates a new file. This triggers the rule again (on “Import”), leading to 3 or 4 duplicates of the same file.
My current setup:
Trigger: On Import
Conditions: Kind is PDF/PS; Word count is 0
Action: OCR apply, move to trash
Goal: OCR the file, replace the non-searchable original in the indexed Finder folder, and keep only one file.
How can I get the “Replace Original” option to show up, or what is the best way to handle OCR for indexed folders to avoid duplicates?
“Additionally, I would like to know which global settings are required under Preferences > Import and Preferences > OCR to support this workflow correctly. For example, should ‘Incoming Scans’ be set to ‘Convert to searchable PDF’ globally (I have now set to no action), or should I leave that to the Smart Rule only? And where should the ‘Output’ (Save results in) be directed when working with indexed folders?”
What version of DEVONthink and macOS are you running?
`I do not see the option to “Replace Original”. It is simply not available in the dropdown menu.
I’m not sure why you’re looking for this as there is no such documented smart action or command.
For example, should ‘Incoming Scans’ be set to ‘Convert to searchable PDF’ globally (I have now set to no action)
incoming scans refers to documents received from certain scanners using a Send to (or similar) function with DEVONthink. If you’re scanning to a Finder folder you’re indexing, this does not apply.
And where should the ‘Output’ (Save results in) be directed when working with indexed folders?”
Please be more clear on the details.
Assuming you are referring to the ScanSnap scanning profile, point it to the indexed Finder folder. See below.
Do not point the Send to to any application. You’re not doing the same process a ScanSnap scan normally does relative to DEVONthink.
Action: OCR apply, move to trash
That is incorrect. You’re deleting your OCR’d document.
This is far less complicated than you’re imagining.
Scans In Finder folder, indexed into your database. This is ideally not in a cloud-synced folder. It is known and documented cloud-synced folders don’t always provide notifications about filesystem changes, so it may not reliably trigger On Import. This is a temporary location, which is perfectly fine. Back in the day, this was known as a watched folder.
Scans Out Finder folder, indexed into a database. This is okay if it’s in a cloud-synced folder as we’ll just be pushing files around. However, don’t use a cloud-synced folder if it’s not needed.
Only matches documents without an OCR done tag. A filtering method.
Thinking about ways to keep from reprocessing the same files is a good habit to be in.
Allows for images or PDF documents with no text layer
Event triggers manually or On Import so when the Finder folder receives the scan and the Finder notifies DEVONthink of it.
OCR > Apply does OCR and produces no intermediate document so there’s nothing to clean up.
Add the OCR done tag.
Move the document to the indexed Scans Out Finder folder.
Notice the groups are indexed and the Scans Out documents are indexed. Exactly as requested.
Dear Jim,
Thank you!
Your smart rule is working! It was indeed a conceptual error on my part regarding the ScanSnap settings.I sent it to Devonthink.
I use DT 4.2.2 and Sequoia 15.7.4.
Thank you again for pointing me in the right direction and for the great support!