Strategy to OCR overnight (if necessary)

I would like to scan tens of PDFs and OCR them overnight. In experimenting with the Devonthink Pro 3 beta3, I was trying to create a smart rule that would run overnight, OCR-ing all PDFs with zero word count. However, I see hourly and daily settings to perform the actions but I don’t see a setting to run at a particular time daily (e.g. 3am).

Also, the OCR does happen when I perform the smart rule, but it appears a reference to the non-OCR’d version of the PDF remains.

Maybe this is all unnecessary if the scanning of PDFs can be continuous and DT3 can be used normally while the OCR tasks are also happening. I just don’t want to wait for DT (or my scansnap software) to OCR before I can efficiently scan the next PDFs.

  1. There is currently no option to set particular times for events to trigger. This is the first request I’ve seen.
  2. The original is only deleted from DEVONthink if you have enabled Preferences > OCR > Original Document: Move to Trash.
  3. OCR is an asynchronous process, so DEVONthink isn’t waiting for it to be finished before it can be used again.

I’ve worked with scanning in a few more documents. Here is what I’ve seen: If the OCR action is set in preferences, then it will typically take 10 or more seconds per a one page document (at least on my “long-in-the-tooth” poorly optimized 2012 Mac Mini) before I will have access to it, to change the title, or to set its metadata.

So, for speed’s sake, it seems scanning the documents, getting them in their groups and setting their metadata, is step 1. OCR-ing later is step 2.

Are their any efficiencies I’m missing?

None that I’m seeing right off hand.