Import PDF, OCR and rename using Automator.

dorich · September 21, 2016, 5:57am

As the subject line suggests I want to use Automator to move a pdf into DTP, OCR the resulting import and then move the document to a new group with a new name. However, I’m stuck on a couple of issues.
I should also note that I want to do this without using applescript.

Looking at the OCR DTP function in Automator I couldn’t see any way to apply it from within DTP. So I took the following approach. I indexed a folder external to DTP with the intention of applying the OCR function to the file when seen from the Finder. To test this I created an Automator Service that has the following steps.

Get Selected Finder Items
OCR Items (The DTP function)
Rename PDF documents

This ‘sorta’ works but with the following problems:

I end up with two pdf documents, both have a text layer.
The resulting pdf’s are moved to whichever DTP group is currently in focus within DTP. I’ve tried controlling this by adding Automator elements that open a specific database and make a particular group active, but that doesn’t work and the pdf’s still go to the last highlighted group in DTP.
The renaming function doesn’t work, perhaps because its a finder function and the new file is inside DTP.

I searched around the forum for help on this but it seems that a lot of solutions use applescript. I couldn’t find any documentation on an automator approach.

Ultimately I want to use Hazel to detect downloaded files and move them to folder where they can be imported into DTP, processed and stored in a specific group.

I would really appreciate some guidance on how to apply the OCR function using Automator and any other wisdom that anyone can impart on this subject.

Thanks

korm · September 21, 2016, 9:55am

Is there a reason why you do not want to use AppleScript?

Is there a reason you want an Automator services for this. Hazel works fine with AppleScript – you can either have a script that is incorporated into Hazel itself, or you can have Hazel run an external script, or you can have Hazel put the file into a folder with a folder action. If you expect that when Hazel puts the file into a folder then that file is imported and OCRd then you should be using a folder action and not a service. The folder action that is relevant for what you want to do is “DEVONthink - Import, OCR & Delete.scpt”, which is in ~/Library/Scripts/Folder Action Scripts.

By default, when DEVONthink imports it places the document in the destination you’ve configured in Preferences > Import > Destination. Out of the box, the folder action mentioned above will place the file into that destination. There are several examples posted of a customized folder action in the forum that will place the file into a specific database > group instead.

If you still must use Automator then you should be aware that:

when DEVONthink OCRs PDFs it does not delete the non-OCR’d copy – of course, that folder action mentioned above takes care of this for you
if you OCR an item that is already “PDT+Text” then you’ll end up with two PDFs with text layers
the “Rename PDF Documents” automator action is not a DEVONthink action – it is a Finder action as you pointed out and won’t work for the reason you gave
you do need to set database > destination in Automator by using

set current group
add items to current group
ocr items