Import PDF, OCR and rename using Automator.

As the subject line suggests I want to use Automator to move a pdf into DTP, OCR the resulting import and then move the document to a new group with a new name. However, I’m stuck on a couple of issues.
I should also note that I want to do this without using applescript.

Looking at the OCR DTP function in Automator I couldn’t see any way to apply it from within DTP. So I took the following approach. I indexed a folder external to DTP with the intention of applying the OCR function to the file when seen from the Finder. To test this I created an Automator Service that has the following steps.

  1. Get Selected Finder Items
  2. OCR Items (The DTP function)
  3. Rename PDF documents

This ‘sorta’ works but with the following problems:

  1. I end up with two pdf documents, both have a text layer.
  2. The resulting pdf’s are moved to whichever DTP group is currently in focus within DTP. I’ve tried controlling this by adding Automator elements that open a specific database and make a particular group active, but that doesn’t work and the pdf’s still go to the last highlighted group in DTP.
  3. The renaming function doesn’t work, perhaps because its a finder function and the new file is inside DTP.

I searched around the forum for help on this but it seems that a lot of solutions use applescript. I couldn’t find any documentation on an automator approach.

Ultimately I want to use Hazel to detect downloaded files and move them to folder where they can be imported into DTP, processed and stored in a specific group.

I would really appreciate some guidance on how to apply the OCR function using Automator and any other wisdom that anyone can impart on this subject.


Is there a reason why you do not want to use AppleScript?

Is there a reason you want an Automator services for this. Hazel works fine with AppleScript – you can either have a script that is incorporated into Hazel itself, or you can have Hazel run an external script, or you can have Hazel put the file into a folder with a folder action. If you expect that when Hazel puts the file into a folder then that file is imported and OCRd then you should be using a folder action and not a service. The folder action that is relevant for what you want to do is “DEVONthink - Import, OCR & Delete.scpt”, which is in ~/Library/Scripts/Folder Action Scripts.

By default, when DEVONthink imports it places the document in the destination you’ve configured in Preferences > Import > Destination. Out of the box, the folder action mentioned above will place the file into that destination. There are several examples posted of a customized folder action in the forum that will place the file into a specific database > group instead.

If you still must use Automator then you should be aware that:

  • when DEVONthink OCRs PDFs it does not delete the non-OCR’d copy – of course, that folder action mentioned above takes care of this for you
  • if you OCR an item that is already “PDT+Text” then you’ll end up with two PDFs with text layers
  • the “Rename PDF Documents” automator action is not a DEVONthink action – it is a Finder action as you pointed out and won’t work for the reason you gave
  • you do need to set database > destination in Automator by using

set current group
add items to current group
ocr items