Automate OCR on a folder

Is there any way to have OCR run automatically on any image or PDF that is dropped into a specific folder?

Yes, this is possible in DEVONthink 3 via smart rules.

And yes, it’s ideally targeting a specific group for a more controlled experience.

Here I have created a target group in the Inbox of a database, then control-clicked it and chosen New Smart Rule.

Here is the smart rule…

This detects images or PDFs with no Word Count, imported to this group Today. It then runs OCR and files them in an OCRd PDFs group in the root of the database. This allows the dropfolder to stay clean for subsequent drops.

Also, to keep things even cleaner, I created a secondary smart rule that refiles PDFs that have a Word Count greater than 1.

So there are two simple, but powerful things you can do with smart rules.

(And obviously, they can be changed or extended, as needed.)

2 Likes

OCR DT3
Thanks for your help. I’m happy with a simpler formula, but it doesn’t seem to be working when I drop a PDF into the folder. What is incorrect about my Smart Rule?

There is no relevant action at the end of your smart rule, but merely Cancel.

Stephen

1 Like

Got it works, thanks all!

I would suggest you don’t go for the “simpler rule” as there’s no point in doing OCR on documents that don’t need it. Your call, but I advise being more specific in your targeting.

What is the difference between “File into” and “Move to”? For example, I used your rule but wanted to file the OCR’d PDF or image back into the Inbox of the database. I couldn’t figure out a way to do this: I was always creating subfolders. It worked only when I used “Move to:”. Thanks.

Move moves items into chosen existing groups.
File allows you to specify groups via their location, even creating locations on-the-fly.

File to /Processed Files would use or create a Processed Files group in the root of the database.

From the built-in Help > Documentation > Appendix > Smart Rule Events and Actions

OK, I understand about creating new groups from “File to:”, and that’s great; I had read the documentation before writing. Maybe I don’t understand Inboxes. Are they a separate group at the root level of a database, or something else entirely? I just wanted to file the OCR’d PDF or image into the inbox for that database, as I clean out Inboxes for my databases weekly.

Are they a separate group at the root level of a database, or something else entirely?

The Inbox is a special group in the database, typically shown in the Globals section of the Navigate sidebar. It is still part of the database it belongs to though (except the Global Inbox, which is its own database).

I just wanted to file the OCR’d PDF or image into the inbox for that database, as I clean out Inboxes for my databases weekly.

Then the easiest option is to pick the Inbox of the desired database in a Move action (though you haven’t specified what location you’re targeting in the smart group).
This is a simple example targeting the Global Inbox and moving an item to the Inbox of the Data database…

That’s what I ended up doing. I specified the Inbox of the database. “Flle to” seems to have two things going for it that “Move to”: doesn’t: It can create subfolders (within the current group only?) and move a file there, and it can use a placeholder to define a location to move the file to. I will have to read up on placeholders to see what options that might give me.

That is correct.
Here’s an example using placeholders to corral clipped items in my Global Inbox.

Note:

  • I am being very specific in my targeting of the Global Inbox, types I want to filter, and a timeframe they were captured in.
  • This is an example of using the File action and static text with dynamic placeholders.
  • Another side benefit of this construction is it generates a new group daily.

PS: Move is also easier for many people to use, i.e., it’s a bit less nerdy than using File. :slight_smile:

1 Like

Well, File gets my Inner Nerd going, so… :blush:

Thanks, as always for your help.

You’re welcome. Nerd away… :stuck_out_tongue: