OCR & Hazel Workflow on Finder before importing into DT

Hello everyone,

I‘d like to set up a workflow on finder basis that takes care of OCR and renaming before sending it into DevonThink. Hazel is pretty neat here and already helps me a lot.

I would like to set up a folder where it is checked if the pdf files in there already have been OCRed. If so, those files are passed onto into hazel workflows. if not, I‘d like to OCR them first and send them into the hazel workflows afterwards. So far, so easy.

In order to OCR those files I‘am looking for a decent and effective way.best thing would be to make use of the great OCR tool that comes with devonthink. Is this somehow possible? Or are there any other bright solutions you can share with me?

Many thanks and all the best.

The OCR engine of DEVONthink is scriptable, see OCR commands suite:

However, DEVONthink’s smart rules might also be able to handle the workflow.

Thanks.
Are the smart rules inside of DT able to extract date and names out of a document and put them into a filename?

I need to create a lot of different tasks. Since inside of DT there seems no way to sort the rules inside of subfolders, it gets very hard to oversee and messy inside DT. That is one of my issue why I‘d like to keep it outside Dt.

Since it is scriptable, i could start with it. Can anyone give me a hint to the correct script (since i am not that good with scripting :D)

Many thanks.

Are the smart rules inside of DT able to extract date and names out of a document and put them into a filename?

You’d have to test this on a case-by-case basis but dates can be extracted from documents.
Names can be too but you’d have to use a regular expression to scrape it form the contents.

I think it is quite a common work flow where users (as I do) utilise Hazel Rules for various purposes before sending the file to DEVONthink. In a dated item this is discussed here,

You could use some other tools to OCR your files in an automated way in Finder, then OCR again when you ask Hazel to send to DEVONthink via a Folder action like, … Devonthink - Import, OCR and Delete.scpt

I would definitely recommend to use smart rules + indexing instead of Hazel and/or folder actions if possible as many workflows can now be automated without external tools or scripts. It’s easier to set up & to maintain and less likely to break. But of course Hazel still has its advantages in some cases, suggestions & requests are of course always welcome.

Both methods ocr and convert image take an “object” as their first parameter. What exactly is this object – a DT record? a file or an alias to a file or …?

This might be a glitch of the script suite’s description, the really relevant attributes are record and file.

OK, I’ll try that one out. Nevertheless I suggest to work these descriptions over and make them easier to understand (and correct, too ;-). There was a thread in 2016 about that already.

  • convert image returns “a record or an image record” – very confusing. I guess it always returns a record (what is an “image record” even?)
  • [type: …/‌"Word"] - really, I can convert an image to a word file?
  • ocr… “file: Text: An image file” should be the path to/of an image file

@BLUEFROG ?

Please fix the scripting library documentation on this!

The documentation for at least ocr is broken. The only way I could get this to work was by using
ocr(theFile, {file: theFile}) (JavaScript syntax)
which is counterintuitive and can by no means be deducted from what the scripting dictionary says. Leaving out the first parameter or the {file: ...} part both resulted in the error message “missing parameter” (which is a master piece of obscurity).

Without using the to parameter, the doc says the record would be placed in the global inbox (correct in my experience) “or bring up a group selector”. What does this “or” part even mean – when will the group selector appear?

Then there’s the attributes parameter which is supposedly a Record (uppercase first letter)? I do not think so. It might be a record (lower case first letter) in the sense of an AppleScript record, but a Record here would be a DT record, which makes very little sense in this context.

Also, the type parameter seems to be a bit overstuffed: will the ocr method really convert an image to an “annotate document” or a “comment document”, and what are those beasts? A “word” document?

Since the irritation with this method goes back to at least 2013, it might be time to fix the text :wink: And maybe, while you’re at it, kill the “pass me the same parameter twice” logic.

You can do all the OCR actions that you can within DT 3 so:
Annotate Document - will place the OCR’d text in the Annotation field of a record
Comment document - will place the OCR’d text in the Comment field of a record
Word Document - unsurprisingly will generate a word document

Thanks for clarifying. I actually never saw the “Word” option in the convert menu, because I do not touch MS Office with a ten foot pole. Therefore, I assumed that the option to OCR to a word file in the scripting dictionary was erroneous.

On annotation/comment: I’m not sure that I fully understand what they’re supposed to do. If I OCR a (for example) TIFF file (which is outside of DT at that moment) to a “PDF document”, I get a PDF document as a new record in my global inbox (not surprisingly). However, what do I get if I OCR an external file to “annotation document”? A new, empty record with only its annotation set to the OCR result?

In DT itself (or with the convert image method), the OCR result would then go to the annotation of the OCR’d record, I guess. But with the ocr method, there is no record yet.

In the first case it would the tiff file will be imported as an image record with its annotation or comments field containing the OCR’d text, With convert, you are correct that it would add it to that record.

Actually, I don’t work on the scripting dictionary. :smiley:

1 Like

Hello everyone,
I have to be honest: I am lost :slight_smile:

I do not know how to setup a folder action with apple script and the ocr engine of DevonThink. Would anyone be able to help me with this script?

Many many thanks.

Where do you find this?

Script Editor (which is part of MacOS), file/ function library, select DEVONthink.

1 Like

Hello, I’d like to bring this up again.

I’d like my documents to ocr before importing them into devonthink, so I can use my hazel-rules for renaming on finder-base. Is it possible to use the ocr-engine outside of devonthink and script it into an hazel-rule?

Yes. Use

const dt = Application('DEVONthink 3');
dt.ocr(...)

for JavaScript. AppleScript is similar, using a tell block.

I do it this way.