Scans -> OCR -> Renamed images workflow

Stike · June 8, 2019, 11:48pm

This is a pretty specific workflow I want to achieve using these tools:

DEVONthink Pro Office (Version 2, yes)
Keyboard Maestro
Hazel
Applescript

Additional tools may come in if necessary.

Workflow description:

Incoming are sheets on actual paper with a specific serial number on a printed sticker in the format “XY-1234-19”. Those sheets also contain a drawing which is always on the same place on every sheet.
Sheets get scanned (either JPG or PDF with OCR)
The script would then
a - the JPG case: Run OCR on each scan, find the serial number via Regex, crop the JPG to show only the drawing and rename the JPG to the serial number…
OR
b - the PDF case: find the serial number via Regex, crop the PDF (?) and create a JPG from the drawing and rename it with the serial number.

So I can let the scanner handle the OCR or tell DTPO to do it. Which case is easier to script?
It seems Preview.app isn’t very scriptable in Mojave, so cropping would be a task for Keyboard Maestro. Also, cropping a PDF is technically not as easy as cropping a picture, which would favor version a).

Any ideas or suggestions how to do all that?

cgrunenberg · June 10, 2019, 9:21am

This depends on the used scan software. But if the scanner is able to perform OCR and create PDF documents, avoiding JPG would simplify the workflow.

The script support finally introduced by Mojave (as far as I remember) is just the standard script support of macOS. But there are actually no PDF specific commands or properties at all and the Automator actions don’t support cropping either. Therefore a third-party app like PDFPen Pro or PDF Expert might be necessary.