@Doyanole Short description of my workflow. The example is a Verizon Wireless bill. ExactScan scans the document, gives it a temporary name starting with VZW to ID it later for processing, puts it into a Hazel watched folder. A Hazel rule recognizes that it needs OCR and adds a tag “NeedsOCR”, then moves it to the Devonthink Inbox. Devonthink smart rule performs OCR, removes that tag and replaces it with “OCR by DT3” (So Hazel won’t enter the file into an infinite loop), then exports to the same Hazel folder. Now Hazel can read the text layer and renames it to iinclude the document date (In this case, the year/month of a bill). The renamed file is returned to Devonthink’s Inbox where a Smart Rule moves it to its final Group.
The Hazel rule that reads the date looks for a line that on the printed copy looks like
Billing Period Aug 16, 2019 to Sep 15, 2019
but due to OCR formatting reads
Billing period Account number Invoice number
Aug 16, 2019 to Sep 15, 2019
(The Billing period, Account Number and Invoice number entries are in a column with the respective data in the next column. The OCR reads down one column then over to and down the next).
I don’t need the start of the billing period, as all I’m after is the closing date to use as the bill date, so my rule looks for “period” followed by anything until seeing a date of the form “to Mmm dd, yyyy”
The field after Contents > Contain Match expands as shown here:
and the Date attribute looks like this:
Finally the renaming action uses the “Date” attribute but changes the format:
Note in the first screenshot a Preview button. This shows whether each condition matches, and in the case of looking at contents lets you see what the text layer looks like so you can spot errors or anomalies like the data reading down columns instead of across.
Sometimes this can be quite exacting, but once set up works well (until the vendor changes the bill format, or you get a bad OCR for some reason)…the latter is why I run the file into DT3 for OCR, then export for Hazel to do its work, then back to DT3…the ABBYY engine in DT3 is more consistent than the one in ExactScan.