Oh, Hazel can extract data from an OCRed PDF and rename the file and or move it to an indexed folder. I used to have a complex system set up with Hazel that parsed bills and filed them away in appropriate folders. Have a look here… (click on the link, don’t just read what’s displayed below)
Hazel certainly can extract text from the OCR layer, as pvonk states. I use that ability to find my bank account number, which allows me to identify pdfs which are bank statements, rename them with date which is extracted from the OCR layer, and file the pdf in the appropriate folder. This feature has been available for some time.
I was guilty of being a bit too terse in what I wrote. Hazel watches my downloads folder and processes various things that land in it. For example, pdfs are automatically scanned to see if they need OCR, and if they do, they are sent off to have that done by PDFpen Pro (because I find it gives the best OCR results of the various options I have available to me) and if it does not need OCR, it is scanned for data that will identify what the PDF is – a bill, a receipt, a bank statement, etc. A bank statement will have my account number and the sort code of the bank in it (I’m in the UK, so UK terminology) and various other identifiers that will tell Hazel that, for example, it is not a receipt with my account number in it. The PDFs are then renamed, beginning with the ISO date (so that they sort properly). The date of the item is extracted automatically from the text in the PDF. Hazel can do some pattern matching and manipulation of data, so a date in standard UK/European format in the text of the PDF can be turned into an ISO date for renaming very easily.
These are not new techniques. Macsparky has been using them for years and has written about them extensively in his field guides.
Right. I think my link above (Revisiting Hazel, click on the l.ink, don’t just read what you see above) goes into many details. The post in that thread by Greg_Jones gives screenshots of rules and ways of picking up text in the PDF that is assigned to a variable and later used for renaming the file.
Yeah, and I see that thread is from 2012, so Hazel has been capable of reading text in a pdf for at least the past seven years. It pays to check what a program will actually do! (Not that I always do …)
" It pays to check what a program will actually do! (Not that I always do …)"
I hear you. I have many apps that I’ve used for years. Once I’ve determined a workflow for a given app that works for me, I can easily miss some of the new features that are added subsequently. For example DTP 3 is one I’ll really have to focus on. I see a few new features (esp. metadata) that I find intriguing, but there will be others that slide in under the radar. That’s why I regularly follow forums for my important apps to learn new tricks. Even then, I’ll run into a post that opens my eyes.