Workflow with Scansnap, Hazel, Dropbox and DTPO

huxbnw · April 5, 2011, 12:35pm

I am attempting to establish a workflow for all PDFs–scanned at home/work, downloaded or attachments to emails. No matter how I receive the PDF, I would love it to be OCR’d, named properly and sent to DTPO.

The topic from drspk on Devonthink + Hazel + dropbox is great, especially on how to get PDFs from work into your workflow, but I’m not sure it creates the perfect workflow for me at least.

Ideally, I would like a system where no matter how I receive a PDF (scanning, downloading, emailing), I can use Hazel rules to: (a) OCR the document, (b) name the document, possibly using a mix of Hazel/TextExpander, © send to DTPO for processing and (d) send a duplicate to an archives folder.

Some inspiration for this came from Katie Floyd’s talk at MacWorld 2011.

Has anyone implemented such a workflow? Any recommendations?

Thanks in advance for any advice/comments.

alanshutko · April 14, 2011, 2:17am

So far, I don’t have a universal workflow. It’s somewhat automated, but PDFs from different sources are handled differently.

When I scan, PDFs go into my DTPO inbox straight from the scansnap, and DTPO OCRs them. Eventually, when I process them, I rename them and sort them into the appropriate databases. I’ve got a lot of different types of things I scan, so I don’t have an automated way of handling them. Magazines get renamed to the magazine name and the volume number. Bills get renamed to the utility and the billing date (not the date of the scan). Recipes, etc get renamed depending on what they are.

I have some automation for PDF bills that I regularly download (check out the hazel rule export attached). For these, it was worth coming up with Hazel rules that see which bill it is (usually by checking the source URL, sometimes the text), scan the text for the bill date (which is dependent on which utility it is, I haven’t found a generic way yet), and rename things appropriately. Downloaded PDFs rarely require OCR, so I don’t have that in the rules.

Other PDFs I download are renamed on an ad-hoc basis or left alone. I download a lot of product manuals, books, and roleplaying materials, and predominately their filenames are used as is.

I also create PDFs in a number of ways, and they generally go straight in. For example, when capturing an article from the web these days, I like to go into Reader mode in safari, then print as “PDF to DTPO”. Those articles need to get renamed, but not OCRed.

When thinking of all the different cases I’ve got, it seems like putting together automation that would correctly handle all the exceptions would be a lot of work. Putting together automation which blindly OCRed and renamed files would cause a lot of damage I’d have to undo later, so it doesn’t seem like a win to me.
Downloads.zip (6.15 KB)