I love DEVONthink! I’ve been using it for more than five years now. Here is the system I’ve evolved in that time. My workflow is as follows:
Get mail from the mailbox, and receipts from my wallet. They go into a physical inbox sitting on my desk. I scan every piece of paper that enters my life, whether it needs to be scanned or not. You just never know!
Every 1-2 weeks, I take out scissors, pen, a letter opener and a staple remover. I use these to turn the stack of mail/receipts into a stack of scan-ready paper.
I scan them all with a Fujitsu ScanSnap into an Inbox directory within an encrypted sparse bundle. I’ve scanned nearly 10,000 pages with this scanner by now!
After scanning, I run Acrobat Pro and OCR all the documents. I turn them into PDF/A files using Searchable (Exact), so that the original scan image is left untouched. The file is bigger, but more accurate.
After OCR’ing, I drag-and-drop these files into the Inbox of a DEVONthink database on that same encrypted sparsebundle. This database is 10.4G now, 8.3G of which are PDFs!
Every 1-3 months, I use the Group & Classify pane in DEVONthink to sort the new Inbox items into groups. I love how accurate this is, now that I have so many classified documents (just over 6,100). I only have to manually classify about 1/3 of the time.
I don’t use tags (yet), but I do make use of multiple groups and replicants. I love that in the Group & Classify pane, I can click on multiple suggested groups and DEVONthink replicates the document into all of them. For example, any time I receive a tax document, I replicate it both to the bank’s group (Banking/Foo/Checking), and also to a group named after the tax year (Taxes/2011).
When I sat down to do my taxes this week, everything I needed was all in one place. When TurboTax asked if I had participated into an HDHP health program (or something like that), a simple DEVONthink query revealed that my paystub had indeed been making such a contribution all year. I didn’t know that, but DEVONthink did!
Many thanks to DEVONtechnologies for their wonderful software. Still loving it after five years of heavy use…
Does your scanner stop working after a bit of scanning. I love my Fujitsu scanner when it works, but it’s the most frustrating thing when it begins to jam midway through or it stops scanning and says “no paper in chute” but my reciept is stopped midway.
I’m no where near the amount of documents you’ve scanned, so I’m just curious if you’ve experienced this…
No, it never stops working. What it does do, after getting this well used:
It often feeds two sheets instead of one, forcing me to abort the scan of those sheets and manually feed them both. I think the rubber on the paper-grabbing roller is just showing its age (now exactly 5 years old).
I noticed that a lot of my scans are tilted 1-2 degrees to the right. Doesn’t really interfere with readability, but it tweaks my OCD something bad!
How often do you clean your scanner? I can get through about 12-15 separate pages before my document begins to hang up and won’t scan. I must wait a few hours and begin scanning again. No matter if I go from a paper document to a receipt or if its half a page or full page, it will scan and stop midway through and give me a “No paper in chute error”. It’s very frustrating. I’m using the Fujitsu ScanSnap s1500M and its only about 2 years old. I just recently replaced my pad/pick rollers and I’m still having issues.
Sorry to get off the subject of your original thread, but I would love to have a system like yours where I scan every 2-3 months, but I know my scanner will crap out.
However, I would like to point out that without this thread, I never would have used Group & Classify and now I’m loving it. Thanks for that!
“Accurate” in terms of image quality perhaps, but perhaps not in terms of OCR, depending on your version of Acrobat Pro. My Acrobat Pro 8 that came with my ScanSnap doesn’t do white text on dark backgrounds, & its dark text on white backgrounds isn’t as good as Abbyy’s OCR.
You sound like you have it nailed and I was wondering if I could ask you some questions as where you are at, is kind of where I would like to be
I do pretty much the same thing as you in my workflow except I scan straight to DT and let it OCR for me.
Do you rename each document then and there? I have been renaming and also adding tags which obviously slows down the scanning process as you basically have to do it 1 document at a time.
I dream of being able to just press that blue button and have the files automatically renamed and filed if they are repetitious (i.e. utility bills, monthly statements etc) leaving me just the obscure ones like a unique letter or a pamphlet to deal with manually. I reckon 2/3 of what I scan is repetitious.
If you just let the scanner default name the files, do you go back and rename later?
Do you have a naming convention for your documents or do you rely simply on their contents rather than their names for classifying and retrieval/searches?
So does this mean you have a single database and keep everything in it? Won’t it eventually grow too large to be useful? What about info that has a use by date - like a kids’ sports roster. Do you have a way to weed out stuff you no longer need and either delete or else archive it off?
So far I just make a new database for each financial year (which for me is 1 July-30 June) but of course then I’m going to end up with a whole bunch of little databases…
Also dealing with multiple databases means I have a global inbox and then have to face a double sort as I first move documents from the global inbox to the inbox of each database before distributing them to their relevant group.
I run 4 databases:
[list] - household info which is not date specific (but will grow ridiculous unless I find a culling method) for recipes, pamphlets, kids schoolwork etc
Personal finances for financial year xxxx (and then make new one each year)
Family company finances for financial year xxxx (separate legal entity, also make new one each year)
Finance studies - I am a student and want to keep my study material here although currently its in a set of nested folders on dropbox for easy access as I haven’t mastered the structure or the remote access well enough - yet)
Wow! How many documents did you need before Group & Classify would work for you? I have over 12 months of identical monthly phone bills and it still can’t recognise a new one as belonging with the others.
How long does it take you to manually file 3 months of documents? Did it take a long time in the beginning and is now shorter? Like you, I think only about 1/3 of my documents need help from me but I cannot get Group & Classify to work. Not sure what I am doing wrong here. How do you group your documents? Maybe this is what I do wrong. Mostly I want to group by statement type - all the phone bills together, all the water bills together, all the bank statements for the same account together etc. I name each one with the yyyy-mm-dd xyz statement.xxx format and give each one a tag of xyz statement and it still cannot do it.
I think I must be doing something wrong but not sure what?
This is I need to learn - did you first create the groups or did you first add the documents when you started your database? At the moment I’ve been making a new database for each financial year. If you can create a replicant group for a specific year, can you then extract and archive that year’s documents when they get too old? Where I live, tax documents have to be retained for 5 years so it would be awesome to extract them from the database once they get to be 5 years old and archive them elsewhere (or even destroy them).
I apologise that this is so long. I have combed the user manual and read Joe Kissell’s book and still can’t seem to get a coherent automatic paperless workflow using DT
I don’t rename until I actually need to reconcile something with the bank. It’s only when I need to USE a document that I will go through and rename everything in that group, as a way of “getting started”. Otherwise, they are just named YYYYMMDDhhmmss.pdf.
I just don’t remember anymore.
Maybe about 2 hours I’d say. I often do it when I’m on a trip, or at a cafe, trying to avoid doing something else.
I create the groups as needed. I have a hierarchy which separate by category, then sub-category, then the specific institution/payee/whatnot.
Not naming the files manually seems to be a good way to cut down the processing time. I’ve been working (well thinking) on this all day. I went back to reread Joe Kissell’s book and found that there is a new edition so I have been pouring over this.
I also worked out that Hazel can rename files in the global inbox so I am thinking of setting up naming and/or tagging rules for all the standard docs here with perhaps a colour label and then a catch all rule that labels all the files that require manual naming red.
Then if I sort the global inbox by label and do the manual renames.
I’m still trying to come up with a way of getting things to either group & classify or be moved automatically to the correct groups…
It occurs to me to include sub-groups by financial year which can then be moved to an ‘archive database’ either at the end of the year or at the end of the 5 years for which they are required to be kept. This would give me just 2 databases instead of 1 for each year.
Thanks for your feedback
ETA: for some reason I am mistaken and Hazel rules will not run on the global inbox before the documents get sucked into DT. This is a problem I am no closer to solving other than to scan to another folder, run the hazel rules and then run one of the import to DT scripts but its messy because Hazel works better when it runs after the OCR
It’s worth remembering that you can run your OCR at any point between the ScanSnap and DTPO. You can have a chain of folders, with Hazel managing the transfers between them, and folder actions or another application triggering the OCR. I sat down with a sheet of paper and planned a workflow: there were several ways to do it, but Hazel was the key. I had to remember to build in delays (with Hazel), so that files were not snatched by Hazel or any other application before they were properly OCR’d.
This website is useful if you haven’t seen it, especially some of the workflows submitted by third parties or contained in the tutorials: