I have had a ScanSnap for about 4 years now and have been scanning everything. I originally had a PC, but am now all Mac. I just keep all the PDFs in organized folder hierarchy, but that is getting difficult to manage and I am finding it more and more difficult to find the files I need. Seems like DT might be right for me.
I do have some questions.
Most of the documents are just PDFs not PDF+Text documents. Is it possible to OCR all documents in the database which do not have information? OCR a folder? This would make searching them much easier. Seems like when I OCR a PDF it creates another PDF (with -1 suffix), is there a way to “OCR in place”? Maybe via some type of scripting? Ideally I’d like to keep the creation/modification date the same, so I can maintain this metadata for searching. Otherwise basically every document will have same modified date…
So, can you OCR entire folders of image only PDF files?
Should I simply import all the documents into DT (putting them into the db package) or just index them in place? I kind of like having the idea of one file containing all the documents and then the organization into folders takes place in DT. I do get concerned about having something happen to that one big file though. I guess if I back it up I will be ok… Have there been any issues with database corruption? This information is very important (basically my entire life of bills/statements/household stuff), so lossing it isn’t an option.
What type of backup strategy is sound? Is using time machine enough?
Does the software have any type of function (grouping?) where you can select a scanned document and will attempt to put it in the correct folder based upon data in that document? For example, I always scan my power bill, cable bill and water bill at the same time. It would be nice to be able to select these documents and click some type of magical button have have it put them in the right folders based on the contents of those folders. Perhaps it gives you a “match quality” or something like that to keep things sane, but still would make it easier. So, basically like group but at a global level.
[EDIT – I found the classify/autoclassify feature… but how do I know autoclassify did it right?]
This seems like an amazing package, and playing with it makes me think it will work well for me but I want to hear from other people who are using it.