Scanning workflow too complex

MikeP · March 9, 2013, 8:55pm

I’m migrating from PC to Mac, and DTPO appears to be an awesome replacement for PaperPort, but am finding the scanning process ridiculously complex for the simple task that it is. To get a multi-page document into DT using my Epson GT-S50 dedicated duplex scanner, the steps are:

Menu “File > import > from scanner” to open the Image Capture ‘sub-app’
tick duplex (this never gets saved, annoying)
give the document a name
Scan
Expand the “Documents” tree
Select document I just scanned into
Click the ‘send to’ button in bottom left to move it from the Image Capture sub-app into DT’s inbox
Wait for OCR to finish
File it from the inbox into the folder I want
At some point I’ve also go to delete it from the Image Capture app (also annoying!)

In workflow terms, document scanning should be fast and simple. I can do anything from 5-50 scans a day. DT forces me to manually perform four actions on a document on it’s journey from paper to searchable PDF (name it, OCR it, file it, delete the scan).

In PaperPort the same could be done with a single click. Whatever folder I’m in, I’d click a ‘scan’ button and a searchable PDF was created from whatever pages are in the scanner. Whilst the OCR is going (a few seconds), I would drop the next page(s) into the scanner, change folder if necessary and click ‘scan’ to process the next document. I could scan and file 10 multi-page documents in 2-3 minutes. DT is terribly fiddly in this regard.

Am I missing something or is there a way to simplify things? The other alternative I’m currently looking at is purchasing ABBYY/PdfPen/ReadIris to do the scan-OCR process and using DT Pro instead of DTPO, but even so it is still a two-step process for each document, and it’s frustrating because technically DTPO contains all the capabilities I need.

Any input appreciated.
Thanks,
Mike

Bill_DeVille · March 9, 2013, 10:27pm

You don’t have to delete the original scanner file, as there’s an option in DEVONthink Pro Office Preferences > OCR to take care of that.

You don’t have to name the document at the time of text recognition. There’s a Preferences option to eliminate that step, which means that you can establish a queue of documents being processed for OCR, that doesn’t stop and wait for you to name a file. I I prefer naming the searchable PDF within the database, as in many cases I can select a text string in the document that would make a good Name, then press Control-Command-I. Often, little or no typing is required to Name each document.

Filing can be simplified with a bit of planning. If you have a series of new documents going into a specific database, make that database frontmost and set Preferences > Import - Destination to send new items to the Inbox of that database. Then the Classify assistant can speed up filing.

My favorite scanner for DEVONthink Pro Office is the Fujitsu ScanSnap. It can be set to handle duplex pages automatically and detect/skip blank pages. ScanSnaps are fast, handle paper smoothly and the optics produce images with excellent clarity and contrast, resulting in good OCR accuracy. Put a multipage paper document in the feeder, press the Scan button and in moments OCR starts in DEVONthink Pro Office. While that document is being processed, put another document (one page or multipage, two-sided copy or one-sided copy) in the feeder and press the Scan button. Keep doing that until the stack of paper to be scanned is finished. It’s a very efficient workflow.

MikeP · March 11, 2013, 12:05am

Thanks Bill for taking the time to respond. The ScanSnap process you describe is exactly my previous workflow and how PaperPort works with any scanner. Drop stuff in, press button, when it runs out of paper that’s the end of your document. Repeat.

It’s a pity Devon requires a specific scanner with proprietary software for a good workflow. The Epson I have is comparable to the ScanSnap S1500 but somewhat faster, especially in duplex.

With your input, I have now worked out a slightly more efficient way to do this (really could be made more intuitive), but it’s still a pain. Much clicking is still needed to get to the scan dialog, then I still always need to set my scanner parameters (at least each time DTPO is launched), then I need to count my pages and type them in for each job.

I tried to use blank pages as separators, but ran into a rather silly quirk: in duplex mode it adds a blank page to the end of each document (because it sees one sheet as two blank pages). Really, this is simple stuff…

I ran into a couple of other of annoyances and bugs. If a queue is interrupted the in-process items stay in “Processing” in the scan queue and cannot be removed. I need to shutdown and restart DTPO.
Another odd one: “OCR failed” on the 4th scan in a batch, whilst it was still OCRing the first one. No further messages or clues as to what’s wrong.

The speed of scanning and OCR is also pretty awful - although the results are excellent.
A scan and OCR of 5 pages at 200dpi took 3 minutes 20 seconds. At 300dpi it took 4 minutes. Using Epson Scan and ABBYY, the same at 200dpi took 59 seconds and at 300dpi it took 1:02. The OCR results were almost identical (both performed very well on fine print on coloured background).

Sorry about the long post, but I like DevonThink as a product and am hoping some of this gets back to the right people. At the moment the scanning feature seems more like an afterthought and definitely needs a bit of attention.