Scanner/software recommendations for document management?

Hi there,

I have been thinking about trying to go paperless with all the bills, statements, etc. that I get in the mail every day. Ideally, I would like to scan in all those papers into PDFs would be fed into a document management system, which can then classify each paper by date, keywords, etc. I did a number of searches for such a solution on the Mac and finally came up with DEVONthink. So my questions are:

  1. Do you think DEVONthink the right tool to use for my needs?

  2. Do you know of any Mac-compatible scanners with automatic document feeders (preferably duplex, for scanning both sides of a page) that work well in this situation?

Any guidance would be greatly appreciated!

Regards,
Bill Lin
billin@jadeforest.com

This is one thing I use use Devonthink for. Here’s my workflow:

Bills/statements come in in one of two ways. If I can download a PDF, so much the easier. If not, I use an HP All-in-one scanner with READI.R.I.S. OCR/Scanning software. This creates a PDF image with indexable text. Import to DT which indexes text and classifies into paperwork structure I have built. Very easy and works very well.

I believe Fujitsu makes some scanners like you are looking for. Something like a ‘Snapscan’ There is a Japanese OS X driver for it on the internet somewhere. The scanners are fairly pricey though.

Hope this helps,

John

Hey John,

Thanks for the reply - it sounds like you’re doing exactly what I’m hoping to do, myself. Thanks, too, for pointing out the Fujitsu scanners. I’d looked at them before but couldn’t find a Mac OS X driver at the time. After reading your post, however, I dug deeper and found that a Mac OS X driver is apparently in the works:
http://forums.macosxhints.com/printthread.php?t=32046

The referenced model, the Fujitsu ScanSnap fi-5110EOX, is apparently a great, compact heavy-use document scanner that’s the cheapest of its type. At the lowest price I could find, $336 after rebate, I don’t know that most people would call it cheap, but perhaps it’s worth it for a machine that will eat all the various papers coming into my house. Hmm…

How many documents would you say you’ve scanned in so far? Hundreds? Thousands? How has DEVONthink’s performance held up with the number of docs you have?

Anyone else have any suggestions for Mac-compatible duplex ADF scanners?

Regards,
Bill Lin
billin@jadeforest.com

I use an HP 7310 All-in-One
Network Compatible
Will scan both sides of a sheet
Comes with Readiris 9.0
I need to use some applescript to do what I want but I’m getting more and more comfortable with just dumping scans to DT and either looking for thumbnails or DTs quick search.

Scan and OCR as PDF to retain the original look but searchable

7310 list price 399.99–got mine for 256.00 wired networking but I connect to my router and works nicely as a printer for my PDA
7410 list 499.99 has wireless networking buil in

Good Luck
Mike (my first post hope it helps)

In issue #2 of DEVONtalk the Canon LIDE scanners are mentioned as being compatible with DEVONtalk. Canon prices for this line of scanners seems to be a lot less than models mentioned in this thread. Does anyone have experience using the Canon scanners?

I am primarily interested in scanning text from myriad sources (such as newspaper articles) that I now have to enter manually or store as clippings in files. I would say that I would not be using a scanner heavily. And I use a film scanner digital imaging purposes, so that isn’t a necessary feature when it comes to a flatbed (whereas the physical size of the scanner is somewhat important.)

Has OCR really improved dramatically since I last looked into it (several years ago?) Is it actually useful technology?

Mojo:

I’ve had an older Canon LiDE scanner, the N1220U, for 4 or 5 years. I use it primarily for scanning documents such as business letters, bills & receipts, some newspaper, magazine and journal articles. Once in a while, copies of photos. Most scans are at 300 dpi, black & white, when I plan to run them through OCR for input to DEVONthink.

Teamed with ReadIRIS 9, it does a very acceptable job of producing PDF+text output. If the input docs are good quality, the OCR text output is essentially error-free (except for fine print, text in logos and the like). Yes, OCR has gotten better over the years. I had to scan a number of papers about 10 years ago. The best OCR software at that time was pretty bad – I spent many hours editing output. With ReadIRIS 9 and PDF+text output, I don’t even bother with editing; it’s good enough for my purposes.

My scanner setup is slow, running with USB 1.1. OK for occasional use, but not to support a paperless office.

One of these days I’ll look into a faster scanner with automatic paper feed, such as one or two suggested above, and with FireWire or USB 2 connection.

I also use an LiDE 20, which I use when I have odd size original.
Very inexpensive, will scan as Bill said PDF+text.
The software that comes with the scanner gives you an option to output a searchable PDF.
The output file is a two part file–the image in front that looks like the original and a layer of text behind that you don’t see but Devon sees it to search for etc. Very simple once it’s set up, put the paper in-Punch a button–Scanned into DT as a searchable file that looks like the original.
Just a note–the text layer when OCRed isn’t perfect but usually enough key words are recognized to find it with a text search.
Don’t know if the LiDE 20 is still available but I think the 30 is an maybe
another models that I assume would have the built in OCR in the software.

Mike

My apologies for not getting back to the forum sooner…

I had decided to order a Canoscan 8400F and I just happened to mention it to my neighbor this afteroon, and he offered to give me his old (but mint condition) Epson Perfection 1260 scanner.

It appears that it will work fine with Mac OS X, but I am curious about OCR software that I can use with it. I imagine that the software that is bundled with the scanner is for OS 9 (but I haven’t taken possession yet so this is a wild guess on my part.)

If someone has experience with OCR software that will work with the Epson I would appreciate hearing about it.

Mojo:

I just checked the Epson support site. There is a TWAIN driver for the Perfection 1260 scanner that will work under OS X 10.2 - 3.

For producing PDF + text (which displays the original image in PDF, but has a searchable text layer), ReadIris 9 has done a good job for me.

Scanning is a 2-step process for me: Scan and save as TIFF or PDF images, then OCR and save as PDF+text. Scanning takes longer with my old Epson N1220U (USB 1) than the OCR processing. I’ve got boxes of papers I’d like to scan and OCR, but will probably wait until I get something like the Fujitsu scanners, which are much faster (USB 2) and can do 2-sided scanning.

To my surprise, I’ve gotten good results with OCR of images taken with my Ricoh Caplio RR1 digital camera. The RR1 has a special black and white TIFF mode for copying text. Good lighting and a good copy stand (which I don’t have) are necessary. Capturing images this way is certainly faster than with my Epson N1220U, and I may play with this a bit more. A portable copy stand might have me set up for doing library research.

Thanks for the info Bill. I checked out the Readiris 9 software. At $129 it costs as much as a new Canoscan 8400F, so it isn’t much of a deal even if I get the scanner for free. And if I understand correctly, using the software is a two-step process because it doesn’t have its own scanner driver.

I’m a little confused at this point because the review I read claimed that Readiris 9 is the only Panther-compatible software currently available. But Canon scanners are recommended on the DEVON web site, along with the OCR software that is bundled with the scanner.

It would seem that OCR isn’t quite as easy as I thought. I was under the impression that a person can scan a document and it will be converted to plain text, but I guess that isn’t the case. You first have to scan and save as a TIFF or PDF, and then use the OCR software to produce a searchable PDF that is identical to the original? I’m not sure that this is going to do me much good, since I want to scan newspaper and magazine articles.

Hey Mojo,

You can get ReadIris Pro 9 for $99 at Amazon.com. Better yet, there’s a $50 coupon on it if you buy it today or tomorrow (click the “Special Offer” link at the top of the page):
http://tinyurl.com/3vw2j

At $50, ReadIris Pro is a bargain and a worthy companion to the rest of one’s setup. Personally, I just bought a Fujitsu ScanSnap 5110EOX document scanner + DEVONthink + ReadIris Pro 9, and I’m ready to go to town on the whole paperless office. Here goes nothin’!

Bill

I’m envious, as I had to do some scanning today with my SLOW scanner.

ReadIris 9 can output scanned and OCR’d material in several formats: PDF, PDF+Text, text and (Word) RTF.

Straight PDF output is very compact – but you can see any OCR errors that may have crept in.

If I need to edit OCR errors, I export as Word RTF, spell-check in Word, then save as PDF. The resulting PDF is compact and looks good. ReadIris does a great job of maintaining the original document’s formatting in the Word RTF conversion, but there can be font difference and line break differences.

I usually export from ReadIris as PDF+Text. The resulting files can be large, and I often run them through PStill to reduce the file size (by up to a factor of 10). PDF+Text is the way to go for legal documents, or other documents that should retain the exact appearance of the original file.

The consumer market for OCR software is pretty small, a fact that has always surprised me. The result is that there’s little software development going on for Macs. Over the years, I’ve bought three OCR apps: FineReader Pro 5 (runs only under Classic), Omni-Page Pro and ReadIris 9 Pro. Of these only ReadIris has had recent development. I would rank ReadIris the best of the three, followed by FineReader Pro 5, with Omni-Page Pro a distant third (I’ve never been satisfied with Omni-Page’s backgrounds and images). All three do a reasonably good job of text conversion – much better than the apps of 10 years ago. ReadIris 9 also does a good job of capturing text and images from PDFs into editable Word RTF.

I have a feeling that DEVONthink will stimulate more people to move their important paper documents into computer-readable form.

Thanks for the additional info. Now I have something to chew on manana…

I too am interested in this type of document management. A document (say a bill for example) would have a date (or even a date range) but it would not necessarily be the creation or modification date of the file. How would you use DevonThink to find things by Date?

For an image-only PDF, e.g. a bill from Joe’s Hardware, you could add simple keywords to the Comment field in the documen’ts Info area, such as date (050502), type (bill), category (hardware). Or you could use a similar scheme in naming the documents, perhaps a scheme that uses the document name beginning with the date (050505), type (bill, receipt, etc.), category (hardware, auto, whatever).

Example: Using the Search tool you could do a search on a keyword in the Comment field, then do a Name sort – which, in my example of Names beginning with YYMMDD, would allow inspection/selection for a date or date range. Depending on how you use a simple set of keywords there are lots of possibilities. DEVONthink Pro has scripts that let one replicate selected Search results into a new group, then attach a script to that group that is in effect a second-level search just of the search hits. (This procedure can be manually replicated in DEVONthink PE, of course – , manually replicating search results into a new group, then searching/sorting/selecting items from that group.)

So with a bit of prior thought about your particular needs, metadata about your PDF documents can be incorporated into the Name and Comment fields that allow many logical possibilities for managing and searching your data.