Devonthink, Readiris pro & Scansnap

archivist · March 31, 2006, 10:40am

I own allthree. But I can’t find a clue in Devonthink pro 1.1 how to scan and ocr a document in Devonthink in one pass.
Did I miss a clue?
Or is this script not yet implemented and only made knowledgeable to the rest of the world that it will come soon.
Did anyone else write an Applescript for this purpose already?

Regards,
Erik de Jong

annard · March 31, 2006, 11:58am

We have demonstrated the integration at both the MacWorld and CeBIT this year, so it is written. I cannot say when nor how, but it will be released in some form later this year.

At the moment you could try to send the output of the ScanSnap to ReadIRIS and its output to DEVONthink Pro. I think if you have the corporate version you could use Folder Actions to set up an automated workflow (one folder for output of ScanSnap to ReadIRIS, another one for output of latter to DEVONthink Pro). Not sure how well it would work though.

For now I can only ask you to stay tuned…

creed · April 6, 2006, 3:40am

I too am interested in this capability. It seems DEVONtechnologies is making a big marketing splash with Fujitsu on this topic, but customers can’t do it? Bwaaaaa!

rickl · May 31, 2006, 11:36am

Please add me to the list of people keenly interested in this!

Bob_Sprague · May 31, 2006, 3:26pm

I to am interested but waiting to buy the scanner when I see the DTP integration. Put me on your mailing list!

Is ReadIris doing a good job of OCR?

Oyvind_Solstad · May 31, 2006, 4:45pm

I would love a feature like that. When I read newspapers and magazines, I rip out anything interesting. A scanner that took the documents, and automatically scanned them, OCR’ed them and entered them in DT would be a wonderful time saver.

Oyvind

howarth · May 31, 2006, 5:49pm

I think that scan-OCR process might not be as time-saving as you would wish. It’s far quicker to write a short note in DTP describing the contents of a physical article and where you’ve stored it.

Also, a very high percentage of periodicals now also have online sites, with full-texts of their articles. I would use DA to look for titles before clipping or ripping.

Oyvind_Solstad · May 31, 2006, 7:56pm

I usually Google the articles or pieces I want to enter in DT, but it’s mostly a waste of time. Far from everything is publsihed online. It’s a reason that there are about a million different paper magazines to be bought despite of the internet.

I’ve also tried: Scanner vs Typing. If I had a setup that had a button on the scanner saying “to DT” or the opposite - a button in DT that said “from Scanner”, and a good OCR solution, I would choose that over typing (which takes thinking to do).

The best thing would be if my mobile phone (which is never more than a meter away no matter where I am) had an amazing 5 megapixel camera: Snap a picture, press a button, and off it goes by Bluetooth to the Mac that OCRs it and puts it in DT. I can dream…

Bill_DeVille · June 1, 2006, 12:04am

howarth, your cautionary note about efficiency is a good one. There’s no point in scanning and OCR’ing a paper document if an online equivalent is easily available. And even at the 30-35 ppm scanning rate of the ScanSoft it’s not always feasible or efficient to scan everything, notably bound material such as books. In that case insertion of a note into the DT Pro database remains efficient and useful.

Some organizations distribute papers and article in PDF format, but they’ve been scanned at low resolution and OCR is impossible. In that case, I import them and just enter some notes in the Info panel’s Content field. That also applies to some scanned documents that are in bad shape for OCR (faded or low-resolution text, handwritten notes, etc.).

Document scanning followed by OCR into a DEVONthink database can be very useful. Lawyers, for example, are often handed numerous documents that must be read and analyzed. The ability to transform them into computer-readable form enhances the ability to search the documents and makes huge stacks of paper much more manageable and transportable.

Businesses generally deal with many paper documents which, when scanned and OCRed can be found more quickly, take a relatively miniscule filing space and can easily be backed up off-site without the time and expense of making additional sets of paper copies.

One of my own projects is to scan and OCR some of my old papers and books that date to that antediluvian period before computer-readable formats became common. I also hope to eventually recover some of my home office space that’s currently occupied by ever-growing folders and boxes of paper and consists of old project documentation (some of which have historical interest to others), correspondence, personal and financial records and so on. Some of those papers can be stacked into a ScanSoft document feeder and scanned at a rate of 30 or so pages per minute, then automatically entered into a que to be automatically OCRed and transferred to my database while I’m using my computer for other purposes.

As an inveterate packrat, I’ve got thousands of pages of documents accumulated over the years. Although I may remember that I kept something, I often have trouble finding it. With over 1.5 terabytes of available hard drive space on my computers and DT Pro, I dream of turning my collection into one that not only allows me to find stuff, but easily respond to requests that I get for copies of some of the material.

Although I’ve been doing limited scanning and OCR for years using a flatbed scanner, each page takes over a minute to insert, setup and scan and pages must be exchanged manually. It’s not feasible to tackle my collection that way. But at 30-35 pages per minute scan time (two-sided paper included) and automatic DEVONtechnologies OCR and database transfer, the ScanSoft/DEVON combination can make paper replacement practical.

Will I ever eliminate all the paper from my office? Probably not. But I expect to at least reverse the rate of growth while making some of those documents far easier to find and more useful.

pb1 · June 4, 2006, 4:31pm

Let me try. I don’t know about Scansnap, but in Readiris Pro:

Choose from menubar > Settings > Text Format…. I prefer saving as PDF (Text-Image) but you can save the document as anything.
From Output box (located at the right-bottom of the window), click Send to and choose Add Application.
Browse to the folder DTP is located, highlight DTP, and press return (or click the button Open.
Click the button OK to close Text Format window.

Did I win a cookie?

pgodley · June 8, 2006, 4:38am

You can do the same thing (send files automatically) from Scansnap Manager to Readiris:

• Choose from menubar > Settings

• Select “Application” from the button panel going across the window

• Click the “Add or Remove” button > “Add” button > “Browse” button

• Browse to the folder where Readiris is located, highlight Readiris, and press return or click the button “Open.”

• Cose the “Add or Remove” Window

• Select “Readiris” from the Application dropdown menu

I have PDF selected under “File option” and the ScanSnap successfully sends files directly to my copy of Readiris 9.0. After initiating OCR, Readiris sends PDF+Text files directly to DTP without a problem, using pb’s configuration instructions above.

All that’s left to make it completely automatic is a Quickeys sequence that detects a new Readiris window, waits for the file to load into Readiris and then starts the OCR process.

milhouse · June 8, 2006, 6:05pm

In case others are interested, I found a scansnap 5110EOXM on newegg for $316! (after mail in rebate).
Of course I ordered one.
That seems rather like a steal compared to the $450 - $650 pricing I’ve seen elsewhere.

Here’s the link:

dealmac.com/search.html?search=F … &x=12&y=10

I am looking forward to getting rid of much paper-based clutter in the coming weeks.

I also look forward to a DTPro + scansnap solution in the future.

cheers

Knight_of_Nee · June 25, 2006, 3:09am

So whatever happened with this. I haven’t seen anything more about the paperless office stuff. I have a HP Scanjet that I want to scan directly into DTP with. Any suggestions?

Bill_DeVille · June 25, 2006, 3:45am

As Annard stated, stay tuned.

I got a Fujitsu ScanSnap for Mac a few days ago and have already scanned several hundred pages. The ScanSnap is fast, actually faster than the multiple-thousand-dollar scanners my agency bought four years ago. And it has no problem with duplexed paper (printed on both sides).

As with any stack feeder, throughput is great if the papers stacked in the hopper are in good shape. Flimsy paper can result in multiple pages being grabbed. Stacks that had been stapled together can also have feed problems (of course, staples must be removed before scanning). But even when one has to feed pages a sheet at a time, scanning is quick.

Even the ScanSnap won’t make my office paperless anytime soon; I’ve got many thousands of pages of paper in file boxes. But I’m more than keeping up with incoming paper and making a dent in a few file boxes. The HUGE advantage is that putting the scanned and OCRed material into DT Pro means that I can actually FIND THINGS! And hard drive space is much smaller and cheaper than file cabinets.

I’m keeping my Canon LIDE 500F flatbed scanner for the occasional photo and scans of book pages. But the ScanSnap is more than an order of magnitude faster for unbound paper scans.