Fujitsu ScanSnap + Readiris + DEVONthink = Paperless Office

Well, in this case, it is more of a “Paperless Home”.

We get so much information in the mail, it is hard to manage. So, I decided to bite the bullet and come up with a way of handling everything digitally.

It all starts with the Fujitsu ScanSnap fi-5110EOXM document scanner. I have it set to:

  • Duplex scan
  • Throw away blank pages
  • Always color scan
  • Low compression
  • “Best” scanning quality

It takes about 10 seconds to scan a duplex page.

I put the document in the scanner tray and hit the “Scan” button. The document is scanned and the output is sent to Readiris. It takes about 4 to 5 seconds per page for Readiris to analyze and locate the different text areas.

Once in Readiris, I click “Recognize” which OCRs all the pages and asks for a place to save the resultant file. I choose to save all my scans as PDF+Text. That way, I have a copy of the original, plus I can search the content in DEVONthink. It takes Readiris about 2 seconds a page to format and save out the PDF file.

I keep my files in a directory called “Filing Cabinet” on my Desktop. Within this directory are category directories (Utilities, Financial, Medical) and then within them are directories with the names of the providers. I then use the “Synchronize” command to import the changes into DEVONthink. I choose to organize this way, outside of DEVONthink, because the directory is synchronized with another directory on my server - a server that has a mirrored RAID and daily automated snapshot backups - I wouldn’t want to lose all this information!

This all seems to be working really well. It’s nice to be able to search all my paper mail for a word or two. However, there are a few features that would be great if DEVONthink supported them:

(1) Edit Creation/Modification Times

A lot of times, I am inputting files from the past. Thus, having a creation and modification time of today is incorrect. Now, I can run commands via the Terminal to update this information, but it would be so much easier if I could do it as I was viewing the file within DEVONthink. That way, I could see the date on the scanned file and change it via the “Get Info” pane. I would hope that this would not just change the metadata on the file but the actual ctime and mtime on the file system.

(2) Tagging

Just like iPhoto has tags, it would be very advantageous to be able to tag files with keywords. For example, I may get correspondence from my financial institutions which are tax related. While the files would go in the institutions folder, I would love to be able to tag the file with “Taxes 2005” so I can find all my tax documents at once. Obviously, I could just enter “Taxes 2005” in the comments section, but a small keyword bar where I could view, add, delete, and change the keywords associated with a file would make life easier.

One other question:

I’ve been doing all this work with a demo copy of DEVONthink. I’m now ready to make a purchase. Given the way I am using the app, would there any benefit to getting DEVONthink Pro?

Thanks,

  • Tony

Tony:

Very good report. And thanks for your comments.

Direct-to-DT Pro scanning is a development project, although that will probably be an extra-cost option.

This is a very interesting, Tony. Thanks for the detail. I just “discovered” the ScanSnap, and have an eye on it (the other eye on my wallet :frowning: ). Also good to know that ReadIRIS works in your application.

Have you investigated using an Automator program, saved as a folder action, that would automatically prompt you for Spotlight comments on each file entered into the folder? This might help your workflow, though it’s probably not everything you’d like.

I can’t really answer your question, since I’m not (yet :wink: ) attempting to duplicate your paperless office workflow. DEVONtech maintains a table comparing the features of DT and DTP; you can find it here.

Of course, you may find additional uses for DT after you purchase it. If so, the additional features of DTP may play an important role in that scenario.

Your report is very interesting for us.
I as discovered (and buy) ScanSnap in Apple Expo Paris France Europe. in september 2005. The Scanner is very very efficient.
With folder actions scripting, it is easy to create scenarii. BUT the acrobat 7 standard in Fujitsu bundle is buggy (unable to OCR accentuated chars as currently used in Europe - the bug remain since version 6 but Adobe sleep, sleep…) so as you, I test Read Iris 11 very successfully…
BUT the ScanSnap Manager app UI is not fluent (you need to change through 2 or 3 tab window) to change setting and destinations… SO, I have created a small app (ScanSnap Prefs Manager) to change Scan Setting (scanning scenario) in one click before scanning a paper… And so with Folder Action, it is possible to :
scan a document and send it by fax
scan a picture and archive it in Pictures Folder
scan a page as searchable pdf and archive it in DTpro…
With AppleScript attached to doc (a note or…), I plan to set the scanner as wished in DTP scenarii…
In SOHO or in a Dpt, now the doc management is X possible !
I hope to release a package in few weeks…

Michel,

I would be very interested in your application and scripts, and would be happy to test it for you, if you are interested. Since what you are doing sounds quite similar to something I was working on a while ago, I thought I might share a small snippet of code using readiris that seemed (to me at least) relatively hard-won. It sounds like you might be further than I was, but I was able to script readiris to open a document and start recognizing it, though I was unable to figure out the syntax for its “save as” command in order to get it to save the document as PDF. But it’s close to working, and if it did, one could script also script the whole cycle out of DEVONTHINK (raw pdf)–>Readiris(OCR)–>Devonthink (PDF+text).

Please tell me if I can be of help. I sure would love to see these two programs able to really work together!

best,

Erico


set thePath to "/Users/eric/Pictures/"




tell application "Finder"
	
	set theFolder to POSIX file (thePath & "/") as string
	set folderlist to entire contents of folder theFolder
	repeat with the_file in folderlist
		if (the_file as string) ends with ".pdf" then
			tell application "Finder" to set the_file_path to the the_file as string
			set the_posix_file to POSIX path of the_file_path
			
			
			tell application "Readiris"
				
				open file (the_file as string)
				--rotate by clock180
				deskew
				
				try
					with timeout of 6000 seconds
						recognize front document saving to ((the_file as string) as file specification)
					end timeout
					
				end try
				--THIS DOES NOT WORK, BUT WHY NOT?
                             ----save front document in file "path:path:pathf" as PDF	
			end tell
		end if
	end repeat
end tell



As you mention, Readiris’s AppleScript 11 offer some surprises…

  1. in Menu settings - text format - the output must be unchecked - (if the file specification syntax is not good, the result (with automatic name) is saved in Readiris folder of Documents.)

  2. the right code is :

recognize front document saving to "Disk:Users:michel:Documents:TempReadiris:fromRI.pdf" as file specification

BUT to work the file “fromRI.pdf” MUST exist before and it is overwritten - Try it by creating a sample pdf - It will be overwritten -

I am just working in a controlled flow of naming for a scenario : Scanning - Folder action trigering the OCR script and saving in DTpro -

next step in few days…

Hi guys,

It’s great to see the interest in this solution. At the MacWorld we announced support for scanners that will be forthcoming sometime this year. We also demoed it on the show floor. Here we had an Automator workflow setup that ran OCR on the output of a Fujitsu ScanSnap scanner and fed the result in DT Pro. It worked very nicely, just press the Scan button. :slight_smile:

There will be support for setting the creation date and PDF keywords. :slight_smile:

So we will ship this but in what form or price is not clear yet. Stay tuned for more specific announcements.

Shouldn’t the first one be doable now via Applescript? There’s already the switch mod date to creation date script. Of course, I don’t know how to write such a script :slight_smile: Anyone?

  • Is using 'aliases" in the info panel (Tools/show info) inside DT a form of tagging? They become potential alternative wiki-link terms.
  • In DT-pro set preferences to ‘import finder comments’ which are specifically searchable within DT

I strongly support the request for a keywords feature in DT; and not just for PDF-documents, but for any kind of document. This is one of the things I really do miss in DT. Data base applications which are much simpler than DT (like for instance MacJournal) do already have a keywords feature.

For what it’s worth… I just unpacked my brand-new Fujitsu ScanSnap, pointed it at DEVONthink Pro, and pressed the Scan button. Multi-page color 2-sided scans showed up instantly as PDFs inside DTP. No Automator, no AppleScript, nothing.

I haven’t tried OCR yet (because my IRISlink license is four versions old!), but for image management, it works wonderfully!