scan to OCR to Devonthink easy way?

Newbie to all this! Got a new scansnap hooked up to my OSX system and have Acrobat Standard and the full package of devon tools.

My question is how to make a smooth workflow for adding searchable scanned documents to my database. I can see bits and pieces of what is needed but don’t see an optimal way of getting everything where I want it. Right now I wind up with pdfs on my desktop, open them with Acrobat, recognise text and save the document back on desktop. I don’t know if that ocr step is needed or if there is a special save I need to do. Also it is really annoying that the “save as” will not let me navigate to where I want to save stuff. So then I close the document, drag it off to where I want to save it. Then fire up devonthink and drag the document in there. It would be helpful for me to understand what Devonthink is storing… a link… seems like putting the original somewhere consistantly would be a good thing. I tried not saving the document… trashed and empty trash … and it seems like a very compressed image of the document is all that survived. javascript:emoticon(’:x’)

Here is the desired state:

  1. searchabe document stored in a logical place like username>documents>scan archival
  2. document captured/indexed etc in devonthink

If I could do this with automator and a couple of clicks, a file name and return it sure would be nice.

How do you want to import the PDFs to DT Pro?

You want to capture the text for searching and analysis. There are several options available, depending on Preferences settings. These boil down to:

  • Index import, leaving the PDF externally linked. Pro: Smallest database size. Cons: Phrase search doesn’t work. Less portable database.
  • Files & Folders import, leaving the PDF externally linked. Pro. Smaller database size, Phrase search works. Con: Less portable database.
  • Files & Folders import, copy PDF to database body. Pro: Phrase search works. Portable database. Con: Largest database size.
  • Files & Folders import, copy PDF to database Files folder. Pro: Phrase search works. Portable database. Con: Somewhat larger database size.

Notes on above:

  • All searches work for Index-imported files except for Phrase searches.
  • Portability means that the database can be moved to another computer and the original PDFs will still be available for reading/printing.
  • I don’t need to worry about hard drive space and I want my databases to be portable, so I use the last option in the list above. And even were the database to become corrupt, I could retrieve my PDFs by opening the package and retrieving them from the Files folder, so I can safely delete the originals in the Finder. (But always make backup copies of your database, anyway.)

Obviously, scanned PDFs must undergo OCR for readable text to be available. The finished files, some of which may be multi-page documents, can be saved or dragged into a folder to which a Folder Action script is attached. DT Pro provides scripts for that purpose, so that the ‘deposit’ folder can be triggered to automatically send new documents to your database. For example, you can attach a script for Files & Folders import to the ‘deposit’ folder. Each time a new OCRed PDF is saved or dropped into the ‘deposit’ folder, the script will perform the import to your DT Pro database.

Note: some users do OCR with Acrobat Professional, others use ReadIRIS Pro. I use ReadIRIS, but also have Acrobat for the ability to insert or remove document pages.

Note: DEVONtechnologies is working on an addition to DT Pro that will automate OCR from the scanner directly to DT Pro. Because of the size of the additional code, this will be an additional cost option when released. No schedule has been set for release.

Tip: It costs money, but Default Folder simplifies routine tasks such as choosing a Finder folder when doing Save or Save As operations.

Bill,
Thanks for the detail. It explained a bunch of things. Now I am trying things and have more questions.

My preference is like yours… Files & Folders import, copy PDF to database Files folder.

On the configuration to use database folder.
I set my preference on PDF&PS to copy files to database folder. Dragged a file into devonthink, quit the program, and went on a search for the file. If the program is importing the file somewhere, my search should have found it? The only copy is the original sitting on my desktop. I also looked in Library, Applications and the devonthink folder in Documents. Where is that folder?

I had not tried folder actions before… created a folder, rt click to bring up Attach a Folder Action, browsed to chose my folder and then to find the actions in the devonThink folders… attached Action Import and Action Index and then tried it out. I can’t see that anything is happening… sure seems like I would need to tell it where to put those new items.

I can apply the OCR in acrobat but don’t see where/how it saves the result. I open a file, run the ocr then do save as and the file size is almost the same as the original. Seems like it needs to grow. I may need to check out Adobe for that info. The addon to go direct from scanner to devonthink is a winner at the right price.

Default Folder is pretty pricy for a one trick pony that seems like it should be a basic part of every piece of software. I fear a comparison to the windows world would look bad on that one.

Where did my imported documents go?

You can tell DT Pro were to send new File & Folders imports by setting DT Pro Preferences > Import. At the bottom of the Import panel you can set the group to which new documents are sent. I’ve created a group called “Incoming” and set it as the target. There’s also an option to display the last document imported.

If you like to drag files into DT Pro, I recommend the Groups panel. Here’s how:

  • In DT Pro Preferences > General make certain the option “Hide ‘Groups’ panel when inactive” is unchecked.
  • Select Tools > Show Groups.
  • Move the Groups panel to the right side of the screen and click the yellow button to minimize it to the Dock.
  • Make another application, e.g. the Finder, frontmost. Click its icon in the Dock to maximize the Groups panel.
  • Drag a file from the Finder to the destination of your choice in the Groups panel.

Note: You can also drag selected text from a document into a desired group/subgroup from any application.

Default Folder is a bit more than a one-trick pony and a bit smarter than Windows. But I wish Apple folded its functions into the OS. As to the price, I’ll grant that it makes DT Pro look terribly underpriced. :slight_smile: