Newbie needs help with tagging & OCR

I just got my scansnap and DNPO yesterday and I am pretty overwhelmed. I have a zillion questions, but I will start with the first few basics to get me started.

  1. How do I get DT to OCR my PDFs?
    I have my Scansnap set to save to folder and not perform OCR (still trying to decide which piece of software to use for each function, DT, Evernote, Receipt Wallet, etc). Also, the OCR tool wouldn’t queue items and I had to manually select each file to OCR which was a pain. Work around? I figured I would have DT do the OCR for me, but when I drag the newly scanned PDFs into DT it doesnt seem to perform that task. How can I fix this?

  2. I want to tag all my newly imported and OCR’d documents and create smart folders (ala iphoto and itunes smart albums and playlists) instead of manually sorting things into folders. I can’t seem to figure out how to tag a bunch of documents at once.

  3. Merge PDFs and move pages around
    When I import my newly scanned and OCR’d documents into DT, some of the pages will need to be trashed, others moved around within a PDF and some merged with other documents. How do I do this?

THANKS A BUNCH! I am so excited to empty my file cabinet!!!

I’m pretty much in the same boat.

Install the add-ons which will install the OCR functionality to DEVONthink. In ScanSnap and the ScanSnap manager select the DEVONthink profile. This will bring your scans directly to DEVONthink. I have profiles for Receipt Wallet, standard Acrobat and DEVONthink; each has their own use. I’ve found it easiest to do my OCR scanning directly to DEVONthink. While one scan is going through the recognition process you can scan your next document; it will go into queue.

After OCR recognition completes you’ll be presented with a dialogue to name your doc and add keywords. Once done, you can click on your new file (Inbox) and you’ll see the page thumbnails in the sidebar. Control click (or right click) on the thumbnail (page) you don’t want and select cut. Furthermore you can drag the thumbnails in the order you choose.

I am not sure how to install the “add-ons.” WHen Devonthink launhes, it does say I have abby fine reader OCR. Are those the add-ons you are talking about? What do i do with the PDFs that i have already created that have not been OCR’d? HOw do I get them to be OCRd and into DTPO?

I have asked this to other users as well, and I’m curious to hear why people don’t use the Help > Search option in our application to find out what is possible with the application. In this case, typing “PDF” or “Convert” will show you the menu option to use. It is a great way to explore the menus of the application.

I did search for “OCR” but didn’t get the results I was looking for. In general (not yours) help files aren’t very helpful (in my experience) and a waste of time.

On a side note, I did discover that I could select a bunch of my pdfs in DTPO and right click on them to convert to searchable text. My PDFS are being processed now, but I noticed that I now have two documents of the same name, one PDF and the other PDF+TEXT. Can I delete one of them? Which one? I don’t want to clutter up my database with duplicates. Thanks

Take a look at Preferences > OCR. There’s an option to delete the image-only PDF after it has been converted to a searchable PDF.

Which one to delete? If Kind = PDF, it’s image-only, not searchable. If Kind = PDF+Text, it’s searchable.