OCR without importing

obiobson · January 21, 2010, 2:26pm

Hi,

unfortunately, I haven’t found anything on this in the forum although this should be a common question:

With Devonthink Office Pro, is there any way of applying OCR to pdf files but NOT importing them? So that just the OCR Layer remains in the database, but not the actual file? This would, of course, be very handy, because importing dozens of (big) pdfs would dramatically increase the size of the database.

My scenario is: I have a lot of scanned sheet music and would like to able to search for certain titles. But I DON’T want to import a few hundred MB of pdf files in my DTOP database.

Thanks for your suggestions.

Bill_DeVille · January 21, 2010, 8:14pm

No, OCR of scanner output or PDFs will result in storing the resulting searchable PDF in your database.

However, if all you need to do is search for PDFs of sheet music by title, you could save image-only PDFs into a Finder folder, then Index-capture (File > Index) the folder holding those PDFs into your database. The actual PDF files will remain outside your database, but you will be able to search them within the database by Name and/or add notes in the Comment field or Tag them and display the results of searches within your database.

In that case, you could simply scan your sheet music directly to a Finder folder, rather than to DT Pro Office, by controlling the scanner using its own driver/software rather than by controlling it from within DT Pro Office.