Searchable PDF

ngc4900 · September 19, 2010, 1:58am

Hello,

Just purchased Devonthink pro office. I purchased the “take control” book as well.

I apologize if I ask a stupid question. My disclaimer.

I mainly import PDF’s, emails, and scan receipts etc (paper) into pdf’s via my Scansnap. Prior to DT, I never used OCR software since most of my stuff has been in folders in OSX finder and Spotlight usually found items. Now with the more data to store and save, I feel that a database will be of benefit to me.

Now that I am bringing in PDF files to DT does it make sense to make them all searchable? It seems to increase the file size. Am I correct that if I don’t use OCR, I will only be able to search a PDF by its name, metadata, group location etc and not its file contents?

I hope I am making sense…

Maybe a better question is: when would you NOT want to perform an OCR on a PDF file?

Just now, I imported a PDF attachment (receipt) from an Apple mail into DT with out performing OCR on the file and I was able to search words in the document.

So, I guess I am confused about using OCR and PDF’s.

thanks

Kim Marietta

Snow Leopard 10.6.4
2.8 GHz Intel Core 2 Duo
4 GB memory
Mac Book Pro
Fujitsu S1300 Scanner
DTPO 2.0.3

kmlawson · September 19, 2010, 2:15am

Hi, there are two kinds of PDF. One is essentially a file which is a series of images. A scanned document, or collection of photos of a document, etc. This does not have a “text layer” that is searchable.

Other forms of PDF, usually created from digital documents rather than scanned from a paper version, have their original text preserved in the document’s text layer.

Thus, you only need to “convert” (OCR) PDFs that lack that text layer. In the “Kind” column you can see “PDF” or “PDF+Text” - you only need to OCR/convert documents of the former in order to get them to be searchable.

K

ngc4900 · September 19, 2010, 4:23pm

Thanks so much. That makes perfect sense. Bringing my attention to the “kind” column was very helpful.