I have a good number of old pdf files which date back to 2002. Although I was able to import them to DT, they are not “read” by the software: DT is not able to search inside these files. What is the best way to address this issue? How can I make these files legible to DT?
A related question is the following: I have now some 500 or so articles and files in my DT. I don’t know exactly which ones are “read” by DT and which ones not. How can I isolate the files that are not properly read by DT?
Image-only PDFs do not contain a searchable text layer. Such PDFs (if they were scanned at sufficient resolution - 300 dpi or better is recommended) can by converted to searchable PDFs by OCR (optical character recognition).
DEVONthink Pro Office includes an OCR capability and can convert image-only PDFs to searchable PDFs.
DEVONthink displays the Kind of image-only PDFS as ‘PDF’ and of searchable PDFS as ‘PDF+Text’.
You can add (or remove) the Kind column to a view window. For example, if you view the All PDFs smart group, choose View > Columns and check the option for Kind. Now, click on the Kind header to sort all your PDFs by Kind.
Those that are shown as PDF are image-only and are candidates for conversion to searchable PDF (Kind = PDF+Text).
Thank you so much. Very indeed helpful.