Saving OCR layer to indexed PDFs

Sorry if this has been addressed elsewhere. I was unable to find any threads.

I’m using Office Pro. I realized today that if I OCR PDFs in an indexed folder, the OCR layer is searchable within DEVONThink but not if I open the PDF directly in the file system in another application (e.g. Preview). If I export the PDFs to a new folder, the OCR layer comes along with them.

Is there a way to have DT save the OCR layer directly to the PDF in the file system when it does OCR on indexed files? Thanks.

When DEVONthink converts a PDF to a searchable PDF it does not overwrite or delete the original non-searchable PDF. For an indexed folder, the new searchable PDF is internal to DEVONthink. So, you have the two PDFs in the same DEVONthink group – the original is indexed, the new searchable PDF is internal. If you want the new PDF to move to the indexed folder in the filesystem, select it and chose Move to External Folder from the contextual menu. The file will be moved outside the database (remaining indexed). Because two documents in the filesystem cannot have the same name the external file will have “-1” suffixed to the name.

DEVONthink is doing all this in order not to interfer with your data (i.e., it does not delete the non-searchable file), and not to interfer with your naming conventions (i.e., within the program it will use the same display name for both files but externally the file-system does not permit this).

If you plan to do a lot of OCRing it’s usually more convenient to import the files, do the OCR, delete the non-searchable files (if you wish).

Perfect. Thank you. I finally did realize that the OCR’ed PDFs within the index folder were internal when I did a reveal on them. This really mystified me until I read your post because i just assumed anything in the indexed folder was indexed. It never occurred to me that the indexed folder could turn into a mix of indexed and non-indexed files. Now that I know this, however, I can move them manually back into the external folder after they are OCR’ed.