Refresh PDF After OCR

I have all my PDFs in DEVONthink Pro (great app by the way!), many of them are simple image PDFs. They are all imported using the copy to database folder option and i have deleted the originals. Some of these I want to convert to PDF+Text using Acrobat’s OCR, which is easy enough. I open them up, Capture, and close Acrobat. The thing is that the Devonthink Pro database doesn’t update (even though the actual files in the database are updated). Is there some way to force a refresh of those files?

Selecting the PDFs and choosing File > Synchronize should update the files.

hm. doesn’t seem to work. the pdf is still identified as “image” and the words in it don’t register on a search (even though they do if i open the document and search it in acrobat). i thought synchronize was only for indexed files.

any other thoughts?


Is the PDF encrypted? Or is Preview able to search the contents of the PDF? And finally - which version of Mac OS X do you use?

Not encrypted. Preview can search it. 10.4.4

Thanks for your help (and quick response).

Did the log (see Tools > Log but this panel should be opened automatically) say that the PDF document was updated? Or is the PDF searchable if you import it again?

The log does not show an update. The PDF is searchable (and correctly identified as PDF+Text, rather than Image) if I reimport.

Then synchronize didn’t update the document, maybe because the creation/modification date is still the same?

Does “Synchronize” work on non-indexed files, i.e. just imported files? These files are all imported, though stored in the Files folder of the db, not added to the db file directly.

Touching the modification date doesn’t cause synchronize to update, btw.

Yes, it should support both indexed and imported files. Could you please check if the path of the imported PDF document is still the correct one?

Is it not simply a problem with pdfkit vs pdftext?

I don’t think so as the imported PDF document is searchable. Anyway, could you send the original PDF document and the modified PDF document to our support address? Then I could check this over here, thanks!

these are all my personal docs, so let me see if i can replicate the problem with a dummy doc. btw, the path is correct; it correctly points to the files within the db project.

Well, if the path points to a file inside the database package, then synchronizing can’t work (because the file in the package is still the same old one and does not need to be updated) - it’s only available for external material.

right, but if i launch that file, and make changes, the changes are saved to the file in the database (as evidenced by launching it a second time, and changes are perserved, or exporting and reimporting).

put another way, there is no external copy of this file anywhere (in the sense of outside the db project).

actually, exporting and reimporting is simple enough, if that’s the only way to do this.

Synchronizing checks currently only external files. But v1.1 might be able to synchronize internal files too (stored in the database folder).