Consolidating Databases and Identifying OCRd Documents


Could someone tell me the best way to -

  1. Consolidate a number of DevonThink databases into One database

  2. Identify which PDFs within a Database have not been OCRd and which have?


  1. You can export the contents of a database to a (new) folder in the Finder, then use File > Import > Files & Folders to import the folder, or All the contents of the folder into the database that’s to become the consolidated database. Alternatively, you might use the Scripts > Export > Daily Backup script, then import those files and folders into the consolidated database.

Comment: The DEVONthink 2 applications are planned for release before the end of 2008. You might want to hold off, as the process may become simpler, and in any case multiple databases can be open at the same time, and searches will work across the open databases.

  1. Use Tools > History. This will provide a flat file view of all the documents in your database. If there’s not already a Kind column, use View > Columns and check Kind. Now you can sort by Kind. Image-only PDFs had Kind = PDF. Searchable PDFs have Kind = PDF+Text. In this way, you can identify the candidates for OCR.

Note: As memory serves, if you select multiple PDFs and invoke Data > Convert > to Searchable PDF, the existing group locations of the documents may be changed. To be on the safe side for keeping the organizational locations intact, select image-only PDFs one at a time for conversion.

Note: Preferences > OCR provides user choice as to whether the original PDF will be sent to the trash, or retained in the database (in which case there would be two copies, image-only and searchable).

Thank you Bill for such a complete and helpful answer.

I think I will hold off on the consolidation until I have purchased DTP 2.0. In the meantime I can sort out the OCRd from the non-OCRd documents.

BTW has any comparison been carried out between the effectiveness of Finereader OCR versus Adobe Acrobat 9 both are available to me but I am not sure which provides the best accuracy. Have you tested them both?

Suggest you try both, as I can’t comment. I found some glitches in the OCR test layer with early releases of Acrobat 8, depending on the PDF version number that was saved.