Consolidating Databases and Identifying OCRd Documents

joesan · November 19, 2008, 1:50pm

Hi,

Could someone tell me the best way to -

Consolidate a number of DevonThink databases into One database
Identify which PDFs within a Database have not been OCRd and which have?

Thanks!

Bill_DeVille · November 19, 2008, 5:55pm

You can export the contents of a database to a (new) folder in the Finder, then use File > Import > Files & Folders to import the folder, or All the contents of the folder into the database that’s to become the consolidated database. Alternatively, you might use the Scripts > Export > Daily Backup script, then import those files and folders into the consolidated database.

Comment: The DEVONthink 2 applications are planned for release before the end of 2008. You might want to hold off, as the process may become simpler, and in any case multiple databases can be open at the same time, and searches will work across the open databases.

Use Tools > History. This will provide a flat file view of all the documents in your database. If there’s not already a Kind column, use View > Columns and check Kind. Now you can sort by Kind. Image-only PDFs had Kind = PDF. Searchable PDFs have Kind = PDF+Text. In this way, you can identify the candidates for OCR.

Note: As memory serves, if you select multiple PDFs and invoke Data > Convert > to Searchable PDF, the existing group locations of the documents may be changed. To be on the safe side for keeping the organizational locations intact, select image-only PDFs one at a time for conversion.

Note: Preferences > OCR provides user choice as to whether the original PDF will be sent to the trash, or retained in the database (in which case there would be two copies, image-only and searchable).

joesan · November 19, 2008, 6:54pm

Thank you Bill for such a complete and helpful answer.

I think I will hold off on the consolidation until I have purchased DTP 2.0. In the meantime I can sort out the OCRd from the non-OCRd documents.

BTW has any comparison been carried out between the effectiveness of Finereader OCR versus Adobe Acrobat 9 both are available to me but I am not sure which provides the best accuracy. Have you tested them both?

Bill_DeVille · November 19, 2008, 7:21pm

Suggest you try both, as I can’t comment. I found some glitches in the OCR test layer with early releases of Acrobat 8, depending on the PDF version number that was saved.