DTPO imports Evernote-Notes directly, but if you have a lot of documents that have been scanned as a JPG, you loose the possibility to search these documents .
Evernote permits the search, but don’t let you migrate the OCR-text.
It is not possible to OCR again such documents in DTPO, as it doesn’t permit to handle the imported JPG as an image and push the OCR-Button.
Thanks to a forum user, ifound this service:
Their servers sync between several cloud services (Evernote, Dropbox, Google Drive…).
After signing on and after permitting them to connect via API on your databases, you can program syncs like “Oneway Sync : every Evernote Note in the notebox xyz -> transform in a PDF -> put in in a folder xyz on Dropbox.” I unchecked every other import form, so every note has been transformed in a PDF.
When syncing is finished and your computer has retrieved the Google Drive -Content, you can drag and drop the PDFs in DTPO and OCR them (select all, OCR).
This works pretty good, i had "only 78 documents with a sync error of the first 3000 notes that i synced in this way. Will continue all syncs and report here in the end.
I had to choose “Google Drive” and not Dropbox, although Dropbox is much more reliable than Google Drive, but Dropbox replaces the timestamp of all documents with the date of the sync. This because of the API of dropbox that doesn’t permit keeping the date, the CloudHQ-support said.
With Google Drive, your documents keep the creation date.
If other users go this way, i would be happy to hear how it did work.
I forgot : it seems that Google Drive could also OCR the documents while syncing. (preferences google drive) For me it’s to late, i a m OCRing the imported documents in DTPO, but next time …
Can you judge the OCR quality of Evernote vs. Google Drive vs. DTPO?
no, sorry, unfortunately i didn’t make comparisons, as the Evernote OCR is not accessible in an easy way, and the Google Drive i didn’t use it as it was too late.
One way to roughly check the quality of OCR is to select a PDF+Text in DEVONthink and use Data > Convert > To Plain Text. This will reveal only the text layer of a PDF, which is usually what OCR creates. It’s usually only necessary to do this to check a page or two of the PDF to get an idea how well text was recognized. By doing this with PDFs from different OCR sources, you can get a general idea how to compare OCR engines.
great idea, will do this, thank you.
Kudos on the text layer technique belong to others, not me, especially Bill DeVille who made this tip part of DEVONforum lore a few days after PDFs were invented … or at least a looooong time ago
/hoists Bill onto the shoulders of the cheering crowd
Just to note a refinement on the text layer technique: you can quantify the quality of the OCR if you start off the test by scanning a text file. After scanning, and going around the loop of extracting a text layer from the pdf+text, you can use one of the many file diff tools to see (and in the case of some tools automatically count) the differences between the text layer and the original text file. This can be handy in running through various OCR settings to check the best balance of accuracy/file size etc.