I have imported a large number of e-mails into Devon Think Pro, with many of them including PDF files. Is it possible to have put OCR on the PDF files ?
Not while they’re attachments in the emails.
Do the attachements actually need OCR?
The Pro edition indexes the contents of attachments, when possible.
I have tried to search for some text that should be in PDF files that are attached in the e-mail.
Should I re-index them again, and if so how is the best way ?
Drag the attachment out of the email into the database to see if there is a text layer in it.
My 2 cents here: with the apple mail plugin (probably a left over from DTPO 2) I automatically import the mails and their relating attachments to an “Attachments” subfolder. If this folder is watched by a smart rule the OCR should be possibly done “by magic itself”.
Yes, that works. But when I import the complete e-mail with pdf attachment there is no OCR.
How can I change this, because I have 3000+ e-mails with pdf attachments I would like to have OCR.
Is there any possibility to get OCR on my database with imported e-mails with pdf attachments ?
The contents (including attachments) of emails can’t be modified. The only possibility would be to store the attachments on their own (e.g. via the “Add attachments to DEVONthink” script in the Scripts menu extra)
Could you help me a bit with this ?
So I have a database called OUTLOOK ARCHIVE, filled with e-mails that included PDF attachments.
Should I make first a new database speciale for the attachments ? And from there ?
I can’t find the " add attachments to Devonthink" script ?
This and other scripts can be installed via DEVONthink 3 > Install Add-On… and can be found in the Scripts menu extra while Apple Mail or Microsoft Outlook is the active application. To use them select some messages first and the script will then add the attachments of the messages to DEVONthink.
My fault - this script is only available for Apple Mail.
Is there any other way to get a script that would get all PDF attachment out of the database and put it in a separate map ?
You could use this script:
However, this requires at least DEVONthink Pro 3 and in case of lots of messages might run for a while.
I’m curious. Why do you assume the attached PDFs need OCR?
Do you suggest that every email with pdf attachment the pdf file automatically is searchable?
Especially scans might require OCR but other PDF documents usually not.
Jist because a file is a PDF, that does not mean it needs OCR. As @cgrunenberg mentioned, this would only be the case for scans and PDFs with no text layer. Unless you are receiving scans from someone, the assumption would be there is a text layer as many PDFs come from text-based sources, like Word, InDesign, etc.
OK. I did some testing with pdf’s attached to e-mail. We get daily invoices from suppliers with PDF per e-mail. After checking searching for some keywords on the invoices Deveonthink Pro 3 doesn’t find any word…
If someone could help me with suggestions ?
I use Microsoft Outlook script.
See the script that I suggested (Extract image files from formatted notes?) which could be used to extract all attachments from the selected emails. You could of course customize the script so that it adds only PDF documents.