OCR on PDF files attached to e-mails?

Could you help me a bit with this ?

So I have a database called OUTLOOK ARCHIVE, filled with e-mails that included PDF attachments.
Should I make first a new database speciale for the attachments ? And from there ?

I can’t find the " add attachments to Devonthink" script ?

Thanks

This and other scripts can be installed via DEVONthink 3 > Install Add-On… and can be found in the Scripts menu extra while Apple Mail or Microsoft Outlook is the active application. To use them select some messages first and the script will then add the attachments of the messages to DEVONthink.

Sorry, still don’t see the "add attachments " script ?

My fault - this script is only available for Apple Mail.

Is there any other way to get a script that would get all PDF attachment out of the database and put it in a separate map ?

You could use this script:

However, this requires at least DEVONthink Pro 3 and in case of lots of messages might run for a while.

I’m curious. Why do you assume the attached PDFs need OCR?

Do you suggest that every email with pdf attachment the pdf file automatically is searchable?

Especially scans might require OCR but other PDF documents usually not.

Jist because a file is a PDF, that does not mean it needs OCR. As @cgrunenberg mentioned, this would only be the case for scans and PDFs with no text layer. Unless you are receiving scans from someone, the assumption would be there is a text layer as many PDFs come from text-based sources, like Word, InDesign, etc.

OK. I did some testing with pdf’s attached to e-mail. We get daily invoices from suppliers with PDF per e-mail. After checking searching for some keywords on the invoices Deveonthink Pro 3 doesn’t find any word…

If someone could help me with suggestions ?

I use Microsoft Outlook script.

See the script that I suggested (Extract image files from formatted notes?) which could be used to extract all attachments from the selected emails. You could of course customize the script so that it adds only PDF documents.

Thanks.

I total have no experience with scripts. What steps should I follow to adjust the script that it adds only PDF documents ?

And would it be possible that I make a map inside the database called “pdf attachments” and that the scripts will put all pdf’s in that map ?

E.g. like this…

-- Import attachments of selected emails or formatted notes

tell application id "DNtp"
	set theSelection to the selection
	set tmpFolder to path to temporary items
	set tmpPath to POSIX path of tmpFolder
	
	repeat with theRecord in theSelection
		if (type of theRecord is unknown and path of theRecord ends with ".eml") or (type of the record is formatted note) then
			set theRTF to convert record theRecord to rich
			
			try
				if type of theRTF is rtfd then
					set thePath to path of theRTF
					set theGroup to parent 1 of theRecord
					
					tell application "Finder"
						set filelist to every file in ((POSIX file thePath) as alias)
						repeat with theFile in filelist
							set theAttachment to POSIX path of (theFile as string)
							
							if theAttachment ends with ".pdf" then
								-- Importing skips files inside the database package,
								-- therefore let's move them to a temporary folder first
								set theAttachment to move ((POSIX file theAttachment) as alias) to tmpFolder with replacing
								set theAttachment to POSIX path of (theAttachment as string)
								tell application id "DNtp" to import theAttachment to theGroup
							end if
						end repeat
					end tell
				end if
			end try
			
			delete record theRTF
		end if
	end repeat
end tell

A map? Do you mean a group?

yes sorry a group.

I would like to call this group : PDF BIJLAGEN

You could replace this…

tell application id "DNtp" to import theAttachment to theGroup

…with…

tell application id "DNtp"
	set theGroup to create location "/PDF BIJLAGEN" in (database of theRecord)
	import theAttachment to theGroup
end tell

Like I say I have no experience with scripts…

so like this ?

– Import attachments of selected emails or formatted notes

tell application id “DNtp”

set theSelection to the selection

set tmpFolder to path to temporary items

set tmpPath to POSIX path of tmpFolder

repeat with theRecord in theSelection

if (type of theRecord is unknown and path of theRecord ends with “.eml”) or (type of the record is formatted note ) then

set theRTF to convert record theRecord to rich

try

if type of theRTF is rtfd then

set thePath to path of theRTF

set theGroup to parent 1 of theRecord

tell application “Finder”

set filelist to every file in (( POSIX file thePath) as alias )

repeat with theFile in filelist

set theAttachment to POSIX path of (theFile as string )

if theAttachment ends with “.pdf” then

– Importing skips files inside the database package,

– therefore let’s move them to a temporary folder first

set theAttachment to move (( POSIX file theAttachment) as alias ) to tmpFolder with replacing

set theAttachment to POSIX path of (theAttachment as string )

set theGroup to create location “/PDF BIJLAGEN” in (database of theRecord)

import theAttachment to theGroup

end if

end repeat

end tell

end if

end try

delete record theRTF

end if

end repeat

end tell

seems not to be working…

where do I put this script and how should I run it ?

You didn’t insert the first/last line of the replacement snippet.

-- Import attachments of selected emails or formatted notes

tell application id "DNtp"
	set theSelection to the selection
	set tmpFolder to path to temporary items
	set tmpPath to POSIX path of tmpFolder
	
	repeat with theRecord in theSelection
		if (type of theRecord is unknown and path of theRecord ends with ".eml") or (type of the record is formatted note) then
			set theRTF to convert record theRecord to rich
			
			try
				if type of theRTF is rtfd then
					set thePath to path of theRTF
					set theGroup to parent 1 of theRecord
					
					tell application "Finder"
						set filelist to every file in ((POSIX file thePath) as alias)
						repeat with theFile in filelist
							set theAttachment to POSIX path of (theFile as string)
							if theAttachment ends with ".pdf" then
								-- Importing skips files inside the database package,
								-- therefore let's move them to a temporary folder first
								set theAttachment to move ((POSIX file theAttachment) as alias) to tmpFolder with replacing
								set theAttachment to POSIX path of (theAttachment as string)
								tell application id "DNtp" to import theAttachment to theGroup
							end if
						end repeat
					end tell
				end if
			end try
			
			delete record theRTF
		end if
	end repeat
end tell

I have created a test database TEST MAIL with a group PDF BIJLAGEN

No idea how I should run the script now (and if above script is correct now ?)