OCR on PDF files attached to e-mails?

Do you still use the above script or a modified one? Or any third-party scripting additions?

This one :

– Import attachments of selected emails or formatted notes

tell application id “DNtp”

set theSelection to the selection

set tmpFolder to path to temporary items

set tmpPath to POSIX path of tmpFolder

repeat with theRecord in theSelection

if (type of theRecord is unknown and path of theRecord ends with “.eml”) or (type of the record is formatted note ) then

set theRTF to convert record theRecord to rich

try

if type of theRTF is rtfd then

set thePath to path of theRTF

set theGroup to parent 1 of theRecord

tell application “Finder”

set filelist to every file in (( POSIX file thePath) as alias )

repeat with theFile in filelist

set theAttachment to POSIX path of (theFile as string )

if theAttachment ends with “.pdf” then

– Importing skips files inside the database package,

– therefore let’s move them to a temporary folder first

set theAttachment to move (( POSIX file theAttachment) as alias ) to tmpFolder with replacing

set theAttachment to POSIX path of (theAttachment as string )

tell application id “DNtp”

set theGroup to create location “/PDF BIJLAGEN” in ( database of theRecord)

import theAttachment to theGroup

end tell

end if

end repeat

end tell

end if

end try

delete record theRTF

end if

end repeat

end tell

maybe you can give me the one you have tested that is working ?

This script from a former post works fine over here:

-- Import attachments of selected emails or formatted notes

tell application id "DNtp"
	set theSelection to the selection
	set tmpFolder to path to temporary items
	set tmpPath to POSIX path of tmpFolder
	
	repeat with theRecord in theSelection
		if (type of theRecord is unknown and path of theRecord ends with ".eml") or (type of the record is formatted note) then
			set theRTF to convert record theRecord to rich
			
			try
				if type of theRTF is rtfd then
					set thePath to path of theRTF
					set theGroup to parent 1 of theRecord
					
					tell application "Finder"
						set filelist to every file in ((POSIX file thePath) as alias)
						repeat with theFile in filelist
							set theAttachment to POSIX path of (theFile as string)
							if theAttachment ends with ".pdf" then
								-- Importing skips files inside the database package,
								-- therefore let's move them to a temporary folder first
								set theAttachment to move ((POSIX file theAttachment) as alias) to tmpFolder with replacing
								set theAttachment to POSIX path of (theAttachment as string)
								tell application id "DNtp" to import theAttachment to theGroup
							end if
						end repeat
					end tell
				end if
			end try
			
			delete record theRTF
		end if
	end repeat
end tell

Again - this script works also - but only from the script editor…
This script does also not skip the already processed e-mails - so it duplicates

What is it that I have different in settings than you ?

That’s a good question. We’re using the same versions of macOS and DEVONthink.

Any new thoughts to get this working ?

Unfortunately none as it’s working as expected over here. Does it work using a clean, second user account (see System Preferences > Users & Groups…)?

If I run a clean second user - than it says that the script is “false” after running it from the script editor…

And did it actually import anything? Do the selected emails contain PDF attachments and do you use the Pro or Server edition?

I have used the same database that contains 3 e-mails with PDF attachments.
DevonThink 3 Pro edition

In that case the only option is to debug the script on your own, e.g. by adding logging or displaying a dialog after each step.

Is there an option that I can upload the complete script so you can check ?
Because what I don’t understand is that if you have the exact same settings as me and it works for you and not on my Mac - I would say the script must have a failure ?

Actually I use exactly the same script from your post.

Might also be a different macOS setting or some 3rd-party app.

But what I don’t understand : if you already have extracted some e-mails with PDF, and you run the same again does it skip the ones you have already extracted ? When I run from the script editor I also get after running 2x the script, double.

So that’s why somehow I think you have a different script than I have…

The script doesn’t skip anything, the PDF attachments are always imported.

Thanks. Would be nice that I could run the script every now and than to extract new pdf’s from the email archive ( that is growing every time I sync with Outlook) and not having every pdf’s again imported in that group that already is there.

I feel because I have zero knowledge of scripts that it stops here for me.

I simple can’t understand why it doesn’t work on my system on a fresh 2020 iMac with Outlook and DEVONthink - all latest versions.

Are you running the script in Outlook or DEVONthink?

I am seeing no issue selecting an email in DEVONthink and running the script.

The email

The attached PDF extracted

Running directly from DevonThink

Is *.eml e-mail the same as E-mail message ?

Because in your screenshot I see attachment.eml and my e-mails are marked as “Email message”
Can that be the issue ?

Look at mine :

Yes, an .eml file is an email.