Extract image files from formatted notes?

jsmith · August 28, 2019, 11:12am

Is there a way to automatically/batch extract image files from formatted notes? I’d like to batch convert the image contents of multiple files into PDF then run OCR.

jsmith · August 28, 2019, 11:15am

If you convert the whole note to PDF the images are “printed” onto a page or multiple pages which isn’t ideal.

cgrunenberg · August 28, 2019, 11:22am

That’s not possible currently, neither via the interface nor via AppleScript.

jsmith · August 28, 2019, 11:29am

Many thanks for swift response.

Any third party options / work arounds you can think of?

cgrunenberg · August 28, 2019, 11:32am

A script converting the notes to rich text first should actually work:

tell application id "DNtp"
	set theSelection to the selection
	set tmpFolder to path to temporary items
	set tmpPath to POSIX path of tmpFolder
	
	repeat with theRecord in theSelection
		try
			if type of theRecord is formatted note then
				set theRTF to convert record theRecord to rich
				if type of theRTF is rtfd then
					set thePath to path of theRTF
					set theGroup to parent 1 of theRecord
					tell application "Finder"
						set filelist to every file in ((POSIX file thePath) as alias)
						repeat with theFile in filelist
							set theAttachment to POSIX path of (theFile as string)
							if theAttachment does not end with ".rtf" then
								-- Importing skips files inside the database package,
								-- therefore let's move them to a temporary folder first
								set theAttachment to move ((POSIX file theAttachment) as alias) to tmpFolder with replacing
								set theAttachment to POSIX path of (theAttachment as string)
								tell application id "DNtp" to import theAttachment to theGroup
							end if
						end repeat
					end tell
				end if
				delete record theRTF
			end if
		end try
	end repeat
end tell

jsmith · August 28, 2019, 11:49am

Amazing. Thank you!

jsmith · August 28, 2019, 11:52am

Any way to name the extracted images after the original file name sequentially?

Mel · January 12, 2022, 7:39pm

Thanks for posting this script which I thought might help me.
Unfortunately when applying I get the error message "on performSmartRule (Expected end of line, etc.but found end of script.)
I’m trying to separate mixed items then have them grouped from merged notes imported from Evernote into DEVONthink Pro. These items are mostly JPG’s and PDF’s.
Is there a way the script could be altered to run correctly?

cgrunenberg · January 12, 2022, 7:43pm

The above script doesn’t contain this line actually. Where & how exactly did you use the script’s code?

Mel · January 12, 2022, 7:52pm

Hi, Thanks for replying.
I made a smart rule and applied the script to a test folder containing four html and rtfd files.
The html files are the result of an import from Evernote and the rtfd files are the result of a test conversion with DEVONthink.
My problem is that I have thousands of mixed notes which I have imported from Evernote.
Each of these notes is a merged note containing JPG’s and PDF’s and a text timestamp.
I would like to extract the items, group them and then make the items searchable.
I am able to extract the items successfully by exporting from DEVONthink to and external folder but of course they ar not grouped.

Is there a way I could automate the extraction, grouping and OCR within DEVONthink?

If so this could be a useful solution for others migrating from Evernote.

cgrunenberg · January 12, 2022, 8:02pm

The source of the smart rule’s script doesn’t seem to be valid and causes the error message.

Mel · January 12, 2022, 8:25pm

Does this mean that I have set the rule incorrectly?
Below is the Smart Rule I created with the script you provided earlier in the thread.

Mel · January 12, 2022, 11:31pm

To try and avoid the source problem I created a new folder and a single RTFD file to test the script.
The error message I now receive is "on performSmart Rule (Error -1708)”
Is there something simple that I could be doing wrong?
Thanks for helping me.

cgrunenberg · January 13, 2022, 9:45am

The syntax of the script used by the Execute Script action is not correct according to the error. Press the Edit Script… button and then the small preview button in the lower left corner to check the syntax.

Mel · January 13, 2022, 7:11pm

Thanks. I found the below which is a copy and paste of the Smart Rule posted earlier. Should I be changing parts of it to suit my situation? I’m using DEVONthink Pro 3.8.

tell application id “DNtp”
set theSelection to the selection
set tmpFolder to path to temporary items
set tmpPath to POSIX path of tmpFolder

repeat with theRecord in theSelection
	try
		if type of theRecord is formatted note then
			set theRTF to convert record theRecord to rich
			if type of theRTF is rtfd then
				set thePath to path of theRTF
				set theGroup to parent 1 of theRecord
				tell application "Finder"
					set filelist to every file in ((POSIX file thePath) as alias)
					repeat with theFile in filelist
						set theAttachment to POSIX path of (theFile as string)
						if theAttachment does not end with ".rtf" then
							-- Importing skips files inside the database package,
							-- therefore let's move them to a temporary folder first
							set theAttachment to move ((POSIX file theAttachment) as alias) to tmpFolder with replacing
							set theAttachment to POSIX path of (theAttachment as string)
							tell application id "DNtp" to import theAttachment to theGroup
						end if
					end repeat
				end tell
			end if
			delete record theRTF
		end if
	end try
end repeat

end tell

chrillek · January 13, 2022, 7:45pm

You can’t use it like that in a smart rule. Please read the documentation, section “Automation”. Make sure you include your script in a on performSmartRule handler. In this context, accessing the selection makes no sense because the handler receives the “selected” records (i.e. those matching the smart rule’s criteria) as a parameter.

If you search around in this forum, you’ll find a ton of examples for smart rule scripts that you can use as a starting point.

Slightly off-topic: Since images are apparently included as data URLs in a formatted note, it should be possible to extract them using JavaScript (or AppleScript, but I wouldn’t want to try that These data URLs could then be converted to proper images with online tools (your favorite search engine will tell you more about those).

What exactly is it that you’re trying to achieve?

Mel · January 13, 2022, 11:59pm

My EN workflow over the years involved scanning, (with a Fujitsu ScanSnap), documents individually as JPG files then merging them to create an EN note. I then relied on EN’s servers for searching.

On importing my EN files (ENEX files via Import > Files and Folders not DT’s built in feature) I’m left with numerous HTML documents in DT with the only searchable text being the timestamp created by the scanner.

Additionally, the HTML documents have other files “embedded” eg. Searchable PDF’s, XLS files etc.

My aim was to use a smart rule to extract and group the individual files then OCR the non-searchable documents

My current solution involves converting the HTML files to RTFD then exporting (Export > As Website) to an external folder which separates the embedded files successfully. I’ll then import the files into DT, sort the relevant non-searchable files via a smart folder and perform OCR on them. The searchable files won’t be grouped but this should be acceptable for my purposes. I might experiment with DT’s “Group Similar Items” a little more but have found the results a little hit and miss.

I’m quite comfortable scanning and importing into DT and will make sure that everything is searchable going forward!

Thanks again for your help and have a good weekend.

chrillek · January 16, 2022, 11:47am

In case anyone is interested in this topic, I whipped up an example JavaScript function (using some ObjC, too) to do just this: Extract embedded files from Evernote (or possibly formatted) notes.

BLUEFROG · January 16, 2022, 5:34pm

Very nice. Thanks for sharing it

Mel · January 17, 2022, 10:20am

Thanks! Certainly helped reducing the steps required for my migration and may help others with the same problem