Extract image files from formatted notes?

Hi @chrillek — the link to your script isn’t working at the moment. Any chance you could share it again? :slight_smile:

Thanks for letting me know. I’ve updated the link in the original post and post it here, too.

1 Like

I’m finally getting around to giving this a shot now, but I’m struggling to get the script to work in a Smart Rule.

This is how I’ve set it up:

With the script being:

function performsmartrule(records) {
    let app = Application("DEVONthink 3");
    ObjC.import("Foundation");

    function extractDataURIs(file, prefix, targetFolder) {
        /* file: Path to source file with embedded data URIs 
         prefix: If set, prefix of the generated files. 
                If not set, the file's name will be used 
                if that's not possible the prefix will be set to 
                  "embedded-data"
         targetFolder: Folder where the generated files will be saved. 
           If not set, the current user's desktop will be used.
        */

        const myDesktop = Application.currentApplication().pathTo("desktop");

        /* Setup target folder and file name prefix */
        const targetDir = targetFolder ? targetFolder : myDesktop;
        const basename = (() => {
            if (prefix) {
                return prefix;
            } else {
                /* Get the file's basename w/o extension. If that fails
                 use "embedded-data" as prefix */
                const fileDestruct = file.match(/.*\/([^.]+)(:?\.*)?/);
                return fileDestruct ? fileDestruct[1] : "embedded-data";
            }
        })();

        /* Read the current record's raw data into 'data' */
        const fm = $.NSFileManager.defaultManager;
        const data = fm.contentsAtPath(file);

        /* Convert the raw data to an UTF-8 encoded JavaScript string */
        const txt = $.NSString.alloc.initWithDataEncoding(
            data,
            $.NSUTF8StringEncoding
        ).js;

        /* Assemble all data URIs in an array looking for 
           'src="data:doc type/extension;base64,...."'
           Note the usage of the 's' flag in matchAll to treat the whole 
           string as a single line.
           */
        const base64Matches = [
            ...txt.matchAll(/src="(data:(?:.*?)\/(.*?);base64,.*?)"/gs),
        ];

        /* Loop over all data URIs */
        base64Matches.forEach((data, index) => {
            /* The first capturing group contains the complete data URL
             'data:image/png;base64,...'
               The second capturing group of the RE contains 
                   the MIME type "extension", i.e. jpg, png etc.
              */
            const fullMatch = data[1];
            const extension = data[2];

            /* Build an NSURL from the complete data URL. 
            Note: MUST URL-escape the raw data first! */
            const matchNSString = $.NSString.alloc.initWithString(fullMatch);
            const url = $.NSURL.URLWithString(
                matchNSString.stringByAddingPercentEscapesUsingEncoding(
                    $.NSASCIIStringEncoding
                )
            );

            /* Build an NSImage from the NSURL */
            const error = $();
            const imageData = $.NSData.dataWithContentsOfURLOptionsError(
                url,
                null,
                error
            );

            /* If the image could not be created, log the error */
            if (!ObjC.deepUnwrap(imageData)) {
                const errorMessage = $(error.localizedDescription).js;
                console.log(errorMessage);
            }

            /* Build a new file name of the form
               basename/prefix-number.extension
            */
            const newfile = `${basename}/${prefix}-${index + 1}.${extension}`;

            /* Write the image to the file */
            imageData.writeToFileAtomically(newfile, true);
        });
    }
    // Loop through each DevonThink record
    for (let record of records) {
        app.logMessage(record.name());
        app.logMessage(record.path());

        // Extract necessary information from the DevonThink record
        let file = record.path();

        // Extract the name of the DevonThink record and use it as the prefix
        let prefix = record.name();

        // Optional: Determine the target folder based on the record, if desired
        // let targetFolder = "";

        extractDataURIs(file, prefix);
    }

}

(() => {
    let app = Application("DEVONthink 3");
    performsmartrule(app.selectedRecords());
})();

First and foremost, this crashes and outputs the error on performSmartRule (Error: Error: Message not understood.).

Secondly, I am a bit lost regarding how to define the targetFolder in a way so the output is in the same DT group as the Formatted Note.

Any ideas or pointers?

PS! I noticed a typo on line 23 in your published script. It should read:

return fileDestruct ? fileDestruct[1] : "embedded-data";

instead of

return fileDestruct ? fileDesctruct[1] : "embedded-data";

With our without writing your log messages to the log window?

This part is only needed when you run the script in script editor on some selected records. Remove that if you’re using the script in a smart rule.
For the time being, I suggest leaving these lines where they are, copy/paste the complete script into script editor, select one record and then run the script in the script editor. Enable logging there first. Then you cam see where the error message originates.

Don’t confuse the concepts. A DT group is just a logical construct, it has no representation in the file system. I suggest saving your image wherever in the file system first and then import it into the correct DT group.

Thanks for spotting the typo, I’ll fix that asap.

Still gives same error after commenting out the logging.

Same error there:

Error -1708: Message not understood.

In the log I get this:

app = Application("DEVONthink 3")
	app.selectedRecords()
		--> [app.databases.byId(1).contents.byId(492327)]
	app.databases.byId(1).contents.byId(492327).path()
		--> "/Users/tkrunning/Library/Application Support/DEVONthink 3/Inbox.dtBase2/Files.noindex/html/5/myfile.html"
	app.databases.byId(1).contents.byId(492327).name()
		--> "myfile"

app = Application("Script Editor")
	app.pathTo(["desktop"])
		--> Error -1708: Message not understood.
Result:
Error -1708: Message not understood.

Guessing it might be caused by the part of the code that sets the path to my desktop.

I also tried passing in a targetFolder, without that changing anything:

        // Optional: Determine the target folder based on the record, if desired
        let targetFolder = "/Users/tkrunning/Library/Mobile Documents/com~apple~CloudDocs/DevonThink Inbox with OCR";

        // Call the function
        extractDataURIs(file, prefix, targetFolder);

That does not answer my question at all.

No guessing there – that’s the culprit.
Change that to

const ca = Application.currentApplication();
ca.includeStandardAdditions = true;
const myDesktop = ca.pathTo("desktop");

Why would that change the error happening in the problematic line – it’s executed anyway, even if you pass in a targetFolder.

The issue here was that I forgot to use ca.includeStandardAdditions = true; before calling pathTo. JXA is picky in that aspect.

Thanks a lot for all the help!

I’ve updated the script now to reflect those changes, and added a console.log() statement to verify that the myDesktop constant is set correctly.

Now there aren’t any errors, but no file is output either.

The log output is as follows:

app = Application("DEVONthink 3")
	app.selectedRecords()
	app.databases.byId(1).contents.byId(492327).path()
	app.databases.byId(1).contents.byId(492327).name()

app = Application("Script Editor")
	app.pathTo("desktop")
	/* /Users/tkrunning/Desktop */

Result:
undefined

Any ideas why it’s still not working as expected?

Perhaps I misunderstood your question. Were you not asking whether it was my logging statements (e.g. app.logMessage(record.name());) that caused the error? The answer to that is that the error is present whether the logging statements are there or not.

Ok, now the error is out of the way. It becomes a bit more complicated to track down the problem, since Script Editor won’t tell you anything about the ObjC-JXA bridge. I suggest peppering the code with console.log(...) statements to see what works and what doesn’t.

For example, put a
console.log(fullMatch);
and
console.log(extension);
right after
const extension = data[2];
Similarly, use console.log(imageData); right after const imgeData = … ;

Unfortunately, there’s no better way to debug JXA code than peppering it with these console.log() calls.

I intended to say “To you see these messages or dont’ you?”. If you saw them, the error was located after them. But that’s out of the way now.

1 Like

Here’s the output of those three (I truncated the first one), are they what you’d expect?

/* fullMatch data:image/jpeg;base64,
iVBORw0KGgoAAAANSUhEUgAABqEAAAlgCAMAAAA/Fm0FAAAA8FBMVEUgIDwNDRkgIDsyMl5FRYAaHy4PEhoiKDs1PlxIVXxaa50QGBgkGSoUDhguIDVIMlNhRHF7Vo6UaKyueslxcXEUExQsKyxEQ0RcW1x0c3SMi4yko6S8u7zU09Ts6+wiKyQUGRUtOTBGWUtgeWZ5mYBAJTcVDBIyHStOLkRrP12IUHelYZDCcqnfg8L8lNu5fmYaEg47KCFc[REMOVED THE REST OF THE OUTPUT AS IT'S VERY LONG]   */
/* extension jpeg */
/* imageData [id OS_dispatch_data] */

Edit 1: I also console logged newfile:

/* newfile myfile/myfile-2.jpeg */

Could that be part of the issue as the note contains two image files, not just one?

Edit 2: I tried with a formatted note containing just one image and it also doesn’t output anything. Log output in that case is /* newfile myotherfile/myotherfile-1.jpeg */

Edit 3: I’m not sure whether the newfile constant should also include the targetDir (I don’t see it being used anywhere), so gave that a try. I also added a bit more debugging:

            const newfile = `${targetDir}/${basename}/${prefix}-${index + 1}.${extension}`;
			
			console.log("newfile",newfile);
			
			if (!imageData.writeToFileAtomically(newfile, true)) {
			    console.log("Failed to write file:", newfile);
			}

This outputs:

/* newfile /Users/tkrunning/Desktop/myotherfile/myotherfile-1.jpeg */
/* Failed to write file: /Users/tkrunning/Desktop/myotherfile/myotherfile-1.jpeg */

But still no file written to the expected folder…

Thanks for providing the data for me to play with. The problem was actually stupid code and missing error checking. One should add this to the script

  // make sure the target dir exists

  ca.doShellScript(`mkdir -p "${basename}"`);

after the lines that define the basename. In your case, the script saved the file to a folder “embedded-data” that didn’t exist, and that didn’t even raise an error. If you create that folder, the file is written there correctly.

But as you’re looping over DT records, all this is probably not a good idea – the script was written more from the standpoint of something running outside of DT. I’d suggest doing something like

const newfile = `/Users/tkrunning/Desktop/WHATEVER/${prefix}-${index}.${extension}`;

and to crete the folder with mkdir -p ~/Desktop/WHATEVER in the terminal first.

1 Like

Thank you so much! I’ve finally got it working now. In addition to your fix I also made the following tweaks to the lines where directory is created and newfile is set):

ca.doShellScript(`mkdir -p "${targetDir}/${basename}"`);
const newfile = `${targetDir}/${basename}/${prefix}-${index + 1}.${extension}`;

That way it uses the directory I pass in to the function, otherwise it wasn’t being used.

EDIT: I tweaked it a bit further to easily be able to toggle whether the subdirectory is created or not (passing in the boolean createSubDir to the function). Here’s my full updated code:

My final code looks like this (in case it’s helpful for others—just make sure to change the targetFolder towards the end of the code):

function performsmartrule(records) {
    let app = Application("DEVONthink 3");
    ObjC.import("Foundation");

    function extractDataURIs(file, prefix, targetFolder, createSubDir) {
        /* file: Path to source file with embedded data URIs 
         prefix: If set, prefix of the generated files. 
                If not set, the file's name will be used 
                if that's not possible the prefix will be set to 
                  "embedded-data"
         targetFolder: Folder where the generated files will be saved. 
           If not set, the current user's desktop will be used.
        */

        const ca = Application.currentApplication();
        ca.includeStandardAdditions = true;
        const myDesktop = ca.pathTo("desktop");

        /* Setup target folder and file name prefix */
        const targetDir = targetFolder ? targetFolder : myDesktop;
        const basename = (() => {
            if (prefix) {
                return prefix;
            } else {
                /* Get the file's basename w/o extension. If that fails
                 use "embedded-data" as prefix */
                const fileDestruct = file.match(/.*\/([^.]+)(:?\.*)?/);
                return fileDestruct ? fileDestruct[1] : "embedded-data";
            }
        })();
		
		// make sure the target dir exists (if using outputting in sub-directory)
		
		if (createSubDir) {
	        ca.doShellScript(`mkdir -p "${targetDir}/${basename}"`);
		}

        /* Read the current record's raw data into 'data' */
        const fm = $.NSFileManager.defaultManager;
        const data = fm.contentsAtPath(file);

        /* Convert the raw data to an UTF-8 encoded JavaScript string */
        const txt = $.NSString.alloc.initWithDataEncoding(
            data,
            $.NSUTF8StringEncoding
        ).js;

        /* Assemble all data URIs in an array looking for 
           'src="data:doc type/extension;base64,...."'
           Note the usage of the 's' flag in matchAll to treat the whole 
           string as a single line.
           */
        const base64Matches = [
            ...txt.matchAll(/src="(data:(?:.*?)\/(.*?);base64,.*?)"/gs),
        ];

        /* Loop over all data URIs */
        base64Matches.forEach((data, index) => {
            /* The first capturing group contains the complete data URL
             'data:image/png;base64,...'
               The second capturing group of the RE contains 
                   the MIME type "extension", i.e. jpg, png etc.
              */
            const fullMatch = data[1];
            const extension = data[2];

            /* Build an NSURL from the complete data URL. 
            Note: MUST URL-escape the raw data first! */
            const matchNSString = $.NSString.alloc.initWithString(fullMatch);
            const url = $.NSURL.URLWithString(
                matchNSString.stringByAddingPercentEscapesUsingEncoding(
                    $.NSASCIIStringEncoding
                )
            );

            /* Build an NSImage from the NSURL */
            const error = $();
            const imageData = $.NSData.dataWithContentsOfURLOptionsError(
                url,
                null,
                error
            );

            /* If the image could not be created, log the error */
            if (!ObjC.deepUnwrap(imageData)) {
                const errorMessage = $(error.localizedDescription).js;
                console.log(errorMessage);
            }

            /* Build a new file name of the form
               basename/prefix-number.extension
            */
            const newfile = `${targetDir}/${createSubDir ? `${basename}/` : '' }${prefix}-${index + 1}.${extension}`;

            /* Write the image to the file */
            imageData.writeToFileAtomically(newfile, true);
        });
    }
    // Loop through each DevonThink record
    for (let record of records) {
        app.logMessage(record.name());
        app.logMessage(record.path());

        // Extract necessary information from the DevonThink record
        let file = record.path();

        // Extract the name of the DevonThink record and use it as the prefix
        let prefix = record.name();

        // Optional: Determine the target folder based on the record, if desired
        let targetFolder = "/Users/tkrunning/Library/Mobile Documents/com~apple~CloudDocs/DevonThink Inbox with OCR";
		
	let createSubDir = false;

        extractDataURIs(file, prefix, targetFolder, createSubDir);
    }

}

(() => {
    let app = Application("DEVONthink 3");
    performsmartrule(app.selectedRecords());
})();

If someone else wants to use this just make sure to change the createSubDir and targetFolder variables to suit your needs.

Why? If it exists mkdir -p doesn’t do anything at all.

It also impacts this line:

const newfile = `${targetDir}/${createSubDir ? `${basename}/` : '' }${prefix}-${index + 1}.${extension}`;

For my use case I may or may not want to create a sub-directory with the name of the formatted note within the directory I specify with targetFolder depending on the files I’m processing. This makes it easy to toggle that behavior on or off.