Trying to relink all URLs after import from Evernote's enex files with Apple Script

The problem is AppleScript as well as Javascript aren’t good at point where a lot of computation is required. Today I met an unbelievable 80 Gb memory leak (I have only 32 Gigs physically, the rest was the compressed SSD cache) with something. My guess is that the script uses a lot of regular expressions based on NSStrings that are not released after usage and thus stay until the script finishes. Every iteration of find-replace-link-in-every-note subroutine copied the content of a note, including images and must have kept it somehow fully or partially, in memory.
Objc has @autoreleasepool and other memory management mechanisms to prevent it, Swift and many other system-grade languages are good either at manual or at automatic memory management. Furthermore compiled and optimised code works multiple times faster than a script. This is why I seek to keep myself from scripting languages when it comes to deal such tasks :slight_smile:

Please post the script that you used.

devonthink_relinkscpt_new.scpt.zip (11.8 KB)
Here it goes. It’s your new version + some lines to print debug info.

I can’t speak about AppleScript. JavaScript as a language can’t be slow or fast nor can it incur memory leaks. The runtime environment … that’s another thing.
And I seriously doubt that the JS engines in browser are slow or that they leak memory in the order of magnitude you experienced. The same holds for node.js

Given that the engine to execute JXA is the same as running JS in Safari, the speed of the JS code should be ok. However, JXA is talking to Apple’s automation technology. I suppose that this creates bottlenecks (for JXA as well s AppleScript). And it would certainly be interesting to see if using JavaScript‘s built-in regex support vs the ObjC Bridge in what concerns memory leaks.

Found this post by Shane Stanley (developer of Script Debugger and author of “Everyday AppleScriptObjC”):

AppleScript memory management is poor at the best of times, and worse with AppleScriptObjC. The good news is that the value in the memory column of Activity Monitor is almost meaningless.

Never experiencend any problems with these handlers. However, I probably never replaced in thousands of records at once.

It’s quite a regular case for modern browsers to take up to gigabytes of RAM when 10+ tabs are open.
I don’t say some languages is the root of all evil :slightly_smiling_face: Javascript, Node.js etc are just tools and it’s up to people to use it properly. However many of us don’t do it due to various reasons: casual mistakes and lack of experience, lack of time (business deadlines), lack of motivation (when it works, don’t touch it), race for money… Many big companies prefer to use Electron and React Native based apps to spend less time and money forgetting about what a headache their users have to get in return. The result is an oversized app, that consumes a lot of resources to render view the way that browser does. Native apps are thinner, much faster and much more energy efficient (that is a big concern on mobile devices). The best indicator is that such a transformation always draws a lot of negative user feedback. Take the most popular password keeping app for macs for instance.
So yes, runtime processes, including active (scripting-powered) browsing are slower and have many other drawbacks in comparison with a compiled code. That’s why modern browsers tend to implement low-level features like WebAssembly, WebGPU, WebGL etc.

From my point of view Mac’s ScriptEditor isn’t a developer-friendly tool, especially for debugging purposes. I don’t see any means like Xcode’s Instruments to capture memory leaks so far. But I’m sure that neither AS nor JS was architected to handle regexp based seek-and-replace for gigabytes of data…

Download Script Debugger.

First of all, I must say it, I was amazed by your ability to bridge apple script to objc classes. I haven’t even imagined that it’s possible. You deserve all the best just for making it real…
In my case it resulted in hang which is a very uncommon situation nowadays, so despite what monitor app tells I can be sure it wasn’t good.
However it’s not obvious what caused that. Maybe the code itself is almost the most perfect masterpiece you can get using AppleScript for this case. Maybe the problem lays hidden deep in system environment and caused by a recent MacOS update or in what is installed on my machine (a lot of different language environments, homebrew packages, developer tools etc.).

It makes difference, thank you!

1 Like

Didn’t see your message before.
I would suggest you not to append url (or id) to contents. In my case I discovered only 20-30 notes with urls, so I copied and pasted urls at the beginning of the content to make URL field available to store note’s own internal address. Comparing with url field is much more efficient in later find and replace process.
And the second advise: don’t use ids, store the full urls. Having only ids will force you to construct urls at runtime which may become a whole issue if you are going to process a lot of material.

It was opposite of constructing urls - I parsed the urls to retrieve the target note-id

For my use, storing the note-id was useful (example ID_73a4b241-2198-4072-8401-212759efe87b)
This became search-able in the Mac File OS and in Devonthink

I successfully replaced the Evernote links with Devonthink links

May I ask you to share your experience? How did you parse the links? I mean to iterate over contents of note, extract url part. Was it done the same way proposed by @pete31 or else?

Step 1: Parse the note

set AppleScript's text item delimiters to {"href=\""}
set delimitedList to every text item of noteSource
repeat with theItem from 1 to count of delimitedList
		set delimitedSource to item theItem of delimitedList

Step 2: Parse the link

if delimitedSource starts with "evernote:///view/" then
		set AppleScript's text item delimiters to "/"
		set parsedList to text items of delimitedSource
		set ENnoteID to item 7 of parsedList as text
3 Likes

Damn, I didn’t think of simply using text item delimiters.
Obviously I’m thinking of regex first and also that it’s the best option (which it is not …)

I did use them e.g. …

… however that was only a preparative step before using, well, regex (but it was necessary it that case).

Note to self: Text item delimiters are great. Use them.

I propose a JavaScript version of the script below. It is a tad shorter then the AppleScript version, also because it does not include any debugging code. One of the main differences is that it is retrieving the source of the relevant records directly after the whose call. That should (or shouldn’t it?) be faster than looping over each record and retrieving source then.
It is using a regular expression, however, to find the EV note to replace. I’m not sure if that really incurs a performance or memory penalty. However, I did remove the unnecessary non-capturing groups from the RE that are present in @pete31’s script.

Caveat: I did not test that script due to a lack of evernote notes. It seems to be synctactically ok and at least the part up until the first forEach loop does not throw an error.

'use strict';
(() => {
  const app = Application("DEVONthink 3");
  const sources = app.selectedRecords.whose({_match: [ObjectSpecifier().type, "formatted note"]}).source();
  const EVnote = `<a href="evernote:///view/`;
  const EVRE = new RegExp(`<a href="(evernote:///view.*?)"`);
  sources.forEach(s => {
    if (s.indexOf(EVnote) > 0) {
      const EVfound = s.match(EVRE);
      if (EVfound) { /* There's a link inside this note to another one */
        const EVURL = EVfound[1]; /* get the URL only */
        const results = app.search(`kind:formattednote url==${EVlink}`);
        if (results && results.length > 0) {
          const DTURL = results[0].referenceURL();
          /* the next line replaces the complete evernote link
          with the DT link, i.e. evernote:///view/... is replaced by x-devonthink:///
          */
          s = s.replace(EVURL, DTURL);
        }
      }
    }
  })
})()

This is exactly the approach I was going to adopt in my initial script too: to separate body of each note into parts. Your example directed me to rewrite the code to make it use delimiters only.

-- Replace evernote:/// Link URLs in Formatted Notes with DEVONthink Reference URLs
-- Note: This script finds a link's corresponding record by searching the link's URL in DEVONthink's URL property.
-- Select some formatted notes first

set anchorPreUrlPart to "<a href=\""
set urlToReplacePrefix to "evernote:///view/"
set currentNoteIndex to 0 # DEBUG

tell application id "DNtp"
	try
		set theNotes to selected records whose type = formatted note and URL starts with urlToReplacePrefix
		set recordsCount to (count of theNotes) as text # DEBUG
		repeat with thisNote in theNotes
			set currentNoteIndex to currentNoteIndex + 1 # DEBUG 
			set currentNoteSource to source of thisNote
			if currentNoteSource contains (anchorPreUrlPart & urlToReplacePrefix) then
				log "========= Processing note " & (currentNoteIndex as text) & " of " & recordsCount & ":: " & (name of thisNote) as text # DEBUG
				set AppleScript's text item delimiters to {anchorPreUrlPart}
				set delimitedList to every text item of currentNoteSource
				repeat with delimitedSource in delimitedList
					if delimitedSource starts with urlToReplacePrefix then
						set AppleScript's text item delimiters to {"\""}
						set evernoteUrlToReplace to the first text item of delimitedSource
						set theResults to search "type:formattednote url==" & evernoteUrlToReplace
						if theResults ≠ {} then
							set thisResult to the first item of theResults
							set thisResult_ReferenceURL to reference URL of thisResult
							log (evernoteUrlToReplace & " -> " & (name of thisResult) as text) & " => " & thisResult_ReferenceURL as text # DEBUG
							set currentNoteSource to my findAndReplaceInText(currentNoteSource, evernoteUrlToReplace, thisResult_ReferenceURL)
						else # DEBUG
							log "No counterparts found with url: " & evernoteUrlToReplace as text # DEBUG
						end if
					end if
				end repeat
				set source of thisNote to currentNoteSource
			else # DEBUG
				log "========= Note " & (currentNoteIndex as text) & " of " & recordsCount & " doesn't seem to have any Evernote urls:: " & (name of thisNote) as text # DEBUG
			end if
		end repeat
	on error error_message number error_number
		if the error_number is not -128 then display alert "DEVONthink" message error_message as warning
		return
	end try
end tell

on findAndReplaceInText(theText, theSearchString, theReplacementString)
	set AppleScript's text item delimiters to theSearchString
	set theTextItems to every text item of theText
	set AppleScript's text item delimiters to theReplacementString
	set theText to theTextItems as string
	set AppleScript's text item delimiters to ""
	return theText
end findAndReplaceInText

This code based on splitting only. Still incurs 38 Gb memory leak when executed by Script Debugger app. It’s less than 1/2 of the previous result keeping in mind that after leak of 80 Gb my mac halted on about 1/2 of the records. And it runs through all of 4260 records in 27 minutes 06 seconds (4 times faster) and successfully finishes.

UPD1. set recordsCount to (count of theNotes) as text line moved out from the cycle.

Thank you! I’m eager to test your approach against my latest version in the name of computer science. Will JS beat AS? Make your bets, gentlemen!

Does

set theSources to source of selected records whose type ...
repeat with thisNote in theSources

make a difference? This should at least reduce the amount of Apple Events some.
Rationale: Since you do not access anything but the source of the records, you can get at it directly. You must of course change
set source of thisNote to currentNoteSource
to
set thisNote to currentNoteSource
(yes, the variable name is a bit unfortunate, but that way you do not have to change a lot more).
Also set recordCount to (count of theNotes) as text should be outside of the loop, since the size of theNotes does not change.

1 Like

Of course! Thank you, this lame thing still existed in the last version of script, fixed now.

I must have changed it in the revised version of the script, which doesn’t use regex at all.

@Idify I asked Shane Stanley what might have caused this, here’s his answer:

The main cause is probably AppleScript’s poor memory management. AppleScript uses a system of waiting until memory is short before stopping and doing garbage collection. This mostly works fine for plain AppleScript, but with AppleScriptObjC the underlying referenced objects are retained until garbage collection, potentially resulting in huge memory use. Compounding matters, it appears that memory used by an instance of AppleScript is not released properly.When run from an applet it usually doesn’t matter, unless you’re doing something like processing a lot of large files such as images, but a few big tests in an editor can clog the system. I suspect the debugging only complicated the situation.

I wasn’t aware of such potential problems. Sorry again.

So, without wanting to start a language war, could someone check the behavior of the JavaScript version? Since it does not involve ObjC and the script engine is a different one, it might be less memory demanding (might!).