This script creates webarchives from Safari’s selection and makes sure that they get the correct URL.
First off, it’s not DEVONthink’s fault. It’s something @Apple should fix.
webarchives are binary plist files. They contain a key
WebResourceURL which should hold the URL that was the current one at the time the webarchive was captured. Unfortunately it quite often doesn’t hold the correct one but some URL. Let’s call it the “internal URL”.
Services are a way in which one app can make some of its functionality accessible in other apps, e.g. to capture a selected portion in Safari we can use the DEVONthink service
Capture Web Archive.
Services rely on what the app they are invoked from provides. If we use DEVONthink’s service
Capture Web Archive from Safari then the service doesn’t know that it’s called from Safari, all that it knows is what data it can get.
The connection between the invoked service and the app it is invoked from is the pasteboard. That’s where the app puts its data when the service is invoked, and that’s where the service gets the data from.
Safari often fails to provide the correct URL, i.e. it sets the pasteboard to some URL (but it’s related to the correct URL).
Now the problem is not that DEVONthink’s service creates a webarchive from the wrong URL, i.e. the webarchive’s content is always correct.
But inside the webarchive there’s the internal URL (
WebResourceURL) and DEVONthink uses it to populate the URL in the inspector, the URL column and the view pane’s URL field - and this one is not always correct because Safari failed to provide the correct one.
I noticed this when capturing from discourse threads: instead of the current post’s URL I ended up with a webarchive whose URL was e.g.
https://discourse.devontechnologies.com/latest. That doesn’t help much if I want to visit the place I captured from.
Example sites where this happens:
- Discourse forums
- URLs that were opened by clicking Markdown headings
Especially the last point “Documentation” is super annoying. Imagine you captured stuff about something you want to learn about. Then other stuff gets in the way. Long after you’ve captured those webarchives you find time to look into the topic again. Meanwhile things might have changed so you try to visit the URL - but it doesn’t point to where you’ve captured from. Instead you’re taken somewhere …
services use the pasteboard to share data it’s possible to “jump in” and manipulate the pasteboard’s content before we programmatically invoke DEVONthink’s service:
- get current Safari URL
- get internal URL
- compare them
- if not equal replace internal URL with current Safari URL
- invoke DEVONthink service
This way we get a webarchive, created by DEVONthink, and we can be sure that it contains the correct URL. Again, it’s not DEVONthink’s fault but Apple’s.
The difference between a webarchive that’s captured via this script and one that’s captured via DEVONthink’s service is only the internal URL, i.e. the value of key
If you want to verify this open the result of both methods in BBEdit and use menu
Search > Find differences > Compare Two Front Windows.
As the script is used in Safari it’s necessary to run it from macOS’s script menu or an app that runs AppleScript, e.g. Alfred, Keyboard Maestro, FastScripts. Used with an Alfred
NSAppleScript action I don’t see a speed difference between the script and DEVONthink’s service.
displayDialog to true if you want to see when the script prevents you from capturing a webarchive with a wrong URL.