Script: Create webarchive from selection with correct URL

This script creates webarchives from Safari’s selection and makes sure that they get the correct URL.

Background

First off, it’s not DEVONthink’s fault. It’s something @Apple should fix.

Format

Safari webarchives are binary plist files. They contain a key WebResourceURL which should hold the URL that was the current one at the time the webarchive was captured. Unfortunately it quite often doesn’t hold the correct one but some URL. Let’s call it the “internal URL”.

Services

Services are a way in which one app can make some of its functionality accessible in other apps, e.g. to capture a selected portion in Safari we can use the DEVONthink service Capture Web Archive.

Services rely on what the app they are invoked from provides. If we use DEVONthink’s service Capture Web Archive from Safari then the service doesn’t know that it’s called from Safari, all that it knows is what data it can get.

The connection between the invoked service and the app it is invoked from is the pasteboard. That’s where the app puts its data when the service is invoked, and that’s where the service gets the data from.

Problem

Safari often fails to provide the correct URL, i.e. it sets the pasteboard to some URL (but it’s related to the correct URL).

Now the problem is not that DEVONthink’s service creates a webarchive from the wrong URL, i.e. the webarchive’s content is always correct.

But inside the webarchive there’s the internal URL (WebResourceURL) and DEVONthink uses it to populate the URL in the inspector, the URL column and the view pane’s URL field - and this one is not always correct because Safari failed to provide the correct one.

I noticed this when capturing from discourse threads: instead of the current post’s URL I ended up with a webarchive whose URL was e.g. https://discourse.devontechnologies.com/latest. That doesn’t help much if I want to visit the place I captured from.

Example sites where this happens:

  • Discourse forums
  • URLs that were opened by clicking Markdown headings
    • GitHub
    • Documentation

Especially the last point “Documentation” is super annoying. Imagine you captured stuff about something you want to learn about. Then other stuff gets in the way. Long after you’ve captured those webarchives you find time to look into the topic again. Meanwhile things might have changed so you try to visit the URL - but it doesn’t point to where you’ve captured from. Instead you’re taken somewhere

Script

Because services use the pasteboard to share data it’s possible to “jump in” and manipulate the pasteboard’s content before we programmatically invoke DEVONthink’s service:

  • get current Safari URL
  • get internal URL
  • compare them
    • if not equal replace internal URL with current Safari URL
  • invoke DEVONthink service

This way we get a webarchive, created by DEVONthink, and we can be sure that it contains the correct URL. Again, it’s not DEVONthink’s fault but Apple’s.

Result

The difference between a webarchive that’s captured via this script and one that’s captured via DEVONthink’s service is only the internal URL, i.e. the value of key WebResourceURL.

If you want to verify this open the result of both methods in BBEdit and use menu Search > Find differences > Compare Two Front Windows.

Setup

As the script is used in Safari it’s necessary to run it from macOS’s script menu or an app that runs AppleScript, e.g. Alfred, Keyboard Maestro, FastScripts. Used with an Alfred NSAppleScript action I don’t see a speed difference between the script and DEVONthink’s service.

Set property displayDialog to true if you want to see when the script prevents you from capturing a webarchive with a wrong URL.


-- Create webarchive from selection with correct URL

-- Note: Wrong URLs are Apple's fault, not DEVONthink's

use AppleScript version "2.4"
use framework "Foundation"
use framework "AppKit"
use scripting additions

property theDelay : 0.25
property theTryTimes : 20
property displayDialog : false -- show dialog with original "WebResourceURL" and Safari's current URL 

tell application "Safari"
	try
		if not (exists window 1) then return
		if not (exists current tab of window 1) then return
		set theURL to URL of current tab of window 1
		if theURL is in {missing value, "bookmarks://", "history://"} then return
		set theSelectedText to do JavaScript "\"\"+window.getSelection();" in current tab of window 1
		if theSelectedText = "" then return
		activate
	on error error_message number error_number
		if the error_number is not -128 then display alert "Safari" message error_message as warning
		return
	end try
end tell

tell application "System Events"
	try
		tell process "Safari" to keystroke "c" using command down
		delay theDelay
	on error error_message number error_number
		if the error_number is not -128 then display alert "System Events" message error_message as warning
		return
	end try
end tell

try
	set thePasteboard to current application's NSPasteboard's generalPasteboard()
	set thePasteboardItem to (thePasteboard's pasteboardItems())'s objectAtIndex:0
	set theWebarchiveData to missing value
	repeat theTryTimes times
		if ((thePasteboardItem's |types|())'s containsObject:"com.apple.webarchive") then
			set theWebarchiveData to thePasteboardItem's dataForType:("com.apple.webarchive")
			exit repeat
		else
			delay theDelay
		end if
	end repeat
	if theWebarchiveData = missing value then error "Pastboard: No webarchvie data"
	
	set theWebarchivePlist to (current application's NSPropertyListSerialization's propertyListWithData:theWebarchiveData options:(current application's NSPropertyListMutableContainers) |format|:(current application's NSPropertyListBinaryFormat_v1_0) |error|:(missing value))
	
	set theWebResourceURLKey to theWebarchivePlist's valueForKeyPath:"WebMainResource.WebResourceURL"
	
	if theWebResourceURLKey as string ≠ theURL then
		theWebarchivePlist's setValue:theURL forKeyPath:"WebMainResource.WebResourceURL"
		set theWebarchiveData_new to current application's NSPropertyListSerialization's dataWithPropertyList:theWebarchivePlist |format|:(current application's NSPropertyListBinaryFormat_v1_0) options:0 |error|:(missing value)
		thePasteboard's clearContents()
		(thePasteboard's setData:theWebarchiveData_new forType:"com.apple.webarchive")
		if displayDialog then
			set theWebResourceURLKey to theWebResourceURLKey as string
			tell application "Safari"
				activate
				display dialog "WebResourceURL:" & linefeed & linefeed & theWebResourceURLKey & linefeed & linefeed & "Safari URL:" & linefeed & linefeed & theURL with title "Capture Web Archive"
			end tell
		end if
	end if
	
	set performService to current application's NSPerformService("DEVONthink Pro: Capture Web Archive", current application's NSPasteboard's generalPasteboard())
	
on error error_message number error_number
	activate
	if the error_number is not -128 then display alert "Error: Handler \"Capture Web Archive\"" message error_message as warning
	error number -128
end try

1 Like

Done.

1 Like