Capturing SingleFile URLs

So I’ve been trying to capture the URL of files saved with SingleFile with the following smart rules, but neither option is working for me. Any chance someone could tell me where I’m going wrong? Is my regex syntax incorrect?

Smart_Rule_v1

Smart_Rule_v2

I don’t know anything about SingleFile so I can just guess.

Do you see the indicator number behind the Smart Rule that shows the number of files it should apply to?

If not then already the conditions don’t work.

Did you try Kind is HTML page?

Do you have an example file you can ZIP and post?

The Scan Text action doesn’t use the raw HTML code but the indexed plain text representation (see Data > Convert > To Plain Text).

In that case, would it be possible to use a record’s data property in a smart rule to scan for raw HTML?

The data property is actually just a binary blob and in many cases no text at all (or even really huge). A simple script should be able to handle this, therefore an example file would be useful.

Sure. Here’s a page I just saved with SingleFile as an example. The original URL of the page is visible in the raw html, so it’s just a little frustrating I can’t seem to grab it!

How to Reinstall macOS on M1 Apple Silicon Macs [osxdaily.com].html.zip (956.0 KB)

I had a bad feeling that was the case. I was thinking of using Hazel’s “Contents: contain match” function as a workaround but same issue. Next stop Keyboard Maestro unless an easier solution can be found.

Right. I got some weird stuff starting with '****'($3C21444… I suppose that could be decoded somehow, but I have no idea how.

@Wseriese As to Hazel: contents contains is probably far too cumbersome to parse an URL. I’d try to use a JavaScript script that fishes out the matching stuff from the file. Which might also be tedious.

This seems to work for me:

'use strict';

function performsmartrule(records) {
	var app = Application("DEVONthink 3");
	app.includeStandardAdditions = true;
	const re = new RegExp(`rel=canonical href=(.*)>`);
	records.forEach (r => {
		const s = r.source();
		const rem = s.match(re);
      if (rem) {
		r.url = rem[1];
      }
	})
}

BTW:
Formatted notes captured by the latest version are very similar to SingleFile actually.

3 Likes

That’s the magic property :wink:

Perfect, works like a charm. Thank you so much for this!

The next release will further improve clipping of web pages, e.g. this is a formatted note clipped using the latest internal build:

How to Reinstall macOS on M1 Apple Silicon Macs.html (581.5 KB)

The result is comparable to the one of SingleFile but a lot smaller.