Import the newly downloaded HTML file into DEVONthink database

Hello everyone, I want to write a script/automation outside of DEVONthink to complete my workflow. The reason is that I like to save useful web pages as HTML, but the HTML pages saved by DEVONthink often have some small problems. So, I prefer to use the SingleFile extension to save HTML. The general steps of the script are: first, simulate key operations, press the shortcut key that I set in the browser to activate the SingleFile extension, activate the SingleFile plugin, and save the current web page as HTML. The download path is ~/Documents/Inbox. At the same time, save the URL of the webpage to the variable URL. Then wait for about 3 seconds until the download is complete. Next, import the newly downloaded HTML file into the DEVONthink database and set the URL to the variable URL that was just saved. I consulted the DEVONthink dictionary in Apple Script Editor and guessed that I should be able to use “import” to complete the process. However, I’m not sure how to locate the HTML file that was just downloaded to ~/Documents/Inbox and import it into DEVONthink. I searched the Finder dictionary in the script editor but did not find a direct method to get the latest HTML file. Could anyone help me with this? I am really not familiar with programming.

Why are you downloading to ~/Documents/Inbox instead of using DEVONthink’s Inbox alias in the Finder’s sidebar?

It’s because the SingleFile extension downloads html files directly to the system download folder, which for me is ~/Documents/Inbox. It seems that it doesn’t provide options to customize the output folder.

Since it appears you are stuck with using ~/Documents/Inbox as output from SingleFile and your “I’m really not familiar with programming” statement at the end … I suggest you use the Hazel application (https://www.noodlesoft.com) to watch this folder to then move any incoming to the DEVONthink Global Inbox. Once in DEVONthink, you can use the Smart Rules to do more automation. Avoids messing with Apple Scripts–unless of course you want to.

thank you for your suggestion! I have considered using Hazel, which is really convenient. The only issue is that if I use it to import HTML into DEVONthink, the URL would be missing from the HTML metadata. Is there a way to solve this problem?

Not off the top of my head can I think of anything. I usually rely on the document to put the URL in the footer, or since I use the DEVONthink “Clip to DEVONthink” which does all that, I have never explored.

Probably a way using scripting, but I avoid that as much as possible. Brain already clogged with stuff. :wink:

1 Like

Still any way? :pleading_face: just feel kind of bothered as I need to copy the URL to the metadata of every html records

1 Like

Does not DEVONthink’s “Clip to DEVONthink” extension (available Chrome and Safari) work for you? Provides the URL into the DEVONthink metadata.

It does work, yet the HTML files it captures often have small problems.

Apparently, SingleFile has a command line interface: GitHub - gildas-lormeau/SingleFile: Web Extension and CLI tool for saving a faithful copy of an entire web page in a single HTML file
I’d use that instead of fiddling around with UX scripting. Especially, since you can then specify the target name and folder. And the URL is there, anyway.

And you can configure the extension to run a JavaScript script before and after saving: GitHub - gildas-lormeau/SingleFile: Web Extension and CLI tool for saving a faithful copy of an entire web page in a single HTML file So, you’re all set.

2 Likes

Excitingly, I came up with a way to use Hazel to match and move the newly downloaded HTML file with the SingleFile extension to the Global Inbox folder in DEVONthink. At the same time, I run a script to get the URL of the currently active tab in the browser as a variable. After the file is moved, the script continues to match the HTML file in the Inbox folder based on the file name and assigns the URL variable to the URL of the file. I completed this code in Script Editor, and the result of running it was very smooth:

if (typeof exports === 'undefined') exports = {}

function timer (repeats, func, delay) {
  var args = Array.prototype.slice.call(arguments, 2, -1)
  args.unshift(this)
  var boundFunc = func.bind.apply(func, args)
  var operation = $.NSBlockOperation.blockOperationWithBlock(boundFunc)
  var timer = $.NSTimer.timerWithTimeIntervalTargetSelectorUserInfoRepeats(
    delay / 1000, operation, 'main', null, repeats
  )
  $.NSRunLoop.currentRunLoop.addTimerForMode(timer, "timer")
  return timer
}

function invalidate(timeoutID) {
  $(timeoutID.invalidate)
}

function run() {
  $.NSRunLoop.currentRunLoop.runModeBeforeDate("timer", $.NSDate.distantFuture)
}

var setTimeout = timer.bind(undefined, false)
var setInterval = timer.bind(undefined, true)
var clearTimeout = invalidate
var clearInterval = invalidate
setTimeout.run = setInterval.run = run

exports.setTimeout = setTimeout
exports.setInterval = setInterval
exports.clearTimeout = clearTimeout
exports.clearInterval = clearInterval
exports.run = run

var fileName = "test";

var app = Application("DEVONthink 3");
app.includeStandardAdditions = true;
var edge = Application("Microsoft Edge");
var url = edge.windows[0].activeTab.url();
var database = app.databases.byId(1);
setTimeout(() => {
var record = app.search("name:" + fileName + " kind:HTML text" + " scope:inbox" + " {any: url==chrome-extension://efnbkdcfmcmnhlkaijjjmhjjgladedno/ url==chrome-extension://mpiodijhokgodhhofbcjdecpffjipkle/}")[0];
record.url = url;
}, 1000);

To ensure the accuracy of the matching, I have set the following matching conditions in Hazel for the ~/Documents/Inbox folder:

Extension is html
Subfolder depth is 0
Date added is after date last matched
Source URL is chrome-extension://efnbkdcfmcmnhlkaijjjmhjjgladedno / chrome-extension://mpiodijhokgodhhofbcjdecpffjipkle

When I tried to embed the script into Hazel, it didn’t work as expected. The URL was not correctly named. Could you please let me know where I went wrong?

function hazelProcessFile(theFile, inputAttributes) {

	if (typeof exports === 'undefined') exports = {}

	function timer (repeats, func, delay) {
		var args = Array.prototype.slice.call(arguments, 2, -1)
		args.unshift(this)
		var boundFunc = func.bind.apply(func, args)
		var operation = $.NSBlockOperation.blockOperationWithBlock(boundFunc)
		var timer = $.NSTimer.timerWithTimeIntervalTargetSelectorUserInfoRepeats(
		delay / 1000, operation, 'main', null, repeats
		)
		$.NSRunLoop.currentRunLoop.addTimerForMode(timer, "timer")
		return timer
	}

	function invalidate(timeoutID) {
		$(timeoutID.invalidate)
	}

	function run() {
		$.NSRunLoop.currentRunLoop.runModeBeforeDate("timer", $.NSDate.distantFuture)
	}

	var setTimeout = timer.bind(undefined, false)
	var setInterval = timer.bind(undefined, true)
	var clearTimeout = invalidate
	var clearInterval = invalidate
	setTimeout.run = setInterval.run = run

	exports.setTimeout = setTimeout
	exports.setInterval = setInterval
	exports.clearTimeout = clearTimeout
	exports.clearInterval = clearInterval
	exports.run = run


	var app = Application("DEVONthink 3");
	app.includeStandardAdditions = true;
	var edge = Application("Microsoft Edge");
	var url = edge.windows[0].activeTab.url();
	var database = app.databases.byId(1);
	setTimeout(() => {
	// 这里放要延迟执行的代码
	var record = app.search("name:" + fileName + " kind:HTML text" + " scope:inbox" + " {any: url==chrome-extension://efnbkdcfmcmnhlkaijjjmhjjgladedno/ url==chrome-extension://mpiodijhokgodhhofbcjdecpffjipkle/}")[0];
	record.url = url;
	}, 5000);

}

That looks like a nice job. Might be a tad too complicated and not very robust (setting timers is not the most reliable approach, for example). I’d go for the CLI and a simple shell script.

2 Likes

Still doesn’t work in Hazel, I have now really no idea what’s going wrong, could you please help me with this? :pleading_face: I’ve worked so much on it, programming is just so difficult for me w

I’d use that instead of fiddling around with UX scripting.

Amen!

1 Like

that would be too challenging to me :pleading_face: just hope my complicated and silly script can work :zipper_mouth_face:

I just noticed a silly mistake… I was too careless to leave out the fileName variable,

so the code in Hazel should now be, which still does not work though:

function hazelProcessFile(theFile, inputAttributes) {

	if (typeof exports === 'undefined') exports = {}

	function timer (repeats, func, delay) {
		var args = Array.prototype.slice.call(arguments, 2, -1)
		args.unshift(this)
		var boundFunc = func.bind.apply(func, args)
		var operation = $.NSBlockOperation.blockOperationWithBlock(boundFunc)
		var timer = $.NSTimer.timerWithTimeIntervalTargetSelectorUserInfoRepeats(
		delay / 1000, operation, 'main', null, repeats
		)
		$.NSRunLoop.currentRunLoop.addTimerForMode(timer, "timer")
		return timer
	}

	function invalidate(timeoutID) {
		$(timeoutID.invalidate)
	}

	function run() {
		$.NSRunLoop.currentRunLoop.runModeBeforeDate("timer", $.NSDate.distantFuture)
	}

	var setTimeout = timer.bind(undefined, false)
	var setInterval = timer.bind(undefined, true)
	var clearTimeout = invalidate
	var clearInterval = invalidate
	setTimeout.run = setInterval.run = run

	exports.setTimeout = setTimeout
	exports.setInterval = setInterval
	exports.clearTimeout = clearTimeout
	exports.clearInterval = clearInterval
	exports.run = run

	var fileName = inputAttributes[0];
	var app = Application("DEVONthink 3");
	app.includeStandardAdditions = true;
	var edge = Application("Microsoft Edge");
	var url = edge.windows[0].activeTab.url();
	var database = app.databases.byId(1);
	setTimeout(() => {
	// 这里放要延迟执行的代码
	var record = app.search("name:" + fileName + " kind:HTML text" + " scope:inbox" + " {any: url==chrome-extension://efnbkdcfmcmnhlkaijjjmhjjgladedno/ url==chrome-extension://mpiodijhokgodhhofbcjdecpffjipkle/}")[0];
	if (url) {
		record.url = url;
	}
	}, 3000);
}
  • Are you writing this script yourself or cut and pasting from other sources?
    • If the former, you have a deeper grasp of programming than many users do, even if the script isn’t fully working yet. (And nobody writes a script of any consequence correctly the first time.) :wink:

Some by myself, some by searching :rofl:

That’s how we learn :smiley:

1 Like

Just don’t understand why it’s good in script editor while fails in Hazel…