Reload the web archive after creating web archive

Hi,

I’m trying to make a web archive using below script

tell application "DEVONthink Pro"
	set contentRecord to content record of think window 1
	set theGroup to first parent of contentRecord
	tell text of think window 1
		repeat with theAttribute in attribute runs
			if exists URL of theAttribute then
				if URL of theAttribute starts with "https://" then
					set theUrl to URL of theAttribute
					set theCopy to create web document from theUrl in theGroup
					set theCopy to refresh
					set creation date of theCopy to creation date of contentRecord
					set label of theCopy to label of contentRecord
					set state of theCopy to state of contentRecord
				end if
			end if
		end repeat
	end tell
end tell

The problem I have is when creating a web archive on a protected webpage, web archives created are all just the login page. If I manually go to individual web archives created by above script and click reload button, the page load correctly (after logging in); then I’d have to update the web archive.

Would there be, possibly, a better way to go about this? An example URL would be on this very forum “Post a new topic”; if I wanted to make a web archive of the page (posting.php?mode=post&f=20).

Thanks in advance.

1 Like

You could use the contextual menu command “Update Captured Archive” to update existing web archives.

Instead of using Clip to DEVONthink you could e.g. print a PDF to DEVONthink or save a web archive to the global inbox.

Is there a way to automate the update of the webarchives? For instance if they meet some criteria, etc…

Only by using custom scripts and e.g. a scheduled smart rule. But this wouldn’t be efficient and in case of pages that do no longer exist and are forwarded now, the results might be even undesired.

Thanks, what command can I use to automate the update of the webarchives?

There’s no such command, therefore this a little bit more complicated. Here’s a simple example:

tell application id "DNtp"
	repeat with theRecord in (selection as list)
		if type of theRecord is webarchive then
			set theURL to URL of theRecord
			set theParent to parent 1 of theRecord
			set newArchive to create web document from theURL in theParent
			set newData to data of newArchive
			set data of theRecord to newData
			delete record newArchive
		end if
	end repeat
end tell
1 Like

I have a slightly simpler version of the same problem and likely a lesser grasp on applescript and DT3—i only have experience with r and just started using devonthink this month. I’m new here but not exactly a noob so I’m going to experiment the example cgrunenberg graciously supplied. However, if anyone has any tips or scripts that are more appropriate for my particular need, I would very much appreciate the insight and hopefully one day reciprocate.

My issue is as follows: i subscribe to an rss feed of my county’s EMS incident status. the source is a state (U.S) government website and, naturally, the feed is a bit fussy. to manage the fuss, i simply convert the [feed].asp to an html page and it renders exactly what i want—it obviates the need to interact with [feed].asp. I can simply click the contextual menu for the html page in DT3 and select “update captured page” to refresh the html page with updated EMS statuses. I would like to automate this refresh to an hourly or twice-hourly interval and have it perform a search for two text-strings after each refresh; if either or both of the text strings are matched, i’d simply like a notification. the strings are the names of two towns. each is only one word, nothing crazy.

that’s it! without the automation, using DT3 for this feed is just a rube-goldberg machine. the alternative/external benchmark for DT3 to beat is me setting a recurring reminder, navigating to the website, and manually performing two word searches. not terrible work but ripe for automation nonetheless. i’ll report back in case my experience is helpful to anyone.

and thanks in advance to anyone who helps me!

update1: there is an extra step: first, click the “reload/stop reloading” button in the toolbar; second, click the gear in the toolbar (contextual menu) and select “update captured page”; third, perform the searches. the new, first step fetches new information, whereas the second step merely imposes the reader friendly format desired.

edit: script is needed because the smart rule builder does not contain “update captured page” nor a “refresh”, and the similiar functions offered either launch my external webrowser, create duplicate documents, cease updating the information, or render the page differently. just want a script that works in the background and only triggers the focus of me or my cpu when there is a match or problem.

Here’s a simple example that updates selected HTML pages and displays a notification. However, this does not work in case of HTML pages inside feeds as all of them are read-only.

property pWords : {"Apple", "iCloud"}

tell application id "DNtp"
	repeat with theRecord in (selection as list)
		if type of theRecord is html then
			set theURL to URL of theRecord
			set theSource to download markup from theURL
			if theSource is not "" then
				set oldSource to source of theRecord
				if theSource is not equal to oldSource then -- Different source?
					set source of theRecord to theSource
					repeat with theWord in pWords
						if plain text of theRecord contains theWord then
							set theTitle to name of theRecord
							set theSubtitle to "Found " & theWord
							display notification with title theTitle subtitle theSubtitle
							exit repeat
						end if
					end repeat
				end if
			end if
		end if
	end repeat
end tell
1 Like