One-stop solution for capturing web archives?

I actually can’t believe, DT Pro (still) doesn’t seem to offer this:

Grab the URI from Safari and automatically turn it into a web archive inside DT Pro. (Yes, I know, DT Pro can’t get a web archive directly from Safari, but this way it shouldn’t have to.)

I can’t figure out how to do it via Automator and I don’t know enough Apple Script to try that way. It is possible? I couldn’t really find a (working) solution here in the forum.

You could try this script:


tell application "Safari" 
   activate 
   try 
      if not (exists document 1) then error "No document is open." 
      set this_url to the URL of document 1 
      set this_title to the name of window 1 
   end try 
end tell 

tell application "DEVONthink Pro" 
   with timeout of 120 seconds 
      set theRecord to create record with {name:this_title, type:nexus, URL:this_url} 
      set theWindow to open window for record theRecord 
       
      repeat while loading of theWindow 
         delay 1 
      end repeat 
       
      set theURL to URL of theWindow 
      set theSource to source of theWindow 
      set theName to get title of theSource 
      set theData to web archive of theWindow 
      set theArchive to create record with {name:theName, type:html, URL:theURL, source:theSource} 
      set data of theArchive to theData 
       
      delete record theRecord -- Closes window 
   end timeout 
end tell 

Thank you very much! It is—in a David Pogue way—»the script that should have been in the box.« Or is it? In which case: Where was it hidden? :slight_smile:

Is “nexus” a valid document type in DT Pro? I get an error that the name nexus is not defined when I run this script.

I guess you’re using a beta of v1.1 - use “link” instead (existing compiled scripts are not affected by this modification).

Yes, that fixes it-link works.

The next release will support a script as simple as this one:


-- Add web archive from Safari to DEVONthink
-- Created by Christian Grunenberg on Wed Mar 15 2006.
-- Copyright (c) 2006. All rights reserved.

tell application "Safari"
	activate
	try
		if not (exists document 1) then error "No document is open."
		set this_url to the URL of document 1
		set this_title to the name of window 1
		
		tell application "DEVONthink Pro"
			set theArchive to create record with {name:this_title, type:html, URL:this_url}
			set data of theArchive to download web archive from this_url
		end tell
	on error error_message number error_number
		if the error_number is not -128 then
			try
				display alert "Safari" message error_message as warning
			on error number error_number
				if error_number is -1708 then display dialog error_message buttons {"OK"} default button 1
			end try
		end if
	end try
end tell

No more need for temporary items or windows popping up :wink:

Sounds cool! Any chances for a public beta soon? 8)

There won’t be a public beta but DT Pro 1.1 should be available at the end of the month.

Does this also work with capturing feeds?

I edited the script “Latest Macintosh News (Internal)” which really works great (kudos), but it does not capture the pages. I can’t see how I can merge this “capturing a website” script with “Latest Macintosh News (Internal)”. Is it possible?

Of course it is possible :wink:

You could replace (“Latest Macintosh News (Internal).scpt”)…


					set this_HTML to |html| of this_item
					set this_record to create record with {name:this_name, type:html, URL:this_link, source:this_HTML, label:1, attached script:"~/Library/Application Support/DEVONthink Pro/Feeds/_Mark as read.scpt", date:|calendarDate| of this_item} in this_group

…with…


					set this_record to create record with {name:this_name, type:html, URL:this_link, label:1, attached script:"~/Library/Application Support/DEVONthink Pro/Feeds/_Mark as read.scpt", date:|calendarDate| of this_item} in this_group
			        set data of this_record to download web archive from this_link

Note: Requires DT Pro 1.1.

I tried this, but it still seems to import links to webpages rather than the webarchives. The little icon / state is still indicating as much.

DT Pro 1.1 uses some caching to speed up execution of (triggered) scripts - please quit & launch DT Pro again to be sure the correct version has been executed.

Hi, all this worked beautifully, and I had a number of feeds working well - until last week, that is. I’m not sure why, other than that I reinstalled the Save to DT Pro script following other suggestions in this forum (thanks Bill).

The feeds seem to be imported, at least there is a progress bar, but no feed is actually added, while I know that new pages have been added to the blogs I capture.

I did not change the script itself.
Any suggestion? Should I reinstall this somehow?

Smolk

PS
This is the script:

– Latest Macintosh News (Internal) %DID NOT CHANGE THE NAME YET!
– Created by Christian Grunenberg on Aug Sun 01 2004.
– Copyright © 2004-2005. All rights reserved.

– Add or remove sites/URLs to/from the following lists
property these_sites : {“balashon”, “Hebrew Aramaic Philology”, “parshablog”, “rif”, “codex”, “yediah (guttmann)”, “abnormal”, “abecedaria”, “English Hebraica”, “Maven Yavin”, “Nach Yomi”, “On the Main Line”, “Toldot”, “Hard Hitting News”, “Unicode Fonts”, “Miqra”}
property these_urls : {“http://balashon.blogspot.com/atom.xml”, “http://hebphil.blogspot.com/atom.xml”, “http://parsha.blogspot.com/atom.xml”, “http://alfasi.blogspot.com/atom.xml”, “http://biblical-studies.ca/blog/feed/”, “http://yediah.blogspot.com/atom.xml”, “http://WWW.telecomtally.com/blog/atom.xml”, “http://abecedaria.blogspot.com/atom.xml”, “http://englishhebraica.blogspot.com/atom.xml”, “http://mavenyavin.blogspot.com/atom.xml”, “http://nach-yomi.blogspot.com/atom.xml”, “http://onthemainline.blogspot.com/atom.xml”, “http://toldot.blogspot.com/atom.xml”, “http://Realjewishnews.blogspot.com/atom.xml”, “http://www.travelphrases.info/gallery/rss/GalleryOfUnicodeFonts_AllFonts.rss”, “http://216.12.134.77/forums/rss.aspx?ForumID=6&Mode=0”}

– Location of news inside database
property this_location : “/MyFeeds/”

tell application “DEVONthink Pro”
activate
try

	set site to 1
	show progress indicator "Downloading News..." steps (count of these_urls)
	repeat with this_url in these_urls
		set this_RSS to download markup from this_url
		set these_items to get items of feed this_RSS
		set this_path to this_location & (item site of these_sites)
		
		set this_group to create location this_path
		set URL of this_group to this_url
		set attached script of this_group to "~/Library/Application Support/DEVONthink Pro/Feeds/_Synchronize Feeds (Internal).scpt"
		
		repeat with this_item in these_items
			set this_title to title of this_item
			set this_name to my replaceCharacter(this_title, "/", "-")
			set this_link to |link| of this_item
			
			if (not (exists record with URL this_link)) or (not (exists record at this_path & "/" & this_name)) then
				set this_record to create record with {name:this_name, type:html, URL:this_link, label:1, attached script:"~/Library/Application Support/DEVONthink Pro/Feeds/_Mark as read.scpt", date:|calendarDate| of this_item} in this_group
				set data of this_record to download web archive from this_link
			end if
		end repeat
		step progress indicator (item site of these_sites)
		set site to site + 1
	end repeat
end try
hide progress indicator

end tell

on replaceCharacter(theString, theOriginalChar, theNewChar)
set {od, AppleScript’s text item delimiters} to {AppleScript’s text item delimiters, theOriginalChar}
set theStringParts to text items of theString
if (count of theStringParts) is greater than 1 then
set theString to text item 1 of theStringParts as string
repeat with eachPart in items 2 thru -1 of theStringParts
set theString to theString & theNewChar & eachPart as string
end repeat
end if
set AppleScript’s text item delimiters to od
return theString
end replaceCharacter

The script is currently running, takes forever to download all the web archives initially :wink: But everything seems to be fine so far (using a beta of v1.1.2 fixing rare crashs of the “download web archive” command but that shouldn’t make a difference).

A little bit later… you should add a “try … end” statement to the “download web archive” command, e.g. some servers time out over here and the download command returns no data and then the repeat loop will be cancelled.


try
	set data of this_record to download web archive from this_link
end try

Quite a few days later… I still had no luck. But then something else happened: “can’t connect to host”. I searched this forum and found out that a security program somehow had been switched to prevent DT from accessing the internet.

Still a day later, I realised this must also have been the problem all along with the feeds. And it was. DT worked as usual.

So it was down to me (or Little Snitch, which I will keep an eye on). Thanks for the effort, sorry about the trouble.