Capturing pages of weblocs in bulk?

Hi all-

Haven’t hit on how to do this, but maybe someone could suggest a solution or another approach?

I have a few HTML pages (a la export) of web URLs. I can import them as weblocs into DTPO, but then want to capture the pages to get the content.

I can do this via the contextual menu (capture page/frame), but is there an equivalent menu command? And would it be scriptable via the Applescript dictionary?

Or is there another avenue of approach that would do the same thing?

Thanks in advance, Charles

Hi all-

Spent some time diddling with Automator, Applescript and Quickeys. I find these all a bit frustrating, but got something working in Automator. I should spend a bit more time studying the manuals…

Here’s the process:

  1. Use the “Links to DEVONthink” Applescript to get a database of weblocs

  2. Export the weblocs into a folder

  3. Use the attached Automator workflow to open the weblocs in DEVONagent, and capture each page in DEVONthink.

The trick is to have the Applescript in the workflow wait until the page is completely downloaded in DA. Otherwise you’re capturing nothing. I currently have a 20 second delay, which is really painful. I’m sure it could be shortened.

Comments appreciated! Charles (70.5 KB)

This script is similar and captures images, PDF documents or web archives (depending on the URL’s target):

-- Convert URLs to web documents
-- Created by Christian Grunenberg on Wed Mar 15 2006.
-- Copyright (c) 2006-2008. All rights reserved.

tell application id "com.devon-technologies.thinkpro2"
	set theSelection to the selection
	if theSelection is not {} then
			show progress indicator "Converting..." steps (count of theSelection)
			repeat with theRecord in theSelection
				set theName to name of theRecord
				step progress indicator theName
				if exists URL of theRecord then
					set theurl to URL of theRecord
					if theurl begins with "http:" or theurl begins with "https:" then
						set theGroup to parent 1 of theRecord
						create web document from theurl name theName in theGroup
					end if
				end if
			end repeat
			hide progress indicator
		on error error_message number error_number
			hide progress indicator
			if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
		end try
	end if
end tell

Thanks for the script Christian!

It’s very elegant, although it creates a webarchive instead of an HTML file. I assume I could Frankenstein what I made in Automator with your script, and get exactly what I want.

Do you have any judgements on pros and cons of storing HTML vs webarchives? I thought that the HTML would save space (while enabling DTPO searching), but perhaps I’m missing something.

Thanks again, Charles

Web archives include frames (and they’re indexed and therefore searchable) and can be viewed offline or if the original web page isn’t available anymore.