Haven’t hit on how to do this, but maybe someone could suggest a solution or another approach?
I have a few HTML pages (a la delicio.us export) of web URLs. I can import them as weblocs into DTPO, but then want to capture the pages to get the content.
I can do this via the contextual menu (capture page/frame), but is there an equivalent menu command? And would it be scriptable via the Applescript dictionary?
Or is there another avenue of approach that would do the same thing?
Spent some time diddling with Automator, Applescript and Quickeys. I find these all a bit frustrating, but got something working in Automator. I should spend a bit more time studying the manuals…
Here’s the process:
Use the “Links to DEVONthink” Applescript to get a database of weblocs
Export the weblocs into a folder
Use the attached Automator workflow to open the weblocs in DEVONagent, and capture each page in DEVONthink.
The trick is to have the Applescript in the workflow wait until the page is completely downloaded in DA. Otherwise you’re capturing nothing. I currently have a 20 second delay, which is really painful. I’m sure it could be shortened.
This script is similar and captures images, PDF documents or web archives (depending on the URL’s target):
-- Convert URLs to web documents
-- Created by Christian Grunenberg on Wed Mar 15 2006.
-- Copyright (c) 2006-2008. All rights reserved.
tell application id "com.devon-technologies.thinkpro2"
set theSelection to the selection
if theSelection is not {} then
try
activate
show progress indicator "Converting..." steps (count of theSelection)
repeat with theRecord in theSelection
set theName to name of theRecord
step progress indicator theName
if exists URL of theRecord then
set theurl to URL of theRecord
if theurl begins with "http:" or theurl begins with "https:" then
set theGroup to parent 1 of theRecord
create web document from theurl name theName in theGroup
end if
end if
end repeat
hide progress indicator
on error error_message number error_number
hide progress indicator
if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
end try
end if
end tell
It’s very elegant, although it creates a webarchive instead of an HTML file. I assume I could Frankenstein what I made in Automator with your script, and get exactly what I want.
Do you have any judgements on pros and cons of storing HTML vs webarchives? I thought that the HTML would save space (while enabling DTPO searching), but perhaps I’m missing something.
Web archives include frames (and they’re indexed and therefore searchable) and can be viewed offline or if the original web page isn’t available anymore.