There is an excellent add-on for Firefox called ScrapBook. It makes it extremely easy to capture web page contents and bookmarks in Firefox, much the same way that Safari is used to capture web archives and links in DEVONthink.
One interesting difference is that ScrapBook captures pages in a very standard way: it downloads the “index.html” file, and all attached images and style sheets, into a single directory. Thus it creates an archive that can be viewed using any browser, on any system.
However, it names these directories based on the moment of capture, not the title; so until you visit the contained index.html file, you really have no idea – looking at the filesystem – of what the site represents.
Of course you can see the titles easily using the ScrapBook sidebar in Firefox. Or you can search the titles and contents of all saved pages using ScrapBook’s handy search facility. But this is not how most DEVONthinkers want their data found! I want the option to see related pages in my DT database, or file the page under another group, or capture the data entirely in DEVONthink, away from the wiles of ScrapBook.
I’ve created two scripts to help with integrating ScrapBook and DEVONthink. First, configure ScrapBook to save its data in a location you can see from DEVONthink. Index the “data” directory within that ScrapBook directory into DEVONthink (you can drag-and-drop it in, while holding down Command and Option). I call mine “-- Web History”, because I’ve told ScrapBook to capture every page I visit. Name it whatever you like.
Note that now you can view and search pages from DEVONthink. However, it will be very hard to determine what they are, since the titles are all numerical. Even searching will not show you what the page is, until you click on it and view its contents.
To get around this, I take advantage of the fact that the name of an indexed entry in DEVONthink does not have to correspond to its name on disk. I attach the following script (based on Christian’s work) to my “-- Web History” group by pressing Shift-Cmd-I, and setting the folder’s script to the following (name it whatever you like, as long as you set the folder’s script to where you’ve saved it):
on triggered(theRecord)
tell application "DEVONthink Pro"
try
set cacheGroup to theRecord
synchronize record cacheGroup
set this_selection to the children of cacheGroup
set this_count to count of this_selection
if this_count > 0 then
show progress indicator "Renaming" steps this_count
repeat with this_group in this_selection
set this_item to (first child of this_group whose name is "index.html")
if type of this_item is not html then
set this_item to (first child of this_group whose name is "index.htm")
end if
set this_type to the type of this_item
step progress indicator (name of this_item) as string
if this_type is equal to html then
set this_url to the URL of this_item
set this_source to source of this_item
if this_source is not missing value then
set this_title to get title of this_source
if this_title is not missing value and this_title is not "" then
set the name of this_group to this_title
set the attached script of this_group to ¬
"YOUR_HOME/Library/Scripts/Open index page.scpt"
end if
end if
end if
end repeat
hide progress indicator
end if
on error error_message number error_number
hide progress indicator
if the error_number is not -128 then
try
display alert "DEVONthink Pro" message error_message as warning
on error number error_number
if error_number is -1708 then display dialog error_message buttons {"OK"} default button 1
end try
end if
end try
end tell
end triggered
NOTE: You will have to replace “YOUR_HOME” in the above script to your own home directory, such as “/Users/myusername”. Second, you’ll have to create the script “Open index page.scpt” in your ~/Library/Scripts directory. That script simply opens the associated index.html page whenever you click on the group relating to the page. That script looks like this:
on triggered(theRecord)
tell application "DEVONthink Pro"
try
set index_page to (first child of theRecord whose name is "index.html")
if type of index_page is not html then
set index_page to (first child of theRecord whose name is "index.htm")
end if
if type of index_page is html then
open window for record index_page
end if
on error error_message number error_number
if the error_number is not -128 then
try
display alert "DEVONthink Pro" message error_message as warning
on error number error_number
if error_number is -1708 then display dialog error_message buttons {"OK"} default button 1
end try
end if
end try
end tell
end triggered
What happens at this point when I click on my “-- Web History” group is that it will automatically synchronize the folder’s contents with the contents of my ScrapBook. Further, it will rename every contained folder to the title of its related webpage. I doubt this will work for sites whose title page is “index.aspx”, or “index.jsp”. You may have to tailor the script based on your usage.
Now DEVONthink’s search results will mean a whole lot more, letting you know the name of the page before viewing its contents. Further, when you click on a sub-folder within your ScrapBook index, it will automatically open a new window visiting the page, saving you from having to manually locate – and click on – the index.html page within that folder.
I hope this makes using ScrapBook and DEVONthink together easier for people. It’s the fastest and easiest way I’ve found for viewing Firefox-visited pages within DEVONthink.
And last but not least, if you want to capture a page you’re looking at wholly within DEVONthink (maybe to categorize it, or divorce it from ScrapBook), just right-click in the browser window and select “Capture Web Archive”. The new web archive will be in your home directory, and you can organize it (or even Auto Classify it) from there.
Happy surfing,
John