Using DT with Firefox's ScrapBook

There is an excellent add-on for Firefox called ScrapBook. It makes it extremely easy to capture web page contents and bookmarks in Firefox, much the same way that Safari is used to capture web archives and links in DEVONthink.

One interesting difference is that ScrapBook captures pages in a very standard way: it downloads the “index.html” file, and all attached images and style sheets, into a single directory. Thus it creates an archive that can be viewed using any browser, on any system.

However, it names these directories based on the moment of capture, not the title; so until you visit the contained index.html file, you really have no idea – looking at the filesystem – of what the site represents.

Of course you can see the titles easily using the ScrapBook sidebar in Firefox. Or you can search the titles and contents of all saved pages using ScrapBook’s handy search facility. But this is not how most DEVONthinkers want their data found! I want the option to see related pages in my DT database, or file the page under another group, or capture the data entirely in DEVONthink, away from the wiles of ScrapBook.

I’ve created two scripts to help with integrating ScrapBook and DEVONthink. First, configure ScrapBook to save its data in a location you can see from DEVONthink. Index the “data” directory within that ScrapBook directory into DEVONthink (you can drag-and-drop it in, while holding down Command and Option). I call mine “-- Web History”, because I’ve told ScrapBook to capture every page I visit. Name it whatever you like.

Note that now you can view and search pages from DEVONthink. However, it will be very hard to determine what they are, since the titles are all numerical. Even searching will not show you what the page is, until you click on it and view its contents.

To get around this, I take advantage of the fact that the name of an indexed entry in DEVONthink does not have to correspond to its name on disk. I attach the following script (based on Christian’s work) to my “-- Web History” group by pressing Shift-Cmd-I, and setting the folder’s script to the following (name it whatever you like, as long as you set the folder’s script to where you’ve saved it):

on triggered(theRecord)
  tell application "DEVONthink Pro"
    try
      set cacheGroup to theRecord
      synchronize record cacheGroup
      set this_selection to the children of cacheGroup
      set this_count to count of this_selection
      if this_count > 0 then
        show progress indicator "Renaming" steps this_count
        repeat with this_group in this_selection
          set this_item to (first child of this_group whose name is "index.html")
          if type of this_item is not html then
            set this_item to (first child of this_group whose name is "index.htm")
          end if
          set this_type to the type of this_item
          step progress indicator (name of this_item) as string
          if this_type is equal to html then
            set this_url to the URL of this_item
            set this_source to source of this_item
            if this_source is not missing value then
              set this_title to get title of this_source
              if this_title is not missing value and this_title is not "" then
                set the name of this_group to this_title
                set the attached script of this_group to ¬
                  "YOUR_HOME/Library/Scripts/Open index page.scpt"
              end if
            end if
          end if
        end repeat
        hide progress indicator
      end if
    on error error_message number error_number
      hide progress indicator
      if the error_number is not -128 then
        try
          display alert "DEVONthink Pro" message error_message as warning
        on error number error_number
          if error_number is -1708 then display dialog error_message buttons {"OK"} default button 1
        end try
      end if
    end try
  end tell
end triggered

NOTE: You will have to replace “YOUR_HOME” in the above script to your own home directory, such as “/Users/myusername”. Second, you’ll have to create the script “Open index page.scpt” in your ~/Library/Scripts directory. That script simply opens the associated index.html page whenever you click on the group relating to the page. That script looks like this:

on triggered(theRecord)
  tell application "DEVONthink Pro"
    try
      set index_page to (first child of theRecord whose name is "index.html")
      if type of index_page is not html then
        set index_page to (first child of theRecord whose name is "index.htm")
      end if
      if type of index_page is html then
        open window for record index_page
      end if
    on error error_message number error_number
      if the error_number is not -128 then
        try
          display alert "DEVONthink Pro" message error_message as warning
        on error number error_number
          if error_number is -1708 then display dialog error_message buttons {"OK"} default button 1
        end try
      end if
    end try
  end tell
end triggered

What happens at this point when I click on my “-- Web History” group is that it will automatically synchronize the folder’s contents with the contents of my ScrapBook. Further, it will rename every contained folder to the title of its related webpage. I doubt this will work for sites whose title page is “index.aspx”, or “index.jsp”. You may have to tailor the script based on your usage.

Now DEVONthink’s search results will mean a whole lot more, letting you know the name of the page before viewing its contents. Further, when you click on a sub-folder within your ScrapBook index, it will automatically open a new window visiting the page, saving you from having to manually locate – and click on – the index.html page within that folder.

I hope this makes using ScrapBook and DEVONthink together easier for people. It’s the fastest and easiest way I’ve found for viewing Firefox-visited pages within DEVONthink.

And last but not least, if you want to capture a page you’re looking at wholly within DEVONthink (maybe to categorize it, or divorce it from ScrapBook), just right-click in the browser window and select “Capture Web Archive”. The new web archive will be in your home directory, and you can organize it (or even Auto Classify it) from there.

Happy surfing,
John