I ran into a bit of problem while trying to migrate from Firefox’ quite nice Scrapbook plugin to the hopefully even more excellent DEVONthink Pro as my potentially sole pile for digital written things collected here and there. My applescript attempts produced some results that I can’t explain and which don’t seem to be documented either. As far as I can tell, there are some hiccups with devonthink’s “create record” and/or “set data” commands.
The moving source…
Scrapbook’s data management is relatively easy:
- a root folder
- –folders for any page grabbed (named with a timestamp
- ----the files of the webpage in that folder (index.html as the main page, .jpgs .swfs …
- ----index.dat
The index.dat has exactly those information you wouldn’t want to miss in DEVONthink:
- date of creation
- URL
- folders
- title
- comments
Slightly unfortunately, the Scrapbook developer(s) orphaned the folder-property in the index.dat a few months ago, propably last October. These folder data is now eexclusivly kept in a kind of central directory file called scrapbook.rdf, which I learned the morning after my macbook ran my applescript for a decent part of the night. (As I’m not too familiar with scripting these xml-like files, I preferred to manually move a few hundred pages into the correct dt groups. DT’s categorisation support has been a great help.)
…and a stubborn target
As to the role of DT in that migration procedure, the important lines of code are the ones, Christian has given several times as an example of how to import webpages into web archives stored in DT:
set theArchive to create record with {name:theName, URL:theURL, type:html} in theGroup
set data of theArchive to download web archive from theURL
The script almost flawlessly processed my four-digit Scrapbook pages. Everything looked very nice in DT, nice pages with nice metadata in the addres bar and the info box and put nicely into Scrapbook-folder like groups (apart from those pages since October). Only a few minor issues were obvious:
a) dt’s adress bar still shows something like “file://pathtoScrapbookEtc” instead of “http://thiswasmyhome.org” even though the URL field has been set to the latter by applescript,
b) clicking on the at-sign next to the path field in the Information windows does nothing,
c) thumbnails of webarchives imported by a script only appear once you click into and edit the webarchive (is there a script for placebo-editing lots of pages around?).
d) (Off-topic: I had a look at DT exporting features, just in case. I still have tons of stuff in old files from asksam Surfsaver, a once nice, now doomed Windows software. I think your “DEVONtech_storage” file is a bit too proprietary and too closed-lipped.)
…with an issue
However, when I “disconnect” the source from Devonthink by either renaming the source folder in Finder (in case of my Scrapbook files with their “file://”-URLs) or by switching off airport (in case of pages with “http”-URLs like something.com), Devonthink is no longer able to display the page. It just displays - nothing. Strangely however, the devonthink adress bar (?) displays reasonable information on the size, amount of characters, the subject and even all the right values in the Information window. Clicking on the menu bar command Format > Edit source (Quelltext bearbeiten) even reveals good looking html code. Clicking then on Format > View page (Seite betrachten) again shows: a blank piece of white nothing. (fyi: The database properties dialogue shows that there are hardly any pictures in the database. However, the page sizes quite often are > 100 kb.) You can still drag the webarchive from DT to the Finder and open it in an texteditor, the sources seem to include binary data for jpgs etc and the archive opens well in Safari. According to the texteditor, it’s UTF-8 encoded. (I had quite a problem reading umlaute from Scrapbook’s UTF-8 encoded index.dat, so I chose an third party librabry “Textcommands” to reencode the input string from utf-8 to unicode)
After grabbing pages from within Safari with the “Add web archive for DEVONthink.scpt” script located in ~/library/scripts/Applications/Safari, I think DT treats http-URLs differently than file-URLs. DT never manages to show pages that were grabbed from adresses like “file:///iamyourmachine/weareusers/thisisyourhome/justgrabme.html” once the source is “disconnected”. However and interestingly enough, you can still click on the “Open with…” command, select, let say, “Safari”, and there is the page, that DT couldn’t show.
I’ll attach some code with comments describing how strangely and differently DT deals with different parameters for the “set data” command.
set dtrootgroup to "Inbox_test"
set source_URL to "file://localhost/folders/Scrapbook/20050310200743/index.html"
(*
dealing with scrapbook folder, looping and repeating, dealing with index.dat etc....
i'm happy to share it once we've solved this set data-thing....
*)
create_new_rec_webarchive("Das Leben als Chance zur Krise", source_URL, "www.test.de/order/blub.html", {"history", "warfare"}, dtrootgroup, "20050310162344")
on create_new_rec_webarchive(theTitle, scrapbookURL, originURL, scrapbookfolder, devonwebstoregroup, timestamp)
tell application "DEVONthink Pro"
try
-- Set target group
set AppleScript's text item delimiters to {"/"}
set targetpath to devonwebstoregroup & "/" & (scrapbookfolder as string)
set AppleScript's text item delimiters to {""}
set targetlocation to create location targetpath
-- Create Record
set theArchive to create record with {name:theTitle, type:html, URL:originURL, path:scrapbookURL} in targetlocation
--#########################
--##### the crucial DT lines
-- Fill the webarchive with content
--set data of theArchive to download web archive from scrapbookURL
-- content not displayed in dt if disconnected from source,
-- "open with...> safari" works fine
--set data of theArchive to download web archive from "http://www.heise.de/newsticker/meldung/87932"
-- works fine in script, shows page in DT even with
--airport off, everything's just perfect
set data of theArchive to download web archive from "http://localhost/~meandmyhome/index.html"
--raises err 1700 from dt, even though the
--URL mentioned works fine in Safari
--set data of theArchive to download web archive from "http://localhost/~meandmyhome/scrapbookroot/20050310200743/index.html"
--raises err 1700 from dt, even though the
--URL mentioned works fine in Safari
-- stuff commented out
--set path of theArchive to scrapbookURL
--set URL of theArchive to originURL
--set comments = folder
--set creation date = timestamp...
--..omitted stuff
on error errText number errNum
log "###ERROR: " & scrapbookURL
log errText & ", " & errNum
end try
end tell
end create_new_rec_webarchive