So, in the end, I wrote a little script that does the job – and found two more glaring bugs in DEVONthink’s AppleScript.
Here’s the script:
tell application id "DNtp"
set myGroup to parent "XXX" of database "YYY"
set myItems to children of myGroup whose type is bookmark
repeat with myItem in myItems
set myName to name of myItem
set myURL to URL of myItem
set myLabel to label of myItem
set myState to state of myItem
set myTab to open tab for URL myURL
repeat while (loading of myTab = true)
delay 1
end repeat
set mySource to source of myTab
set myRecord to create record with {type:html, name:myName, source:mySource, URL:myURL, label:myLabel, state:myState} in myGroup
--if myRecord ≠ missing value then delete record myItem
close myTab saving no
end repeat
end tell
What this does is take all the bookmarks from a particular group; get the name, URL, and other useful metadata; open the URL in a new tab in a new window; get the source from that tab; and create a new HTML record in the original group. The repeat loop with the delay is to allow the tab to load completely before capturing the source.
This works well. The only downside is the windows popping up beneath applications you are working in and the closing. But this has a reason.
I wanted the new tab to open in a viewer window of the current database, to prevent the popping up and popping away of all these windows. Unfortunately, that doesn’t work because of a bug in DEVONthink. If you do ‘open tab for … in window’, what is displayed momentarily is the correct content - but what is captured is the source of the previously openend tab, which then becomes the new content. In other words - you cature your first tab over an over again. DEVONthink creates the record with the correct URL, but the source does not match. By forcing DEVONthink to create a new tab in a new window, the correct source is captured.
You will see I have commented out a statement to delete the old record. Ideally, if a new record is created, the old one can be expunged. Here is the second bug. If you do that, somehow DEVONthink also put in the source of the new record you have just created the source of the previously created record. In other words – the same happens as above. The new record has the correct URL, but the source does not match. I’ve left the statement in, in the hope this gets fixed somewhere in the future.
Anyway, this works and for now I am a happy camper because at least the HTML content gets indexed. And, instead of having 24,000 records occupy 12 GB, it now occupies only 500 MB.