DEVONthink Web Page archiving tip

I often store web pages in DEVONthink and in the past, I tended to use the Archive bookmarklet to do that. While that works well, many web pages have a lot of stuff that isn’t necessary to the content (e.g., ads, sidebars). To get around that, without haven’t to select and clip stuff, I recommend Readability:

Just add it as a bookmarklet and on the page of interest, select the bookmarklet. The page will be reformatted and just the main content will be there.

Now, Readability has a print option on the top left of the page. Select that and print to PDF. DEVONthink Pro users can directly add the PDF using the Save PDF to DEVONthink Pro script. Doing this gives quick addition to DEVONthink but a nicely formatted document that’s focussed on the content.

I would recommend DEVONagent for a solution. It’s ideal for research intensive web searches. It also produces semantic networks of search results with weighted connections. The results of searches are also very clean (text only) with all the rubbish removed. You can then simply click one button to “post” to DEVONthink.

I’ve been using DEVONagent for this purpose since I purchased my DTP 1.x and DEVONagent bundle way back in 2006… Love it :smiley:


I use DEVONagent too, but this is for the case when you’re browsing and found a useful page. In those cases, I find this is a useful technique. Plus, it should work in most browsers since it doesn’t require web archives.

Neat tip, thanks! Unfortunately, in saving as PDF and then importing to DTP, the URL info gets stripped out, so I no longer have a record of where the entry came from. Unless I’m missing something …

If you use the Print PDF to DEVONthink Pro script, you should get the URL field filled in (At least I am). Of course, that requires DT Pro or DT Pro Office. However, if you have the Print Headers and Footers set, you’ll get the URL in the footer (at least in Safari). I’m fairly certain the other browsers have a similar printing option.

Ah yes, I see, thanks for that tip too. I tend to save my PDFs and documents to a dedicated folder, then index (rather than import) them, though, which again leaves me URL-less. But I guess that’s the price to be paid … unless I’m missing something else, which given my record so far seems entirely possible …