Updating webarchives

pete31 · May 31, 2021, 11:05pm

Converting to PDF takes the displayed content, i.e. the content that’s saved inside the webarchive, and creates a PDF from that. It does not capture a PDF from the URL that’s stored in the webarchive.

I make use of this by

capturing only selected part of a site as webarchive
converting the webarchive to PDF

This way I get the best of both worlds: a PDF whose “clutter freeness” I can control beforehand.

Only downside (as you know) is that sometimes the browser doesn’t report the correct URL. Apart from that it’s the best capture methode I’ve found so far.

Edit: Script: Create webarchive from selection with correct URL

Exporting doesn’t change the content. DEVONthink never changes files, neither on import nor on export.

Yes, as explained in DEVONthink’s help Documentation > Documents > HTML-Based Formats:

Note: Web archives can be very useful with web pages using statically linked content. However, some popular and monetized sites get their contents dynamically from other sources, so the actual data is not in the underlying HTML. These pages may have missing content due to this, require an internet connection to display content, and run JavaScript. If you encounter this, a PDF may be a better archiving option.

PS I didn’t look at your attached files as there’s nothing I can do about it