Help! WebArchives not working as expected!?

Junction10 · February 9, 2022, 9:02am

Hi,

I’ve been saving certain web pages or social media pages to Devonthink as WebArchives for a number of years, so I have a ‘browser format’ version of the page as it was when captured, locally, rather than keep pulling data off the original source server…

I always save the page as a WebArchive to Devonthink Pro, and even after the images or pages hav been deleted from the sites, the WebArchive in Devonthink has still remained intact. To nail that home, I also ‘lock’ each WebArchive I store.

I’ve just noticed though, I have a twitter post saved as a web archive, and the tweet has since been deleted. The WebArchive in Devonthink shows the “Hmm… This page doesn’t exist. Try searching for something else” error message that twitter throws up when a tweet is missing.

Why is it fetching content from Twitter, and not the local saved WebArchive? I haven’t ‘refreshed’ the source or anything… really worrying if a webarchive is not actually keeping everything intact…

How can I save a web archive accurately and that will be a true reflection of how the page was when it was saved - but still keep it in a browser format, rather than screenshot or PDF?

Thanks,

J

cgrunenberg · February 9, 2022, 9:35am

That’s unfortunately one of the limitations of Apple’s web archive format, it’s not really suitable for dynamic web pages which load contents on demand and especially if the dynamic contents might vary (e.g. depending on user input or time etc.). Therefore either clutter-free web archives or other file formats like RTF, PDF or formatted notes would be an option.

Jayboux · February 9, 2022, 4:07pm

Hi Junction10,

I feel the same way you do in the desire for interactivity of my archives.

You can utilize the chrome plugin ( also a standalone program) SingleFile. It works as you ask, compressing everything into one, usable webpage. Even the ads come along for the ride.

For something a little more automated, there is ArchiveBox which utilizes SingleFile as one of it’s many methods of webpage archiving.