WebArchive Question

Could you please clarify something for me? When I choose to save a Web Archive what exactly am I saving? Is it a static view of that web page (that won’t change if the actual web page does) or is it a marker (essentially a bookmark) to the actual web page?

I presume, since I have the option to save a “bookmark” to DEVONthink, that a web archive is, in fact, a static rendering of that page, but then when I look at it in DEVONthink or DEVONthink To Go, the archive appears to be loading the actual web page.

My goal is to have access to a page whether it changes (or is even online) in the future. Appreciate the clarification!

– Robert

Once upon a time, that was true - and actually is for many sites still. However, many “popular”, paywalled, and click-bait filled sites have their content delivered from external sources. The webarchive will contain javascript and other code that downloads and displays the content when you view the page, ie. it’s live, not a static capture.

If you want a static capture, a PDF is generally considered the best option. Note the dynamic delivery, etc. can also sometimes produce a wrong PDF.

Thanks, Jim. Your support (and speed) is exemplary. Much appreciated.

– Robert

You’re welcome!

Thanks for the details @BLUEFROG
I’m still a bit confused… how does the algo decide which to keep static and which to keep dynamic.

Paywall sites obviously wont by static . but isnt the idea to have a snapshot representation (whatever that looks like) at the time of capture?

I have disabled JS to prevent any code executing… I understand some saved clipping wont load properly (say a fancy agency website with animations etc)…

I’d prefer not to use PDF, as I’d like to keep it fully searchable and potentially open for conversion later if exporting out to another service/app in the future…

how does the algo decide which to keep static and which to keep dynamic.
… isnt the idea to have a snapshot representation (whatever that looks like) at the time of capture?

What content is (or isn’t) captured is not up to us, but up to what the mechanism imports from the site. The site’s design determines what’s static and dynamic in how it’s built and content delivered.

I’d prefer not to use PDF, as I’d like to keep it fully searchable

I’m not sure why you’d think PDF isn’t fully searchable in a web capture. It certainly would be, as much as the other formats would be.

Ok thank you for the thoughts.

I guess my question was simply as Op asked- if my internet connection is off, will I have access to what I’ve saved. I understand your reply now…

About the PDF, yup I get that the text is searchable…
I was thinking more for long term flexibility - a PDF feels like a flattened file that can’t really be edited outside of Adobe or some other image editing software… while an .webarchive, or other format that retains HTML + CSS, could potentially be processed later programmatically if needed.

Thought Im not sure how portable a .webarchive is…

Since I’m only interested in the content of a website, I always capture as MarkDown. When it works (which is almost always) it’s a ‘static snapshot’ of the content of the page, and is searchable and the layout can be manipulated for printing (or even edited).

I suppose I go with this method as it’s most like the ‘clutter-free’ clipping I used to do in Evernote.

Thanks @funkydan2

My needs are a bit different -
I clip the entire page even with all the ads/junk, it’s the visual design of the page that I remember. When I’m seeking, scanning hundreds of search results from an archive that has many thousands, color and layout help me spot what I need quickly.

Scanning hundreds of text files means you gotta read everything :wink: