Webarchive doesn't show static copy of web page

Hello all,

I would like to clip web pages to DT as webarchive. I have the impression that these webarchives are not a snapshot copy of the webpage but rather open the page in a browser-like window. This is inconvenient p.e. when I have clipped a webpage accessible for within a network that isn’Ät accessible from the outside.
What can I do to save a real snapshot as webarchive?

Best regards, Michael

If you capture html, you can never be sure that it looks today exactly the same it looked yesterday or will look tomorrow. Nowadays websites are not static content anymore but lots of them are dynamically generated with JavaScript.

If you want static content, you’d have to use a static format like PDF. Even markdown is kind of dynamic, when it’s converted to html or PDF: image looks and the like may load different content now than they did before.

If you want a snapshot, and still want pure HTML, use Formatted Note, that is HTML with embedded resources. If not, as Michaell has said, use PDF.

Thanks for answering!
In my eyes a “webarchive” should be different from a bookmark in a way that it displays a local archive of the snapshot of a webpage… I wasn’t aware of DT’s webarchive being more like a bookmark.

As Apple has deprecated the webarchive format, this might be another motivator to move to PDFs or formatted notes.

Webarchive is a format conceived by Apple. Afaik, it was never used by anyone outside of the Appleverse. And given the implications of Javascript, it was never static anyway. Thus not an archive in the strict sense of the word.

2 Likes

I gave “formatted note” a try, when “cleaned up” is unselected the result is great!

Do you know if “HTML Page” as a format is also immune to dynamic content changes?
I could open it while blocking DT internet access and it looked much better than “Formatted Note” (which in some cases was complete chaos).

No, it is not, because HTML Page (for example is what is captured by DevonSave script), needs to download images and other stuff from internet. If you have DTTG you can do your own experiments. Capture a web with some images or live content with DevonSave Shortcut (search here to find it), then sync with DT, stop internet before open the captured result and you will see some parts aren’t present. Then connect internet one more time and open the capture and it completes. Then you can convert into Formatted Note or PDF.

Normally Formatted Note and pure HTML pages lacks some advanced formatting and “live” stuff like server database access etc… Depending on website it could be almost impossible to capture a page without wrong parts or strange artefacts.

What I do is use DevonSave script in my iPad/iPhone, then sync into DT in my Mac, I open the page and edit/paste fault parts, remove other garbage, and then convert into PDF. But what I use to capture is more or less documentation sites or static news I want to save.

2 samples of what I do:
Una fórmula que genera los primeros 42.000 millones de decimales de π y luego deja de ser precisa - Microsiervos (Matemáticas).pdf (82.3 KB) Four Unpublished Letters from Nicolas Fatio de Duillier to Isaac Newton in- Nuncius Volume 34 Issue 3 (2019).pdf (895.6 KB)

1 Like

Whatever its name: If it captures HTML, it is either subject to content changes (think images, javascript) or it is not complete (think network not accessible). The only exception to this rule are text-only HTML pages with no reference to any other content on the net. Which kind of goes against the idea of “HT” in “HTML”.

I think you have to define your goals:

  • snap a kind of picture of the HTML page as you see it now that will never change: use PDF.
  • keep a copy that looks more like HTML, has clickable links embedded etc: use HTML, either as a formatted note or as HTML or as web archive. But don’t expect it to look tomorrow as it looked today.

I’m afraid you can’t have it both ways: dynamic content that does not change is a contradictio in adjecto. Either it doesn’t change, than it’s not dynamic. Or it’s dynamic, than it can change.

You can of course always have a look at webarchive, too. No idea what they’re doing, though.

2 Likes

Thanks. For now I’m using the InstaWeb app on iOS to save PDFs, because it is the only app I know that preserves links in PDFs on iOS.

On Mac, if Devon’s PDF options look bad (cookie notices, etc.), I print to PDF or use an extension like Page Screenshot.

Thank you. The main issue I have is that the formatting of PDFs looks bad sometimes, but as mentioned in my comment above, there are options for this too (all with downsides unfortunately).

Webarchives are not a DEVONthink format, but an Apple format. Also, I wouldn’t call them “like bookmarks”:

They try to archive the full web resources locally and show these local resources then in a browser-like window, but they not necessarily capture all resources a page uses to display its content (e.g. JavaScript might fetch stuff later).

My problem with PDFs is that they in my experience very often show the page quite differently Lay-out Ed to what I see in a browser like Firefox. (The reason, I guess, is that PDFs use print media queries in CSS which often is rather half hardly implement.)

In today’s world of complex web sites, the only way I’ve found to get an exact rendering of a web page (not something I do often), is to do a screen shot as a PNG or JPG. I rely on PDF, often of the Reader View, and accept that gets me the content.

1 Like

Even what a HTML document looks like in a browser is not fixed. You may choose another font and/or font size as the author of the document/css. You may make your browser window smaller or larger causing a reflow. You may decide to not load images. You may do a ton of other things (or your browser might without you even noticing).
I’m afraid, @rmschne is right here: if you need a foto, take a foto. If you need the content as frozen as possible at this moment in time, use PDF. If you insist on HTML, be aware that what you see tomorrow might not be what you saw yesterday and what you see here might not look like what you see there.

1 Like