Properties of webarchive in DTPO?

Hi all-

I had some memory of a post on this, but couldn’t locate it. I’m curious about the properties of a webarchive in DTPO. In other words, is it a static document format?

When I select a webarchive document in DTPO, I notice activity in the URL bar. Is DTPO downloading the latest content from the URL?

If not, is there a way to “refresh” the content of a webarchive?

Bill DeVille also commented a while back that webarchives are “flaky.” Still true?

Thanks! Charles

I’ve never discovered problems with any of the multitude I’ve saved, but their content could be more conservative than what might still be problematic.

Christian has noted that WebArchive is the least stable of Apple’s proprietary file formats, as there have been bugs in WebKit. Some of the older WebArchive files captured under earlier versions of OS X may have problems, especially under Snow Leopard.

Dynamic images (especially common among ads) result in online activity.

My usual capture mode is as rich text of selected text and images, as I’m more interested in content than in the page layout.

I wonder if Web Archive Extractor could salvage content from those problematic files.

I avoid those in webarchive captures.

The Printliminator preprocessing is one method I’m using to get rid of unwanted cruft before doing webarchive captures.

Hey you two-

Thanks for the responses.

The one question of mine that (I think) hasn’t been answered is this: if I webarchive the page of a blog, for example, will DT preserve that state? Or will DT continue to “refresh” the webarchive as new posts are blogged, thereby displacing the state (and content) that I originally captured?

Happy Thanksgiving!


Small observation – I checked a handful of webarchives from 12 and 36 months ago. From NY Times and a forum I follow (not the DT forum). Both of them were static - no updated content. In sections of the page where NY Times normally updates the content regularly through the day, the content is still the content that was there when I captured the archives 36 months ago.

My experience matches korm’s, i.e. webarchive content has remained static.

Thanks Korm and SJK!