I just tried this test today, which DTP 1.3.1 apparently fails:
I have a collection of HTML and image files in a small directory. This is a verbatim web capture using Firefox’s ScrapBook add-on, so I know every little bit of the webpage is right there on my disk.
I imported index.html from this directory into DTP. This allows me to see the web page correctly.
I right-clicked, and choose ‘Capture Webarchive’. Now I have a second item with a correct title, which also looks just like the web page.
So in theory, the Webarchive is now a snapshot of the web page, living independently. Right?
I deleted the original directory from disk.
I deleted Library/Caches/Safari and Library/Caches/DEVONthink Pro.
I opened up my DTP database containing the Webarchive. Lo and behold, all I see is a blank white screen. If I “Open With” Safari, it shows me the webarchive, but DTP will not show me the contents of this page anymore.
For the sake of reproducing this bug, I’m put up the HTML+images archive of the web page in question here:
What I want here is a stable, reproducible, network-independent way of turning a webpage into a single file. Webarchives are supposed to do that, but I’ve found that sometimes when you delete the browsers cache and you’re offline, they don’t render correctly. Or am I doing something wrong in the process of creating them?
Not sure what the trouble is. I’ve got lots of WebArchives in my database and they all display properly in DT Pro, whether or not I’m connected to the Internet, and whether or not the pages I captured still exist on the Internet (of course, I captured them in case the pages were later removed). Most of my recent WebArchives are real estate listings of log cabins in Brown County Indiana – I checked out a lot of them and bought one.
But I usually don’t capture material as WebArchive documents, primarily because many of the journals and science/environment/policy news sources put ads and other extraneous material on the page, in addition to the article I wish to capture. So 99% of my captures are as rich text of selected text and images.
Now that Apple has released a version of Safari for Windows, I wonder if that version can display WebArchive files on Windows computers. Has anyone tried that?
That’s a WebKit issue related to web archives containing local file data but not having a file representation anymore. This doesn’t affect Internet web archives of course and V2 (using files instead of a huge databsae) will solve this anyway.
So, Christian, the problem I described a few months ago is caused by that WebKit issue you mentioned? And it’ll be gone by simply installing the notorious V2?
I don’t dare to ask when I could possibly install that release.