More on webarchive that turn "invisible"

jwiegley · October 1, 2007, 11:50pm

Ok, I have a very simple set of steps that still happens in DTP 1.3.3:

Save a complete webpage, HTML, stylesheets and images.
Open the webpage on disk and create a webarchive from it, capturing it in DTP.
Delete the original webpage and images from disk. Just moving them to the Trash is not good enough; you need to wipe out the files altogether.
View the webarchive in DTP now after restarting. You can’t see it!
Export the file and double-click on it; it displays fine in Safari.
If you use the Info window to delete the original URL (which points to a now non-existent file on disk), the URL will not change if you click on another entry and then click back on the webarchive. Even though the Info page now has no URL! It must be reading the URL directly from the webarchive.I’m faced right now with a whole slew of webpages that I converted into webarchives by “capturing” them in DEVONthink, but I cannot view them at all without first exporting the files and then opening them in Safari.

John

Bill_DeVille · October 2, 2007, 12:15am

John, why don’t you directly capture WebArchive files? The way you are currently doing it, DT tries to maintain the URL of the captured HTML page rather than the original URL on the Internet. But that URL no longer has a reference, as the files were deleted.

If you add the bookmarklets from the Extras folder on the download disk image to your browser, there will be a button that will directly capture a WebArchive of the page being viewed.

Personally, i capture rich text notes of selected images and text because I don’t want to capture ads and other extraneous material.

jwiegley · October 2, 2007, 12:32am

My problem is that I stopped using DTP to capture web pages several months ago, and I’m just not going back to it. So I have hundreds of pages that I captured with Firefox in this format, which I want to convert to a webarchive within DTP (some of the content is not available on the web anymore; I tried to recapture directly from the Web as much as possible).

John

Bill_DeVille · October 2, 2007, 3:12am

I suspect that when you create a WebArchive from the page downloaded by Firefox (Scrapbook?) DT Pro is using the “file” URL from your disk copy of the page. Check that by looking at the Info panel of such a WebArchive in your database. If that’s the case, the original URL didn’t carry over – which could be important for documentation of the material.

If you can’t see the WebArchive in your database (as I understand it, the original file was deleted from your drive), what happens if you go offline, e.g. turn off AirPort if you are using a WiFi connection? Can you now see the page?

See your earlier post: http://www.devon-technologies.com/phpBB2/viewtopic.php?t=4476&highlight=firefox+bookmarklet.

Note that there’s a script in the global scripts menu that’s available when Firefox is frontmost, that will allow capture of a WebArchive of the viewed page into your DT Pro database. What’s really happening is that, because Firefox isn’t scriptable (AppleScript), the URL is captured by the script, then the built-in browser in DT Pro downloads the page as a WebArchive. That script could be modified (and saved to the same location with a different name) to capture an HTML page instead of a WebArchive. The URL field in the Info panel will be correct.

The Archive bookmarklet in the DT Pro download disk image Extras folder can be added to the Firefox bookmarks toolbar and will save a WebArchive of the viewed page to your database. The URL field in the Info panel will be correct.

I know that many people prefer Firefox, and that some of the extensions are very useful for special purposes. But I probably capture more downloads from the Web than the average bear, and find that the Services and scripting limitations of Firefox make it too slow and clumsy for my purposes. But I’m probably spoiled by the convenience of capture options in DEVONagent and the DT built-in browser. To each his own.

jwiegley · October 2, 2007, 4:53am

No, sadly this does not work. Which means that if you have your HTML+images on disk (and not accessible via a web browser), there is no way to capture it as a webarchive in DTP? If I export all of my webarchive they are viewable in Safari, just never in DTP.

John

cgrunenberg · October 2, 2007, 6:28am

That’s a known issue of the WebKit which can’t properly render archives not having a file representation and containing local file URLs like in your case. One of the many reasons why v2 will be file-based.

jwiegley · October 2, 2007, 7:29pm

Thanks, Chris. I’m eagerly awaiting the fruits of your labor in that department!

John