URLs in the future

Just reading a few posts over in the Scrivener forums and one of them (http://www.literatureandlatte.com/forum/viewtopic.php?f=19&t=5867#p47773) got me thinking.

I regularly pull in webarchives to DTPO. Daily.

When I export these, however, the URL is not retained. At least not as far as I can tell.

I mean, long live Devon Technologies (and I sincerely mean that), but what if, as the poster in the link above indicates, 15 years from now DTPO does not exist?


Using which method(s)?

Where do expect it to be retained?

Should be stored as /WebMainResource/WebResourceURL in both the original and exported files, which can be examined with Property List Editor. See: microcoder: Safari’s WebArchive format

Web Archive Extractor is a handy GUI utility for doing what its name implies.

Not sure why that matters because there are plenty of URLs that won’t survive 15 years, especially those increasingly being created using URL shortening services (which I consider an unwise practice in contexts where prolonged URL survival may be useful and intended).

When I’m in NetNewsWire (or Safari), usually just Command+%.

Good tips, thanks. TextWrangler viewed the URL just fine.

True enough. I’m not so much concerned if the URL eventually dies, but rather than I have a record (for research purposes) of where something came from.

With the history of Mac apps to date, odds are it won’t. But the data (your webarchives, etc.) are not captive. They are in their native format inside the DT database package. This gets discussed frequently in the forum.

Hi korm,

Indeed, it is discussed quite a bit - some great threads, too. Storing files in their native format is a great way to take the risk out of someone purchasing the product.

What I was saying is that, in this specific instance (a webarchive), having the native format is fine, but the metadata information (i.e., the original URL of the webarchive itself) is retained in DTPO’s format. As sjk pointed out, however, and this is something I didn’t know, it is indeed is available.

Reason I asked is because saving with File > Save As… from Safari (e.g. to the global Inbox folder) adds the URL to kMDItemWhereFroms Spotlight metadata (and com.apple.metadata:kMDItemWhereFroms xattr). And with Ecamm Network’s free DownloadComment installed the URL is also added to Spotlight Comments (kMDItemFinderComment and com.apple.metadata:kMDItemFinderComment).

All that can be retained when exporting from DT. Also, when exporting with File > Export > Files and Folders… any documents URLs are stored in the associated DEVONtech_storage file(s) (which, for some unknown reason, still doesn’t have an extension). DEVONtech_storage files read during DT importing aren’t particularly useful outside DT, though it wouldn’t be hard to extract URLs (and other metadata) from them if desired.

Does DT display the URL for documents captured with Capture Web Archive service (Command-%) from NetNewsWire? If yes, I wonder if it uses /WebMainResource/WebResourceURL embedded in those documents or is stored separated.

Nice. Presumable TW’s a better plist viewer/editor than Apple’s stinky Properly List Editor (at least in Leopard; dunno if it’s changed, for better (not hard) or worse (not easy), in Snow Leopard). :wink:

Same here.

One thing that originally appealed to me about DT was its relatively thorough retention of metadata during importing/exporting that other products omitted. Some people who claim DT isn’t “Mac-like” enough for them seem to overlook the more subtle things it does that qualify it as being more Mac-like for my purposes.