OK, here’s a problem for all you Gods and Goddesses out there. I have had this same concern with other softwares so it would be nice to understand this once and for all.
If I save a page as an archive, and the page is WITHIN a website that I had to log into, the archived page is just the log-in page, when I go to look at it later – it’s not the page I wanted at all.
Why is it that an archive cannot be an HTML version of THAT PARTICULAR PAGE and be kept on my computer forever more? It’s not really an HTML version, it’s more of a link, really, since it also seems to update itself if the page itself is updated. This isn’t what I think of as an ‘archive’. This to me is just the same as saving the link. Am I wrong? I remember having this same problem with Webstractor.
What you need to do is save the page as a PDF to have a real copy on your hard drive forever more. But in fact, using DT’s bookmarklets, the PDF buttons make a PDF of … the log-in page! That’s so weird. What you have to do, it seems, is PRINT the page, and choose PRINT PDF TO DT. And then it usually works.
At the extreme end, even printing to PDF sometimes gives me a PDF of the log-in page. For instance, I made a payment today via Western Union and I was on the receipt page of WU, with my payment information and all that before my eyes, and I tried printing it to PDF to save it as a receipt and it printed the Western Union home page! I had to actually take a SCREENSHOT of the page in order to keep a record of it. Perhaps that page is ultra-secure somehow.
Anyway, this is important, because I’m saving lots of webarchives for research, and I’m going to be worried if, when I go to get them later, they are all archives of updated pages, which no longer contain the information I need.
Many sites, including my bank’s online Web site, prohibit a second download of a displayed page unless a second login is made by the user. What you have discovered is that attempting to capture such a page by means of a script or bookmarklet results in a re-download of the page, and so all you capture is the login page.
Of course, you can select the text/image content of such a page and capture it as a rich text note.
Usually, to preserve formatting when I’m saving a bank transaction to a database I ‘print’ the page as PDF. The Print command is allowed by my bank, and the page doesn’t have to be re-downloaded, so that always works.
Yep, that formatting is quite satisfactory for my infrequent usage purposes. And mentioning Capture PDF led me to discover that I can get the same result using the Add PDF Document to DEVONthink script directly from Safari. Problem solved – thanks!
D’oh! Thanks, Charles. All this time I’ve never paid attention to those icons (and wish I had).
I actually prefer that printable view better, without any CSS adjustments, than the capture method for creating PDFs of full threads. But maybe I’ll find reasons to use the capture method, too. Problem re-solved.
Yeah, it’s pretty good. Would be nice if the font size difference between body, quotes, code, etc. was a bit smaller, but that’s a pretty trivial issue.
OK I can see the perfection of korm’s formatting if you want to surf in DT. I don’t know where Add PDF Document to DEVONthink is – if it’s a script, and you’re in Safari, do you need to put the script into Safari’s scripts folder, sjk?
What I have noticed now is that there is a contextual menu in Safari itself which does exactly what I’m talking about – it saves the page as an archive rather than as a PDF and it doesn’t update when the page is updated, and you know why? Because the address for the page is on my hard drive rather than on the web. The HTML archives I save in Devonthink all have addresses on the web. Doesn’t that – I’m asking this again – technically make them links rather than archives?
Meanwhile I can always save pages as archives in Safari itself and drag them into DT… although I don’t suppose this is going to be so easy if I’m saving a bunch of webpages from Devonagent, into DT.
Of all these questions, my main one is: can somebody who made DT tell me why archives have a web address rather than a hard-drive address? This is me learning. Cheers.