I am using the web clipper (in Firefox) a lot to capture web pages. But what annoys me is that the website that I see in Firefox (Cookies confirmed) lands in DEVONthink in a way where I constantly need to re-confirm cookies again, whenever I view the webpage as a document in DEVONthink.
I used to use Evernote to clip webpages, and it would capture a static view of a webpage on that moment.
What I’d like to accomplish in DEVONthink is this:
capture a static view of that site
preferably not in PDF format, but rather HTML
the way I see it in the browser
no reload in DEVONthink whenever I view the document
I could capture as PDF, but even then I see a big banner overlapping the document saying “confirm cookies”.
What am I missing?
Apparently I had turned off accepting cookies in DEVONthink
Off the top of my hat: Cookies are not part of the web page. They are stored by the browser (and send by the server). So if you save an HTML page as is, the cookies are not part of it.
Regardless of the technicalities: DT says that it is saving the HTML data, and it even shows the size of the page. But it really seems to reload it from the original server, which leads to the behavior that you describe. And which is not desirable.
I think that “HTML page” should mean exactly that: the page itself at the time it was saved. Not something the server sends when DT tries to display the page again. Also, the documentation does not seem to describe the behavior shown by DT.
If you want to capture an HTML page frozen at the moment you saw it, my best bet would be a portable format like PDF. Especially if (from your quote) images in Webarchives are downloaded again when the computer you’re viewing them on is online (what a weird concept of “archive”). Which would against the whole idea of having a frozen copy of the page.
Not necessarily just dynamic content. Markdown webpages clipped by DEVONthink, wether clutter-free or not, do load from their respective servers too. Images, that is. Which is the reason they do not qualify for archiving either.
Good point. As long as references to files (aka URLs) are saved, regardless of the format, the result is not an archive. Only if the current off the referenced files are embedded in the final result one has a real archive. Less complicated: PDF is the best choice here.
I use .pdf to get a pretty good representation of the page/s. I used to choose unpaginated from the clipper drop down but I found recently that DT now saves one long strip which it then tries to print all on one A4 sheet (it renders fine on screen so I hadn’t noticed). Now I have changed to paginated pdf which prints OK. This happens on both Safari and Firefox.
Some newspapers prevent clipping to DT; I have found that sometimes it works to export the page as pdf in Safari (File/Export as pdf) and then drag it into DT.
I save on iPad-DTTG as Webarchive to one database which is synced to my Desktop-Mac. There a smart rule coverts it to a pdf and ist synced back to to my iPad-DTTG (the webarchive is deleted. Most importantly: All weblinks are saved.