Clip to DEVONthink / Save the webpage as-is without cookie confirmation

I am using the web clipper (in Firefox) a lot to capture web pages. But what annoys me is that the website that I see in Firefox (Cookies confirmed) lands in DEVONthink in a way where I constantly need to re-confirm cookies again, whenever I view the webpage as a document in DEVONthink.

I used to use Evernote to clip webpages, and it would capture a static view of a webpage on that moment.

What I’d like to accomplish in DEVONthink is this:

  • capture a static view of that site
  • preferably not in PDF format, but rather HTML
  • the way I see it in the browser
  • no reload in DEVONthink whenever I view the document

I could capture as PDF, but even then I see a big banner overlapping the document saying “confirm cookies”.

What am I missing?

Edit

Apparently I had turned off accepting cookies in DEVONthink

Settings > Web > Accept Cookies > Never

But thanks for all the answers!

Off the top of my hat: Cookies are not part of the web page. They are stored by the browser (and send by the server). So if you save an HTML page as is, the cookies are not part of it.

Regardless of the technicalities: DT says that it is saving the HTML data, and it even shows the size of the page. But it really seems to reload it from the original server, which leads to the behavior that you describe. And which is not desirable.

I think that “HTML page” should mean exactly that: the page itself at the time it was saved. Not something the server sends when DT tries to display the page again. Also, the documentation does not seem to describe the behavior shown by DT.

The captured HTML page might contain JavaScript and this is supported by DEVONthink (see Preferences > Web) and might cause this. The best options are to use a different file format or clutter-free layout.

2 Likes

What I have observed is that cookies I confirm in Chrome browser (on a Mac) need no re-confirmation in DEVONthink. But I’d rather not use Chrome browser but stick to Firefox.

So how is it possible that confirmed cookies in Chrome need no re-confirmation but those from Firefox do?

Also, when I capture a website on that very moment, I need to rely on the information captured. That a captured HTML page can change in DEVONthink makes it not acceptable for my work.

But maybe I am missing on something, and there is a switch that makes it possible to statically create a copy of the webpage.

Unfortunately, in html format there is no way to check “clutter-free” format.

As @cgrunenberg said: the HTML page might contain JavaScript. Which in turn might build part of the page by requesting data from the server. Imagine a shop system… No static HTML pages there.

Than you should use PDF or MD. HTML is (since a long time already) a dynamic format, thanks to JavaScript. And not only that: even simple links in the page or images are dynamic elements that are loaded from a server (<img src="http://example.com/image.png">). There’s no way to make them static in an HTML document that you save. Regardless of the software you use for that. If http://example.com/image.png points to a car today and to a motorcycle tomorrow, you have to use a different format than HTML if you want to be sure that your file always shows the car.

You could use formatted notes which are based on HTML too.

The hint with JavaScript was very helpful.

Turned off JavaScript in

Settings > Web > "Enable JavaScript"

and now I am able to see the captured pages (from within Firefox) without re-confirming cookies in DEVONthink.

Thank you very much!

1 Like

But the pages can still change over time. Or rather their representation.

So I should rather save / clip in Webarchive format instead of HTML?

This comment here suggests so:

Webarchive is deprecated (Apple deprecates WebArchives - what does this mean for DEVONthink?) and not widely supported. Meaning that apparently the only browser you can use to display them is Safari.

If you want to capture an HTML page frozen at the moment you saw it, my best bet would be a portable format like PDF. Especially if (from your quote) images in Webarchives are downloaded again when the computer you’re viewing them on is online (what a weird concept of “archive”). Which would against the whole idea of having a frozen copy of the page.

Ahh, the joys of dynamic content delivery… sigh.
:roll_eyes:

Not necessarily just dynamic content. Markdown webpages clipped by DEVONthink, wether clutter-free or not, do load from their respective servers too. Images, that is. Which is the reason they do not qualify for archiving either.

1 Like

Good point. As long as references to files (aka URLs) are saved, regardless of the format, the result is not an archive. Only if the current off the referenced files are embedded in the final result one has a real archive. Less complicated: PDF is the best choice here.

I use .pdf to get a pretty good representation of the page/s. I used to choose unpaginated from the clipper drop down but I found recently that DT now saves one long strip which it then tries to print all on one A4 sheet (it renders fine on screen so I hadn’t noticed). Now I have changed to paginated pdf which prints OK. This happens on both Safari and Firefox.
Some newspapers prevent clipping to DT; I have found that sometimes it works to export the page as pdf in Safari (File/Export as pdf) and then drag it into DT.

1 Like

That’s at least one step too many: Just print the webpage and pick Save PDF to DEVONthink 3. This method respects the activation status of Reader View too.

1 Like

I save on iPad-DTTG as Webarchive to one database which is synced to my Desktop-Mac. There a smart rule coverts it to a pdf and ist synced back to to my iPad-DTTG (the webarchive is deleted. Most importantly: All weblinks are saved.