But this is very inefficient. I use browser extention for archiving web page to Devonthink. It send url and title to ‘Clip to Devonthink’ form, but it captures a page without safari cookie nor Devonthink browser cookie.
Personally, I almost never want to capture a full Web page. In most cases, there’s an article that I want to capture, plus irrelevant images and text on the page that I don’t want to capture, as they would add additional file size and irrelevant text would reduce the accuracy and precision of searches and of the AI assistants.
My preferred capture mode is as rich text, using the appropriate Service in Safari which has the keyboard shortcut Command-) to capture the selected area of the page. In Safari, I usually click the button in the left of the URL address field, which automatically selects the main content of the page (the desired article). Then press Command-A to select that and Command-) to capture it. The URL of the Web page is captured in the document’s Info panel, just as it is with a WebArchive capture.
This works without the dual access mode of Clip to DEVONthink, so you won’t get captures of login pages only.
The rich text capture will include clickable links, images, formatted text, lists and table in the selected area of the page. Often, I’ll then delete images that do not add to the information content.
As a result, my Web page captures usually result in a file size that’s from one to three orders of magnitude smaller than a WebArchive capture of the full page, and searches of my captures avoid irrelevant text that might be included in the WebArchive capture.
Over the years I’ve captured tens of thousands of Web pages this way, with enormous savings in file size and improved efficiency of searches and the AI assistants.
Note: Neither Chrome nor Firefox is capable of capturing rich text. The Service to capture rich text does work in Safari, DEVONagent and DEVONthink’s built in browser.