I want to save the content of the page in case it gets deleted. I also want to know if the content of this web page has been updated. So, I would like to know how webarchive handles this issue. And if there is a better suggestion for this scenario.
In case of dynamic webpages loading contents on demand a web archive is not really a future-proof format (and limited to Apple’s platforms too). A different format like PDF, formatted notes, Markdown or rich text is recommended for archiving and doesn’t require the original webpages.
I’m following your advice. But formatted notes or pdfs often don’t fully download images.
I get some pictures like this:
Probably a dynamic website, it’s likely that the web archive doesn’t contain them either and downloads them from the server. Instead of clipping you could print a PDF to DEVONthink or take a note using services in this case.
Thank you for your patience, and one more question. The same page is being saved in formatted note format.(I don’t choose clutter-free!) Sometimes the file is up to 8mb and sometimes it is only about 500kb, how does this happen?
Absolutely the same URL using the same settings?
Yes. But it operates slightly differently, and I’ll describe what I see, not necessarily accurately. I’ve found that when I activate the Chrome plugin, the web page refreshes, and if I download it without waiting for the web page to refresh, I get the That one’s big in size. On the contrary, if you activate the plugin, wait for the refresh to finish, about 20 seconds later, and then confirm the download you’ll get the one under volume. The site I tested is medium.
Today was my first time using Devonthink and I wanted to use it as a replacement of Evernote web clipper. So, I’m trying to figure out which format would preserve the look of the page as much as possible, while having an acceptable file size.
DEVONthink uses the currently rendered page to improve the compatibility to websites requiring a login, therefore it’s definitely recommended not to do this while loading.
You mean, when I get that 400kb file, the operation is correct? What’s the extra data in that 8MB file, a full 20 times difference.
It depends probably on the website and what kind of data is loaded when.
That pages has many images in it, so the bulk of the size is most likely from them.
After much testing on this, I save most of them in PDF (losing videos and animations), but Formatted Note is my second choice, as at the end are self contained HTML files and can be viewed in Windows without any problem (as PDF are), and use to maintain animations.
I’ve seen some advice elsewhere on the web that says to open the page in Devonthink first, then save it as another format, which works better than saving directly in another browser. This is because Devonthink does some kind of processing of the page when it is saved, and this processing is uncontrollable. Conversely, when you open a web page in Devonthink, because Devonthink has a built-in Webkit, except that you can’t enter it directly in the address bar, so when you want to convert the save format, you can achieve a true WYSIWYG effect.
Do you agree with this suggestion, although there is an extra step to save the page to Devonthink first.
I use Devonthink on a Mac and found that for some web pages that contain images, I switch to Reader View (where available) and then print to PDF. This retains the images in the PDF and is formatted properly.
It’s also an elegant solution.