Web Archives Deprecated - Should We Migrate?

Looking at the apple developer docs, WebArchives have been deprecated as of 10.14. Should we be converting our web archives to something more durable e.g. PDFs or just using bookmarks within DEVONthink? Does DEVONthink have a replacement in mind? Thanks!

2 Likes

If you want an archive to be a snapshot of a website at a certain point in time, you have to go with an immutable format like PDF, i.e. something that has neither dynamic content like JavaScript nor links.

A bookmark will only show you what is there at the moment you open it. Which has the advantage of being small and the disadvantage of presenting you with an error message if the original web page goes away.

2 Likes

Yup, same page. I guess the point is do we stop using web archives which have the benefit where links still work etc, or decide if we want bookmarks vs. pdfs?

I print the pages to pdf (from reader view so it doesn’t have the ads) and the in-text links work fine for me.

Correction:They do not, I have no idea what I’ve been clicking on!

That’s strange. I routinely export to PDF from Safari on macOS, and the links work. Perhaps there’s a difference in browser behavior?

1 Like

I only use PDFs printed from Safari’s Reader Mode and also the exported Safari Full Page PDFs from the File > Export as PDF feature, and all links work fine for me. Links not working would be a non-starter for me.
Printed PDFs without Reader Mode also retain all links.

2 Likes

Thank you for flagging, and @chrk. I went to check after I wrote my comment and the pdf I tested didn’t work, so I thought maybe I’d muddled the function up, and it was only actual pdfs I download off the web that do this. However having seen your comments I’ve just tried a couple of other pdfs in DT that are definitely web pages in reader format that I printed to pdf and the links are fine. So, I think I was just very unlucky and the one I tested after I wrote my comment doesn’t work for some reason (I can’t remember how I got it so it’s probably my error or a blip!).

PDF is still best after all :blush: I try to use it for all web pages and for emails (I try not to save many emails, but I think the pdf format is more useful for archive than the email format).

3 Likes

I was playing around with the Safari extension to clip (what I used previously for Web Archives) and did a ā€œclutter freeā€ PDF. The output is not what I was hoping for…

Screen Shot 2022-02-26 at 6.38.52 PM

I never use the DEVONthink clipper for PDfs because the clutter free option fails 7 out of 10 tries, like in your example, and the normal option practically always includes some banner, cookie notice or other pop-up obscuring the content.
The best options for PDF in my experience are Safari Reader View printed to PDF and Safari’s File > Export as PDF feature. Other browsers don’t have anything comparable, especially the Export as PDF feature functions as a full page screenshot, but also includes links.

The only way around this issue would be if DEVONthink’s clipper would gain a local PDF feature (like Safari) without sending everything through DT servers. This would guarantee that you clip to DT what you actually see in the browser.

Thanks for the reply @chrk, I’ll do some more testing that route. One downside (besides the 3 step process) is that the URL is not preserved as metadata on the document so I cant use the ā€œLaunch URLā€ feature.

My feature request would be a simple one-click capture PDF, includes all the page metadata and title information. Might need to applescript this…

1 Like

Yuck. It is in fact capturing only the details element of this page. I guess that might be due to the fact that the web page authors decided to use JS code to simulate that element (completely unecessary nowadays).
Firefox’ ā€œPrint to PDFā€ function renders everything but this details element (no wonder: Since the details are only shown when the user clicks on the summary part, there’s hardly anything it can do). The same happens, btw, if you choose the non-uncluttered PDF option in the DT add-on.

In any case, saving HTML to PDF can lose you some functionality and content (like the said details element). The best results are to be expected if the web page provides styles for printing. Which deplorably few do.

So true, agree. Honestly this is why part of me just wants bookmarks but then I don’t have the ability to search for content within the page details.

You’re welcome.

About the URL…
When using the ā€œfull-pageā€ Safari > File > Export as PDF feature, it is indeed not included.

However, when using print to PDF from Safari (Reader Mode or not), it is inclued as long as you use DEVONthink’s PDF services add-on. You can find this in the menu DEVONthink > Install Add-Ons.

Then, in any print dialog, you get an option to save a PDF to DT, directly. This will include the URL.

Using the default ā€œSave as PDFā€ does not include the URL.

In System Preferences > Keyboard > Shortcuts, you can also assign a shortcut to it (mine is cmd+p).

So, hitting cmd+p twice will import the PDF to DT, including the URL.

To automate this, I made a BetterTouchTool flow. So with just one gesture on the trackpad, Safari opens Reader Mode, sends the PDF to DT and closes Reader Mode afterwards. (The delays are only needed when using this for 50+ page articles by the way.)

With the help of BetterTouchTool, this is my 1-click solution. You could also use a keyboard shortcut to initiate it.

2 Likes

I use a workaround with the Evernote web clipper (my only remaining use of Evernote). That web clipper is pretty good. I clip as a ā€˜simplified article’ and after I have a couple of articles (easily edited and cleaned in Evernote), I import them to Devonthink. This results in formatted notes.

1 Like

This is my solution also… until DT3 can improve their web clipper that is :wink:

I use almost the same method as @chrk. I switch to Reader on the page I like, then I use my shortcut cmd+p+p (cmd p twice) to print to pdf, which writes straight to my DT inbox. Whole process takes less than 30 secs.

Switching to Reader view and printing on iPhone and iPad works the same too, albeit with an extra step to save to DT.

5 Likes

hey :slight_smile:
which docs is it that says webarchives are deprecated, and is it that they will no longer be createable or no longer openable?

Using an internet search engine ā€œapple web archive deprecatedā€ finding Apple Developer Documentation. Apple might have published more, but I did not look further.

I’ve solved the mystery of pdfs without clickable links. I’ve noticed that if you do ā€œReader view + print to pdfā€ on the iPhone the links in the PDF are not clickable. I have no idea why and haven’t had a chance to test this on an iPad. PDFs made on a Mac are fine.

I’ve been wondering about why this is the case for a while now. I only know 1 iOS app that includes links in generated PDFs, InstaWeb, which I use regularly. It’s obviously possible to keep the links intact, so it’s absurd that most apps don’t do it.