Apple deprecates WebArchives - what does this mean for DEVONthink?

Web archives just aren’t practical as an offline solution.

Webarchives are still practical for many sites, just not all of them.

PDF is too rigid, especially since being able to edit the captured page is key (to remove useless footer stuff, for example)

PDFs are still the best option for static representations of a page. If they’re captured as Paginated PDFs, you may be able to excise pages at the end of the document.

1 Like

I guess, but then you get all kinds of crap that can’t be removed. And the pagination is static, too. What if you want to convert from Letter to A4 at some later point?

Have you tried the SingleFile extension? You can use it to remove practically any elements you desire, then save the page as html, which is static but also infinitely editable and which can be converted to practically any other file format at a later date.

2 Likes

The format has been discussed but nothing decided either way.

can you share your rule for us non programmer types? thanks!

I don’t think that there’s any programming involved here – just a simple smart rule like this (not tested!)

1 Like

I’ve found when converting WebArchives to Paginated PDFs that documents containing mathematical formulas with exponents are expressed incorrectly. The exponent comes down off its perch. Any thoughts on correcting this?

First, sorry for the long delay in reacting on this thread, things were busy IRL.

Unfortunately, I have come across many pages where capturing as PDF cuts text in the middle of lines, e.g. https://www.keensoft.es/en/alfresco-devcon-2019/ . At the moment, capturing web content is a bit of a mixed bag with DEVONthink – I would welcome a new, independent approach beyond PDF and WebArchives. However, I also understand how tricky it is to even decide on a format in this context. Let’s hope some alternative will present itself.

No problem at all!

And we are always working on something so you never know :slight_smile:

2 Likes

Damn man, now I can’t wait for something. :grin:

2 Likes

Looking into webarchives I found this thread again. Worth mentioning:

The WebArchive class is deprecated, not the file format webarchive.

There’s a new methode, from WWDC20 Notes:

WKWebView has learned to create Web Archives with createWebArchiveData(completionHandler:)

Everything’s fine :slight_smile:

2 Likes

As long as you don’t plan on ever using these archives outside of the Appleverse.

1 Like

I simply love WebArchives!!!
Cannot imaging living without them.

What I noticed is, that DTTG (probably DT too) seems to refer to the remote site!
So, most of the time when I do something with WebArchives, I disable Internet access for the iPad and then handle the WebArchives - otherwise, DTTG tries to access the orginal website, as it seems.

This can be annoying, of course.
So a way to disable active internet access for WebArchives would be GREAT :wink:

1 Like

Alternatively, clipper can capture webpages with SingleFile. It produces a self-contained HTML file with all the images, styles, and scripts. The files can be viewed with a regular browser so don’t require anything special but WebKit.

3 Likes

Thanks a lot for this reference! SingleFile looks like a very interesting project; unfortunately, due to “real life” interfering, I won’t have the time to test it for a while. Perhaps someone else is also interested and could look into DEVONthink / Safari integration of this tool, possibly via a script?

I’m curious to see if SingleFile works with this page Golang project structuring — Ben Johnson way | by vignesh dharuman | SellerApp | Medium. This page contains code snippets hosted at Github; I have tried every format the DT clips to, including web archive, and cannot get the code to be embedded in the output.

Scroll down and wait until the GitHub parts are loaded. Afterwards two ways work over here in Safari.

Webarchive (via dragging)

  • Select the part that you want to clip
  • Drag it onto DEVONthink’s icon in the dock

PDF (via printing)

  • Press ⌘+P
  • In the left corner under PDF select Save PDF to DEVONthink 3

Instead of using this menu via mouse you can create a shortcut, search the forum.

Thank you for the suggestions @pete31. I’m just not having much luck capturing the entirety of the document. When I print to PDF, the file is subject to the whims of a paper page size, which proves to be too narrow to contain the code samples

I also tried selecting all text in the article and dragging to the DT dock icon, but it too provided spotty results:

I’m including the files created in DT for reference
PDF printing.pdf (383.0 KB)
select and drag to DT icon.webarchive.zip (279.9 KB)

Too narrow?

The width of the printed PDF is contolled by the Page Setup in the printing application.

True, @BLUEFROG. I was able to get a much higher percentage of the code parts to render by going from Portrait to Landscape, but it did not get everything. Just for kicks I created a page that was 50x50 inches and it appears to have captured all the information.
monstrous page pdf.pdf (377.5 KB)
It is obviously too wide, but having to tweak and proofread kind of obviates quick and easy clipping.

Umm, I now checked the records I captured carefully and they are also missing parts (PDF) or do not wrap (Webarchive). So that was a bad suggestion (was in a hurry …)