Capturing Medium articles to WebArchive

scotte · November 9, 2022, 8:05pm

I want to capture some medium articles to DT so that I’ll be able to reference them later. I am a Medium member and some of these articles are for members only.

When I initially capture them to WebArchive, all is good and they can be opened. However, after a period of time, when I open them again, they appear to work just fine and then the page goes to a 500 error “Apologies, but something went wrong on our end.” message. My medium account info is shown on the top right just fine, but the content isn’t there anymore.

I am not enabling “clutter free” when I save them (the format isn’t as nice).

I would have expected the complete page to be rendered from the webarchive itself, but it appears it is still going out to Medium to retrieve info and my guess is that I’m logged into medium with a different session than was active when I captured the article.

Ideas / solutions?

chrillek · November 9, 2022, 8:31pm

I don’t use Medium but I can imagine that they embed JavaScript in their pages to check credentials. That will probably execute when you open the web archive.

AlanRalph · November 10, 2022, 8:53am

Unfortunately, Medium are one of a growing number of websites who make it difficult to save contents for posterity. Your best bet is to treat the WebArchive as an intermediate step, and use DEVONthink’s conversion tools to make a copy in a different format. You may want to clean up the WebArchive first to remove things like header navigation, sidebars and footers — it’s surprisingly easy, literally select and delete.

chrillek · November 10, 2022, 9:12am

In addition: Perhaps the Medium people provide a useful CSS version for printing, so that using “print to PDF” directly from their site might be an option.

AlanRalph · November 10, 2022, 1:36pm

That is another option, though I’d highly recommend previewing the output first to see if anything gets missed or mangled in the process.

BLUEFROG · November 10, 2022, 1:41pm

I’m with you on this - and Medium has been a particular sore spot.

Capturing as a web archive. Doing a bit of cleanup in DEVONthink, if needed. Then Data > Convert > To PDF (One Page).

RuslanI · July 18, 2023, 5:25am

I’ve been having some major problems capturing medium articles with embedded Github snippets. I wanted to try this approach and it seemed to work, but then I didn’t.

So first I captured this article as web archive. When I opened it in DT, it took some noticeable time to load it (mostly the code snippets) but it finally did it. So then I deleted some unneeded stuff (mostly the footer with suggested articles) and converted it to pdf. When I opened the PDF in DT, there were two problems:

All the edits were gone from the PDF, whatever I deleted in web archive was there again
Most of the code snippets were missing from the PDF although they were all present in the web archive version of the article

Do you have any pointers on how to properly save such articles?
Preferably with my edits, but most importantly with code snippets.

cgrunenberg · July 18, 2023, 6:18am

A screenshot of the used settings would be useful. In the worst case exporting a PDF (single page) or printing a PDF (paginated) from Safari to DEVONthink should almost always work. Or saving a webarchive in DEVONthink’s inbox folder.

RuslanI · August 23, 2025, 10:54am

Where can I find the conversion settings to make a screenshot of?

Because this clearly does not work for me.

Here is the screenshot of the bottom of the article once I saved it as WebArchive on DT and deleted unneeded stuff at he bottom:

This is the same view of the PDF once I used Data → Convert → To PDF

As you can see, all the stuff I deleted is back again

chrillek · August 23, 2025, 11:24am

Medium is not a website but a JavaScript application. (That’s me being polite). You can delete all you want, opening the WebArchive will activate the JavaScript again and have it load (or create or whatever) the original page again.
Try convincing them to deliver HTML instead.