Can't Create Web Archives with Sites that Require Login or Age Verification

Phileosophos · April 11, 2023, 4:24pm

I’ve read a few posts on this subject but none of the tips provided in any of them seem to work. In particular, I’ll use two recent examples I cannot get to work: (1) a Wall Street Journal piece on regulators missing obvious banking audit red flags, and (2) a piece by NRA ILA on “innovative” gun control.

The first of those two requires a login, which I have. I’ve used the suggested procedure found in other threads to no avail. In other words, I have created a bookmark in DT, then logged in so the content is all shown correctly, then performed a convert to web archive. The newly created archive item shows me only the headline and demands I log into the site to see more. I get the same behavior if I open the site in Safari and use the service/extension/whatever-it-is to capture to DEVONthink.

The second of those two refuses to show you any content if you live in California (as I unfortunately do) because one of the state’s absurd new laws requires the NRA to self-censor itself by age “for the children”. In short, one must always enter a birthday onto a blocking page to access the otherwise free content. I’ve tried the same couple of techniques and every web archive I create produces only the blocking page and none of the content.

Are there any other meaningful approaches for capturing web archives? This is disappointing. Thanks.

BLUEFROG · April 11, 2023, 4:28pm

VPN to a more reasonable state?

I’m physically in Michigan at the moment and I clipped the NRA page with no issue.

PS: I VPN’d into Los Angeles and see the obnoxious result you see…

Phileosophos · April 11, 2023, 4:40pm

I confess I hadn’t thought of using a VPN to mask my location. I’ll see if that works for the second piece.

BLUEFROG · April 11, 2023, 4:45pm

I confess I hadn’t thought of using a VPN to mask my location.

That’s why we’re here

Phileosophos · April 13, 2023, 8:39pm

The good news is that I was easily able to set my VPN to a less-idiotic state and capture the second piece. The bad news is that I still can’t find any way to make the WSJ piece work. Any other ideas for that? I find it stupidly irritating that 30+ years after I built my first web site there’s still no reliable way to capture a good snapshot and trim out all the garbage I don’t need, though DT comes about as close as anything to working. It’s positively disgraceful how incompetent the browsers remain even today at rendering a site to a simple PDF file or something.

rmschne · April 14, 2023, 5:54am

FYI, when I find the need to keep a copy of a WSJ article into my DEVONthink research database, I use their Print button to create a PDF and save that. I even will often compress that PDF to eliminate the bloat of the images. Saving to the DEVONthink Global Inbox makes it easy to integrate to DEVONthink.

WSJ, like many commercial web sites, do a lot of tricks and stuff on their servers that do not really facilitate the job of browsers or clipping by other apps. Not in their interest to make it easy to get their product saved by users, and they automate the heck out of publishing the sites.

BLUEFROG · April 14, 2023, 12:23pm

Glad to hear the VPN worked

Regarding the WSJ, I don’t pay money to any “news” site so I can’t specifically assist on it.

Maria · April 14, 2023, 12:34pm

Good to hear that there are people with similar problems.

I have the same problem with FAZ, NZZ, Süddeutsche Zeitung. No problems occur when I make an Apple Note from it or with Evernote. I tried many workarounds, but a nice Formatted Note or Markdown File from such a page would be nice. I am willing to take some systematic tests with the Süddeutsche if I know that it is a problem that is being worked on.

Cheers

system · April 13, 2026, 12:34pm

This topic was automatically closed 1095 days after the last reply. New replies are no longer allowed.