Is capturing PDFs from developer.apple.com broken?

I’ve been meaning to write that I’ve also started using the Export to PDF facility in Safari, after discovering recently that it started working again to produce unpaginated PDFs [1]. The output has been excellent, and the approach has the advantage of saving what you actually see in your Safari window (e.g., allowing ad blockers to do their thing). Your method adds scrolling the page in Safari before the export, which is a good idea for getting lazy-loaded elements to appear.

[1] I used to use the Safari export, then for some years had to stop because it no longer produced single-page PDFs – in fact, for a while I used a free utility called Paparazzi to get single-page PDFs, back in the days when I used Evernote (and I even wrote a utility to automate the process). Then at some point, the capability in Safari must have been restored, because I noticed only recently that Export to PDF works as hoped, at least on macOS 10.13.6 (which I know is ancient, and I don’t know if it works the same on later macOS versions). I haven’t changed my DEVONthink code yet, though.

I’m looking forward to trying out the new capabilities in the upcoming new version of DEVONthink mentioned by @cgrunenberg !

URLs of problematic dynamic websites not requiring a login would be great, thanks.

Could someone please check whether Apple Developer URLs display correctly in DEVONthink? (Don’t want to create a new thread and the question is somehow related to this one)

  1. Create a bookmark for https://developer.apple.com/documentation/foundation/nsstring?language=objc

  2. Open bookmark

What do you see?

That’s with and without JS activated (I generally have JS off in DT; turning it on and refreshing made no difference, the result remained the same).

Thanks! Seems like it’s necessary to restart DEVONthink before the change takes effect.

Here’s what I see since some time. When the page loads it’s white for a short time and then I see this:

If I scroll down I get this:

I’ve no idea what happens.

After capturing via Safari didn’t work reliable anymore (I suspect due to changes Apple made. Around the time capturing stopped working reliably Apple changed the site’s layout so I suspect this broke it) I developed a very nice script that I used for some weeks to capture from within DEVONthink. This worked perfectly, it was almost too good. Now it’s useless as I can’t properly view Apple URLs anymore. No idea why

I can capture the website you named, both as HTML and as a beautifully formatted PDF (both from Safari 14.1.2 using sorter

1 Like

The problem is that capturing Apple Developer documentation via Safari doesn’t work reliable (anymore) over here. I used to capture this way and it almost always worked but at some point it became very unreliable. Sometimes it took up to 5 or more times till I got a proper PDF :frowning:

No idea why you can view the site and I can‘t. Just rebooted and didn’t open any app but DEVONthink. Stil the same result.

(Sorry, yes, I’ve just scrolled up and read through this post, so I realise capturing successfully once is not worth much). For what it’s worth, I’ve just restarted DT (for you - only for you, Pete - I’ll have to reopen all my databases and enter all my passwords :stuck_out_tongue_winking_eye: [Edit: oh heavens, there’s a risk you won’t understand my humour - nobody ever does - so I’ve replaced “…” with “:stuck_out_tongue_winking_eye:”]) with JS active, and the bookmark loads no problem:

Ist something blocking things your end? Pi-hole etc.?

Again, for what it’s worth, I’m on macOS 11.5.1

1 Like

Wow, thank you! No worries, understood it :grinning:

There‘s nothing blocking as I turned off AdGuard‘s „start after login“.

I have a suspicion, but it’s probably very unrealistic. The script does the following:

  • extracts all links of PDF
  • searches if a URL is already in the database
    • if yes: replicate the record into a new group
    • if no: create a bookmark in this new group

This way I can select e.g. the NSString class’s PDF and get everything I already have and everything I could capture. I then look through the created bookmarks and capture what looks promising and afterwards delete the group.

The problem now could be (but again, probably very unrealistic) that I not only delete the group when I’m done. Quite often I deleted it just to get an updated state (in order to prevent capturing the same URL twice), i.e. see replicants of newly captured PDFs instead of the bookmarks. As this worked so well I captured some hundred PDFs and created a lot more bookmarks in the process. Did I maybe violate the terms of use? Doesn’t make any sense, does it? I can view the site in Safari …

Open the bookmark… in DEVONthink?

Light Mode

Dark Mode - Use dark background for documents enabled

Dark Mode - Use dark background for documents disabled

Do you have anything specified in Preferences > Web > Style Sheet?


PDF Captured from Safari 14.1.2 in Catalina and Big Sur…

Yes. It displays correctly in Safari and shows this strange behavior in DEVONthink.

No.

Only difference between your (and Blanc‘s) setup and mine seems to be

  • the macOS as I‘m still on Mojave.
  • I created a lot of bookmarks in short time

I have confirmed the behavior on macOS Mojave.

I’m curious why you’re not upgrading your OS.

2 Likes

Thank you very much!

I didn’t upgrade as I’m still using a mid-2012 MacBook Pro (you used one too, I think) and Catalina was not really what I wanted to use. I’ll get a new mac but don’t want to get it now as I don’t like to use the first generation (M1) of anything. Also hoping that a coming MacBook allows to use multiple external monitors without additional hardware.

1 Like

OK, I see the same in Script Debugger’s result pane:

Scrolling down:

@cgrunenberg any idea what’s going on? Maybe DEVONthink and Script Debugger using an older WebKit version? In Safari the page displays fine

Safari’s engine is not identical to the WebKit framework actually.

1 Like

Seems Apple changed the Developer site again.

It’s now possible again to view Apple Developer bookmarks in DEVONthink (on macOS Mojave) :slight_smile:

Hi, I’m not sure this is the right place but didn’t want to start another thread…
I’ve been having intermittent problems capturing web pages as PDF (using different settings) as DT3 captures the website as a bookmark, not pdf.
if I reboot my Mac, and try again, it seems to work.
not sure why this is happening…
macOS v11.6, M1 MacBook Pro DT3 3.7.2
workflow, browse web → use safari clip to DT3 extension (selecting paginated PDF usually, sometimes ‘uncluttered’) → web page captured as bookmark!
If I try using the sorter, I get the same result…
any ideas?
Joe

Yes, that currently seems to be the only option that works.

AFAIK the next release will make rebooting unnecessary (as it will restart those things that cause the problem automatically, I think).

1 Like

thanks Pete31

The next release will automatically restart its background process for capturing, a reboot shouldn’t be necessary anymore.