Why some static webpages are not clipped completely in pdf format?

chrk · July 10, 2021, 11:48am

I have no clue about any technical aspects of this, but I wanted to contribute that in my experience I can also confirm that Safari’s Export as PDF always takes a local snapshot of what is seen in the browser. This becomes apparent when disabling (local) adblock extensions, as it changes the resulting PDF. It behaves like one of those screenshot page extensions, but keeps links intact and is overall the best option for saving as PDF in my experience.

The PDF clipper options in DT don’t seem to happen locally, so the results are always different from what I see in the browser. This also applies to one-page PDFs, as there are sometimes banners, cookie notices or other annoyances visible. That’s why I never really use it any more, because I never know what I get in the end.

naitree · July 10, 2021, 12:26pm

I find it not always keeping links intact. Oftentimes some links are dropped, which makes it unreliable when links are important information on the page.

In my personal experience, there’s a way to specify the result of one-page pdf clipper, which is a 3 steps procedure:

clip the target webpage as webarchive to DT
open the webarchive in DT, make necessary edits, e.g., delete extra headers/footers/ads
clip the webarchive again into a one-page pdf (via gear icon next to the address bar).

In my experience, the result will always look identical to the rendered webarchive. Actually this procedure worked (for me) for every webpage that requires redaction and preserves links, except for these weird Tübingen pages which bumps onto my head today.

cgrunenberg · July 10, 2021, 12:37pm

I just clipped the above page as PDF documents. Paginated PDF used the print-style and single-page PDF the current layout as expected. The only issue cause the annoying cookie overlays as DEVONthink has to download & render the page on its own for clipping.

chrk · July 10, 2021, 1:27pm

Not ideal either.

I’ll try your other method too, but given its effort it wouldn’t be sustainable in my case, but good to have as a last resort.

Would be nice if Fireshot was available for Mac.

BLUEFROG · July 10, 2021, 1:47pm

While we don’t do support for others’ apps, maybe look at…

chrk · July 10, 2021, 2:05pm

Thank you. Unfortunately, this has the same issue as DT’s PDF clipper in that it does not capture a local WYSIWYG version PDF, like Safari’s Export to PDF option does or utilities like Fireshot do. That makes it as difficult as DT’s own clipper because of elements like cookie notices and other pop-ups obsuring content that result in unusable PDFs.

BLUEFROG · July 10, 2021, 2:12pm

This is what I got in Paparazzi…

Section 2 | University of Tübingen.pdf.zip (94.5 KB)

chrk · July 10, 2021, 2:20pm

Looks good and keeps links intact. It’s the same result I got from Safari’s Export as PDF.

This could be an option in cases when Safari’s export doesn’t keep links and the webpage doesn’t have ads, cookie notices or other junk pop-ups.

If the website has ads et al., Paparrazi will also have them and will be less useful compared to the Export as PDF option in Safari, as long as an adblocker like AdGuard is used in Safari.

It will be interesting to see if something better can be achieved with Shortcuts on Mac in the future.

naitree · July 11, 2021, 12:52am

Thanks for helping @cgrunenberg.

Would you mind check it again that all 3 example pages were clipped completely? Specifically, in my case, the single-page pdfs of Section 2 | University of Tübingen and Section 3 | University of Tübingen do mostly preserve the screen layout, but if I scroll to the bottom of the clipped pdfs, I can see the pages were cut off, they are incomplete. Can you confirm your clips are complete?

For Fachschaften | University of Tübingen page, the screen layout is not preserved. I see on screen a two-column layout full of links, but the clipped one-page pdf has a single-column layout, much like in Print, but not exactly. In addition, the page was also cut off at the bottom, it’s incomplete. Can you confirm your clip does preserve two-column layout and is complete?

Here are my resultant pdfs. Note that I opened urls in DT, so cookies popups have already been dismissed.
Section 2 - University of Tübingen.pdf (125.2 KB)
Section 3 - University of Tübingen.pdf (253.5 KB)
Fachschaften - University of Tübingen.pdf (148.2 KB)

cgrunenberg · July 12, 2021, 3:05pm

I can confirm that the pages are incomplete, this seems to be an issue of the WebKit’s PDF generation. We’ll check if it’s possible to work around this (but quite often it isn’t in case of the WebKit).

naitree · July 13, 2021, 7:44am

In either case, thanks for the effort.

naitree · August 6, 2021, 12:06pm

Today I just discovered some wikipedia pages I’ve saved in one-page pdf format are incomplete. So apparently the bug would potentially break PDF (one page) clipper for many sites as far as I’d say it’s unreliable to use in its current state. (Maybe WebKit is to blame, but I don’t know.)

Here’s an example clipping Apple Inc. - Wikipedia.

If you compare the original link opened in DT’s browser (NOT in external safari) and the clipped pdf, you can see they are almost identical except for the “References” section, where the three-column layout was stripped to single-column layout, resulting in page cut at the bottom of the page.

At the start of References section in original page, a 3-column layout:

At the start of the clipped one-page pdf, a single-column layout:

At the bottom of the original complete page:

At the bottom of the clipped incomplete one-page pdf:

cgrunenberg · August 6, 2021, 12:26pm

Thanks for the feedback but it’s still the same WebKit issue unfortunately.

naitree · August 6, 2021, 1:47pm

Just realized a workaround could be to firstly narrow down the DT browser window so that multi-column layout would automatically reduce to single-column layout then clip to pdf. As I tested it, this way the clipped page was complete.

Edit: Sometimes the layout won’t reduce to single-column no matter what. But the WebKit-clipped page always uses single-column layout. Apple must be a master for creating bugs, that’s why it’s called bugOS…