Hi everyone, today when I was clipping some static webpages under University of Tübingen, I realized some pages are not saved in complete form using pdf format.
When clipping, I performed the following procedure:
Clip the Safari webpage using Sorter as a bookmark
Open the bookmark in DTP’s builtin browser
Check the rendering looks same as in Safari
Clip by clicking the gear icon next to the address bar, choose PDF (one page)
Normally, if I understand correctly, the saved pdf should look basically same as in browser. But not this time. The pages were not clipped completely, and some layout was wrong.
Note that all linked pages are basically static. They renders normally when javascript is disabled.
I also tried to firstly save pages as webarchive, then convert to pdf (via context menu) or clip as pdf (via gear icon) with no avail.
Hi @mksBelper , thanks for replying. I just read the linked thread. But my case is different. All pages that I’m having problem capturing are basically static, without dynamic JS loading content. I can confirm all 3 pages renders correctly when javascript is disabled (in DTP and Safari). So it’s not a problem of dynamic loading.
Understood. If you search for ‘web clipping’ you’ll find a number of parallel discussions, the totality of which leads me to believe that improvements across the board in this area/functionality are in the offing in future releases.
If I take DT out of the equation with, say, the Fachschaften | University of Tübingen link and Export it as a PDF from Safari, I get the same result as you do in your (146.8K) file.
That makes me wonder whether perhaps any export/conversion to PDF will always result in the format you’re seeing.
Some resource (CSS, image) - as opposed to a (java)script failing to load, perhaps?
How did you export as PDF from Safari? I just tried to export from Safari via File > Export as PDF menu item, and it produced a complete page, except most links are not clickable anymore (probably a glitch of Safari PDF export).
All this is a function of the tool which converts the web page to PDF. I vaguely recalled Adobe Acrobat could save as PDF with active links. Ino longer have that product in use or installed on my Mac so I cannot test for you. I found Converting web pages to PDF, Adobe Acrobat which suggests that it does it along with a lot of other things. Perhaps give it a try if retaining links is important to you. Adobe Acrobat surely can be integrated with DEVONthink.
If by “static” you mean “not depending on JavaScript”, you’re right. But the guys and gals in Tübingen went a step further: They added media queries to their style sheets. So that when you print e.g. Section 2 | University of Tübingen the layout is drastically changed (for example, there’s only one column now).
Although the PDF output does not look like on the screen, it probably looks exactly line Uni Tübingen’s web developers wanted it to look in print. In my opinion, this has nothing to do with DT. You could try to disable the meda queries in the style sheet. Or save as HTML or MD.
Thanks for suggestion. I just tried the Acrobat way of doing this. It… kinda works, with various other major/minor annoyance (e.g., images are lost for whatever reasons), just like every other half-made Acrobat features
Adobe Acrobat has features to pull in web pages to PDF. I sort of recall they can do entire web sites also, but I’m not sure and can’t look for sure. Nor interested!!!
On a related note: Trying to capture HTML 1:1 in print is a hopeless idea. Think of animated gifs. Think of dynamically generated content. Think of transparency. CSS animations… whatever. Print has fixed dimensions, screens have … not really, given that you can shrink/grow browser windows, change font sizes etc.
Screen is screen and print is print. That’s what some people accept and that’s why they provice print style sheets.
I think that this idea has changed quite a bit. Of course, it is still possible for users to change their preferred font and font size and have them override the ones set by a style sheet. But browsers nowadays are very much following the orders of CSS and HTML. What (hopefully) has changed is the idea that a “pixel-accurate layout” is possible at all.
I’m agree with you and @rmschne that media query (@media print) does affect browser’s Print and DT’s Clipping as Paginated PDF. Actually they both output the same page layout and conform media query as expected.
But I’m not so sure that media query should affect DT’s Clipping as One Page PDF. As I understand it, it should generate pdf just like Safari’s File > Export as PDF, which always generates pdf that looks identical to how the webpage is rendered on screen, like a snapshot.
Because in this case Safari’s Export as PDF does generate pdfs that are complete and identical to on-screen rendering, I’m wondering maybe there’s a bug in DT’s implementation of Clipping as One Page PDF.
The differences between “Export as PDF” and “Print … to PDF” have been discussed already seven years ago:
It seems that Apple decided to built something into their browser that is … let’s say “peculiar”? If one prints to PDF, one can be fairly sure that the result does not depend on the browser. With this “Export to PDF” thingy, all bets are off. Does it ignore media conditions in style sheets? All of them? Some of them? Who knows. One of these Apple black boxes, it seems.
Pagination is not really the question here, given that many browsers do not implement all @page requests correctly anyway.