Webpage to pdf version can not show long line code

chamjam · February 22, 2024, 7:17am

I want to save a web page to a static version, I search the community and people says saving the web as a pdf version. But in the pdf, any code that exceeds the length of the code block cannot be displayed. like this:

any way to solve this? or there’s another way to save a webpage permanently?

chrillek · February 22, 2024, 7:27am

No. Code in HTML doesn’t reflow. And in PDF, nothing reflows.
You could try printing to a landscape PDF from your browser.

mhucka · February 22, 2024, 3:52pm

I snapshot a lot of web pages, and I use PDF in pretty much all cases, and consequently, have often run into the same problem as you. I concur with @chrillek wrote: PDF is problematic for cases like this.

You may already have realized this, so my apologies if this is repeating something you already know. One way to think about the situation is that the PDF format describes a snapshot of what you see. If the content is cut off, the snapshot captured in PDF form will be as well. So, one way to try to avoid the problem is to manipulate the original source before capturing it in PDF format. Rotating to landscape format may be enough for some cases; shrinking the page zooming/magnification in a browser sometimes can be another way (doesn’t work in all cases, of course, but sometimes it makes enough of a difference). Some web pages may offer additional controls over the way content is displayed, for example, by letting you change the “theme” of the page. Some pages are also available in different variants. For example, for a repository in GitHub, if it uses GitHub Pages, the same content you come across in the G.P. may be available (and presented differently) in the repository’s README file. Checking both might reveal that one or the other format wraps the lines differently.

But sometimes it’s just impossible to make it work, because the authors of the source web page did not arrange the content in a suitable way. My own approach for those cases is to save the text somehow, either by using some of the text-format capture facilities in DEVONthink, or saving the page source as HTML.

If you save a lot of things that have this problem (e.g., if you’re trying to save lot of things from GitHub), it may be best to work out alternatives to using PDF. For example, if you want to save the content of a GitHub gist, I would probably get the raw text instead. For repositories, you can export entire repos as zip archives (there’s a button for that right on every GItHub repo’s front page), or you can use command-line tools (e.g., github-backup).

chrillek · February 22, 2024, 4:04pm

Printing to PDF might sometimes give better results than clipping if the site’s authors have provided a print CSS. That might exclude unneeded parts of the page and/or rearrange the content so that it fits better on print.

chamjam · February 23, 2024, 7:09am

Your reply was so detailed, thank you very much. It seems that there is no good way to deal with this problem at present. I am going to use the web archive to save web pages first. hope that devonthink can support a permanent storage of web page in the future.

chamjam · February 23, 2024, 7:14am

thanks for your reply. I tried this, sometimes it’s a good choice to save web pages. but not suitable for all pages, and the operation is a bit complicated (I have a lot of pages to save). so for the time being, I’ll use web archive to store them.

rfog · February 24, 2024, 8:11am

When the direct capture to PDF generates a file I don’t like, I use to capture into Formatted Note, then manually change, normally removing endings or some remaining commercial stuff, adding not captured images, changing videos for the URL and so on, and then convert into PDF.

(And that is the reason I have the DEVONSave Shortcut combination not to automatically convert into PDF in DTTG when DT generates the capture. It allows me to edit the resulting capture).