I have some PDFs saved from webpages as one long page. teh look like this:
Can I convert them somehow to a paginated PDF where each page has A4 dimensions?
I saw the topic Converting non-paginated into paginated PDFs, but it hasn’t received any update since 2016.
Still not possible to do this? Or has anybody found a way do do this outside of DEVONthink?
(No pressure, I’m just checking. I know there are lots of things to be done…)
A PDF is basically a sequence of drawing operations on one or several pages. Imagine you have a long sheet of paper, and you draw a bunch of tiny images on them, scribble some tiny text etc. Now you want to
- enlarge all the images and text because they are tiny and difficult to read
- and you want to create a set of fixed-size pages from this single sheet.
You can use a copying machine and make an enlarged copy. Or take a photograph and enlarge that – there are many options for enlarging. But where do you cut the large sheet? Yep, you’re human, you know where an image starts and ends. Or what a paragraph of text or a table is – you wouldn’t cut through them.
A program? Not so much. All it “sees” are drawing instructions. It has no concept of text or tables. Perhaps of images, but only if they are simple – imagine a circular photo with text floating around it …
I’m simplifying here, but the principle is that there are no objects in a PDF (at least not reliably). So, a program wouldn’t be able to figure out where a new page should begin. Therefore, you shouldn’t get your hopes up that this feature will arrive anytime soon.
Unless, perhaps, one of the AI fans trains their favourite app to do that…?
4 Likes
Cool, detailed explanantion. Makes total sense.
However the feature could just draw the long page on a virtual raster canvas and then cut this canvas (perferably with some overlaps) in pages.
These pages would be raster pages, but at least the needed resolution would be determinable beforehand, because it is for A4 output.
Surely not perfect but certainly better than to read a looooong one-page PDF…
Off the top of my hat, you could open the long page in Preview and use rectangular selections to copy (raster) parts of it into new pages.
1 Like
If it was a raster canvas, it would be resolution-dependent and no longer searchable.
1 Like
True and true.
However, I think, searchability could be possible if the new PDF gets first a text layer from the old PDF and then the raster (image) pages.
Have a look at Apple’s PDFKit. How would you go about creating the new PDF using that framework? Or Core Graphics?
This stuff is so limited and in the case of Core Graphics under-documented that the only way I see is writing your own PDF engine.
Or saving things as paginated PDF in the first place.
1 Like
If the single page PDF documents were clipped from the web, then they should have a URL and it might be possible to capture them again in the desired (paginated) format.
1 Like