Automatic page fit to width in continuous scroll view

nsflanagan · September 16, 2023, 2:09pm

As a feature request, I would like to see a display option for resized pages in continuous scroll.

80% of the documents I work with have been been OCR’d by DT or SnapScan from archival documents or books. Some of the originals were printed in odd page sizes, and the OCR process produces irregular page sizes on its own.

As suggested in some earlier topics, I have been using single page view but it’s a drag when skimming or trying to interpret context of search hits. Similarly, I have tried the old kludge of printing to PDF to resize and:

Using traditional page sizes is silly because I don’t really plan to print any documents in my database;
Variance of length is more common so many pages are shrunk to fit the vertical direction, which I don’t want, or I have to use legal/tabloid size and then crop the bottoms off some pages;
It prints with a white border which I then have to go switch to black, because the text on many documents are faded (even after processing) and white washes them out.

And more importantly, this is really just a UI issue for me! Having all the pages automatically fit to width in continuous scroll is definitely a desired feature.

BLUEFROG · September 16, 2023, 3:11pm

The request is noted, with no promises. It’s the first of its kind I recall receiving so it doesn’t appear it’s troubling many people. Also, I’d wager most PDFs people handle or generate are not made of varying page sizes. Just something to consider.

nsflanagan · September 16, 2023, 4:22pm

I won’t dispute that others aren’t bothered but it’s happening on basically everything I scan, so I don’t think it’s a truly rare problem. I find DT’s OCR program tweaks page sizes slightly even for documents I’ve put through a flatbed scanner. It looks janky, at least.

I think it would be valuable for anyone doing archival work. Thanks!

Ryan_N · September 18, 2023, 6:37pm

I too would love to see buttons in the header to auto-fit the document to view either by page height or page width. This need is what compelled me to buy Apple’s trackpad a year or so ago .

nsflanagan · January 19, 2024, 1:06am

I think I have identified the cause of this issue and found a workaround that works well enough.

I was taking image files (jpg and png) and running them through DT’s OCR. It seems like the deskew function was tweaking the size of the images as it imported them, sometimes wildly. I guess this means that deskew cannot be disabled for images… perhaps this is a bug? On the other hand, I find that the deskew function does turn off when running OCR on PDFs.

What I do now is three steps:

First, I batch process the files in a photo editing program. I use Affinity, it’s good enough for this kind of work. I adjust, compress, and set a uniform fixed width through its macro interface.
Second, I convert to PDF in DT.
Finally, I run OCR with deskew turned off (via settings).

This produces consistent document sizes and saves time in the end since I don’t have to redo work or get into nitty gritty editing. With a cropping step, this approach also yields good results for documents I captured with a camera or my phone—which is the only option in some archives.

The one downside is that without deskew, the OCR is less able to read words or form lines, particularly with handwriting.

I hope this is helpful to someone in the future.