How to correct papersize of a PDF

Hi to all,

I just bought a Brother ADS-2700W document scanner, which works really fine and can be integrated nicely with DT3. But now, I find some suspicious thing: The PDFs from the scanner have a papersize of 122.8cm to 86.7cm!!! (yes, centimeter!!!). I did not find any cause for this wrong size (apart from the resolution: 150dpi halfs this size… but for a better OCR result, I always scan with 300dpi), but I was thinking, if there is perhaps some tip/script/whatever, which solved this problem within DT3… or perhaps someone knows this problem and has some hint for the Brother scanner.

Thanks for any suggestions/hints in advance
Ulrich

Is the paper size wrong in Preview too or only in DEVONthink?

It is wrong in Preview, too… The papersize is definitely wrong from the Brother PDF.

Which scanner software do you use?

I installed only the scanner driver shipped with the Brother ADS-2700W, and for configuring, there is a tool named “Brother ADS-2700W Remote Setup.app”. Then, I only hit the scan button on the document scanner and then everything worked automatically…

And these are the latest versions of the driver/software? Because this sounds like a major and obvious bug that should be easily fixable (well, for Brother…)

Yes, unfortunately, it’s all new… I downloaded/installed the versions yesterday…
If I change to Multipaged-TIFF, and let DT3 make the PDF-conversion and the OCR via “intelligent rule”, then everything is ok, but that worked only for black&white… for greayscaled and color, multipage-TIFF is not available…

Are TIFF and PDF the only supported formats?

For greyscale and color, there is JPEG, but that produces a JPG-file for every page, which I then have to combine manually in DT3, which is not an option at all… PDF ist the only format, which is available for every 3 modes (black/white, greyscaled, color).

there is JPEG, but that produces a JPG-file for every page, which I then have to combine manually in DT3, which is not an option at all…

Are you referring to when scanning in DEVONthink’s Import > Image Capture?

No, everything is done from the document scanner… the scan is started via button on the scanner, the scanned file (PDF) is transferred to the inbox and there it is finally OCRed. Really nice workflow, apart from the issue, that the papersize of the PDF is wrong…
I opened a ticket at Brother website, perhaps it is a bug in the newest driver… but I was hoping, that there is perhaps some script/rule/whatever to change the papersize in DT3, what I could use as workaround…

Where are you seeing the paper size reported?
PM me a PDF and the size it’s supposed to be.

Thanks.

The file is actually the correct sized page.

3 0 obj
<< /Type /Page /Parent 4 0 R /Resources 7 0 R /Contents 5 0 R /MediaBox [0 0 2480 3508]
/Rotate 0 >>
endobj

2480 / 300dpi = 8.27in or 210mm
3508 / 300dpi = ~11.7in or 297mm

This is being misreported. It is reporting the MediaBox values / 72 points per inch instead of the scan resolution.

So…
2480/72 = 34.44 in or ~87cm
3508/72 = 48.2 in or ~122 cm

Here are the values from a US Letter PDF printed from TextEdit…

2 0 obj
<< /Type /Page /Parent 3 0 R /Resources 6 0 R /Contents 4 0 R /MediaBox [0 0 612 792]
>>
endobj

And indeed 612 792 is 8.5 x 11 at 72 points/inch.

That’s very interesting… The MacOS Preview-App also shows the “wrong values”, exactly the same values DT3 shows in the information-panel.

But, if I open the file in the preview-app and want to print it, it is “full-sized” to DIN A4 (because of the option: scale to printsize). That is ok for DIN A4 pages, but smaller scans (DIN A5 or smaller) are also printed DIN A4 fullsize, because they are also much “bigger” than DIN A4 with their wrong values.

So, if I get it right, the correct values for the PDF MediaBox would be 595,2 x 841,92… these are the values I found in “correct created” PDF files in the DIN A4 format.

But I think there is no short Applescript, which could correct these values as long as the PDF-files from the document scanner are created with the (size * scan-resolution) values? :slight_smile:

The MacOS Preview-App also shows the “wrong values”, exactly the same values DT3 shows in the information-panel.

Well, use also use Apple’s PDFKit. :slight_smile:

these are the values I found in “correct created” PDF files in the DIN A4 format.

Where are you seeing this?

And no, you wouldn’t be seeing centimeter, millimeter, or inches for the MediaBox. In fact, no differently than any raster image is represented, the pixel dimensions are the important values here.

<Where are you seeing this?>

For example, if I print this webpage here to a PDF, then I found the following values:

<< /Type /Page /Parent 3 0 R /Resources 6 0 R /Contents 4 0 R /MediaBox [0 0 595.2756 841.8898]

So finally, the error/problem is caused by the wrong PDF-creation of the Brother-Scanner-Driver, which reports (size * dpi) instead of (size *72), right?

Interesting… it’s a localized value. I show mine based on points: 612 792 which equals 8.5 x 11 (US Letter).

And this is also different since you’re printing to a PDF. That is a different mechanism than scanning. Scanning involves a resolution of the scanned image. Printing is defined by a chosen page size.

For example, here I have printed this webpage with a page size of 3 x 5

And the MediaBox correctly reports: MediaBox [0 0 216 360].

So in this case it’s not an apples-to-apples comparison of the method for generating the PDF.

So, meanwhile, I have an answer from Brother. They say, that the functionality “Scan to PC” uses the OSX Quartz PDF engine. But they say, that this Quartz engine seems to be corrupted or manipulated, because their internal tests have no such mistake.

That’s a pity, because I don’t think, that the Quartz engine is damaged, because the PDF-creation from OSX always works fine…