Page/display size weirdness after OCR conversions

Hoping someone can explain this…

I regularly photograph books and documents in order to import them into DEVONthink 3 with OCR.

I export the images from Apple Photos in full size, with resolution typically in the region of 3000 x 4000 pixels.

If I use either TIFF or JPEG as the export format, the resulting PDF+text documents (after being imported into DT3 with OCR) appear very small at “actual size” to the extent I have to zoom in quite a bit just to read them — even though the inspector reports them to be around 17 x 23cm in size (which isn’t that far off A4 size in printing terms). Interestingly, zooming in reveals quite a lot of detail.

However, if I export from Apple Photos in PNG format, the resulting PDF+text documents in DT3 appear much much larger at “actual size” — the inspector confirms this as they’re typically 100 x 140 cm! Yet, weirdly they’re about the same 75KB in storage size and when I zoom the text to the same size as the documents originally exported as TIFF or JPEG they look similar in image quality.

I’m very confused about what is happening here and why PNGs are treated so differently by DT3??

I THINK the answer is in someway related to the behaviour I see in Apple’s Preview app where there’s a setting in Preferences where you can “Define 100% scale as” either “1 point equals 1 screen pixel” (which is DT3’s default behaviour with TIFFs and JPEGS converted to OCR’d PDF) or “Size on screen equals size on printout” (which is the size I kinda expect to see and displays as the same size as the PNGs imported into DT3).

My head hurts!

Could you send or provide a link to a before and after sample as I have been unable to reproduce the same behaviour.

Yeah sure - I’ve described the process (with screenshots) in the attached PDF document…

Thanks in advance for any assistance!

DTP images issue.pdf (4.2 MB)

I’ve attached here the “original image” (exported from Apple Photos without any export conversion) so you can try and export it from Photos yourself using TIFF and PNG. (Shows as an image in this attachment)

I’ve also attached the resulting OCR’d PDF files - you can see even in Apple Preview the difference in sizes (though they’re both about the same in KB).

PNG version.pdf (74.3 KB) TIFF version.pdf (71.4 KB)

Thanks for the files. The reason the PNG version appears larger is the page size of the PDF is 10 times larger than it should be. As yet I haven’t been able to reproduce the issue by OCRing a png converted from original jpg, so I am not sure whether the glitch occurred on export from photos or when the PDF file was generated by the ABBYY OCR. I will talk to ABBYY about this however it is unlikely that I will get any answer until the new year.

Do you know if there was a resolution to this in the end? Thanks so much.

I’m having the same problem with OCR’d PDF files. The uploaded file “View Corruption 1” shows the location of a rectangular anomaly on an OCR’d PDF file. The “View Corruption 2” file shows how the text in the anomaly moves by cursor input independently of the text outside the anomaly.

DT3.0 is the only app out of 7 PDF reading apps I have that has this problem. The forum shows this problem has been known by the good DT folks for six months, yet there seems to be no solution either in sight or in process. Please fix!

DT3.0

The latest version is 3.5. Which version of macOS do you use as DEVONthink does neither render the pages nor handle the mouse/text input on its own?