I am not sure if this is a bug, but I have noticed that if I OCR a PDF to a searchable PDF and then select some text in the internal PDF viewer the copied text has words stringed together without spaces. Opening the same file in the external PDF Expert does not show this problem. Here is an example from the internal viewer:
“Mrs. Friedman tells me that she has met Mr. Foss and Iamlookingforwardtoseeinghimagain. Ialsohadhoped to get a glimpse of Mr. Hinsley, but so far he has been kept pretty busy.”
While in the PDF Expert the selection is like this:
“Mrs. Friedman tells me that she has met Mr. Foss and I am looking forward to seeing him again. I also had hoped to get a glimpse of Mr. Hinsley, but so far he has been kept pretty busy.”
The OCRed PDF file can be downloaded here:
The original PDF file, which has a text layer but with errors, can be downloaded here:
As a test I OCR’ed again this file in my own copy of ABBYY FineReader PDF, Version 15.2.14 (Build 1093333; Part # 1418.21) and imported that copy into DEVONthink 3. When I select the same text in this copy in the internal viewer I get exactly the same problem:
“Mrs. Friedman tells me that she has met Mr. Foss and Iamlookingforwardtoseeinghimagain. Ialsohadhoped to get a glimpse of Mr. Hinsley, but so far he has been kept pretty busy.”
In the PDF Expert the selected text is perfect.
One thing I see is that the file size of the internally OCR’ed copy is 443.8 KB, while the externally OCR’ed copy has a size of only 56.5 KB. The original PDF is 75.9 KB. I suppose this might be due to OCR settings etc. My settings for ABBYY FineReader PDF is Text under the page image, no compression and Balanced image quality.
That’s an issue of macOS’ PDFkit, it’s the same in Preview.app. PDF Expert uses its own engine.
Many thanks for the quick answer. That explains it all. PDFkit once again. My greatest wish is not for a new MacBook Pro with a M4 chip, but rather a PDFkit Pro.
You and us both.