Chars copied from a pdf changes when pasting into 'Plain document'

Hi
I have a PDF document with non-English characters. I want to copy and paste part of the text into a new, plain-text document. Then, the non-English characters change.

This happens only when the source document is OCR’ed inside DevonThink, not to documents born as PDF/A.

Any suggestions of how to deal with this?

It’s unclear what’s happening without a copy of the document. Does the same happen when using e.g. Preview & TextEdit?

OK - here is a ‘before’ and ‘after’:
DT original
DT plain text

A technical note: The words you see in the document don’t exist before OCR, except in your mind. It’s a picture of words. OCR is never 100% accurate and there is no perfect OCR engine.

That being said, what are your OCR preferences in DEVONthink?

Yes - I remind myself that this is the case: no words but pictures of words. However, this happens only with some OCR-ed files. Others seem to be OK.

It is not scanned texts (by me), but downloaded files from the internet. Files, which must have been scanned at a point.

Can you ZIP a problematic PDF, then start a support ticket and atrach the PDF for us to inspect? Thanks!

Try this Shortcut to “extract” text from a selected area of the screen, utilizing one of macOS’s built-in services. From my experience working with Chinese text, this shortcut always gives better results than the text layer of the PDF itself.