OCR'ing a PDF+text document mangels the document


I have a document of type pdf+text that gets completely mangled when I run it through DT3’s OCR. Since it’s a bank statement, I’m not going to upload the original PDF but only the result of the OCR process. Kontoauszug_1063145302_Nr_2019_008_per_2019_12_30 Kopie.pdf (12.3 KB)

Note: in this case, the document was PDF+Text in the first place, i.e. when I downloaded it from the bank. If I try OCR’ing a PDF in DT3 that was already OCR’ed before in DT3, it works ok – the file is legible and apparently identical to the first one.

The next release will include an updated engine. However, without the original document it’s hard to tell whether this engine will fix the issue. If you’re interested in a beta just send me an email.