OCR'ing a PDF+text document mangels the document

chrillek · March 18, 2020, 10:44am

Hi,

I have a document of type pdf+text that gets completely mangled when I run it through DT3’s OCR. Since it’s a bank statement, I’m not going to upload the original PDF but only the result of the OCR process. Kontoauszug_1063145302_Nr_2019_008_per_2019_12_30 Kopie.pdf (12.3 KB)

Note: in this case, the document was PDF+Text in the first place, i.e. when I downloaded it from the bank. If I try OCR’ing a PDF in DT3 that was already OCR’ed before in DT3, it works ok – the file is legible and apparently identical to the first one.

cgrunenberg · March 18, 2020, 10:56am

The next release will include an updated engine. However, without the original document it’s hard to tell whether this engine will fix the issue. If you’re interested in a beta just send me an email.