OCR Quality with ABBYY engine vs FineReader app

I know that @aedwards is working with ABBYY on the FineReader engine problems that have been widely reported on this forum.

Just to provide a bit of perspective and suggest an area for improvement, I OCR’d the same file (a statement provided by my bank as a PDF without a text layer) with the FineReader app and with DT3’s ABBYY engine. Visual quality is identical to my eye, but the DT3 result was 3.9MB, the FR result, 138.4KB. DT 3.5.1, FR 12.1.13.

I know that the engine is not the full version, but I would hope that ABBYY can bring the performance closer in terms of file size without sacrificing quality.

FRP settings (scripted):

languages enum langEnglish saving type saveSettingsSingleFile 
image quality imageQualityBalanced 
export mode pdfLayoutTextUnder with keep page numbers headers and footers and keep pictures

The larger file sizes are usually when the file has been resaved after the OCR which can be due to:

  • Transferring annotations
  • Adding metadata

ABBYY has better compression of PDF’s than Apple’s PDFKit however the next update has fixed an issue where the PDF file was resaved when it was not needed to be. So that should produce a similar file size for your test file.

Thank you, Alan. Looking forward to the revision.

Alan, if you are in charge of OCR for DT, take a look to MRC compression under Bug (Big) Sur. It completely fails to show any PDF compressed with MRC… Even apps using other SDK different than Apple one (like PDF Expert), fails to show any text, only backgrounds. @aedwards

Thanks I will take a look

1 Like

@aedwards, Beta 3 resolves the issue with MRC compression.