DEVONthink 4.1 Mac mini M1 running Tahoe 26.0.1
Having added around 600 word to the OCR custom dictionary, I reran the DT OCR on 34 of PDF documents (selected all 34 in the group and then right-clicked for OCR and let it run)
Once OCR processing had completed for the whole group I reinspected the PDFs and was very pleased with the updated results from the OCR Custom Dictionary. However, one PDF in particular had an absolute corker of an OCR experience.
This is a section of a page from the PDF:
It’s not the best for OCR, but I would have thought that it wouldn’t have caused too much of an issue for DT4/ABBYY.
This is what t he text layer contained after the OCR process:
RGO a-y gyp!vmo ecfdyvBqh qw [m!’hmcBfT 2nh’yqc JmdBm!c MO nym-vT
mmryf B-y wqooqp!c’ qffyhamB!qcf hy’mhp!c’ B-y f!vrcyff !c B-y pydpBf pnh!c’
B-y vymh ncpyh hydqh
(The remainder of the text layer from this particular PDF is very similar)
When I reran the OCR process on just that one file making no changes to the file or any settings and not even restarting DT, I eventually received the following text layer:
The Medical Inspector of Emigrants, Surgeon-Captain A. Leahy,
makes the following observations regarding the sickness in the depdts during
the year under report
This PDF is not much different from the others in terms of size, page count or character/word count.
I’ve checked the remaining documents and they are fine.
What happened during the first OCR run on this document that was apparently fixed by the second OCR run?
Cheers!
dp
