On a side note: The german version uses “Konvertieren/nach durchsuchbares PDF/nach Text”. I find the preposition a bit awkward and would suggest “in” instead of “nach”. Which is what DT uses on the desktop.
The OCR engine is Tesseract. The quality slider adjusts the level of compression applied to the image for each page when generating the final PDF file. In general around 75% gives a good compromise between file size and quality. Setting the quality value lower will make the PDF file size smaller but depending on the content it can make the text harder to read.
Thank you!
What would you say are the main differences between Abbyy engine on Mac and Tesseract on iOS? And can you recommend which one to prefer if both are available?
It does depend on the content that you are OCR’ing, however in general ABBYY OCR in DEVONthink 3 will have a higher percentage accuracy for text recognition across a range of different document types. This is not to say that Tesseract provides poor results it is just that ABBYY OCR can utilise more system resources on the Mac than would be allowed or is available on an iOS device.
A bit of a niche request, but since DTTG uses tesseract, would it be possible to add in the deu_frak/frk libraries?
I have many German documents printed in fraktur, and it would be nice to finally OCR them-- neither Abbyy Finereader Pro nor Devonthink Pro can decipher the typeface. Letting DTTG tackle the task intrigues me.
Out of curiosity: does tesseract recognize different gothic (if that’s the right term) fonts in general? I’m under the impression that Fraktur encompasses a huge gamut of character shapes. But maybe that’s not a problem or I’m mistaken?
No, generally OCR that’s not trained on fraktur does quite poorly. Take this sample text, for instance.
The first line reads
“Aktum Dienstags den 3. März 1891”
Devonthink reads it as
“Mtum Pitnfliigs öen 3. Miar? 1891.”
I expect that Abbyy Finereader Pro would come up with something similarly unhelpful, but it doesn’t work with my new mac. Tesseract with the Fraktur libraries might be more successful.
I would have thought that iOS could provide just as accurate results at a faster speed if the neural hardware on iOS is used. Perhaps in the future once abbyy is optimized for the new M1 Mac hardware, the kit should run fantastic on iOS as well. Either way It’s mind blowing to see PDF OCR on a phone in your pocket.
Fraktur is not included in the standard ABBYY supported languages and requires a specialist licence. I had seen that Tesseract supported Fraktur however as yet we haven’t had a chance to test it. We will be adding support for other languages in future updates and I have noted your request for Fraktur.
I assume you have a M1 Mac. In DEVONthink 3 the ABBYY OCR is working in Rosetta2.
As far as I understood their GitHub pages, Tesseract should run on your desktop system just fine. So you could actually find out how good the deu-frakt rules are on your M1 mac (should be blazingly fast I’m not so sure about the quality about their rules though. They said something about deu-frakt being updated last for version 3 something, 4 is the current one.
@aedwards I’m curious, why Tesseract instead of the Apple VisionKit framework?
I believe Vision is faster and more accurate than Tesseract but is iOS 13+