Most probably just the Vision framework. Not very secret, not bad, but not perfect by a far cry – it often gets the sequence of text chunks on the same line wrong.
If that’s the Textify available on GitHub, it is but an interface to Tesseract. I didn’t have the time to look into it. As @cgrunenberg and @MsLogica said, it may well be that it does not add a text layer to the PDF but just extracts the OCR’d text from the PDF to be used somewhere else.
Check out Use Tesseract to add a text layer to a PDF via OCR · GitHub on how to add a text layer to the PDF with Tesseract. Perhaps that gets you a better result.
Alternatively, there is this Installing OCRmyPDF — ocrmypdf 17.4.2 documentation
That’s strange. I can copy correct Arabic text from the file, and that’s what it means to me: that it’s “searchable.”
And yes, it seems I had the option “Settings > Files > Import > Recognition > Make text in PDF documents searchable” enabled. I’ve now disabled it, and I’ll wait for the results in future tests.
Thank you.
Yes, that seems to be what’s actually happening and I didn’t know about it! I thought that just because a file is accepted for searching in Preview, it means that it’s “searchable” in every other application!
Thank you so much for taking the time to search tools that can help me. I will try these tools when I have free time over the next two weeks and will let you know the results, as a way of sharing experiences.
I will try following the steps in the two links you provided, hoping to achieve the desired results.
Thank you.
I tested this, and after excluding more than 1.5 million words, the preferences file reached 50MB. As you predicted, DEVONthink became very slow and sluggish.
I have now restored my old preferences, and the app is back to its normal speed. I agree—massive exclusion is not the way to go. I will focus on better OCR for my Arabic files instead.