I have about 30,000 page images in formats including 1-bit TIFF (scanned from microfilm), JPG, and PNG. Since upgrading to Ventura (13.0) I have been noticing that a Finder search (that uses Spotlight) will bring up images with the word or phrase that is typed, typeset, or many handwritten cursive or block character images.
What I have found is that this is related to Spotlight and so far it seems that it must be on the internal SSD (2 TB) drive of my MacBook Pro (2020, Intel 6-core i7). An external SSD is not indexed by Spotlight.
It is possible to open one of these images in Preview and choose the text selection tool and select and copy the text. As with most OCR, the result is so-so, especially with the handwritten. But what impresses me is that a search for a word like “estate” will bring up images matches which have the word, even in cursive, and even if it visually appears to be two words. When I copy-paste though, the identification of the correct letters is not quite there. Clearly some fuzzy searching is going on to bring up the file names in the search.
Since the external SSD is not indexed, I made room to copy 22 GB of image files to the internal SSD. After about a day I began to get searches. I don’t yet know if there is a way to see the progress of the indexing to know when all documents have been processed. But my keyword searches are bringing up hundreds of documents that contain the word which is a desirable feature. (Well, if it is not documented, maybe it is a useful bug.)
When looking for logs to show that a file was processed, I did a grep for a file that seemed to be indexed and found a reference in a binary log file in
/var/log/DiagnosticMessages
I don’t know how to read this normally but a grep did work so there’s some reference to a file there. I just don’t know what it means.
A conversation today with two levels of Apple support indicates that this is a feature in development and it is not well documented either internally or for developer or public consumption. Nevertheless, it is in the system, even for Intel Macs, and seems to work reasonably well.
Obviously this is a supplement to DT Pro. It is another tool to access our documents and may be of interest.
Sorry to reply to a year-old thread but it seemed to be the most relevant place to do so short of starting a new one.
James D. Keeline