Apple "Live Text" / Vision Framework OCR

chrillek · February 5, 2022, 10:25pm

Right. I’m aware of the abilities of the vision Framework for short pieces of text, and that’s what I use, too.
But for a real document, I have no idea. And I haven’t heard anything about it in this context. For example, a normal two column text from a newspaper.
Also: does it work locally or does it require an Internet connection?
Edit: Answering this last question, Apple claims that everything happens on the local device. And Apple also keeps suspiciously mum about anything remotely resembling OCR in documents. They’re only talking about images, and only about apparently very short sequences of text (business cards and the like).

And just for the heck of it, sample some JavaScript code that uses the vision framework for text recognition. I tried that with 2 JPEGs, i.e. real fotos. Not with any PDFs yet. The script can be run with osascript -l JavaScript \<filename> or copy/pasted into Script Editor and run there.

I tried it out with a JPEG that was converted from PDF in Preview, and the results where actually quite good. Only a table derailed it a bit, but that is to be expected. So the script could be used for OCR, but it would require amendments for PDFs: They usually consist of more than one page, and the script would have to loop over all pages, converting each one to a NSImage object and then running character recognition on it.

ObjC.import('Foundation');
ObjC.import('Vision');


(() => {
  const error = $();
  
  const directory = "/Path/to/folder/with/images";
  const images = ["Image1.png","Image2.png"];
  images.forEach(i => {
    const path = `${directory}${i}`;
    const fileURL = $.NSURL.fileURLWithPathIsDirectory(path, false);
    
    const request = $.VNRecognizeTextRequest.alloc.init;
    request.setRecognitionLanguages(ObjC.wrap([$.NSString.alloc.initWithString('de-DE')]));
    const reqArray = $.NSArray.arrayWithObject(request);
    
    const imageRequestHandler = $.VNImageRequestHandler.alloc.initWithURLOptions(fileURL,{});
    
    const success = imageRequestHandler.performRequestsError(reqArray, error);
    if (!success) {
      console.log($(error.localizedDescription).js)
    } else {
      const successArray = request.results.js;
      successArray.forEach(segment => {
        console.log(segment.text.js);
      })
    }
  })
})()