Add Automatic Audio Transcription -- Similar Approach As OCR

chrillek · February 20, 2022, 5:20pm

It is a limitation according to the spec – audio is processed in „segments“

AW2307 · February 20, 2022, 7:03pm

Yes, but in practice you select an audio file of whatever length and it just gets transcribed without interruption. The only limitation is that every page or so (1 minute audio) there are separators (===).

chrillek · February 21, 2022, 7:01am

In the meantime, I came up with a JXA script (that does not work). I post it since perhaps someone else might want to fiddle around with it and can get it to work.
Problem: The handler function (which should be called with the recognized “chunks” of audio by the recognizer) is never ever called. That’s probably due to some glitch (or feature?) in the ObjC bridge. Or it is simply not possible to use callback functions in JXA – who knows. It would be nearly trivial to translate that to AppleScript, if somebody knows how to get that to work as expected.

ObjC.import('Foundation');
ObjC.import('Speech');

function handler(result, error) {
  const transcription = result.bestTranscription.formattedString.js;
  console.log(transcript);
  if (result.isFinal) {
    console.log(`${result.bestTranscription.formattedString}`);
  }
}

/* Fix the path to a speech sample */
const fileURL = $.NSURL.fileURLWithPath(ObjC.wrap('/Users/.../OSR_us_000_0010_8k.wav'));
const locale = $.NSLocale.localeWithLocaleIdentifier(ObjC.wrap('en-US'));
const recognizer = $.SFSpeechRecognizer.alloc.initWithLocale(locale);
const request = $.SFSpeechURLRecognitionRequest.alloc.initWithURL(fileURL);
recognizer.recognitionTaskWithRequestResultHandler(request, handler);

pete31 · February 23, 2022, 3:55am

No idea as SFSpeechRecognizer is only available in macOS 10.15+ and I’m still using Mojave. But I’ll try it as soon as I got a new mac.

chrillek · February 23, 2022, 7:59am

Don’t get your hopes up. Quoting from “Everyday AppleScript-ObjC”:

The biggest no-go zone is methods that take what are called blocks as arguments. A block is a kind of in-line function, and so the methods expect something like a chunk of Objective-C code. There is no way AppleScriptObjC can provide that.

Alternatively, one could try to use the other recognitionTask method that requires a delegate. I tried that, too, but managed only to send Script Editor and osascript into lala land – they appeared to be doing something and never came back. Didn’t find anything on delegates in the a.m. book, though. I found this discussion in the Apple forums: AppleScriptObjC help button - Apple Community
Will try to figure that out.

chrillek · February 23, 2022, 3:12pm

Below you’ll find my probably very bad attempt at doing that in AS. Note that I try to use a delegate to process the speech. The code compiles and kind of runs, but none of the delegate’s methods is ever called.
Note that the task state changes from 0 (starting) to 2 (finishing), but … well, the problem seems to be the delegate. I guess that the method names are not ok, but I don’t know … Maybe it’s simply not possible because the Speech framework is so recent and ObjC is old.

use framework "Foundation"
use framework "Speech"

script MyDelegate -- this will contain any delegate methods
	
	on speechRecognitionDidDetectSpeech:task -- handle a help
		log "Detected speech"
		return true -- the help request has been handled
	end speechRecognitionDidDetectSpeech:
	
	on speechRecognitionTaskFinishedReadingAudio:task
		log "Finished reading audio"
		return true
	end speechRecognitionTaskFinishedReadingAudio:
	
	on speechRecognitionTaskDidHypothesizeTranscription:task transcription:t
		log "Hypothesize transcription"
		return true
	end speechRecognitionTaskDidHypothesizeTranscription:transcription:
	on speechRecognitionTaskDidFinishRecognition:task recognition:r
		log "Did finish recognition"
		return true
	end speechRecognitionTaskDidFinishRecognition:recognition:
	on speechRecognitionTaskDidFinishSuccessfully:task success:s
		log "finished successfully" & success
		return true
	end speechRecognitionTaskDidFinishSuccessfully:success:
	on speechRecognitionTaskWasCancelled:task
		log "cancelled"
		return true
	end speechRecognitionTaskWasCancelled:
end script

set filename to "/Users/ck/Downloads/OSR_us_000_0010_8k.wav"
set fileURL to current application's NSURL's fileURLWithPath:filename
log fileURL as string
set locale to current application's NSLocale's localeWithLocaleIdentifier:"en-US"
set recognizer to current application's SFSpeechRecognizer's alloc's initWithLocale:locale
set recognizer's delegate to MyDelegate
set request to current application's SFSpeechURLRecognitionRequest's alloc's initWithURL:fileURL

set task to recognizer's recognitionTaskWithRequest:request delegate:MyDelegate
set theError to task's |error|()
log theError
log task's state as string
tell task
	finish()
end tell
log task's state as string
set theError to task's |error|()
log theError

pete31 · February 23, 2022, 7:22pm

Never done that and can’t test, so I’m afraid I can’t help.

Huh? Where did you get the names from? Other than looking them up in XCode (or in the online documentation) I’ve no idea. sorry.

chrillek · February 23, 2022, 9:07pm

The names are from the documentation. And yes, I know you can’t test that now. Just wanted to get it off my chest.

wotiva · April 8, 2025, 3:37am

Interesting to see how long this has been thought about. Great solution.