ETA for Gemini option for transcription?

soligram · December 11, 2025, 9:12am

Hi folks. It seems that Gemini is currently available (and I’m using it) as an option for all the AI features except transcription. The Apple Speech option isn’t very accurate for the videos I have added to DT - but Gemini is generally very accurate for me (I routinely transcribe client calls so that we can both have a copy).

I’m assuming (please correct me if wrong) that Gemini for transcription is on the roadmap - presumably there’s not much implementation (or not much different from the other use cases). I’d be very grateful for an ETA before I go off researching Voxtral and Whisper, signing up for accounts, etc.

TIA.

cgrunenberg · December 11, 2025, 9:15am

Thanks for the suggestion but there are currently no plans for additional transcription services in the near future. Maybe later depending on common demand (as we can’t add and support every option out there) but no promises.

Did you use the local or remote Apple Speech transcription? And which version of macOS do you use?

soligram · December 11, 2025, 10:05am

I completely understand that - I made the assumption as Gemini is already supported for every other AI function in DT (which presumably means the already-implemented methods/classes and UI elements are relatively reusable), and so it seemed odd that it wouldn’t be implemented for transcription as well (especially given the recognised quality of its transcription capability, trained on probably hundreds of millions of Meet calls).

Apologies, I should have specified - the remote Apple Speech model. It’s really not great. I’m on macOS 26.1 and DT 4.1.1 on an M4 MacBook Pro.

cgrunenberg · December 11, 2025, 10:16am

What’s the spoken language of the videos? Did you choose the right one in Settings > AI > Transcription? Apple Speech doesn’t support an automatic detection.

soligram · December 11, 2025, 10:38am

US English (with that selected in prefs), and well-presented - they’re commercial training videos with clear, slow-paced, well-enunciated speech. The kind of stuff Gemini wouldn’t make a single misstep with. The remote Apple model just doesn’t seem very good. Lots of mistakes.