Video and Audio Transcription

Is there a way to let DEVONthink make transcriptions of audio and video files?

So they are searchable…

Ideally when you click on a sentence in the transcription you go to that point in the video or audio file…

It would be an awesome to DEVONthink itself. But I’m looking for a way to do it now.
Good Mac software that can do it on-device?

Maybe an automation between that software and DEVONthink?

1 Like

Welcome @pjc9

Sorry but no, DEVONthink will not transcribe files for you.
Using an annotation file (See Tools > Inspectors > Annotations & Reminders), you can create your own transcriptions.

Maybe an automation between that software and DEVONthink?

This depends on whether the third-party application has provided any inter-application communication functions. Many apps nowadays don’t.

There are several online services (s. e.g. Add Automatic Audio Transcription -- Similar Approach As OCR) but DEVONthink doesn’t support any of them currently.

Automated audio/video transcriptions would be incredible. +1

1 Like

I worked with video+annotation file in last months. A transcription with links would be nice but I‘ll be also happy if insert backlink would result into an link that will jump to the current timestamp ;D

What kind of file did you annotate?

This is already possible using the Annotations button > Insert Back Link command.

1 Like

Oh that’s nice! Thanks for the heads up.

You’re welcome :slight_smile:

would it be very hard to tap into Apples Speech Framework for parsing audio files?

I know, the recognition is not the greatest, but at least it would help finding content.

1 Like

Define „hard“. Afaict, it is not possible with scripting (at least not for me). So someone would have to write a Swift (or perhaps Objective-C) program.

There might be examples of such programs available on the web. I didn’t check, though.

Welcome @hansdorsch

I can’t speak to the level of difficulty, however “the recognition is not the greatest” - if an accurate assessment - is not a standard we’d be comfortabe with. We strive to provide a much better experience than “not the greatest” as much as we are currently able. The other issue is, a poor implementation would increase support, not only for tech but development as well.

That all being said, we appreciate the suggestion and will take a look at it.

Maybe, I shouldn’t have mentioned the thing with the recognition. because there is no 100 % automatic transcription – and probably never will be.

For me, the level of accuracy the apple framework delivers is definitely “good enough” and way better than “not at all”.

I have been using an app called JustPressRecord for a couple of years now. It lets me record audio and automatically transcribes it.

The quality depends on different factors, mostly on the audio quality.
I use it for Voice Memos up to a couple of minutes.
If the transcription fails to recognize a word, I can always listen back to the audio and correct it.

The transcription uses the Apple Framework, works on device and is free (the app is a one time payment).

For Interviews and Podcast Transcription, I use a web service called Sonix. This is more accurate and gives me an editor, that sticks the audio behind the text. It is charged by the minute and works in the browser.

The use case for the transcription for me would be mostly on the iPhone App:

long tap on the App Icon > “New Media document” > “audio note”.

Then I would record, whatever I want to remember, and tap “done”.

Later on iOS or Mac I would use the option similar to “OCR” but “transcribe audio” and get the transcription as a comment or annotation.

This would be just enough and would make the audio notes so much more useful.

No worries and thanks for the clarification.
Interesting ideas for us to consider.


1 Like