Video and Audio Transcription

I use DaVinci Resolve, and agree that its new transcription capabilities are brilliant. I also make audio recordings of interviews that I run through Otter.ai to transcribe, then drop an exported PDF of the transcription into DT to work from when I’m writing.

I’d be absolutely delighted if DT had DaVinci-like transcription capability. As AI becomes ubiquitous (for better or worse), I’m becoming more grudging about paying the Otter.ai sub and have been looking for other solutions. I’d been waiting for the new transcription capability in Rogue Amoeba’s app Audio Hijack to mature - it seems a bit beta at the moment - but if Christian says transcription is coming to Devonthink, I’ll hang out for that. Impatiently.

When this is added, will it force a new floating window? I’m hoping it won’t, but instead will appear as a new pane within the document view. The reason I ask is because a new window would break the two-window layout I’m using with documents:

It would be ideal for me if the transcript is available as a new pane within the document window so that it doesn’t force me out of my fixed layout. Perhaps occupying the upper or lower half of the media region, with the media player in the other region.

Do we have an expected release date for this version?

Well, there’s one internally but we don’t announce release dates, I’m sorry.

Are we talking months or years?

Likely not years, but also not month.

3 Likes

I’ve been using an app called Aiko for transcriptions (has an iOS app as well as macOS) and it’s amazing. Runs on-device, is very quick, and incredibly accurate.

My question is, what is everyone’s recommendation for where to store those transcriptions? In the file’s comments? As a separate file linked to the original video file?

Would love to hear everybody’s thoughts!

For DEVONthink multiple options (transcribing to indexed & searchable text, to comments, to annotations, maybe to .srt files) are actually planned. Especially transcribing to annotations should be quite useful as the annotations can be easily viewed & edited via the Annotations & Reminder inspector and as the timestamps are clickable.

2 Likes

Yes please to a format that allows for clickable timestamps.

1 Like

Agreed, please don’t make users hunt and peck up and down the playback bar trying to find the source of the text.

Please clarify what you’re describing here.

If you have a very long recording, and you generate a transcript, you still have the problem that, when you find the specific quote you want in the transcript, you then have to search forward and backward through the recording, trying to find where that quote occurs in the recording. This is why I mentioned the DaVinci Resolve implementation, hoping that people will download and try it (Resolve is free). In Resolve, you can click on any word in the transcript and it will jump to the point of the recording where that word occurs, helping the user skip the hunt and poke process. It’s a game-changer. Any type of clickable timestamps placed throughout the transcript would be similarly helpful and time-saving.

It feels like DT is approaching a place where, should this transcript-to-video become a reality, then @BLUEFROG, @cgrunenberg et al, are going to have to prepare some standard responses to
“Why can’t I edit clips together in DT?”

In the text world, you can cut and copy text from any number of formats (RTF, md, docx) and cobble them together in another new doc.

Video editing works entirely differently than making a new text doc (a sequence in Resolve, Premiere, or AVID is, in Devonian terms, indexing a ton of files on media hard drives)

If you think there’s a lot of text editing software and layout formats, please take some time to read about codecs, pixel aspect ratios, and frame rates.

I welcome new features like everyone else but I’ve also lived through all of the video editing packages getting bloated with added functionality and all of them have gone through dark phases of waiting for the available CPU/GPU to catch up to make them stable again.

4 Likes

DT just runs quicktime player, at least on my machine. I don’t really see that as a worry, DT is about data organization.

As a TV/film pro I am curious about how you would use the transcript linked to video function.

  • Screening and making notes to be edited in another video editing program?
  • Reporting/writing about the video and the content but never directly using the source video again.
  • Quoting the video in other mediums but maintaining the original copy for legal and reference purposes?

I could see a function of DT where, using a time-stamped linked transcript and associted video file, one could export an AAF or simple EDL of selected clips from a video and those could be imported into any edit platform.
EDL can be a simple text file. Everything else runs into some sore of licensing or IP issue.

I currently make heavy use of frame links with video and audio files. I use DT for history/sociology research, so my tasks revolve around making RTF files for topics, and then I use links to video, audio, and PDF’s to group all the relevant instances for that topic.

I do also make videos, but once I get to that part of my process, I just use timestamps or the DaVinci Resolve transcripts to get to the relevant video/audio sections.

For my part, I mostly transcribe audio interviews. Timestamps are useful when taking notes from those transcripts as a point of reference should I need to double-check a quote or its context some time in the future.

This also applies to video. I don’t really need to jump-to-word capability in video that @EthicalEgret mentions, but timestamps at regular intervals throughout the transcript are very useful.

(I work with video frequently, but it the end result is usually a video. Audio recordings are a step towards turning something into print. I seldom make a video of an interview that I’m going to write about, because it’s too much overhead.)

1 Like

Both DEVONthink and QuickTime Player use the AVKit framework and therefore DEVONthink seems to “run” QuickTime Player but doesn’t.

1 Like