I’ve written a script lately. It works just fine, but it is a way too slow on my macbook. I thought if those new-non intel processors would help it. So, you can check it out )
General installation instructions
- First you need to install
openai-wisper
command-line utility. You may just use thebrew
: type in Terminal:brew install openai-whisper
. The description of this utility you may find here on Github. You may want to play around with parameters and language models, so see the github and help:whisper -h
. Script currently uses large model (~3Gb). It downloads it automatically, just indicate the name of the model. - See if you have
ffmpeg
installed. If not - type in Terminal:brew install ffmpeg
- Save script and tweak it as you need:
-- Script to Transcribe any media with sound, and adding the transcription to finder comments of this record
-- Language is detected automatically and added to the custom meta data of the record
-- Using openai-whisper cli
-- Created by Silverstone on 17.08.2024
-- 17.08.2024 - Added Timer
use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions
use framework "Foundation"
-- Local variables
set theOutputFolder to "/Users/ilya/Documents/DevonthinkTempItems/" -- Temporary path you will use for transcription TXT files
set ShellPath to "PATH=$PATH:/usr/local/bin:Users/ilya/.local/bin/; " -- Locations for script to find used utilities (whisper, ffmpeg etc, see instructions on the forum)
set theLanguageModel to " --model large" -- Choose your model here - https://github.com/openai/whisper
tell application id "DNtp"
set theRecords to (get selection)
set RecordCount to (count of theRecords)
if RecordCount > 0 then
show progress indicator "Transcribing Media…" steps RecordCount with cancel button
set GlobalStartTime to (current date) -- Timer start
set theNumber to 0
set GoodNumber to 0
set theLanguage to ""
repeat with theRecord in theRecords
step progress indicator "(" & (theNumber + 1) & " of " & RecordCount & ") - " & ((name of theRecord) as string)
set StartTime to (current date) -- Timer start
--Constructing arguments
set theInput to path of theRecord
set baseName to (current application's NSString's stringWithString:(theInput))'s lastPathComponent()'s stringByDeletingPathExtension() as text
set TXToutput to theOutputFolder & baseName & ".txt"
--Trascribing using OpenAI Wisper model
set theText to do shell script ShellPath & "whisper " & quoted form of theInput & theLanguageModel & " -f txt -o " & quoted form of theOutputFolder
set theTranscription to do shell script "usr/bin/iconv -t UTF-8 " & quoted form of TXToutput
--Detecting the language of the media from theText (uses first 30 seconds of media)
set saveTID to AppleScript's text item delimiters
set AppleScript's text item delimiters to "Detected language: "
set theList to text items of theText
set theText to item 2 of theList
set AppleScript's text item delimiters to "["
set theList to text items of theText
set theLanguage to item 1 of theList
set AppleScript's text item delimiters to ""
set theLanguage to (characters 1 thru -2 of theLanguage) as string
set AppleScript's text item delimiters to saveTID
if theLanguage is not "" then
add custom meta data theLanguage for "languageofcontent" to theRecord
else
add custom meta data "unknown" for "languageofcontent" to theRecord
end if
-- Setting Timer strings (nedded for log)
set EndTime to (current date) -- Timer stop
set theElapsed to my secondsToTimeString(EndTime - StartTime)
set theDuration to do shell script "/usr/local/bin/ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 " & quoted form of theInput
set theDuration to (current application's NSString's stringWithString:theDuration) as real --Converting string to a number
set theDurationText to my secondsToTimeString((my roundThis(theDuration, 0)) div 1)
set theRatio to my roundThis((EndTime - StartTime) / theDuration, 2)
set LogText to "Trancsribing Media '" & (name of theRecord) & "': "
set InfoLogText to "Start: " & (StartTime as text) & return & "End: " & (EndTime as text) & return & "Elapsed: " & theElapsed & " | Duration: " & theDurationText & " | Ratio: " & theRatio
if theTranscription is not "" then
set the comment of theRecord to theTranscription
add custom meta data "true" for "transcribed" to theRecord
log message LogText & "Success!" info InfoLogText
set GoodNumber to GoodNumber + 1
else
log message LogText & "Nothing to transcribe..."
end if
set theNumber to theNumber + 1
if cancelled progress then exit repeat
end repeat
hide progress indicator
-- Setting Global Timer strings
set GlobalEndTime to (current date) -- Global Timer stop
set GlobalElapsed to my secondsToTimeString(GlobalEndTime - GlobalStartTime)
display notification ((GoodNumber) as string) & " of " & ((RecordCount) as string) & " record(s) was successfuly transcribed." & return & "Elapsed: " & GlobalElapsed with title "Transcribing Media"
end if
end tell
-- Getting time string from seconds
on secondsToTimeString(t)
-- Comment the code if t's likely to be less than a day: 'set d', 'set t' and last 'Set timeString'.
set d to t div days
set t to t mod days
tell (1000000 + (t div hours) * 10000 + (t mod hours div minutes) * 100 + t mod minutes) as text
set timeString to (text 2 thru 3 & ":" & text 4 thru 5 & ":" & text 6 thru 7)
end tell
set timeString to text 2 thru 4 of ((1000 + d) as text) & ":" & timeString
return timeString
end secondsToTimeString
-- Rounding the number
on roundThis(n, numDecimals)
set x to 10 ^ numDecimals
tell n * x to return (it div 0.5 - it div 1) / x
end roundThis
What this script does:
- It transcribes the media file and saves the transcription in Finder comments field of the Record. It is good if you want to full-text-search your audio or video records. It even correctly exports these records (you can see the transcription in Finder).
- Additionally it defines the Language of the speech and saves it to appropriate custom metadata field (currently
languageofcontent
, you may set it up as you wish at any time, or use yours). - It also saves the boolean custom metadata field
transcribed
to mark the record as transcribed. - Script has progress indicator to see the progress if you process a bunch of records.
- Script has a timer, so you can see the time it took to transcribe (Elapsed), the Duration of media file and a special figure: Ratio, which is Elapsed, divided by Duration. All these data you can find in process of transcribing and after that - in DevonThing’s standard log window.
Tweaking the script
All you need to tweak is at the beginning of the script. You will need a Script Editor to do this (or Script Debugger, if you have one):
- Setup the path to your temporary folder. Script will use it to save transcription TXT files
- Setup locations to command-line utilities for
do shell script
command to run without errors. The matter is that the AppleScript uses another shell, which is not aware of yourPATH
variable and if you run the same command in Terminal, it doesn’t mean there will be no problems running it in AppleScript. In most cases it is enough to just indicate explicitly the path to executable, like:do shell script "usr/bin/iconv"
. But if this command uses other executables, which are in other locations (e.g. ffmpeg) you will get error. So you need to ask Terminal about locations:which ffmpeg
, and add this location to the string above, using “:” as divider. See the example in the script. If shell script still runs with error, you need to see which executable it can’t find, ask Terminal the path and write it in this string. - Language model. You can experiment with them. See the descriptions and names here.
That’s all!
You are all set up. Happy transcribing!
PS
What is interesting for me is what Ratio you get on your machines, because I get too big figures, like 20 times (means transcribing takes 20 times more time than the duration of the media)! Yes, the transcription is very nice, with all the punctuation and sentences, but 20 is a bit too much ))
Whether it is because of my intel MacBook pro (16,2), or Large model…
Don’t know, share your Ratio