You could likely do this through the Claude Cowork or similar connected to the Devonthink MCP server and to the Azure Document Intelligence MCP server.
Or you could do it for free by asking Claude Cowork to create an OCR skill using a local Tesseract installation.
And for Devonthink, I created a script that executes the shell script above:
-- DEVONthink: Run OCR for all selected PDFs
tell application id "DNtp"
activate
-- Path to the OCR shell script
set scriptPath to POSIX path of (path to home folder) & ¬
"Library/Application Scripts/com.devon-technologies.think/Menu/Azure-DocAI/azure_ocr.sh"
-- Get selected records
set selectedRecords to selection
if selectedRecords is {} then
display dialog "No records selected." buttons {"OK"} default button 1
return
end if
set totalCount to count of selectedRecords
set currentIndex to 0
-- Show DEVONthink progress indicator
show progress indicator "Microsoft Document AI" steps totalCount with cancel button
repeat with theRecord in selectedRecords
set currentIndex to currentIndex + 1
try
set recordPath to path of theRecord
set posixPath to POSIX path of recordPath
set recordName to name of theRecord
-- Update progress text
step progress indicator "Running " & currentIndex & "/" & totalCount & ": " & recordName
-- Process PDFs only
if posixPath ends with ".pdf" or posixPath ends with ".PDF" then
-- Run OCR script
do shell script quoted form of scriptPath & space & quoted form of posixPath
end if
on error errMsg number errNum
hide progress indicator
display dialog "Error processing: " & recordName & return & errMsg ¬
buttons {"OK"} default button 1
return
end try
end repeat
-- Hide progress indicator
hide progress indicator
end tell
I also have a problem with PDFs that already have a text layer. Document AI simply adds a second layer, and then a third, fourth, and so on. Does anyone have an idea how to delete the existing text layer beforehand without installing Ghostscript?
My OS is Mac OS Tahoe 26.5 without any brew installs.