Microsoft Document AI

I am familiar with another software application that offers the ability to send PDFs to Microsoft Document AI. The recognition results are fantastic.

Has anyone tried something like this with DEVONthink? Is there perhaps a ready-made script available?

You could likely do this through the Claude Cowork or similar connected to the Devonthink MCP server and to the Azure Document Intelligence MCP server.

Or you could do it for free by asking Claude Cowork to create an OCR skill using a local Tesseract installation.

I have created a scipt for this. This send the pdf to Microsoft Document AI and creates a searchable pdf.

#!/bin/zsh

PDF_PATH="$1"

AZURE_ENDPOINT="https://YOUR-ENDPOINT.cognitiveservices.azure.com"
AZURE_KEY="YOUR_API_KEY"
API_VERSION="2024-11-30"

TMP_DIR=$(mktemp -d)

HEADERS_FILE="$TMP_DIR/headers.txt"
STATUS_FILE="$TMP_DIR/status.json"
OUTPUT_PDF="$TMP_DIR/output.pdf"

ANALYZE_URL="$AZURE_ENDPOINT/documentintelligence/documentModels/prebuilt-read:analyze?api-version=$API_VERSION&output=pdf"

curl -s \
  -D "$HEADERS_FILE" \
  -X POST "$ANALYZE_URL" \
  -H "Ocp-Apim-Subscription-Key: $AZURE_KEY" \
  -H "Content-Type: application/pdf" \
  --data-binary @$PDF_PATH \
  > /dev/null

OPERATION_URL=$(sed -n 's/^operation-location: //Ip' "$HEADERS_FILE" | tr -d '\r')

[[ -z "$OPERATION_URL" ]] && exit 1

STATUS="running"

while [[ "$STATUS" == "running" || "$STATUS" == "notStarted" ]]; do
    sleep 2
    curl -s -H "Ocp-Apim-Subscription-Key: $AZURE_KEY" "$OPERATION_URL" > "$STATUS_FILE"
    STATUS=$(jq -r '.status' "$STATUS_FILE")
done

[[ "$STATUS" != "succeeded" ]] && exit 1

BASE_URL="${OPERATION_URL%%\?*}"
PDF_URL="$BASE_URL/pdf?api-version=$API_VERSION"

HTTP_CODE=$(curl -L -s \
  -w "%{http_code}" \
  -o "$OUTPUT_PDF" \
  -H "Ocp-Apim-Subscription-Key: $AZURE_KEY" \
  "$PDF_URL")

[[ "$HTTP_CODE" != "200" ]] && exit 1

if file --mime-type "$OUTPUT_PDF" | grep -q "application/pdf"; then
    mv "$OUTPUT_PDF" "$PDF_PATH"
else
    exit 1
fi

rm -rf "$TMP_DIR"

And for Devonthink, I created a script that executes the shell script above:

-- DEVONthink: Run OCR for all selected PDFs

tell application id "DNtp"
	activate
	
	-- Path to the OCR shell script
	set scriptPath to POSIX path of (path to home folder) & ¬
		"Library/Application Scripts/com.devon-technologies.think/Menu/Azure-DocAI/azure_ocr.sh"
	
	-- Get selected records
	set selectedRecords to selection
	
	if selectedRecords is {} then
		display dialog "No records selected." buttons {"OK"} default button 1
		return
	end if
	
	set totalCount to count of selectedRecords
	set currentIndex to 0
	
	-- Show DEVONthink progress indicator
	show progress indicator "Microsoft Document AI" steps totalCount with cancel button
	
	repeat with theRecord in selectedRecords
		
		set currentIndex to currentIndex + 1
		
		try
			set recordPath to path of theRecord
			set posixPath to POSIX path of recordPath
			set recordName to name of theRecord
			
			-- Update progress text
			step progress indicator "Running " & currentIndex & "/" & totalCount & ": " & recordName
			
			-- Process PDFs only
			if posixPath ends with ".pdf" or posixPath ends with ".PDF" then
				
				-- Run OCR script
				do shell script quoted form of scriptPath & space & quoted form of posixPath
				
			end if
			
		on error errMsg number errNum
			
			hide progress indicator
			
			display dialog "Error processing: " & recordName & return & errMsg ¬
				buttons {"OK"} default button 1
			
			return
			
		end try
		
	end repeat
	
	-- Hide progress indicator
	hide progress indicator
	
end tell

I also have a problem with PDFs that already have a text layer. Document AI simply adds a second layer, and then a third, fourth, and so on. Does anyone have an idea how to delete the existing text layer beforehand without installing Ghostscript?

My OS is Mac OS Tahoe 26.5 without any brew installs.