Smart Rule AppleScript to OCR a PDF using Nitro PDF Pro (lossless)

tkrunning · September 9, 2023, 8:54am

Thank you for your valuable input, Jim!

As mentioned this script is something that was posted on the MPU forum—years ago—before PDFpen was renamed Nitro. I just didn’t think to change the comments when I changed the app name in the script.

Great feedback!

I’ve updated the script used for the Smart Rule (with a bit of help from ChatGPT):

on performSmartRule(theRecords)
    log "Starting script"
    tell application id "DNtp"
        repeat with theRecord in theRecords
            set thePath to path of theRecord
            my processFileWithOCR(thePath)
        end repeat
    end tell
end performSmartRule

on processFileWithOCR(thePath)
    tell application "Nitro PDF Pro"
        -- Open the file
        open (POSIX file thePath) as alias
        
        -- Perform OCR
        tell document 1
            ocr
            repeat while performing ocr
                delay 1
            end repeat
            delay 1
            close with saving
        end tell
        
        -- Quit the application if no other documents are open
        if name of window 1 is "Preferences" then
            quit
        end if
    end tell
end processFileWithOCR

Yeah for sure. I know the limitations in DevonThink is caused by the OCR engine (which has been discussed elsewhere on the forum), and beyond changing the OCR engine there’s nothing the DT developers can do to fix the issue. For my purposes it’s important that the OCR process is lossless/does not resample the actual images in the PDF, hence why I am relying on Nitro instead.

Is there any benefit to doing it this way instead of using character count? Any edge cases I’m not considering?