Here is the a small script to OCR a selected PDF from within DT using Finereader Pro v12
tell application "DEVONthink Pro"
set thepath to path of item 1 of (get selection)
tell application "FineReader"
-- add ocr text layer underneath and compress
export to pdf thepath from file thepath export mode text under image with use mrc
-- uncomment next line if you want text to be on top of image. Works well if scan is good quality
-- export to pdf thepath from file thepath export mode text over image with use mrc without keep pictures
end tell
end tell
Why Finereader over DT’s own OCR ? - DT’s own OCR uses the Finereader engine and does very well at text recognition but doesn’t have some of Abbyy’s more advanced features such as compression or conversion to pdfa.
Why Finereader over Acrobat DC Pro ? - Adobe seems to have forgotten with their latest release that a measure of stability is an essential feature. Also your users will just love it when replace your awful non standard UI with an even more incomprehensible UI.
Language coverage is another issue – the right-left languages of the Levant, for example, are supported in Finereader Pro, but not by the DT distribution.
I tried to add a loop to rescan multiple (selected) documents with Finereader, but it does not work:
tell application "DEVONthink Pro"
set theSelection to the selection
repeat with theRecord in theSelection
set thepath to path of item 1 of theRecord
tell application "FineReader"
-- add ocr text layer underneath and compress
export to pdf thepath from file thepath export mode text under image with use mrc
-- uncomment next line if you want text to be on top of image. Works well if scan is good quality
-- export to pdf thepath from file thepath export mode text over image with use mrc without keep pictures
end tell
end repeat
end tell
I have not tested it extensively but it seems to work
tell application "DEVONthink Pro"
set theSelection to the selection
repeat with theRecord in theSelection
set thepath to path of item 1 of theRecord
tell application "FineReader"
repeat until not (is busy)
end repeat
-- add ocr text layer underneath and compress
export to pdf thepath from file thepath export mode text under image with use mrc
-- uncomment next line if you want text to be on top of image. Works well if scan is good quality
-- export to pdf thepath from file thepath export mode text over image with use mrc without keep pictures
end tell
end repeat
end tell
Works fine for me without an error. Are you using the app store version of Finereader or the direct download?
I should have specified but this script works with the direct download version. The applescript of the direct download version is significantly different due to Sandboxing. I have not tried the app store version. You might try contracting Abby directly. It seems they are pretty helpful in resolving differences with applescript between the two versions.
If the version of Finereader you are using does not support the ‘is busy’ command then Script Editor and Script Debugger won’t compile.
If I look at this thread http://macscripter.net/viewtopic.php?id=43516 it looks like the app store version of Finereader uses a a different approach. Perhaps this more verbose approach might be more successful. ( I am not sure why it should be but applescript is a funny thing)
on WaitWhileBusy()
repeat while IsMainApplicationBusy()
end repeat
end WaitWhileBusy
on IsMainApplicationBusy()
tell application "FineReader OCR Pro"
set resultBoolean to is busy
end tell
return resultBoolean
end IsMainApplicationBusy
I had been using Finereader Pro 12.0.7, purchased from the developer, not the mac app store. I checked for an upgrade, there was one, 12.1.3, so I upgraded and now the script can be compiled.
Yes, sometimes it’s critical to check for and install the latest version of an app. Frederick’s comment about differences in code of apps purchased from Apple’s App Store and the developer is also potentially important. Especially if Applescript extension may be important, I’m more and more in favor of downloading from the developer, as Apple’s emphasis on sandboxing can cripple AppleScript dictionaries.
Off topic: The free version of FineReader included with ScanSnap scanners will OCR scans produced by that scanner, but not image-only files produced by other means. The OCR module included in DEVONthink Pro Office doesn’t have that limitation, and of course versions of ABBYY FineReader that are purchased from ABBYY don’t have that limitstion.
Thanks for the script. It successfully sent the pdf to FineReader, it got scanned and OCR was complete.
I understood from the script that fine reader save the file on the same location. I could not figure out how to get DTPO to recognise automatically the file and replace the original.
The script works… and it doesn’t
what? again, please:
ok, here is the story: “FineReader” works… but “FineReader” is the trial version from the ABBYY homepage. Once you buy the program in the Apple App Store, the program calls itself “FineReader OCR Pro”. THAT program does not accept calls from the outside world nor via workflow nor via Automator.
the good news: finereader 12 (trial or app store version) both create wonderful, great small files versa the old “Abbyy FineReader 8” Engine.
so, please, dear DTPO Team pleeeeease implement the new engine for OCR.
@Chris: If life was so simple. What ABBYY offers for the consumer is not necessarily what it offers developers at the same time. In fact, the Mac and PC versions of the ABBYY engine don’t have feature-parity either. We can’t just implement something if it’s not an available resource through ABBYY’s licensing program.
That being said, it is possible an update would be coming in the future.
In case it wasn’t clear - only buy the version of FineReader direct from Abbyy’s store. If you email them you might be able to persuade them to swap your app store version for the real unrestricted version.
I’m wondering whether this is also relevant to the discussions elsewhere about support for double-byte languages such as Japanese? If I OCR’d files using the Pro version, would search in DT Pro Office work better; or is the OCR itself a side issue, with the main issue being that DT is designed mainly for European, alphabetic orthographies?
FineReader Pro 12 produces much, much better results than the engines found in DEVONthink or ScanSnap. I’m guessing ABBYY won’t license the technology because they would then be foregoing the revenue from people buying FineReader Pro. Here’s an idea: importune ABBYY to allow DEVONthink to use the engine if FineReader Pro is installed on the user’s machine.