ABBYY Finereader Pro v12

Here is the a small script to OCR a selected PDF from within DT using Finereader Pro v12

tell application "DEVONthink Pro"
	set thepath to path of item 1 of (get selection)
	tell application "FineReader"
		
		-- add ocr text layer underneath and compress
		export to pdf thepath from file thepath export mode text under image with use mrc 
		
		-- uncomment next line if you want text to be on top of image. Works well if scan is good quality
		-- export to pdf thepath from file thepath export mode text over image with use mrc without keep pictures
		
	end tell
end tell

Why Finereader over DT’s own OCR ? - DT’s own OCR uses the Finereader engine and does very well at text recognition but doesn’t have some of Abbyy’s more advanced features such as compression or conversion to pdfa.

Why Finereader over Acrobat DC Pro ? - Adobe seems to have forgotten with their latest release that a measure of stability is an essential feature. Also your users will just love it when replace your awful non standard UI with an even more incomprehensible UI.

Frederiko

2 Likes

Thank you for this script. I was looking for a simple way to get documents to/from ABBYY FineReader and you already had it worked out.

Language coverage is another issue – the right-left languages of the Levant, for example, are supported in Finereader Pro, but not by the DT distribution.

I tried to add a loop to rescan multiple (selected) documents with Finereader, but it does not work:

tell application "DEVONthink Pro"
	set theSelection to the selection
		repeat with theRecord in theSelection
   			set thepath to path of item 1 of theRecord
   				tell application "FineReader"
      
      	-- add ocr text layer underneath and compress
      	export to pdf thepath from file thepath export mode text under image with use mrc 
      
      	-- uncomment next line if you want text to be on top of image. Works well if scan is good quality
      	-- export to pdf thepath from file thepath export mode text over image with use mrc without keep pictures
      end tell
   end repeat
end tell

Any hints?

I have not tested it extensively but it seems to work

tell application "DEVONthink Pro"
	set theSelection to the selection
	repeat with theRecord in theSelection
		set thepath to path of item 1 of theRecord
		tell application "FineReader"
			repeat until not (is busy)
			end repeat
			-- add ocr text layer underneath and compress
			export to pdf thepath from file thepath export mode text under image with use mrc
			
			-- uncomment next line if you want text to be on top of image. Works well if scan is good quality
			-- export to pdf thepath from file thepath export mode text over image with use mrc without keep pictures
		end tell
	end repeat
end tell

Frederiko

“repeat until not (is busy)”

Both Script Editor and Script Debugger complain about this line.

–Expected expression, “)”, etc. but found “is”.–

Works fine for me without an error. Are you using the app store version of Finereader or the direct download?

I should have specified but this script works with the direct download version. The applescript of the direct download version is significantly different due to Sandboxing. I have not tried the app store version. You might try contracting Abby directly. It seems they are pretty helpful in resolving differences with applescript between the two versions.

Frederiko

This has nothing to do with Finereader, I believe. I can’t compile the script due to the error given.

If the version of Finereader you are using does not support the ‘is busy’ command then Script Editor and Script Debugger won’t compile.

If I look at this thread http://macscripter.net/viewtopic.php?id=43516 it looks like the app store version of Finereader uses a a different approach. Perhaps this more verbose approach might be more successful. ( I am not sure why it should be but applescript is a funny thing)

on WaitWhileBusy()
   repeat while IsMainApplicationBusy()
   end repeat
end WaitWhileBusy

on IsMainApplicationBusy()
   tell application "FineReader OCR Pro"
       set resultBoolean to is busy
   end tell
   return resultBoolean
end IsMainApplicationBusy

Sorry I can’t be more help

Frederiko

EDITED…

I had been using Finereader Pro 12.0.7, purchased from the developer, not the mac app store. I checked for an upgrade, there was one, 12.1.3, so I upgraded and now the script can be compiled.

Yes, sometimes it’s critical to check for and install the latest version of an app. Frederick’s comment about differences in code of apps purchased from Apple’s App Store and the developer is also potentially important. Especially if Applescript extension may be important, I’m more and more in favor of downloading from the developer, as Apple’s emphasis on sandboxing can cripple AppleScript dictionaries.

Off topic: The free version of FineReader included with ScanSnap scanners will OCR scans produced by that scanner, but not image-only files produced by other means. The OCR module included in DEVONthink Pro Office doesn’t have that limitation, and of course versions of ABBYY FineReader that are purchased from ABBYY don’t have that limitstion.

Works fine for me, thank you very much.

Abbyy Finereader 12.1.3, Non App Store Version.

Thanks for the script. It successfully sent the pdf to FineReader, it got scanned and OCR was complete.
I understood from the script that fine reader save the file on the same location. I could not figure out how to get DTPO to recognise automatically the file and replace the original.

Thanks for your help,
Nawaf

The script replaces the file in DT with the file OCRed by Finereader. DT knows the file has changed and reindexes the the file automatically.

The script works… and it doesn’t :confused:
what? again, please:

ok, here is the story: “FineReader” works… but “FineReader” is the trial version from the ABBYY homepage. Once you buy the program in the Apple App Store, the program calls itself “FineReader OCR Pro”. THAT program does not accept calls from the outside world nor via workflow nor via Automator.

the good news: finereader 12 (trial or app store version) both create wonderful, great small files versa the old “Abbyy FineReader 8” Engine.

so, please, dear DTPO Team pleeeeease implement the new engine for OCR.

thnx,
Chris

@Chris: If life was so simple. What ABBYY offers for the consumer is not necessarily what it offers developers at the same time. In fact, the Mac and PC versions of the ABBYY engine don’t have feature-parity either. We can’t just implement something if it’s not an available resource through ABBYY’s licensing program.

That being said, it is possible an update would be coming in the future.

In case it wasn’t clear - only buy the version of FineReader direct from Abbyy’s store. If you email them you might be able to persuade them to swap your app store version for the real unrestricted version.

Frederiko

I’m wondering whether this is also relevant to the discussions elsewhere about support for double-byte languages such as Japanese? If I OCR’d files using the Pro version, would search in DT Pro Office work better; or is the OCR itself a side issue, with the main issue being that DT is designed mainly for European, alphabetic orthographies?

FineReader Pro 12 produces much, much better results than the engines found in DEVONthink or ScanSnap. I’m guessing ABBYY won’t license the technology because they would then be foregoing the revenue from people buying FineReader Pro. Here’s an idea: importune ABBYY to allow DEVONthink to use the engine if FineReader Pro is installed on the user’s machine.

1 Like

What a wonderful idea.