MRC & OCR quality support inside DT3

I use standalone Abbyy these days mainly because it allows me to specify more options like quality and mrc.

It would be nice if I could do all of that inside DT3 without having to use standalone Abbyy on top, if possible case by case with separate menu items.

Currently I use this applescript here that triggers through a smartrule when I tag something with ocr_pending. It gives me a mrc and a high quality version side by side and imports both into DEVONthink so I can choose later how important scan quality is for the given document (some of them I might want to print again, some others are just for archiving so mrc is fine)

1 Like

Thank you for the feedback! The next release will include an updated engine which should improve the quality.

2 Likes

Besides quality, is it possible to add mrc as well or is this something that the embedded version can’t do?

We’re checking the MRC support of the engine and whether this will be an improvement. Please note that the stand-alone FineReader app uses the latest engine which is not yet available for third-party developers.

I think if you select “Compress PDF” in DT OCR options you are using MRC, or at least a “soft” version of MRC. In DT2 OCRing a PDF resulted in a big size and very low quality PDF with low quality text (you could see it incredibly blurry), but starting with one of the latest DT3 updates, the resulting PDF was almost perfect. High text quality and very tiny size, that is one of the MRC things.

(I use a more or less last Abby Server version to batch OCR and sometimes DT result is better than done with it. So better than I’m using DT OCR for scrapped content and photos of text (saving paper books citations) instead of switch on my “OCR machine”, that, well, ahem, cannot be connected to internet).

Ah let me give that a try.

It would still be nice to have it as separate menu items somehow so it can be set case by case for when the document is ok as a lossy version or when it needs to be HQ.

How do you like Abby Server? I use Abby standalone on my mac mini server at home because it is controllable through the applescript interface so it’s completely automated.

My mac mini is not available from the internet but is downloading/syncing stuff. So I have the above applescript set to run with smartrules on everything that has ocr_pending on sync. That gives me a abby-server-ish workflow inside DT3 without having to expose the machine

I use server version because it is unattended, and normally I batch OCR scanned magazines and books. I have (and buy) some collections of very old stuff I want in electronic form, mainly form personal use and backup. For example, I have a complete National Geographic from first number until 2010 that use for internal reference and I’m thinking to offer it to The Internet Archive.

However, server version has a limit of two core and 3000 month page limit that, well, I’ve overpassed (that is the reason I cannot connect to internet, to avoid program calling home). I’m not doing commercial or money earning with that tool. It is the Windows version.

But it has a very interesting feature: you can have different auto-detected folders with different OCR options on it. I have two: one for high quality result, and other for very optimized image/text. I switch on that PC, drop the files and wait.

However, if you can script control the normal version and don’t do a lot of OCR, I think it is better use the normal version because it is waaaaay faster -uses all CPU cores, at least in Windows-. And knowing it can be scripted, you’ve given me the idea of purchase the macOS version (my main concern is OS reinstallation and the Abbyy activations).

Standalone Windows Abbyy looks like it has a command line interface as well - https://abbyy.technology/en:kb:code-sample:commandline_ocr

1 Like

Mmmm… Interesting, very interesting. Thanks!!!

You may try out this Smart Rule version of my script.

Hope it can be helpful for you.

1 Like

Hello!

I’m trying to add this script to a Smart Rule script but I get the error

“Expected end of line, etc. but found “from”.”

on line

“export to pdf pdfPath from file itemPath image quality high quality ocr”

Any idea on what could be wrong?

Thanks in advance.

Ok, found this:

Shitty Catalina, honors his name in Spanish.

As a Folder Action directly from Finder, works: