I use standalone Abbyy these days mainly because it allows me to specify more options like quality and mrc.
It would be nice if I could do all of that inside DT3 without having to use standalone Abbyy on top, if possible case by case with separate menu items.
Currently I use this applescript here that triggers through a smartrule when I tag something with ocr_pending. It gives me a mrc and a high quality version side by side and imports both into DEVONthink so I can choose later how important scan quality is for the given document (some of them I might want to print again, some others are just for archiving so mrc is fine)
We’re checking the MRC support of the engine and whether this will be an improvement. Please note that the stand-alone FineReader app uses the latest engine which is not yet available for third-party developers.
I think if you select “Compress PDF” in DT OCR options you are using MRC, or at least a “soft” version of MRC. In DT2 OCRing a PDF resulted in a big size and very low quality PDF with low quality text (you could see it incredibly blurry), but starting with one of the latest DT3 updates, the resulting PDF was almost perfect. High text quality and very tiny size, that is one of the MRC things.
(I use a more or less last Abby Server version to batch OCR and sometimes DT result is better than done with it. So better than I’m using DT OCR for scrapped content and photos of text (saving paper books citations) instead of switch on my “OCR machine”, that, well, ahem, cannot be connected to internet).
It would still be nice to have it as separate menu items somehow so it can be set case by case for when the document is ok as a lossy version or when it needs to be HQ.
How do you like Abby Server? I use Abby standalone on my mac mini server at home because it is controllable through the applescript interface so it’s completely automated.
My mac mini is not available from the internet but is downloading/syncing stuff. So I have the above applescript set to run with smartrules on everything that has ocr_pending on sync. That gives me a abby-server-ish workflow inside DT3 without having to expose the machine
I use server version because it is unattended, and normally I batch OCR scanned magazines and books. I have (and buy) some collections of very old stuff I want in electronic form, mainly form personal use and backup. For example, I have a complete National Geographic from first number until 2010 that use for internal reference and I’m thinking to offer it to The Internet Archive.
However, server version has a limit of two core and 3000 month page limit that, well, I’ve overpassed (that is the reason I cannot connect to internet, to avoid program calling home). I’m not doing commercial or money earning with that tool. It is the Windows version.
But it has a very interesting feature: you can have different auto-detected folders with different OCR options on it. I have two: one for high quality result, and other for very optimized image/text. I switch on that PC, drop the files and wait.
However, if you can script control the normal version and don’t do a lot of OCR, I think it is better use the normal version because it is waaaaay faster -uses all CPU cores, at least in Windows-. And knowing it can be scripted, you’ve given me the idea of purchase the macOS version (my main concern is OS reinstallation and the Abbyy activations).