OCR Speed on macs somewhat generally - anyone with a mac pro

You make several good points. The reality is that I did achieve basically what I was hoping to do which was run the machine full tilt OCR ing files. What I was trying to see now was whether the added instance would increase throughout. If it would, would running multiple instances on a dedicated iMac say be sufficient versus the added $ on a Mac Pro. But your question is relevant. I would like a Mac Pro but only can justify it if it would aid in the occasional large document OCR project-of which I have one now. So. I am curious about what he power of a Mac Pro will do to my apparent time of 0.8 pages per second. If it would significantly improve that then I would be tempted to get one, to crank through the documents I have and then to power a new 4k monitor for when I am reading the million pages.

So to recap. My iMac will crank 0.8 pages per second but be relatively unusable for anything else (running two instances). How does that compare to what might happen with a Mac Pro is another question. If I knew I could buy a windows machine and it would crank out files as fast or faster I guess I would consider it. But since I am my own it guy I am not inclined to service windows if I don’t have to!

Sorry, didn’t want to come across too 'lectury", but probably did. The PC route probably only makes sense if you can round up a box (or 2) that’s sitting around somewhere. Then it’s only a matter of putting Acrobat on, and feeding it via a memory stick with the pdf files. That’s probably no much of an admin job. No net needed (other than registering Acrobat), no security issues (that’s the only admin work that takes efforts, in my view, on a Windows machine).

I have not owned a desktop since my Amiga 1000 in 1986, so I’m not the best person to judge this (I deal with tons of desktops, servers at work, but those are all Linux, Unix, Windows, no Macs), but I would suspect that despite the “monster” specs on the new Mac Pros, the actual bare CPU performance, which probably matters most for this OCR job, has not really improved that much over the recent years. I suspect there will be no “killer” improvement there. In these new machines, it’s more about GPU computing, Thunderbolt connectivity, and driving a zillion 4k displays.

If you want to get a Mac Pro anyway, you could run your iMac in dedicated OCR mode, and use “half” of your Mac Pro for OCRing as well.

I’m not sure whether the new App Nap feature in 10.9 let’s you manually fine tune how much oomph you give to an application. Many years ago, I used a utility AppStop (I think under 10.4, and it no longer exists) that could halt an application, or reduce its CPU access (essentially re-nicing the job). That way, you could run the Mac Pro full tilt on OCR, and whenever you want to do other things, you could halt Acrobat to get full access to your machine. Maybe someone knows a current utility that can do that (or just using “nice” on the command line, but I don’t know how that plays with complex apps on the Mac).

By the way, one question that still needs answering: From my own experience, which essentially consists of having OCR’ed around a 100 files one by one with Acrobat in the past two or three months, I would say one in 50 files in my case gave an exception in Acrobat (“cannot render page” or similar"). It simply would not do it. On the same file, ABBY/DT had no problems. On my scale that’s absolutely not a problem. On your scale, that would mean that there are quite a few such files in one night’s run. Maybe you should run a test run over a full night on your iMac to see how smooth that typically goes.

I use Acrobat Pro XI for multi-file OCR quite a bit, but I’ve never bothered to explore configuring it for specific cores or threads. (Don’t even know if that’s possible.)

OTOH, I’ve never tried to do anything on a PC six million times. There are services, though, that will do this for money. Such as bit.ly/1rfMeG0

i have a mac pro (4,1). I use handbrake to encode videos using all 8 cores, yet the computer is still very responsive, don’t notice it at all.

you might to look into something cloud based. boot up a server for a month or so and use a linux tool (if available, no idea) to process in parallel.

I agree that lots of “all cores” type of work (Handbrake, iMovie) leaves these machines amazingly responsive.
The cloud solution might not be useful depending on the nature of the material. As a physicist, I don’t get suddenly 6 million pages dumped on me. This sounds more like some legal cases.