OCR stucks when working on a batch of PDF

saschabur · December 9, 2019, 5:36am

In one of my databases, i have still some 50 ebooks in PDF that should be OCRd.
If i launch OCR on all, the OCR stops somewhere, and the OCREngine of DT takes 2-7 GB of memory in the activity monitor. How should i manage this problem?

BLUEFROG · December 9, 2019, 2:40pm

Don’t launch OCR on all of them

No, seriously. You should always be circumspect when trying to process large volumes of data like that. I’d personally suggest queuing up a maximum of 5 at a time, but that would also depend on the size and number of pages in each.

Remember: Just because something technically can be done, doesn’t mean it should be done. I can technically hammer nails with one of my bass guitars, but I would likely be very unhappy with some of the effects of doing it