OCR very slow

Hi everyone,

I convert many documents to OCR. Sometime it’s very slow. But never as slow as it is now. I am converting a 304 page document and DTPO has been working at it for more than 24 hours. Previous to this current attempt I stopped and restarted to see if that would speed things up. I do know that this is a long document, but I have done this before and it’s never taken close to this long. I’m posting to learn if there’s anything I might do to speed things up–again, keeping in mind that I do know it will take time.


Is there something different about this document than others you’ve OCRd? Does it have numerous images? How was the document created – from a scanner, camera, other?

You can always split a PDF and OCR the piece parts.

Thanks, but no it’s not different than others. As with many, many other documents in my database this is a PDF that I created in Adobe. What I did–and have done many times–is take individual JPEGs and convert them to a single PDF, save the PDF in DTPO and then run OCR. All of the JPEGs are images of typewritten memos, letters, and so forth.

Your suggestion of splitting is a good one. It’s probably time to do that. This is taking an unusually long time.

If you’re compiling the PDF from JPEGs in Adobe – why not just make individual PDFs, recognize the text in those individual PDFs, then combine the pieces? IMO, I would do all this work in Acrobat if you have it, since it’s much more suited for constructing PDFs than DEVONthink.

I’ve done it the way I describe b/c it’s been less time consuming to simply create one PDF and then let OCR do its work. Creating 304 PDFs and then running OCR on 304 individual PDFs would be cumbersome; I’d prefer to simply get it going on a long document, walk away, and come back when it’s done. I’ve done this in Adobe but have always found DTPO to be more than adequate for my purposes at OCR and Adobe, in my experience, has been no faster.

What I meant was using Acrobat’s batch OCR (can be applied to a folder of documents, for example) and combine. But, whatever.

Apologies for misunderstanding. Perhaps I will give that a try–thanks.