Hi, Cesare. I’ve successfully run many 50-page scans to OCR to database on my MacBook Pro dual core 2.0 GHz with 2 GB RAM and on my Power Mac G5 2.3 GHz dual core with 5 GB RAM.
OCR requires uses a lot of computer resources, when you think about what’s going on. Especially if there’s not much free RAM available when you start scanning, a long document may take a while, although there are some things you can control.
I use a little preference pane called MenuMeters (Google it). That lets me monitor the activity of both CPU cores and the amount of free RAM available.
Free RAM is important. Although Apple’s Virtual Memory will let a memory-intensive operation proceed to completion, it does so by swapping data between disk and RAM, using VM swap files. By comparison to the speed of memory operations in physical RAM, manipulating memory from disk is horribly slower.
If I’ve got little free RAM available, I know that a memory-intensive procedure that I’m about to start (such as scanning and OCR) may prove slower than I’d like. So I can quit some other applications I’m not using at the moment, and quit and relaunch DTPO to free up some more memory. If I’ve accumulated large VM swap files, I may simply reboot before doing scan/OCR.
If you have a large DTPO database, it may be using up a lot of memory to load. You may find it efficient to create a new, empty database when you are going to scan many paper documents. This will free up your computer resources (especially important if you have limited RAM) so that the scan/OCR process will be noticeably faster. Later, you can move the new OCR’d files to their appropriate database.
Option to OCR only the first x pages of a PDF? Personally, I wouldn’t be satisfied with that. I’ve got many long PDFs in which the information in the last y pages may be more important for my searching and analysis needs than the information in the first x pages. It’s not just what the document is about; it’s about the information that it actually contains.
I’m not even certain that such an option could be implemented, given the OCR plugin we are using.
Any others want to comment on that suggested option?