I’ve been importing a bunch of files (c. 1 million single-page PDFs) into a fresh database. Most of these have some kind of OCR layer, so upon import DevonThink was indexing them all. I was a bit disappointed that the OCR speed seemed to be really slow, but hey, I thought, this is just the price to pay.
I just discovered something that dramatically increases the speed of OCR (as much as a factor of 10, by my measurement) and indexing. (I discovered this while performing some OCR of documents inside my global inbox).
I wanted to be able to see the “activity” window at the same time as watching a youtube video in my browser, so I resized my browser window and then closed the main window to DevonThink. Then I opened up the activity window from the toolbar, leaving this as the one in the foreground and then the youtube/browser window in the background to the side.
I had measured the speed of indexing (and OCR) previously to try to get a measure of how long it was going to take to complete its indexing process (i.e. as until then I am unable to use the files as I want). I had estimated 25 days of constant run-time. (The index was running REALLY slow).
But with the main window closed, I just noticed that the speed is dramatically faster. It’s WHIZZING through both OCR and indexing tasks. I now estimate that it’ll take 103 hours only to complete.
My initial hunch for what’s going on is that the preset smart groups are updating every time a new file is indexed, so that’s what’s causing the bottleneck? Anyway, I am just writing up my experience here in case this is useful either as an edge case for large numbers of documents in a database, or in case others are having the same trouble with really slow speeds of either OCRing or indexing.
If you are, close the window and witness the SPEED!