Weird tip to dramatically boost OCR and indexing speeds

I’ve been importing a bunch of files (c. 1 million single-page PDFs) into a fresh database. Most of these have some kind of OCR layer, so upon import DevonThink was indexing them all. I was a bit disappointed that the OCR speed seemed to be really slow, but hey, I thought, this is just the price to pay.

I just discovered something that dramatically increases the speed of OCR (as much as a factor of 10, by my measurement) and indexing. (I discovered this while performing some OCR of documents inside my global inbox).

I wanted to be able to see the “activity” window at the same time as watching a youtube video in my browser, so I resized my browser window and then closed the main window to DevonThink. Then I opened up the activity window from the toolbar, leaving this as the one in the foreground and then the youtube/browser window in the background to the side.

I had measured the speed of indexing (and OCR) previously to try to get a measure of how long it was going to take to complete its indexing process (i.e. as until then I am unable to use the files as I want). I had estimated 25 days of constant run-time. (The index was running REALLY slow).

But with the main window closed, I just noticed that the speed is dramatically faster. It’s WHIZZING through both OCR and indexing tasks. I now estimate that it’ll take 103 hours only to complete.

My initial hunch for what’s going on is that the preset smart groups are updating every time a new file is indexed, so that’s what’s causing the bottleneck? Anyway, I am just writing up my experience here in case this is useful either as an edge case for large numbers of documents in a database, or in case others are having the same trouble with really slow speeds of either OCRing or indexing.

If you are, close the window and witness the SPEED! :slight_smile:

1 Like

Note: Putting 1,000,000 files into a database has the potential to cause some performance issues.
Also, it is really inadvisable to try to import or OCR such a large number of files at once.

3 Likes

Wow! If you generated 50 PDFs per day, you’d need 54 years to accumulate 1,000,000 such files.

1 Like

Did you add any smart rules/groups on your own? The default ones shouldn’t cause this. Collapsing the smart rules/groups section of the sidebar should be sufficient too.

I was running in a fresh database, so no. But I had some really large numbers of files…

The global smart groups/rules can be found in the sidebar and could affect a new database too depending on their settings.

Is there any way to disable those? Stop them from running automatically? (Aside from deleting them, that is)

Collapsing the sections in the sidebar might be sufficient.