40GB database of eBooks is too much?

I am using a registered copy of DEVONthink Pro Office 2.0.1. I have built a database of about 40GB worth of eBooks. There are 2,833 groups, 6,428 HTML pages, 5,498 plain text files, 4,129 images, 12,012 PDFs, 155 Quicktime files and 12,883 Unknowns.

I can load the database (takes about 1 minute and a half on a MacBook Pro with a 2.5GHz CPU and 4GB of RAM). Once loaded, it is rather snappy to browse and search. But if I try to do a Verify & Repair or a Rebuild, after a while, DT simply crashes.

I don’t know why…

Of course, I could simply ignore that problem and continue to throw more stuff in the database and use it like nothing wrong is going on, but if I do that, I know I will someday regret it.

Should I split the database in smaller chunks? Is about 44,000 documents taking 40GB too much?

Thanks!

The number of items and their filesize doesn’t matter (at least almost), only the number of total/unique words (see Database Properties) is important. According to the description you’re probably running out of real memory and therefore might have to split the database.

There were 5,3M unique words and 305M words total.

I have now split the database by publishers.

That’s approximately the maximum until DEVONthink will be available in a 64-bit version. But the first number is quite huge - do the eBooks use multiple languages?

Indeed, there could be books in 3-4 languages (English, French, Spanish and, …?).

Thanks for the ballpark figure about the maximum number of unique words in a db.

EDIT: also, a lot of source code!