New Job, Big Library

akraut · October 2, 2017, 11:51pm

I recently started a new gig within the policy group of a large corporation. We have a huge library of policies, documentation of their creation, and historical versions. On the surface, this seems like it would be a great use of DEVONthink. Since it’s all on a shared drive and reasonably well organized, I can Index (not import) the library and bootstrap my ability to find things.

However, the execution seems to be fraught with peril. By my estimation, I’ve indexed less than half of the shared drive, and my DT database file is ~100GB. DT regularly gets paused by the operating system because the whole system is out of memory.

Has anyone else used DT in a similar situation? Is there a better strategy for what I’m trying to do?

cgrunenberg · October 3, 2017, 7:33am

How many items and words (see File > Database Properties) does the database currently contain? Or could you post a screenshot of this panel? Thanks! In the end the size of the files doesn’t matter but the word count does.

akraut · October 3, 2017, 7:05pm

Just over 1million unique words, 1.5billion total

cgrunenberg · October 4, 2017, 6:57am

That’s definitely the highest amount of words I’ve ever seen in a database and the reason for the high memory usage as DEVONthink’s index includes always all words/numbers of the documents. The only cumbersome workaround coming to my mind would be to use multiple databases and to open only one concurrently.

akraut · October 5, 2017, 7:44am

Ok, in that case, the likely culprit is something I suspected… It’s indexing files that have things that aren’t real words. I’ve tried to cull some of that extraneous info by choosing what I index, but I can’t quite be selective enough. ie: I can either index XLS and CSV or neither.

If I deselect some file types and index a file structure, can I re-enable those types later and they’ll be picked up when I choose “Update Indexed Items” from the menu?

cgrunenberg · October 5, 2017, 2:11pm

It would be better & more reliable to move the undesired file types to a different folder that isn’t indexed.