2.5 million words DB

Quoyle · January 8, 2023, 11:54pm

I have a literature database (PDFs, sometimes with annotations) that consists of a good 2 million individual words so far and has a size of 23 GB. - All on a Macbook Air 2020 with 8 GB RAM. Should I already split the collection or is there nothing against continuing to add literature to it for the time being?

BLUEFROG · January 9, 2023, 2:38am

That’s within comfortable limits for a modern machine.
If you get above 4.5 million unique words or 250,000+ items in a database, you may want to start thinking about a split.

cgrunenberg · January 9, 2023, 9:04am

Or in case of more than 300 million words in total but in the end it depends highly on the used machine, its performance & available RAM. Some users have billions of words and millions of items in their databases. However, splitting databases and then opening all of them at the same time doesn’t reduce the memory usage.

tharpold · January 9, 2023, 12:19pm

I’m one of those, um, packrat users of DT .My largest database has 5.6 million unique words, 560 million words in total, 18 thousand documents, 200 GB. That’s a literature database, perhaps comparable to the situation that Quoyle is asking about. A couple of years ago, I consolidated several fairly large databases that I pretty much always kept open into this single behemoth in order to simplify searching and to take greater advantage of replicants in groups scattered across the database. I noticed a slight cost for searching the unified database – it takes just a wee bit longer for some tasks – but nothing really bothersome. Now, this is with a 2017 MacBook Pro, running Ventura, with 16 GB of RAM, and DT is the biggest user of RAM on the computer. But it’s certainly workable if you’ve got the storage and the RAM, and in any case this approach has become central to my research workflow. Hats off to the brilliant DT designers.