For many reasons having to do with my academic workflow these days – too many writing projects! – I’ve found that my primary use of DT3 as a repository for texts used in my research and writing (mostly .pdfs, journal articles and books, I’m a prof in the humanities), has been to combine multiple large-ish databases into one or two very large databases that are, essentially, my local giant encyclopedia. This gives me the maximal flexibility in terms of searching, linking, tagging, replicating, etc. across a very large number of files, and makes it easier to discover all kinds of cross-connections in the fields in which I work.
I’m aware that doing this probably reduces DT3’s speed and efficiency in some basic functions but that loss seems to be more than made up by increased convenience and efficiency for the wetware end of application use (my brain): it’s like having one giant archive with everything present at the same time, albeit carefully structured with groups and tags.
So… I’m working on a regular basis with couple of big databases: (140 GB + 470 million words total, and 58 GB + 75 million words total). All of the contents of each of those are imported files (no indexed files). And then I have a fair number of much smaller databases, project-related, teaching-related, or archival in nature, none of which is bigger than 10 GB or so (most are much smaller than that), and many of which include indexed files from elsewhere on my computer (a 16 GB MacBook Pro Intel). I keep the two big databases open all the time and open or close the project and teaching databases as I require them. I practice scrupulous backup procedures: Time Machine, Carbon Copy Cloner, Arq, multiple redundancies saved.
What would I lose or gain if I further combined the two big databases into a single really big database? As I keep them open all the time, their words (total and unique) are always loaded; in effect, as I understand how DT3 manages this stuff, the performance penalties of using two large databases, when they are open all the time, are roughly the same as using one combined database open all the time. My guess is that there is a good bit of overlap in terms of the unique words of each of the databases, so making them into one really big database, while it would increase the total file size of the largest database DT3 manages, would not increase the total word count, and would decrease overall the unique word count. (I don’t know how much that matters.)
Are there any obvious advantages in terms of memory footprint or other performance of going for the large, single database + satellite specialty databases? Are there any real and worrisome downsides?
(Made a couple of edits after initially posting to clarify some details.)