Questions about databases versus groups within a database

I read the excellent pinned thread started by Bill about splitting databases but that info is about 6 years old and I don’t know if either Mac OS has meaningfully changed things like how it handles memory management, or if DevonThink Pro Office has changed in that time.

I’m re-engaging with DevonThink and bringing over about 5,000 notes from Evernote. By the way, kudos to the Devon folks for making that easy. More kudos for the excellent DevonThink To Go app, it’s a terrific app.

So anyway, I’m doing what a lot of DT users do, I’d imagine: trying to determine if I should have fewer databases and more groups within them, or split things up into more databases. One issue seems to be that if I launch a search using the built in search field that it only searches in the currently selected database. If I want to search across all I have to launch the search window. Not a big deal, but I wish there was a preference I could set to have the search field search all databases.

Does the AI “see also/classify” functionality still only “see” within the currently selected database? If so, then that would seem to be an argument in favor of fewer databases which have more obviously disparate information therein.

Are there any meaningful memory or disc management issues now with High Sierra (and moving into Mohave)? If someone has plenty of hard drive space and a decent amount of RAM, should these be considerations for how to structure this stuff?

Anything else I’m not considering that argues one way or another?

Size in gigabytes isn’t the critical number. If you check out File > Database Properties > … for a given database, the number of words / unique words are more critical. On a modern machine with 8GB RAM, a comfortable limit is 40,000,000 words and 4,000,000 unique words in a database. (Note: This does not scale in a linear way, so a machine with 16GB wouldn’t necessarily have a comfortable limit of 80,000,000 words / 8,000,888 unique words.) So text content in a database is far more important.
If you have a database of images, it will have very few words but be large in gigabytes.
If you have a database of emails, it will have many words, but may be smaller in gigabytes.
The second one may perform more poorly as the number of words increases beyond the comfortable limit.

Smaller, more focused databases will generally perform better, Sync faster, and be more data-safe in the event of a catastrophe (avoiding the “all your eggs in one basket” problem). They also give you the opportunity to close unused databases when you’re not using them. This frees up resources, not only for DEVONthink, but the rest of the system. There is no benefit to having a bunch of unused databases open all the time.

The search field only searches the current database. Tools > Search is for all open databases.

And yes, See Also / Classify only functions within a database. This may change in a future release.

Thanks Jim.

I have a lot of PDFs and web archives. How do those relate to your information above about size versus word count?

The text content of both formats will contribute to the number of words in the database, hence the index will grow and require more resources.
Due to compression, PDFs may not be large regarding file size, but the number of words is the more critical factor. For example, I have a 2MB PDF that has approx 530,000 words in it.
Webarchives are often larger but contain fewer total words.

Thanks Jim, makes sense.

I checked my main database (I have 3 in total but one is the primary one with which I’ll be interacting regularly). For that database I have 7.4 million total words, 193,000 unique words.

So would a reasonable strategy be to keep that as is for now and then just monitor it and when it approaches the limits you reference above, to then split it up to keep the total word count below those limits?

It seems that to maximize both search and AI usefulness that fewer databases are better than more, and that one of the main reasons to break up databases is to avoid performance hits. If I’m understanding that correctly, then the approach I propose above seems reasonable.

thanks for your assistance here, much appreciated.

You didn’t mention what your kit is == how much memory, etc.

I regularly (10 hr / day / everyday) have six databases open whose collective size is 12+ GB and contains 89 million words (20% of which are unique). For me, Search is rapid, highly reliable, etc. I’m on a machine now with 16 GB memory, but have not always been over the years that I have been running these databases in DEVONthink. I’m pretty confident that even if I grew the number and size of databases open at one time by 50% or me, that the performance would be great.

BTW, don’t forget that databases do not need to be opened to be searchable. If you enable Spotlight indexing for a database (in database preferences) then your data will be available for searching in Spotlight or any other app that uses Spotlight – such as Houdah Spot, FoxTrot, etc.

Yes, though the amount of “monitoring” would probably be minimal. The limits I mentioned are not hard and fast. They are general guidelines, but ones that are performant but reasonable. Also, due to the difference in resources on a machine, not only installed RAM but the ever-changing availability due to apps and processes running and expiring, it’s a guideline, not a rule.

I wouldn’t say this as an axiom, as it depends on what you’re doing. “Smaller, more focused” can still produce a larger database, but I generally do not suggest having more all-encompassing databases. Even a minimal division of a personal and a work database is a good idea. However, I suggest even more granularity. For example, I have a financial database that’s separate from my personal database, even though it’s technically “personal”. I also have multiple work databases that exist outside my main work database.

thanks guys. I’ve got a 4 year old macbook pro with 16 gb of RAM and a 500 gig SSD drive. I keep looking for reasons to get a new laptop and just can’t justify it. This does everything I need, and very well.

I appreciate the advice. Right now I have one database for reference materials (work and personal and I might split those up but there’s enough overlap that I’d like “see also” functionality to be able to see both datasets at the same time. Then another for financial statements, receipts, etc…, and another for email archiving if I can ever figure out a way to functionally use it now that I’ve switched off Mail and to Airmail 3.

I guess one other consideration on this would be if there’s any strategies that don’t make much difference in the OS platform but would impact usability in some way on my iPhone with Devonthink To Go. Any thoughts there? I guess what I’m really going after is a way to structure my databases such that I’m maximizing functionality both on the OS and IOS platforms.