Thoughts on database setups from exp. users

SamCo · March 27, 2019, 1:59pm

Hi there, after having used the trial for a while I bought DTPO, and I use DTTG. I would like to put my whole life in it. I would like to hear how other users have set up their databases and what works and doesn’t work for them?
I would like to know why not to put everything in one database, and where people find it useful to have separate databases.
Currently I have databases for things like personal, college, research…

Thoughts?
Thanks

cgrunenberg · March 27, 2019, 2:18pm

I’m using a similar setup and multiple databases (personal, emails, job, hobbies etc.). This makes it easier to e.g. use different sync and server (DEVONthink Pro Office) settings. In addition, I don’t need all this data at the same time usually. This improves also the performance, reduces the memory usage and makes See Also & Classify more precise compared to one huge database. And of course in case of a severe issue (e.g. broken sync store or damaged database) this causes less troubles & maintenance.

BLUEFROG · March 27, 2019, 3:20pm

As Criss already touched on…
Smaller, more focused databases will generally perform better, initially sync faster, provide more positive hits in search and AI functions, and be more data-safe in the event of a catastrophe (avoiding the “all your eggs in one basket” problem) .
They also give you the opportunity to close unused databases when you’re not using them. This frees up resources, not only for DEVONthink, but the rest of the system. There is no benefit to having a bunch of unused databases open all the time.

Your current approach sounds like a good one to me.

ghoetker · March 28, 2019, 9:37pm

In addition to the above helpful comments, I’d note that my organisation schemes actually change over time. For example, I have a mid-sized “Research” database. When I am working intensively on a given topic, I might move the relevant groups into their own database for ease of search. I can move them back into the main notebook when appropriate.

So, I’d encourage you to let your organisation system evolve with your needs.

korm · March 28, 2019, 11:02pm

I have one very large database that has been in use for years – it is my main database for a client I have had for years. As projects in that database age out, I transfer the data to an “archive” database for the same client so that I don’t keep old stuff hanging out in the main database.

The archive database is usually not open, but is Spotlight indexed so I can find stuff that lives there, via Spotlight, if I need to. Remember: Spotlight indices for your databases are really important.

Other than that I have an ongoing small database for household matters, and several other databases for personal research, and the like.

I also have a database I deliberately keep small as my main macOS <–> iOS database. Just easier to work the sync process that way, in my experience.

Hens · April 5, 2019, 5:48am

Korn wondering if you have ever made a thread about all your scripts?

I have read you have made lots of helpful scripts.

New here….
Wish there was a thread exclusively of scripts, list of users scripts, not like it is right now, all scattered around.

Hens · April 5, 2019, 6:17am

…Another question.

How do I go about archiving a database on an external drive? Can Spotlight indices work when I plug in the external drive?

korm · April 5, 2019, 8:56am

(will post an answer if requested, in a different thread)

BLUEFROG · April 5, 2019, 12:42pm

@Hens
Please start a new thread as this one is about setting up databases, not scripts and Spotlight. Thanks!

korm · April 5, 2019, 1:30pm

Oops, sorry. I deleted the post.

BLUEFROG · April 5, 2019, 1:35pm

Haha. No worries!

Hens · April 6, 2019, 2:34am

Yes I’m aware this had nothing to do with scripts, but since I saw Krom didn’t wanted to miss the opportunity, I have read about his posts he has contributed a lot here.

I didn’t wanted to private message him, he would have think I was spam lol

But you are right and sorry.

Hens · April 6, 2019, 2:37am

Thanks Krom, yes I think we should start a scripting thread lol

Yclipse · April 6, 2019, 12:13pm

One major consideration for me is how often I need to do a focused search. If I have a database that includes 30 different projects at various stages of development, I may have one or two that are large and very active. If I frequently need to search a word or phrase, the search will come up with all hits in the primary database. If it is important to limit the hits to just one project, it is better that it have its own database.

SamCo · April 6, 2019, 8:53pm

Thanks that’s really useful info.

SamCo · April 13, 2019, 3:50am

So now I have 17 databased open on DTPO, of between 1.4gb and 2.2gb in size, (There’s no comparative view by which to see the number files in each database). We’ve been talking about the speed benefits of many smaller databases open over one large database. 17, however, seems excessive and I suspect I would like more, so - is there a limit to the number of databases before the speed slows down significantly? Should I be opening and closing databases and keeping the number of open ones under say, 10 or something? That would slow me down more than the computer - so much so that I would rather have one big one to chuck everything into.
More importantly - is keeping many databases open going to slow down the process enough that I am going to know the difference?
Am I going to notice the speed difference between one or two large databases and 17 tiny>small>small medium ones?
Same question for DTTG…

Hens · April 13, 2019, 6:46am

17? lol

I think you should use about 4 or 6

I have 5 the ones I use most. Other DB I keep it external I open them when I need to.

DTTG you could purge and only download what you really need.

korm · April 13, 2019, 10:46am

Advice from @cgrunenberg in the past has been that it is the total number of “words” in the open database(s) that affects performance. I’m not sure where that was written now that we were migrated to this new forum.

I guess the other answer is “what works for you”? If you have bad performance on your machine, and it looks like DEVONthink is causing it, then shut something down. Everyone’s machine is different with whatever you’re running or doing. So, “you ought to do XX” is maybe not helpful. I cannot imagine what I would do with 17 databases simultaneously , but you can and if that works for you, then go for it.

Over here, my current set up when I launch DEVONthink is 7 databases comprising about 2 million words. But that info is about as useful as saying “I’m 6’1” and you should be too".

BLUEFROG · April 13, 2019, 1:44pm

17? lol

I think you should use about 4 or 6

Umm… I currently have 12 databases open at this moment.

I guess the other answer is “what works for you”?

@korm is correct in this statement.

There is nothing wrong with having 17 databases, if it works for you and makes sense to you. In fact, smaller more focused databases allow you to manage resources by closing unnecessary databases.

That would slow me down more than the computer

It would slow you down how?

so much so that I would rather have one big one to chuck everything into.

That’s your prerogative but you’re also giving up the benefits of smaller, separate databases.

saltlane · August 12, 2024, 2:26pm

Responding to the OP, I thought I would offer my simple set up as an example of how I use DT. I hope people find it useful. DT is extremely powerful and can be used in so many ways.

A good place to start is how I divide my data between databases.

Deciding on databases

It has taken a bit of trial and error to get my databases right, but data is easy to move about in DT on the road to getting it right. The key elements on deciding how to split my information were:

Do I want to sync the data?
Is it an ‘area’ of work where I need everything together?
Does the data share a common tagging schema?

Databases I have include:

Work (used for all work stuff)
Birds (holds lots of notes about my bird watching hobby - the feathered ones)
Newsletter (I write a monthly village newsletter)
Village Hall (I am on the management committee)
Reference (reference materials and personal stuff like car insurance documents, purchase receipts, vacation info, manuals, guides)
Thoughts (collection of quotes, snippets, and ideas)
Cookbook (collection of recipes)

Yes, I could have all in one giant database broken down into major groups as above, but the tagging schemas are different in several databases (see below) and the schemas make sense when in that database. Groups in the databases are generally only two or three deep, and I try to make group names understandable and as unique as possible. It stops (for me) lots of groups with the same name being confusing in a search list.

Do I want to sync the data?

Yes, I know I could sync everything everywhere, but why, when I don’t need it on my iPhone or iPad? There is plenty of room, but why sync gigabytes of data when I don’t need to? I don’t sync Work to my devices as it contains very confidential information. It could be safely encrypted etc. but it is far safer not to have it on my iPhone in the first place. I want Cookbook on my iPhone for ideas when shopping and and Reference for insurance details if I have a car accident. Newsletter stays on my MacBook as that is the only place I work on it.

Is it an ‘area’ of work?

An ‘area’ of work is what I am going to be doing when I open my MacBook, drink a coffee, and get going. Work pretty much speaks for itself. When working on the Newsletter, I only need all the newsletter information, like past issues, details for publishing, ideas for future issues, etc. You might argue that Birds could go into Reference, but I have a specific tag schema relating to bird classification and it gives me a classification view from the tag list without lots of other tags messing it up. Another consideration on grouping of data is if I want to use replicants (they only work within the same database).

Does the data share a common tagging schema?

I am a bit OCD and hate messy tag lists. Tagging in Work relates to the type of document (proposal, report, agenda etc.) and its status (final, draft, in review, waiting on someone). Tagging in the Newsletter is the month and the information’s source. For my Cookbook, the schema is ingredients. I don’t want to have pumpkin next to proposal in my tag list, which is why they sit in different databases. For this reason, I do not unify tags but have them listed in each database only. Another reason for not unifying tags is both Village Hall and Work have an agenda tag. Unifying, I got two agenda tags, and I was never sure which was which. When working in a database, it is easy to remember the particular tag schema for that database. Not keeping to a tag schema ends up being ‘garbage in, garbage out’ and having difficulty finding anything based on tags. Was it tagged vacation or holiday?

Workflow

Now I have my databases, how do I get information into them and process it?

The Global Inbox is my clearing house

All new information, whether from my iPad in DTTG or DT on my MacBook, gets sent, shared or dragged into the Global Inbox. Smart rules based on a tag then move it into the Inbox of the appropriate database for processing when I am working in that ‘area’. I can add these simple tags in the sorter when clipping, when saving in finder, when sharing to DTTG, or in the Global inbox itself.

Simple Smart rules

Simple smart rules look for the simple tag and move the item into the correct database, deleting the tag as soon as it is moved. So tagging an item newsletter moves it to the Newsletter Inbox. Similarly, cookbook into Cookbook.

The local DB Inbox

When I am working in an ‘area’ and have that database (and usually the Reference DB) open, I will further process anything in the database’s Inbox. Workspaces can be your friend here for opening what databases are needed (or just have the lot open). If you put a database in favourites and it is closed, clicking on the favourite will open it.

Items in the database’s inbox might be further tagged, moved into the right group (using the move shortcut ctrl-command-M or using ’see also & classify), or discarded to stop the garbage pile getting too big. On some databases, like Work, I have a few additional smart rules that try to auto tag based on the file name and/or contents when the item arrives in the database inbox. This allows for targeted rules appropriate to that database. For example; if the name contains ‘plan’ it is tagged plan. I did get excited with trying to automate as much as possible based on content, but got a lot of ‘false positives’. A report from someone might reference a ‘proposal’ and got tagged proposal when it wasn’t. Sometimes, human oversight is the fastest way to do it right.

There are lots of examples on the forum where people use rules to rename bank statements and such things when scanned.

My end of year review

Over the Christmas period, with a beer in hand, I will go through some of my databases (particularly Reference) and decide what can go. I don’t need the car insurance documents from three years ago or the manual for the old fridge. Doing the occasional review is good for keeping the garbage in check. It makes searching more efficient if the results list is not filled with irrelevant stuff. I think of my databases like my house. I don’t want them to look like the worst hoarders home on TV, with piles of old newspapers and junk everywhere.

In Summary

Overall, my workflow is simple. When I have an item and save or share it, I add a tag for the ‘area’/database it needs to end up in. When I am next working in that ‘area’ the item(s) are sitting in that database’s inbox, waiting for final review and processing. The final review stage
doesn’t take any time, ensures everything is correct and, importantly, makes me think “do I really need this?”