Optimal size for search/Magic Hat?

I clip almost everything I read on the Web to one of two folders - one for potentially work-related articles, the other for everything else. I have broad interests, so “everything else” covers a lot of ground.

That results in a couple of big databases. My work-related clippings go to a group in my work database; that group now has 369 items in it. Everything else goes in a “Clippings” database, which now has 442 items in it. I’m a relative newbie to DevonThink, so I created these databases April 6.

Is this the optimal way to organize these databases, from a search and Magic Hat perspective? Will I get the best results this way? Are there other gotchas I may encounter? I can think of one possibility - the databases might get huge over time and I might want to watch out for that. Is that a concern?


You’d have to define “best results”. See Also won’t work across databases, so if you require connections between work and non-work files, you’d need to use one database. On the other hand, the full search (see Tools > Search) searches all open databases.

Personally, I advocate smaller, more focused databases for better performance, faster syncing, data-safety in the event of a catastrophe (avoiding the “all your eggs in one basket” problem), and the ability to close unused databases.

Thank you, Bluefrog!

A reasonable question. The problem with answering it is that I’m saving these documents “in case I need them for something someday.” So it’s hard for me to figure out what the use cases are. However, here are several that I can think of:

The primary use case - and this is the reason I went with DevonThink rather than a cheaper, alternative Finder replacement, is to find serendipitous connections. I’m a business journalist by trade, covering the technology industry. If I’m writing an article about, say, a new Google announcement, I want DevonThink to be able to remind me that this is similar to something Oracle did six months ago, which I completely forgot about, but which DevonThink remembers because I saved a clipping about it back then.

Two other use cases that come to mind:

  • I know I read an article on something a couple of months ago, and I’m trying to track it down again. I don’t remember the author, or where it ran, and the keywords are likely to be common in the entire corpus of articles I save. I do remember the gist of the article, though.

  • Looking for articles on a particular subject. Like Google, but based on my own corpus of articles I saved over the past few months.

I can see based on your suggestion that it might make sense to simply divide up the groups into separate databases based on broad topics . Does DT make that easy to do with an existing database/group of hundreds of articles?

Yup, that’s why my “work” clippings file is a group in my work database, and the non-work clippings are their own database.

But if I have separate databases for clippings from my work documents, I won’t be able to use the Magic Hat/See Also for work documents, will I?

Related: Does it help with Magic Hat to use tags and to classify articles in separate groups?

Thanks again!

This is a difficult thing to answer, as “easy” is a subjective term. The AI could be employed, but it has to be trained (like any good assistant). This means it would not be very “accurate” until you’ve already curated to some degree.

No, See Also & Classify will not work across databases.

Nope. The AI is based on content for making See Also connections. It disregards filenames, locations, and metadata. (The location is used in Classify, but makes connections between content and location to make filing suggestions.)

This is very helpful.

I think I’m probably better off selectively clipping snippets rather than just willy-nilly archiving pages.

This is particularly true of how-tos. For example, for a while I got really into coffee, and began clipping articles like mad about how to make great coffee. Seems like everybody who writes a how-to article on making great coffee feels compelled to start with a few paragraphs of enthusiasm about how awesome coffee is. Which is just a bunch of noise. So it pays to just clip the parts about how to make great coffee and leave the noise behind.

It’s not just coffee. It’s everything.

Indeed, being more selective in what you clip or put in your databases will have an effect on the results. On the other hand, some of the “charm” or “surprise” in See Also, is it can make connections we may not have made. This could even be because of the “noise”. But ths would be more useful in organic, non-focused ways.