What is the optimal folder structure for a large database?

rmathes23 · January 12, 2005, 3:48pm

I’m a new DT/DA user and as my db is growing I figured I’d ask if there is an optimal folder structure for larger databases.

Right now, i’ve got about a dozen first level groups. i can see that growing to maybe a couple dozen, but probably not more than that. from there i add groups vertically/hierarchically. Not sure how deep i’ll drive that folder structure. DT’s superb search functionality obviates the need to use folder structures to enhance retrieval ability, so i’m thinking it just needs to be a structure that allows me to make sense of the search results returned.

i’m also thinking that if i start slicing the groups to thinly then i’ll decrease the volume of docs in each group, which seems like it will degrade the quality of auto classification.

any thoughts on this would be appreciated.

eboehnisch · January 12, 2005, 8:28pm

Generally, the fewer documents you have in a group, the better the auto-classification will work. Of course, this depends on the quality of the documents and on how sharp you can slice them up in different groups. We recommend not to mix groups and documents in one group.

Best,

Eric.

Maria · January 13, 2005, 12:02am

Good to know!:idea:

Maria

rmathes23 · January 13, 2005, 3:18am

thanks for the response.

so the way i’m interpreting that statement about mixing docs and groups is that if a group has sub groups, then ALL docs in that group should be nested within one of the subgroups and not sitting at the group level.

for example…

if I have Group 1, and subgroups 1.1, 1.2 and 1.3, all docs under group 1 should be in one of those three subgroups. I shouldn’t have the subgroups and then also have doc 1, doc 2 and doc 3 at the Group 1 level. Is that right?

When you say fewer docs = better auto classification, can you give me an idea regarding scale? How many docs before this functionality starts to meaningfully degrade? Or put another way, is ‘too many docs’ 10, 100 or 1000?

My database is pretty small at this point (especially compared to some of the databases I read about here at these forums) and I’ve been extremely impressed at how well the auto classification is working.

Timotheus · January 13, 2005, 6:08am

But why this recommendation not to mix groups and documents in one group? Does this kind of mixing have negative effects only upon the auto-classification, or does it also have other negative effects (like, for instance, slowing down the application while searching)?

eboehnisch · January 13, 2005, 8:15am

Exactly.

You can’t tell this exactly. It all depends on your data. The better DEVONthink can distinguish between the docs in each group, the better the classification. So, e.g., put all docs about aquarium fishes in group “Aquarium”, and all docs about car maintainance in group “Car”. If you find that you can even sub-divide these, for example into “How to feed fishes” and “Fish types”, do it. The better the granularity, the better.

Best,

Eric.

eboehnisch · January 13, 2005, 8:17am

Yes, this mixing has negative effects on auto-classification. DT then simply doesn’t know if it should put documents into the group itself or its sub-groups. And no, it does not have any other negative effects, e.g. on performance.

Best,

Eric.

rmathes23 · January 13, 2005, 11:28am

thanks Eric, very helpful stuff.

could you address the point regarding folder structure?

for example, does DT search/auto classification work better if your db is 20 groups wide and 20 groups deep, or 200 groups wide and 2 groups deep?

or does it make any difference at all? my guess is not, but just want to check while i’m still early enough in the usage stage to modify my usage to fit the app’s preferences.

eboehnisch · January 13, 2005, 11:35am

No, it should not make any difference. From my experience, a depth of four to five levels is good, but more for human convenience than for auto-classification.

Best,

Eric.

ChemBob · January 13, 2005, 1:54pm

OK, then if this seems to beg another question. What about when you import a folder from the finder? I have some very deep folders of work from the past 17 years or so where there are very often folders at some levels containing both other folders and some documents relevant to that folder level. I could rearrange the structure in DT I suppose, adding groups to subsume the loose folder documents, but wouldn’t that mess me up for synchronizing? Also it would be a huge amount of work, whether I reorganized all this work in DT or in the Finder before importing. Any suggestions? Just how big a deal is this? (I’m wondering if ultimately the better organization would be worth the time and effort to manually organize it.)

Thanks,
ChemBob

eboehnisch · January 13, 2005, 5:53pm

Yes, this would mess around with sync’ing. As DEVONthink is more an information manager than a file manager, I would recommend not to import everything from your 17 years of folders, but selectively transfer those documents you want to have in DEVONthink to your database and take the chance for a reorganisation. Again: DEVONthink is not a sophisticated file manager, it’s a database, after all. And we usually do not recommend keeping both, documents in your file system and in the database. Even if we’ve just introduced a sync command, we do not really encourage the use of it.

Best,

Eric.

ChemBob · January 13, 2005, 9:35pm

OK, now I am confused. I put all my old work folders into DT because I’m a research scientist and have worked on a gazillion (seems like) projects over the years. During that time my naming conventions for files and folders wasn’t always perfect and, frankly, I’d forgetten a lot of what I’ve done. I wanted to rapid searchability and access to this information because there is a lot of stuff in there that I did that can be used in current proposals to clients, etc. I figured DT could rapidly ferret through all my old files for work I’ve done relevant to the current proposal topic and I could grab examples and experience highlights and slap them into the proposal. Now, when the new proposal is written, submitted and filed away I wanted it to also serve as an info source for potential proposals in the future, so I wanted it to be in DT also. So say I have a folder with the proposal, some background materials, an outline in Word (or NT or whatever), some web page links, etc. I’ve got it all in DT and I change the outline in Word at the request of the client. Wouldn’t I want to sync DT to the folder again to get the changes so that I could, if need be, search the database and have the changed version show up rather than the outdated one?

You say you don’t recommend keeping documents both in the file system and in the database but I thought the DT manual actually recommended the “copy files into database” option. Am I misinterpeting what you meant in your post, above? And you’re saying you don’t recommend using the sync command. Why? What can go wrong?

Sorry if these are stupid questions but one of the main uses I have for DT is the rapid searching and ranking of my old work to help me build new proposals and new manuscripts based on past projects.

ChemBob

Bill_DeVille · January 13, 2005, 11:41pm

I’ll let Eric respond to your questions, but I need to make a point: Do not import package documents such as NoteTaker files directly into DEVONthink (especially if you plan to delete the original notebook from your drive). DT imports package files as folders and files, and that can cause problems such as ‘lost’ files. Bad karma!

If I want to include a whole NT notebook in DT, I first convert the notebook to a PDF file, then import that to DT. Often, I’ll substitute the path for the original notebook for the path of the PDF, so that I can open the original notebook under NoteTaker from within DT. Or I may export the notebook as a .doc or RTFD file, and then import that file into DT.

So: there are file types that should NOT be imported into DEVONthink – but there are still ways of incorporating searchable text from such files into DT.

eboehnisch · January 14, 2005, 4:21pm

When you’re mixing documents that you have to edit outside DEVONthink, e.g. QuarkXPress layouts or Word documents, then, yes, sync’ing is the way of doing it.

Generally, when you have only textual information and, say, PDFs, I would recommend them all to stay either in the database itself or at least in the database folder, not anywhere else in the file system. As always with syncronitation, there’s always a good chance that you move something around and the sync’ed party misses exactly this.

I meant: It’s always not very straightforward to keep copies of files in folder XXX AND a copy of it in the database. The database folder is managed by DEVONthink and so I see it as part of the database (in DT Pro it will be e.g. part of database packages).

You could move the folder in the file system and the link will break. Of course, you can repair this. It’s just one more factor you have to think about when you do something. In general, the syncronization command is very robust.

Best,

Eric.

Maria · January 15, 2005, 12:48am

Obviously, these are not stupid questions. They are related to the problem of how to use DT most efficiently - as regards the time we have to invest to get it working as well as regards the results we get from DT.

I am in a similar situation like ChemBob, but I do not only have a folder structure that grew about 15 years, it changed more than hundred times, contains several versions of the same file and several identical backups as well as a variety of file formats that will vanish soon from real life. In this situation, I do not see any other solution than getting through all my old back up CDs (fortunately, there was a tendency not to produce so many files in earlier years of computing ) and import them as PDFs into DT. After that I eventually can discard the Classic environment and start a more effective backup strategy: backing up DT databases only .

In terms of efficiency, this is time consuming in the realm of preparing DT for work, but I hope for a better performance in my work later. Writing this I got another idea: Try the link command to have DT do the work of comparing files etc. in a new database which will just do the job of cleaning my CD backup collection, before the results can be imported into my main database. If it works, I will report on that. Thanks ChemBob for your inspiring comment!

Bye,
Maria