Evaluating DT / Some questions

Uniqueuser · January 16, 2010, 4:51pm

I’m currently evaluating DT if could be a useful tool for me but I’m not sure yet. Mostly because I might not have fully understood all concepts yet.

So, here is what I would like to do and how normally work:

I don’t want any tool to hijack my files, mails, etc. If, the tool should use the files where they are and just add additional information that it needs.

I understand that DT can do this with files. Is it possible with Mail as well? I want to keep all my Emails within Apple Mail, as I use a bunch of accounts with intelligent postboxes etc.

I don’t want to tag any files. Everything the files are about is contained in the files. Hence, the tool should extract whatever makes sense. I think DT has this ability. But, what’s the difference to using Spotlight indexing system?
How fast is DT on a databasis that consists of 200.000 Emails and around 200GB of documents? I tested some other tools and most just choke on these numbers.
What’s the benefit of having DT do categories based on the file path? I can use a file manager to access files like this.
I’m not a fan of using folders, groups, etc. at all because all this has one problem: It’s a hierarchy that should be MECE. But this is not how the world looks like. A hierarchy needs to be perfect in that evey piece fits.

Anyways, my concept is simple: I have bunch of data, and I need a way to layer context levels above it. Fast, and flexible. Using some rules, AI, queries… what every can filter, select the files matching these criterias.

And than a way to traverse the result at high speed, getting previews etc.

Bill_DeVille · January 16, 2010, 10:09pm

Thanks for a post that raises some interesting and important questions.

I’ll start first, though, with your item 5. Every now and then I see a comment that hierarchical structure in organizing things, whether documents or boxes of screws on a hardware store shelf, is ‘bad’, perhaps with a note that this isn’t the way the universe “is”.

You mentioned the MESE Principle, which basically states that, when organizing things into groups the groups should be divided into subgroups that “comprehensively represent that group (no gaps) without overlapping. This is desirable for the purpose of analysis, because it avoids both the problem of double counting and the risk of overlooking information.” (Quote from WikiPedia.)

The MESE Principle has been adapted into management analysis from another discipline, mathematical logic (set theory), where it completely makes sense. It has become something of a fad in management theory, in my opinion, and when applied in that discipline can easily lead to results that make little or no sense. I happen to be critical of another management “principle” — the “Towers” concept in analyzing organizational structure in terms of functions — and of the “Precautionary Principle” in environmental management of technologies.

DEVONthink allows the user to cluster items into groups, or not. It’s optional. DEVONthink allows the user to create subgroups within groups, subgroups within subgroups, and so on; its optional. The user may create a completely flat (non-hierarchical) database, mix flat and and grouped items in a database or use any degree of hierarchical structure desired. It’s optional. If desired, replicants allow filing an item in multiple groups.

I’m not al all bound by the MESE Principle in classifying information in my databases, and if DEVONthink were to require that, I would consider that a silly waste of time and effort with little if any practical return. If, however, I were cataloging metal screws in a database, I would find it useful to group (or tag) items by screw size, and in subgroups by composition — brass, stainless steel, etc.

The reality is that in interacting with the universe we often find it useful to classify some things as related (groups) and to view some of those groups hierarchically (animals > mammals > canines > dogs). For some purposes we may be satisfied with rough “clusters” of similar items, and for other purposes we may need more detailed classification of similar items.

I work with topically designed DT Pro Office databases. My decision as to which database will contain a new item is a form of classification or grouping. My main database contains at any time thousands of unclassified items, and a mixture of ‘flat’ groups and groups that contain subgroups. Often, when working on a project, I will add more detailed structure to some of my documents, and perhaps remove that added structure after completion of the project. For my own convenience in thinking about the content, most items end up in groups.

Because searches and See Also do not look at my organization of documents within a database, and these are among my most useful tools in finding and evaluating information content, I spend only so much time and effort in grouping and subgrouping content as I find useful for other reasons. Yes, for some topics I’ll spend time creating a hierarchical structure, as that may help me understand the relationships of information — even though a search or See Also operation would find those items anyway. Once I’ve created a group and populated it, Classify becomes useful to file new content into that group.

I view tagging similarly. I think all tagging systems are logically unsatisfying and operationally inconsistent and inefficient, so I never bother tagging content a priori (as it is added to a database). In the course of working on a project I will sometimes add tags to project-related items, but often remove those tags when the project is completed (I sosmetimes find that previously assigned tags may be an impediment in trying to take a different perspective for a new project).

Uniqueuser · January 18, 2010, 7:29pm

Thanks for your detailed response.

I’m a context person. This means, all information pieces, would be grouped/tagged/you-name-it differently depending on the context.

And one information piece can make a lot of sense in different contexts.

I understood that I could do this using tags. But I need to add them manually. But the amount of stuff I have to handle is just to much to do this.

Using an iterative approach like you described is interesting. IMO this leads more to a context-specific collection, that has a due-date. After the context is no longer needed, the collection can be cleared.

The collection metaphor suggests that DT can be used as a “better” local search machine (working against a private set of information pieces), helping to find related material, which than can be grouped together for a specific context/project/…

That’s a interesting idea. The question is: How much better (= time do I safe) using a multi-staged approach like this:

I search the internet. Using DT to keep, and manage everything I find interesting while searching something. This builds up my DT database.
Im adding all kind of documents to DT as well. Plain, flat, as is. DT will be like a Spotlight on steroids.

Now if I need to find something I do:

Search using DT against my local knowledge base. Group and cluster stuff as collect together relevant information pieces.
If I still have not everything I found, I add an internet research round. Filling DT further.

And the question for me is: How much more work is this, and is the result of using this approach better/faster than either only searching the internet or using something like Spotlight against my document collection?