flat hierarchies

mtheoryx83 · August 14, 2006, 7:23pm

Since DT/DA have the AI funtion, and the ability to formulate very precise queries, doesnt it make some sense to ignore groups and just maintain a database using a flat hierarchy? (ie gmail and del.icio.us)

I have pondered this question for a while. I can certainly understand a massively large database could benefit from groups and smart groups. But when a person just starts out adding items into the database, a flat hierarchy seems best. Is it possible to just add all the massive amounts of documents, then somehow let dt decide how best to group them together, if at all?

It is my understanding that the reason people are using the groups is simply to facilitate DT’s AI in searching for relationships. But, the main benefit of DT’s AI, as far as I understand it, is to show you relationships you may not have thought of. So, by using groups, and a general old school folder structure, are you not limiting the very function of the AI we paid for by restraining things into groups?

Just wondering, as I am beginning to dump huge amounts of info in one of my new databases. I will be primarily using DTPro and DA for schoool and research, and dont want to limit any new and exciting relationships the AI can suggest to me. Oh, and im pretty lazy, so i dont really want to do the old school method of creating a ton of folders and sub folders and sub sub…well you all remember Windows 95, so, yeah, i hate all that.

Appreciate all the advice, and look forward to others’ insights into this “philosophy of piling up my digital crap”

jmck · August 15, 2006, 1:37am

I’ve been wondering similar things lately, just because sometimes simply being able to see all those folders full of stuff tugging at your attention is distracting.
The massively increased amount of information we all have available to sit on does make it hard to keep one’s piles straight. DT’s AI is I suppose one answer to this question – let the machine do it – but while this may be great for research material, it has less relevance for the more mundane GTD usages which I’m mostly interested in.

In favor of groups and (user defined) hierarchy, I will say that hierarchy is meaning to some extent, so keeping an enormous flat DB is depriving oneself of a vector of meaning…

edf · August 20, 2006, 3:36am

It certainly would make sense to have a flat hierarchy for all of the data, and to represent the order of the data as a tree structure that is generated dynamically based on the contents of the database.

In the case of DT, though, this would not work. My understanding is that the groups are used to generate the relationships, i.e. placement of an item in a group is a classification of that item. There is no way for the user to specify a starting set of categories and relationships – the AI would merely be able to show documents that are similar, not which category a document belongs to.

One methodology to experiment with would be to store all files in a single, top-level ‘all files’ group, then build a hierarchy of groups which contain replicants of the original file. A script could then be written that would automatically create replicants (in the correct locations) of items inserted into the ‘all files’ group, based on the classification algorithm.

Thus the original set of replicants and groups defines the categories of the data and traiuns the AI. By using a third top level group to duplicate newly-added replicants (e.g. the top level would contain All_Files. Classification, Classification_Log), and allowing the user to correct the classification of replicants in the group (really just by deleting the auto-generated replicants and manually creating replicants in the correct classification), one should be able to train the AI in the same manner as a neural net is trained.

Bill_DeVille · August 20, 2006, 7:24pm

As I sometimes create new databases by dumping over thousands of documents from DEVONagent searches, I’ve tried a variety of approaches to try to add some organization to the material – not just for myself, but for others who may also look at it.

Sometimes I’ve made copies of such a “single-level” database and tried various approaches to organization of the material. Of course, I’m more likely to spend effort on this if the database is intended to be retained and have additional content added in the future.

One approach is to do some searches and then replicate the results (or selected “high-ranking” results) into new groups created for that purpose. With some thought, planning and a bit of grunt work this can carve out a hierarchical organization that makes some sense to me, and that DT Pro’s Classification feature can begin to recognize. Usually, though, despite my best efforts, I’ll end up with a gaggle of still-unclassified items that, for lack of anything else to do, I toss into an “Unclassified” group for possible future evaluation – and I go to the Info panel of that group and tell DT Pro to ignore it for Classification purposes.

Later on, especially if I’m continuing to use the database with further additions of content and some manual “forcing” training of DT Pro I may make another duplicate of the database and see if the material in my Unclassified group will now be classified by DT Pro.

Still another approach, usually mixed with the above approaches, is to select a number of unclassified items and invoke the Auto-Group command. DT Pro will then try to group together documents with related content. The result is, of course, not hierarchical. Usually DT Pro will still leave ungrouped a number of the seleclted items.

Depending on the textual content of the items being Auto-Grouped, the results can range from useful to frustrating. In the best case, some of the new groups will really be useful, perhaps “smarter” than I might have been in seeing relationships. In the worst case I’ll end up with a large number of groups that contain only two or three documents and that would take a lot of work and time to evaluate and use.

I find that Auto-Grouping usually works best when I’ve already created a group with a sizeable number of contents and would like to make that group “finer grained” as the top level with sub-groups.

Hacking out an organizational structure as above results in a lot of replicants, especially a lot of straggling, unclassified replicants. If i’m designing a database that I’ll maintain for a good while with the expectation of adding more content, I’d like it to look better.

So i’ll select just the well-organized stuff and export it, then import that material into a new database. That new database will be the basis of a continuing and growing topical reference collection.

The oganization resulted from a combination of manual design and content movement, auto-grouping and “training”. As the organizational structure becomes more defined, the database begins to interact with me and participate in it own further organization.

If new content being added is sufficiently similar to the existing content (my databases are topical) the database can suggest where to ‘file’ the new content. At some point I may turn on Auto-Classification and let the database make most of those filing decisions.

Confession: none of my databases is completely organized. At any time there may be nundreds or thousands of unfiled items. Some groups are well-organized into a fine-grained structure of subgroups. But some groups are more like “catch-all” containers, with only rudimentary organization. I don’t have any “static” databases; all of them continue to grow and evolve.

There are payoffs in spending some time and effort to organize – or at least partially organize – database contents. I’ve got two objectives: when looking at the database structure it should “tell” me about the topics of the documents; and I want to hand-off most of the responsibility for filing decisions to the computer, as I hate that job.

As much as I hate filing, I hate tagging even more. Various forms of tagging get attention in this forum, and there are proponents of this or that form of tagging. Tagging is some sort of extension of the concept of classification into a hierarchical system of “folders”, by adding a keyword, color, state or other “mark” to an item.

But tagging is primitive. It’s what we had to do before the days of computers, and tagging evolved as special techniques in the early days of computing, because all databases were dumb and depended on tagging to find anything at all.

And tagging is limiting and inconsistent, especially in the context of the documents in my database. My database includes documents such as Darwin’s Origin of Species, Lynch and Conery’s paper, The Origins of Genome Complexity and David R. Liu’s paper, Translating DNA into Synthetic Molecules. As it happens, my database has a group named “Evolution” and DT Pro’s Classify suggested that I file each of these documents into that group when each was added. Not a bad suggestion and I accepted it, although one of those documents is also filed in two other groups as well.

If I had also tagged each of these documents with the keyword Evolution that would be OK, I suppose. But there are very important differences in content among the three. Darwin’s paper is important in the history of science. At the moment, that distinction can’t (yet) be applied to the other two. One of them involves discussions of geophysics and geochemistry. Two of them involve molecular biology. One has important conclusions about synthetic chemistry and synthetic biology.

Of course, after reading and thinking about each of these documents, I suppose I could come up with some tags that distinguish each of them from the others. There are hints of such tags in the previous paragraph. But each of the documents is “richer” in content than just a few tags could describe. Some aspect of a document may be important to me in one context, but a different aspect may become important if I’m researching a different topic. If I become dependent on tags, it’s likely that I’ll keep adding or modifying tags continually.

Tagging is time-consuming and takes effort. I may add batches of thousands of documents to a database. I’m simply not going to bother with tagging individual documents except in special cases, even using scripting approaches. I might consider “smart groups” and search results as an extension of classification, and so as tagging, but that’s about it.

Do I do tagging in special cases? Yes. I may mark a document by adding a comment about it in the Comment field, such as the fact that it’s a citation in an article I’m writing – or I might do that in a separate note that links to it as a citation. If it’s a draft, I may mark its State as unfinished, then either mark it as finished or clear the State when I’m done with it.

One can also use hyperlinking to mark or “tag” relationships among documents, including Wiki linking. When I’m writing an article I may start by doing a List outline on a Table of Contents page, then link each component of the List to a document that’s a section or subsection of the project. And I may use an associated file to link to citations that will be used as footnotes or endnotes in the finished article. I’ll use (probably temporary) State or Label tags in progress so that I can quickly check the status of the project.

But if I don’t do general tagging, can I still get “tag clouds”? Of course, and in ways not inherently limited by any tagging scheme. That’s why I love DT Pro. When I do a search or create a smart group, I’ve identified documents that have some commonality. When I do a See Also operation the suggested list has some commonality of contextual relationships. Or when in a rich text document I select a word (shall I call it a keyword?) and press the Option key, DT Pro shows me a list of documents that also contain that word. And of course there are still other ways I can use DT Pro to show me some sort of commonality between documents or portions of documents.

What really thrills me when I’m researching in one of my databases is when DT Pro helps me discover a relationship between ideas that’s new and useful to me. Not only does it not require tagging to do that, I suspect that tagging would actually hinder the process of discovery.

edf · August 30, 2006, 5:56am

I stumbled upon a great application for AutoGroup just now.

I have 3 DevonAgent queries cronned to run once a week; the results get dumped into my Incoming folder in DT.

Usually these results contain huge numbers of irrelevant or redundant material. Usually aggregators and definitions from online encyclopedias; I’ve fine-tuned the queries quite a bit, but they still need work.

Anyways I just selected the entire contents of the folder and auto-grouped them, and found that it did a good job of grouping irrelevant documents together.

I’m going to experiment with this approach by creating an ‘Undesirable’ group and storing irrelevant links there, and seeing what auto-classify will do … maybe I can teach DT to weed out the bad hits that make it through my query.

–Eric

sjk · September 3, 2006, 2:11am

As an alternative to mimicking the hierarchical filesystem I’ve often wished DEVONthink used something like an iTunes/iPhoto Library storage model, with a kind of implicate replication of items in playlist/album-like virtual groups. More like Yojimbo, but with nested hierarchy support.

DT items that can easily coexist in multiple locations independently of where they’re “physically” stored seems like it would be a more natural, flexible way for me to organize and manage them. It would relieve a burden of vigilance about where items/groups exist that’s currently necessitated by their “physical” containment within a rigidly enforced hierarchical structure. Maybe that helps make it more a process of thinking where I’d want to put things than where I’d need to put them.

I’m not overly concerned about how and where the master copy of my iTunes tracks and iPhoto images exist in the filesystem while I’m organizing them into playlists and albums. And it’s nice not always having to consider whether deleting tracks/images or their containing playlists/albums is permanently destructive; the master content is preserved unless/until I explicitly want it removed.

I don’t know if a similar abstraction layer between storage and organization would really be effective for DT but the concept remains intriguing.

Oh, and it’s been interesting observing how much more comfortable and confident my wife, a technical novice, is with managing iTunes/iPhoto content than Finder files/folders.