Top-level organization that's tidy, semantic, and scales?

merlinmann · May 27, 2006, 9:31pm

I’m happy to say that my DEVONthink collection is becoming increasingly large and useful to me. And with the addition of Smart Groups, I’ve developed some simple ways to bubble up the information I’m most interested in monitoring and mining regularly.

My question – which I suspect has been covered before – relates to Best Practices for storing and grouping information at the top level. What are the best options to consider in keeping things tidy yet DEVONthinkingly semantic?

Given that my topics of interest are becoming broad and deep, I’d thought it might be best to first store all my materials in general silos like “People,” “Concepts,” “Books,” “PDFs,” “Photos,” “Sites,” and so on, always dropping new info into a vertical category about “what it is” before further organizing (via replicants, tags, or Smart Groups) into “what it means.” Would that be useful, or would it just make Devonthink confused? (aka “Hmm, Merlin seems to think Benjamin Franklin, Captain Beefheart, and a bunch of fictional characters are related…”)

Is there a guideline for choosing how to set yourself up in a way that scales best for future grouping and classifying, as well as ensuring that you give DT the appropriate “hints” on what the relationships mean to you? I’m still pretty new to DT, so I’d be grateful for any advice on how veteran DEVONthink users have had the most success. (Or, of course, an RTFM link to other posts and tutorials would be humbly appreciated ).

Many thanks!

Bill_DeVille · May 28, 2006, 1:03am

merlinmann:

I’m happy to say that my DEVONthink collection is becoming increasingly large and useful to me. And with the addition of Smart Groups, I’ve developed some simple ways to bubble up the information I’m most interested in monitoring and mining regularly.

My question – which I suspect has been covered before – relates to Best Practices for storing and grouping information at the top level. What are the best options to consider in keeping things tidy yet DEVONthinkingly semantic?

Given that my topics of interest are becoming broad and deep, I’d thought it might be best to first store all my materials in general silos like “People,” “Concepts,” “Books,” “PDFs,” “Photos,” “Sites,” and so on, always dropping new info into a vertical category about “what it is” before further organizing (via replicants, tags, or Smart Groups) into “what it means.” Would that be useful, or would it just make Devonthink confused? (aka “Hmm, Merlin seems to think Benjamin Franklin, Captain Beefheart, and a bunch of fictional characters are related…”)

Is there a guideline for choosing how to set yourself up in a way that scales best for future grouping and classifying, as well as ensuring that you give DT the appropriate “hints” on what the relationships mean to you? I’m still pretty new to DT, so I’d be grateful for any advice on how veteran DEVONthink users have had the most success. (Or, of course, an RTFM link to other posts and tutorials would be humbly appreciated ).

The more contextually related a group of documents is, the easier it will be for DT Pro to “see” the contextual relationships between those documents.

The user starts the classification system in the database by clustering documents the user perceives to be related into groups and subgroups.

Let’s start with your silo concept. Create a group named People. Don’t put individual records into the group. Instead, create subgroups, perhaps one for Famous Americans. In that group you could include other subgroups such as American Independence, in which you would include some documents about Benjamin Franklin. And another subgroup named American Scientists, in which you might include references about Benjamin Franklin’s theoretical studies on electricity (yes, he did much more than fly kites).

Still under your People silo you might create a group named Fictional Characters, then subgroups as you wish under that, perhaps distinguishing plays from movies from from children’s’ stories from cartoons, or whatever makes sense to you.

The silos you described will often overlap. You might end up placing a new book about the life of Benjamin Franklin both under your People silo and your Books silo, into an appropriate subgroup in each. As your database grows in size, DT Pro will probably make recommendations for such multiple classification once in a while.

In any organizational structure, the better you break it out into clusters of related documents, the more likely it will become that DT Pro’s Classification recommendations for new additions will make sense to you.

I would question using “PDFs” as a silo. PDFs are documents about something, rather than topics in themselves. If by Sites you mean a collection of URLs to web sites that interest you, that’s perfectly appropriate, but it’s just a collection of bookmarks rather than an organizational scheme for documents (I’ve got a Bookmarks group with subgroups such as Scientific Journals, etc.).

I’m not quite sure what you mean by a Concepts silo. But if you understand what you mean and come up with a systematic classification scheme it will probably work for you.

The more clear and consistent you are in working out your organizational structure and following it in starting your database, the more likely it will become that DT Pro will “see” your intended relationships between documents. The larger your database becomes, the better DT Pro will become at suggesting locations for new additions.

eiron · May 28, 2006, 10:33am

I have to disagree with Bill here, which doesn’t happen very often. By dividing your “people” folder into sub-subjects like “American Independence” you’d just be fooling yourself that “People” is a relevant category. It seems to me that DT is going to want to put all documents related to “American Independence” in that folder regardless of whether it’s parent folder is “People”, “History”, “Myth”, or “Things to thank the French for”. And I doubt you’d want to bloat and confuse your database by having separate “American Independence” folders in each of your categorical silos.

What I would do in this case is use some form of metadata(labels, comments, wikiwords inside texts etc) for your categories and use searches and smartfolders to keep track of them. That said If you really want big high level silos I see no harm in it as long as you check “exclude from classification” for all their folders and subfolders. Correct me if I’m wrong, Bill, but it seems to me that once a folder has been “excluded” Ben Franklin and Captain Beefheart can frolick together without confusing the neighbours.

All you need do is pick your Ben Franklin document, press classify, command click on all the appropriate folders in the sidelist (Famous Americans, Francophiles, proto 43folders junkies, ThisWeek, ToRead etc), press “move to” and DT will put a replicant in each of the selected folders. One caveat: the “Move to” button does indeed move the doc out of its original folder, you must also select the doc’s current silo from the list if you want to keep a copy there. It’s a pain but it might be best to start by creating a replicant in the appropriate silo, and then pressing classify. I’d prefer a “replicate to” option, but such is life.

I’ve been working on an approach that’s a bit different:

I import, clip and gather to my InBox folder using scripts.
I “get my inbox to empty” by marking a source using coloured labels (Urgent, ToDo, High, Low, Reference, Personal, Junk (note the untidy conceptual variety) and filing all sources by topic with replicants in multiple topic folders within my “Library” folder. (Dramaturgy, Cognition&Theatre, VideoTech etc) using “classify” to help me place them. I don’t worry too much about the size of the database.
VITAL STEP: While reading a source I clip, copy and write snippets of text (which are often in the Steven Johnson 200-500 word sweet spot), and annotate them with multiple keywords - either in comments or in the rich text itself (using my own annotation text style which I have keyed to F1.)
These keywords might be as conceptually different as “PlayReview”, CoolImage", “Good Metaphor”, “act2”, “Milo’s Apotheosis” & “Duh” - the sort of thing I write in the margins of books. I try to keep it freeform and generate keywords quickly: like doing a little text mindmap of all the links to an idea that my brain burbles up. This supplements the text of the snippet, giving the DT AI more to work with in searches and classification. It also has the added value of making me really think about why I find a particular passage interesting and has a way of forcing me both to categorize my thoughts, yet to keep those categories flexible. DT is pretty smart and it allows me to be a bit sloppy here; and I’m rather enjoying the creation of what I think of as Fuzzy metadata.
When I’m doing general research I might look in the whole database or even restrict my searches to the source material. But when I’m writing or trying to define what I think on a subject I try to narrow my searches to the snippets/notes alone. That way I’m focusing on what I think is interesting. By using mashedwords for some keys, I further narrow the links/searches down to those bits/snips/texts that I’ve duly annotated.
I don’t worry too much about how I file these snippets; I have huge sloppy folders full of them. I let keywords and searches do the heavy lifting. Often these snippets are single concepts, tools or ideas to which I give simple names or aliases. I then can wikilink to them automatically while brainstorming, fleshing things out or actually writing. Generally speaking the snippets automatically carry their source info, but even if they don’t it’s not too hard to pin down the original with a search.

All this would be too messy if I didn’t trust DT’s ability to quickly find a good list of related data in a number of different ways.

In short DT’s multiple strategies allow both order and disorder- thereby allowing me to focus on results rather than method. So try mixing up your strategies and don’t worry too much about the “how” for now. With a little work, DT will always be able to find the “what”. Tidyness is a means , not an end.

Bill_DeVille · May 28, 2006, 3:32pm

Hi, eiron:

I’m in general agreement with your criticism and your approach.

The reality is that DT Pro will accommodate a wide range of organizational schemes, from rigidly structured and maintained to pretty sloppy.

I spend little time thinking about organizing material. I do create new groups for writing or research projects. But I depend a great deal on DT Pro to support my research needs by finding or suggesting information.

In A Connecticut Yankee in King Arthur’s Court Mark Twain wrote about the desire to hang the person who wrote the hymn In the Sweet Bye and Bye.

I feel much the same about whoever came up with the buzz word “silos” for handling organization and management issues. The approach is to lump things together so as to produce the smallest possible number of “silos” or lumps of things. I think it’s one of the dumbest fads to have appeared in recent years and have seen it produce organizational disasters time after time, because it arbitrarily focussed on superficial similarities and ignored profound functional differences. It’s one of those ideas that has a germ of truth, but is often abused.

That said, users are free to use almost any organizational approach they choose and DT Pro’s AI features will probably remain helpful.

Except in the sense that I create organizational groupings for certain types of information and for specific projects, I don’t bother with “tagging” because I deal with such a wide range of categories of information and with so many thousands of references that tagging could be complicated and time-consuming and continually require revision. But I depend a great deal on the ability of DT Pro to suggest items related to whatever I’m working on at the moment. That’s metadata that DT Pro and I create interactively on the fly. Rarely, I will create smart groups for a particular purpose. More often I’ll create groups containing replicants from search results, some of which will be thrown away after I finish a project while others may remain permanent. I prefer replicating search results to a new group because that allows me to inspect the results and throw out the ones I think are less useful; I can’t do that with smart groups.

And like Vannevar Bush’s Memex machine, DT Pro lets me write notes and create links between items that become a part of my database and add to its value.

I like freedom to look at my reference materials in new ways. That’s where DT Pro shines.

eiron · May 28, 2006, 3:59pm

Bill,
I’m not overly fond of silos either, I was just picking up on Merlin’s terminology without really giving it much thought. To me silos are just one more feature of what makes the Saskatchewan landscape a level of hell.

It would be nice if the kind of content I gathered allowed me to trust DT without metadata and tagging, but my needs are more artsy-fartsy than scientific, and thus come with less of a built-in system of classification. When an item in your database could be a scene from a Tom Stoppard Play, a snippet of dialogue from the guy at the market, a journal article on NeuroAesthetics, a play review one wrote or a striking piece of prose from the New Yorker (i.e. as notable for its Form as for its Content), you need metadata to identify its potential relevance. All these sources are very different, with very different vocabularies, only metadata can identify that they all have a common topic as vague as “regret” or all apply to a particular character in a play I’m staging.

howarth · May 28, 2006, 4:38pm

Very interesting points, Bill and eiron. My experience is that collecting data in DTP reflects the usual course of my thinking, which does NOT begin with large categories, but rather tons of gravel and silt, among which a few flecks of gold may from time to time wash out. (Mining is at the heart of my latest project.)

So it makes little sense for me to categorize by data types (pdf, url, rtf), though it might for others who are writing about media-message relations. I do use that method to distinguish between online and offline data. I have a hard time recalling the contents of file boxes or book shelves, so I make lists to remind me: Books in Hallway, Books in Endnote, Gray File Case, Original Trip Notes, etc. Searches will find those contents when they are topically relevant.

As the data accumulates, I compile many topical folders, and as the writing progresses, I merge those folders into units reflecting the project chapters. It’s worked for both nonfiction and fiction. I read and re-read my notes, breaking them up or combining them, until the line of story or argument emerges.

There’s no one way to use DTP; for me it’s the tool that helps me gather data, sift and shape it, and then use as a guide while I do the writing. I like that, because it lets me find my own way.

mdl · May 29, 2006, 4:25pm

Very interesting thread! Thanks!

DT Pro has quickly become my repository for anything I find interesting or potentially useful. Items I would never have thought to clip or store before are now dumped unceremoniously into DT Pro. At first I tried to tag and classify everything I imported into the database, but this was far too time-consuming. I ended up fiddling constantly with tags, metadata, and smart groups on items I might or might not use again.

Recently, I’ve found it very useful to create groups and tags only when needed. If I know why I’m importing something into the database, I can classify and tag to my heart’s content. Otherwise, I don’t worry about it. I simply have a massive folder (oops, group) titled “random web clippings” and another hierarchy for notes, organized by source. (I probably wouldn’t even need to organize these notes, since I enter bibliographical info in the comments field, which would allow for the creation of smart groups when needed.) I let DT Pro do the work when I need to find something for a project. Not having to organize everything is a luxury. Thanks to powerful search features, I can afford to be lazy, and my less important data can remain one big sloppy soup.

A couple of things that have helped:

keeping clippings and notes in the sweet spot of 250-500 words (à la Steven Johnson)
color coding items by type (generally I like to distinguish between web clippings, notes, reference materials, drafts, and finished writings; different colors make it easy to tell which is which when I use the See Also feature)

A couple of things I’m considering:

tagging items by type (e.g., webclippings, mydrafts, mynotes, etc.). This would enable me to create smart groups organized by type, so that all web clippings or all notes would stay together in a group even after I’ve classified them by subject.
The reason I like to keep all my web clippings or all my notes in a single folder is that it enables me to create a quick “tag cloud.” (O.K. so I do tag some of my items). Basically, I select all the items in the folder and then use the script “Assemble” under “Comments.” This handy script creates a text file containing the aggregated comments of the selected items. If I double click on this newly created file and then hit the “Words” button, the result is a list of tags that I can sort either alphabetically or by frequency.

eiron · May 29, 2006, 4:49pm

It seems like your approach doesn’t differ all that much from mine Mdl. Glad to see I’m not alone in the part neat/part messy crowd.
Great Idea For the"tag cloud" too; I was wondering what people use that Assemble script for; now I know. I’ll give it a try.

Thanks.

Eiron