As I sometimes create new databases by dumping over thousands of documents from DEVONagent searches, I’ve tried a variety of approaches to try to add some organization to the material – not just for myself, but for others who may also look at it.
Sometimes I’ve made copies of such a “single-level” database and tried various approaches to organization of the material. Of course, I’m more likely to spend effort on this if the database is intended to be retained and have additional content added in the future.
One approach is to do some searches and then replicate the results (or selected “high-ranking” results) into new groups created for that purpose. With some thought, planning and a bit of grunt work this can carve out a hierarchical organization that makes some sense to me, and that DT Pro’s Classification feature can begin to recognize. Usually, though, despite my best efforts, I’ll end up with a gaggle of still-unclassified items that, for lack of anything else to do, I toss into an “Unclassified” group for possible future evaluation – and I go to the Info panel of that group and tell DT Pro to ignore it for Classification purposes.
Later on, especially if I’m continuing to use the database with further additions of content and some manual “forcing” training of DT Pro I may make another duplicate of the database and see if the material in my Unclassified group will now be classified by DT Pro.
Still another approach, usually mixed with the above approaches, is to select a number of unclassified items and invoke the Auto-Group command. DT Pro will then try to group together documents with related content. The result is, of course, not hierarchical. Usually DT Pro will still leave ungrouped a number of the seleclted items.
Depending on the textual content of the items being Auto-Grouped, the results can range from useful to frustrating. In the best case, some of the new groups will really be useful, perhaps “smarter” than I might have been in seeing relationships. In the worst case I’ll end up with a large number of groups that contain only two or three documents and that would take a lot of work and time to evaluate and use.
I find that Auto-Grouping usually works best when I’ve already created a group with a sizeable number of contents and would like to make that group “finer grained” as the top level with sub-groups.
Hacking out an organizational structure as above results in a lot of replicants, especially a lot of straggling, unclassified replicants. If i’m designing a database that I’ll maintain for a good while with the expectation of adding more content, I’d like it to look better.
So i’ll select just the well-organized stuff and export it, then import that material into a new database. That new database will be the basis of a continuing and growing topical reference collection.
The oganization resulted from a combination of manual design and content movement, auto-grouping and “training”. As the organizational structure becomes more defined, the database begins to interact with me and participate in it own further organization.
If new content being added is sufficiently similar to the existing content (my databases are topical) the database can suggest where to ‘file’ the new content. At some point I may turn on Auto-Classification and let the database make most of those filing decisions.
Confession: none of my databases is completely organized. At any time there may be nundreds or thousands of unfiled items. Some groups are well-organized into a fine-grained structure of subgroups. But some groups are more like “catch-all” containers, with only rudimentary organization. I don’t have any “static” databases; all of them continue to grow and evolve.
There are payoffs in spending some time and effort to organize – or at least partially organize – database contents. I’ve got two objectives: when looking at the database structure it should “tell” me about the topics of the documents; and I want to hand-off most of the responsibility for filing decisions to the computer, as I hate that job.
As much as I hate filing, I hate tagging even more. Various forms of tagging get attention in this forum, and there are proponents of this or that form of tagging. Tagging is some sort of extension of the concept of classification into a hierarchical system of “folders”, by adding a keyword, color, state or other “mark” to an item.
But tagging is primitive. It’s what we had to do before the days of computers, and tagging evolved as special techniques in the early days of computing, because all databases were dumb and depended on tagging to find anything at all.
And tagging is limiting and inconsistent, especially in the context of the documents in my database. My database includes documents such as Darwin’s Origin of Species, Lynch and Conery’s paper, The Origins of Genome Complexity and David R. Liu’s paper, Translating DNA into Synthetic Molecules. As it happens, my database has a group named “Evolution” and DT Pro’s Classify suggested that I file each of these documents into that group when each was added. Not a bad suggestion and I accepted it, although one of those documents is also filed in two other groups as well.
If I had also tagged each of these documents with the keyword Evolution that would be OK, I suppose. But there are very important differences in content among the three. Darwin’s paper is important in the history of science. At the moment, that distinction can’t (yet) be applied to the other two. One of them involves discussions of geophysics and geochemistry. Two of them involve molecular biology. One has important conclusions about synthetic chemistry and synthetic biology.
Of course, after reading and thinking about each of these documents, I suppose I could come up with some tags that distinguish each of them from the others. There are hints of such tags in the previous paragraph. But each of the documents is “richer” in content than just a few tags could describe. Some aspect of a document may be important to me in one context, but a different aspect may become important if I’m researching a different topic. If I become dependent on tags, it’s likely that I’ll keep adding or modifying tags continually.
Tagging is time-consuming and takes effort. I may add batches of thousands of documents to a database. I’m simply not going to bother with tagging individual documents except in special cases, even using scripting approaches. I might consider “smart groups” and search results as an extension of classification, and so as tagging, but that’s about it.
Do I do tagging in special cases? Yes. I may mark a document by adding a comment about it in the Comment field, such as the fact that it’s a citation in an article I’m writing – or I might do that in a separate note that links to it as a citation. If it’s a draft, I may mark its State as unfinished, then either mark it as finished or clear the State when I’m done with it.
One can also use hyperlinking to mark or “tag” relationships among documents, including Wiki linking. When I’m writing an article I may start by doing a List outline on a Table of Contents page, then link each component of the List to a document that’s a section or subsection of the project. And I may use an associated file to link to citations that will be used as footnotes or endnotes in the finished article. I’ll use (probably temporary) State or Label tags in progress so that I can quickly check the status of the project.
But if I don’t do general tagging, can I still get “tag clouds”? Of course, and in ways not inherently limited by any tagging scheme. That’s why I love DT Pro. When I do a search or create a smart group, I’ve identified documents that have some commonality. When I do a See Also operation the suggested list has some commonality of contextual relationships. Or when in a rich text document I select a word (shall I call it a keyword?) and press the Option key, DT Pro shows me a list of documents that also contain that word. And of course there are still other ways I can use DT Pro to show me some sort of commonality between documents or portions of documents.
What really thrills me when I’m researching in one of my databases is when DT Pro helps me discover a relationship between ideas that’s new and useful to me. Not only does it not require tagging to do that, I suspect that tagging would actually hinder the process of discovery.