Is classification of documents into multiple folders useful?

Bill_DeVille · December 4, 2007, 5:45pm

As an old chemist and erstwhile philosopher and historian of science I’ve always been interested in how terminology and methodologies spread from one field to another.

Stoichiometry is basically the discovery of, and methodologies associated with, quantitative measurements that have predictive value. The concept was familiar to the ancient Greeks and has been a mainstay in chemistry for a long time, e.g., reaction equilibria.

Political scientists discovered quantitative approaches back in the 1960s, and I once teased a friend about papers that read like “Tom Swift and His Electric Factor Analysis Machine”. (Tom Swift was the hero of a series of children’s novels in the early 20th century; he used science and technology to solve mysteries.)

Quantitative approaches are important in many fields, certainly in ecology. I remember a landmark study of field mouse populations related to nutrient availability that was done on a tract of land over a period of years. There were equilibria between population size of the mice and the quantity of nutrients on the site, but with temporary undershoots or overshoots of population size related to nutrient levels. In general, population size adjusted to nutrient availability, a dynamic relationship. Other studies have focussed on behavioral changes of mice (or their predators) during periods of excessive population density, or in adaptation to inadequate nutrient availability.

You are interested in classifying ecological literature that uses stoichiometric approaches, even though the term “stoichiometry” doesn’t appear in the content of many of the writings.

See Also is probably “smarter” than you think. It’s forte is finding similarities of words and especially the contextual relationships among words in a collection of documents. No, See Also doesn’t look at Names or at the group locations of documents (although classification may help you, the human part of the interactive team, organize your own thought).

Let me give an example. Dogs are canines. So are wolves, foxes and coyotes. Suppose you are viewing an article about dogs, which doesn’t include the term “canine”. You invoke See Also and find that the list includes an article about wolves, even though the term “dog” doesn’t appear in that article about wolves. How did that happen? Somewhere in that database is a “bridge” document that includes the term “canine” as related both to dogs and to wolves. The greater the number of such “bridge” documents, or the greater the frequency with which the relationship is defined even in a single “bridge” document, the more likely See Also is to make such a connection.

Take that as a tip. You are trying to force a connection among documents by grouping them. The connection may be the concept of stoichiometry, but that term doesn’t exist in many of the documents in your collection. One way to enhance the behavior of See Also to make that connection would be to make sure there’s one or more documents in the collection that “bridge” the term “stoichiometry” to other terms or word patterns common to the concept. That bridge document might be a beautifully written overview of the field, or it might be a “nonsense” document that is basically a glossary of related terms, perhaps repeated for emphasis.

I still do organization, at least to some degree, of my database collections for my own benefit. I can’t create and hold in my mind tables of all the tens of millions of words in my database and also the patterns in which those words occur, like See Also. But my database isn’t trained as a chemist, or ecologist, or economist or whatever my interests may be. So I’m responsible for determining the pertinence of documents suggested by See Also. Some of those suggestions may be “dumb” while others are “brilliant” – it’s up to me to make the distinction. This is human/machine interaction, and I often find it very useful.

Sometimes I find it useful to follow a trail of See Also suggestions. Perhaps the first list of suggestions may not give me what I’m looking for; but selection of a document from that list and another invoking of See Also may lead me to discovery of a relationship I hadn’t thought of.