I stand by both areas of that response to a Support ticket, but in different ways.
The first paragraph refers to issues of integration of the new tagging features in Maverick with the existing OpenMeta-consistent tagging scheme in DEVONthink, including differences between DEVONthink and the Finder and differences between Imported and Indexed documents in DEVONthink. These issues will certainly receive attention by the developers of DEVONthink and by the users of DEVONthink.
The other parts of that response referred to my personal opinions about the ROI (return on investment) of spending a lot of time tagging or keywording documents. They are not a policy position of DEVONtechnologies, but I hope they are of use to users of DEVONthink.
Tags or keywords can be very useful tools for handling unique characteristics of documents that make them more easily identifiable and retrievable, especially if those characteristics remain valid for repeated purposes of access of those documents and are easy to assign. For example, if I tag a collection of notes and photos to identify them as related to a trip to Malta, I’ve made it easier to retrieve them if I wish to do so. They can be very useful for other purposes, such as identifying a set of references in my collection of documents that are useful for a particular research/writing project. But in the first case I would probably leave those tags or keywords permanently in place. In the second case I might decide to remove them after completion of the project, and in fact add value to my database by removing them.
Back in the day I was managing a university center that accepted queries about scientific and technical issues related to environmental issues, and searched computer tapes for information about federally funded research that might provide useful information.
We searched computer tapes by keywords. The result of a search was a list of numbers that matched the numbers of more than a million paper copies of abstracts, which were filed in shoeboxes in a quonset hut on campus. We sent the search lists to staff in the quonset hut, who then pulled the corresponding abstracts, made photocopies of them and sent us back the photocopies.
Our staff was supplemented by hiring a number of graduate students familiar with various scientific and engineering disciplines.
When we received a query, the first task was to translate the query into keywords that would be likely to pull relevant material in the computer search stage. The second stage was to examine stack of photocopied abstracts resulting from a search, and determine their relevance to the original query. Relevant abstracts were organized and sent back to the quonset hut staff to be pasted up on letter-sized paper and photocopied as collections to be sent as a response to the query.
At the time, this was a bleeding edge project that often did provide useful information to people who sent in queries. We received support in part from federal funding, and in part from fees charged to (primarily) industrial and governmental customers. It did help disseminate the results of federally funded research to potential users of information. Today, of course, it seems very primitive.
There are serious fundamental problems in attempting to make documents retrievable by assigning keywords or tags to them. These problems have often been addressed in the field of information science.
One problem is comprehensiveness of keywording/tagging. A given document may be relevant to multiple topics. Limiting the keyword or tag to very high levels of a topic, such as air pollution, would (in my example of our information dissemination center) result in many thousands of abstracts in a search result. That’s not very useful. Keywords should filter the search to provide results for a specific query. So we need to use such specific keywords to designate each of the important topical elements of an abstract at the lowest level of terminology possible. Typically, keywords supplied on those computer tapes had been assigned at the federal agency that supplied us with the tapes.
This requires the person who is assigning keywords to an abstract to recognize the elements of information contained, and to assign one or more keywords to each “element” of information that might be important. That takes time, though, and so can make keyword assignment expensive.
Quite often, in reviewing the final results of a query response, one of us would recognize that potentially important information that we were familiar with had been left out of the response, usually because the keywords used to describe it had not made it relevant for the search, and/or because the person choosing keywords to match a query had not included an important one.
During that project i visited with several of the federal agencies that did the keywording and supplied the tapes, to discuss this problem. They had tried to mitigate the problem by two approaches; development of glossaries of keywords and staff training. While there were some improvements (which raised the cost of the effort), there were never satisfactory solutions to the issue. I had the same kind of problem at my end, in the phase of translating a query to a set of keywords.
Ignoring a related problem, which is that the terminology in different disciplines to define information may differ even for closely related items, and that the terminology in a given discipline tends to change over time, the fundamental problem with the issue of comprehensiveness of descriptors is that it cannot be mitigated very much without drastic increase of time and effort.
The second hair-pulling issue is consistency of application of descriptors, whether by different individuals, or by the same person at different times. Use of glossaries and training of personnel helped somewhat, but never made enough difference to keep this from being a serious problem. Adding an additional layer of review of the descriptors used for a document helped, but that added substantially to cost.
Based on that experience and on the fact that I often need to approach to analysis of information in my research databases from differing perspectives, I do not tag new items as they are added to those databases. I simply don’t have the time to attempt an adequate job of that, and wouldn’t consider the effort likely to be repaid well. DEVONthink gives me access to full text searches and to the ability to vary search criteria to improve results, when I’m looking for information. See Also can sometimes help overcome the problem of variations of terms used for similar topics. The DEVONthink environment is very different, compared to the limitations of our information dissemination project in the old days, which relied entirely on use of descriptors for searches.
That doesn’t mean that I consider tagging unimportant. It does mean that I tend to restrict tagging to a relatively small number of items, where that becomes a major aid to retrieval or use of the tagged items.
I often dump hundreds of new documents into a database. It’s unlikely that I’ll consider upfront tagging for any of them to be worth my time. In a few cases, such as the example of associating notes and photos of a trip to Malta, I might do so.
Feel free, as always, to consider me an eccentric. I probably am.