Keywords and Tags

xaxa · September 17, 2010, 4:09pm

Hi,
In the document properties there is a field to add “keywords” to a document. They seem to match Apple’s kMDItemKeywords and are thus indexed in Spotlight. What are they best used for and what is their relation to tags? Would there be a way to sync or merge the two?

xaxa

Bill_DeVille · September 17, 2010, 7:48pm

Document Properties are “available” for only a limited set of filetypes and DEVONthink has no control over making a given field in Document Properties editable.

Because I work with a variety of filetypes in my databases, my own workflows entirely ignore Document Properties. My DT Pro Office Preferences > OCR setup “turns off” the option to add Document Properties to searchable PDFs, as I find the modal procedure irritating because it slows processing of a queue of scans. I find it faster and easier to rename and organize/tag searchable PDFs after they have been imported to a database, and by doing this my PDFs integrate well in organization, tagging and searching with all the other content in my databases.

I suppose it would be possible to design a script that would convert keywords contained in Document Properties to tags, assuming that a consistent method of separating keywords had been used.

Confession: I view keywords (and to some degree tags as well) as relics of the stone age of computing and document management, as they were absolutely necessary to find anything before the advent of full-text indexing and the AI tools provided in DEVONthink.

Back in those stone age days, I was project director of a university center designed to disseminate federally-funded scientific and technical research related to environmental problems. We received from federal agencies computer tapes that contained keywords descriptive of the content of abstracts of research publications, and a code that allowed one to pull the corresponding paper abstract.

So keywords were the critical factor in both describing information content, and in searching for it. The person assigning keywords should be thoroughly familiar with the material being read. A person not familiar with the discipline covered in the material is not likely to be able to precisely define and use appropriate keywords, but I quickly found that even an expert in the field may not consistently apply keywords to similar documents, so there’s an inconsistency problem that was often evident in the computer tapes we received.

At our end, we had to analyze a customer’s request for information in terms of defining keywords most likely to result in a useful search of the computer tapes. The persons we assigned to develop searches by keyword also had to be familiar with the scientific and technical disciplines involved, and with the general practices that had been used at “the other end” in assigning keywords. Again, we found that even the same person designing searches by keyword tended to be inconsistent over time.

So there are two basic problems in reducing the content of a document to keywords (or tags): 1) precision in distilling the material to a few discrete terms, and 2) consistency in applying those terms. There has been a lot of discussion in the literature of information science about both problems.

It’s not too difficult to design a set of consistently applicable keywords (or tags) for a catalog of items, such as for the nuts and bolts on the shelves of a hardware store. But the task of assigning keywords to collections of correspondence, scientific papers or books and papers on economic theory can be very difficult, especially if one tries to go much beyond the most obvious “top end” keywords to distinguish the items – a lot of time and effort can be expended on that, and I’ve grown to think that most of it is a waste of time, given the tools in DEVONthink to help me find what I need for a particular purpose.

Do I use keywords or tags? Yes, I use some degree of group organization as a form of keyword assignment, and I use tags fairly often that are applied when I’m working on a project (and that I often removed when the project is completed). Often, when I’m working on a project, I want to take a fresh look at my collections of tens of thousands of reference materials, and most or all of the time I might have expended in tightly categorizing those materials probably wouldn’t have been helpful, and might even be an impediment to that “fresh look”.

Count me as lazy, so far as concerns “front end” organization, keywords or tags when I’m adding new content to a database. Long ago, I spent a lot of time on such tasks when adding new content to a database, and I’ve concluded that most of that effort was wasted (except for cases where “cataloging” is important, chiefly for financial records). I find it much more profitable to do that sort or work at the project level, using DEVONthink tools such as searches, smart groups, See Also and See Selected Text to pull together just the material that I need for a project and to explore it for facts and ideas. I’ll make notes about important material, often with links to references. I find See Also a wonderful tool, especially in a large database, to explore for related material – and it’s ability to “bridge” related terms avoids that problem of inconsistency had I tried to spend a lot of time key wording or tagging the items.

KeithKendrick · September 23, 2010, 5:48pm

Hi Bill,

I continue to be confused by your opinions about tags. And I really don’t want to be!

I might be simply making a logical misstep in my understanding of the capabilities of the tools and of good ideas about how to use them. Please tell me if there is a silly mistake in the oversimplified list below:

data is fundamental input to the senses, including the mind;
information is data invested with some meaning (for our purposes here, by the mind);
knowledge is information invested with some additional meaning (i.e. patterns recognized across pieces of information);
wisdom is knowledge invested with some additional meaning (i.e. patterns recognized across pieces of knowledge);
the human mind can make the leaps that invest meaning in data, information, and knowledge, but machines can’t, at least not to the same extent;
if someone stores a piece of data, information, and/or knowledge in a database, it would be a good idea to also capture any additional meaning that a human mind identifies with that piece, and to associate those things somehow in the database;
the only reason to store data, information, or knowledge is the expectation of an interest in eventually retrieving it;
the method of storing it, therefore, should be highly related to the anticipated method(s) used to retrieve it

In case there isn’t a major flaw in the thinking indicated above, then please tell me why keywords and/or tags and/or comments aren’t an excellent way to capture additional meaning that one might want to add to an article captured in a DTPO database.

It seems to me that we should store information (data, knowledge, etc.) in the smallest chunks in which we are likely to want to retrieve them, and then all retrieval is done by generating reports - the request that causes the report should just indicate which pieces of information (data, knowledge, etc.) are of interest at that time. So, the most important elements of the whole storage/retrieval system are metadata (including tags) and smart groups - the metadata being a user’s means of efficiently investing additional meaning in a piece of information (data, knowledge, etc.).

I think iTunes is an easy example of this - the storage unit is song, because that is the typical unit we will want to retrieve (rather than words or phrases); the reports are the collection(s) of songs we want at that time - defined by metadata like album, artist, genre, word or phrase in the song title, etc.

If I’m searching a database other than my own, then I might like to take a guess at the metadata that someone else might have used to invest additional meaning in the individual chunks stored there, AND, I would like to have something like AI making suggestions based on it’s ability to invest meaning. If I am searching MY database, though, I want to be able to give priority to the meaning that I have invested in the pieces that I have chosen to store. That is, I frequently want to focus the search on things like tags, keywords, and comments. Unfortunately, tags seem to be second-class citizens in the DT search tools/processes world; I don’t understand why that should be.

What am I missing in my understanding of this whole topic?

Thanks! Keith

rickla · March 28, 2011, 8:02am

I can understand that Bill may consider his previous message to be an answer to Keith’s last question, but I’m sure there are other people who are thinking something similar. Can anyone suggest an alternative answer?