Yes, that is an interesting paper. It represents an approach to classifying by topic large collections of files containing text. The purpose of such classification would be to allow the researcher to winnow down from the collection items related to a topic. In this case, the algorithms would have already tagged files by topic, so the researcher can quickly find potentially useful documents by tag.
Logically, there’s no difference between tags and keywords. Applying such a classification to an item is usually done to make it easier to retrieve by a search, and/or to aggregate or relate items that share the classification.
There are serious logical and methodological problems with applying and using keywords or tags. These can be summarized as problems of consistency, comprehensiveness and context. My own experience as director of a computer information center back in the days when enormous collections of documents could only be searched by keyword turned me into a curmudgeon concerning the return on investment of time and energy in trying to apply such a priori classifications to every document in my databases. I regard tags or keywords as sometimes useful, but I take the time to assign them only for limited purposes, where the investment of my time and energy is likely to be well repaid.
But the cited paper is about assigning tags by the computer, so as to help the researcher avoid reading all those documents and applying topical tags. Great! The human doesn’t have to do the work! It’s likely that an algorithm will be more consistent than would a human in applying tags, but the logical and methodological issues of comprehensiveness and context remain. (Do you really want to see hundreds or thousands of tags per document, depending on nuances of comprehensiveness and context, and that relate not only to the document itself but to others in the database?)
Information science has made a lot of progress in assisting humans to work with the information content of documents. Computers can do some things very quickly, for which the human brain isn’t wired. We have already reached the stage where a human researcher can use a computer synergistically, interacting with the computer’s software to find and analyze information much more easily than the pre-computer past. But true semantic analysis hasn’t yet been achieved, nor is one’s computer trained in disciplines such as chemistry, ecology and so on. In that sense, it remains the human’s responsibility to evaluate information found or suggested by the computer.
DEVONthink and DEVONagent have been suggesting keywords or topics identified by contextual relationships in text content since these applications first appeared. In DEVONthink, open a document to display its content. In the navigation bar immediately above the pane in which the document is displayed there’s a Keyword button. Click it, and a list of suggested keywords is displayed. In DEVONagent Pro’s Digest view of search results a list of topic terms is displayed, and even a graphical display of relationships among search hits by topic terms.
In DEVONthink, Option-click on any single word term in a document. A list of all other documents in the database that contain that term is displayed. (That’s not complicated, of course.)
Indeed, some of the algorithms in DEVONthink and DEVONagent Pro go well beyond simple use of terms contained in a document, such as the keywords listed for that document. See Also, for example, may suggest among a list of similar documents in the database one that may not contain the same keywords, yet is found to be contextually related! Such a suggestion can be unexpected and, if on review I find the suggestion useful, it’s the kind that makes me shout Eureka! – it’s a conceptual relationship I hadn’t thought of. No, See Also isn’t true semantic analysis that can identify a concept regardless of the terms used to express it. That’s a really tough target for information science. But sometimes See Also approaches the results that would be expected from true semantic analysis. Of course, it’s up to the user to evaluate suggestions and recognize the really useful ones.
As time goes on, computer hardware becomes more powerful and software more powerful and sophisticated. Isn’t that wonderful?