Recent comments about the usefulness of the See Also feature in DT Pro (and about DT Pro’s strengths and weaknesses as a research tool) are found in this thread: < devon-technologies.com/phpBB … php?t=3211>.
Recommendations for modification of See Also
Peter Gallagher notes that I’m enthusiastic about See Also as a research tool (which I am, with my main database), but would like the the ability to do user tweaking of the algorithms so as to make it more focused. For example, he suggests that additional weight be given to the terms that are included in the title of a document, and to the terms used in bolded headings and subheadings, as opposed to the body of the text in the document. Other such tweaks previously discussed might include weighting of the organizational structure of the database (which in fact does get consideration in the algorithms) and/or user-assigned keywords. (In some past threads, some users have commented that they couldn’t “trust” the AI algorithms unless they were disclosed. The algorithms are proprietary and are not likely to be disclosed.)
In a similar vein, talazem reminds us “I think everyone here with a high school education (everyone, I’d assume) remembers when a history or english lit teacher tuned you in to the fact that if you SCAN the table of contents and index of a book, plus any abstract that you might find, you’ll have gotten the whole of the gist of the book; the rest is evidence and details.“ So he sides with Peter and wishes that the AI routine could be tweaked to act as recommended by his high school teacher.
My own use and expectations of See Also
First, I neither expect nor want an exhaustive “catalog” of all the books, articles and notes about a topic to be presented in the See Also slide-out panel. I try (with more or less efficiency) to do that with my organizational structure of the database. If I wish to produce such a catalog (usually to improve my organizational structure), I’ll use searches and/or smart groups. As Christian has noted, the search features in version 2.0 will become much more powerful for such purposes.
Second, even if I could tweak the algorithms to look for bolded text or tables of contents, the variety of layouts and formatting in my collection of documents would probably make that of limited utility, as well as constituting a complicated development set of problems. Perhaps keywords might be considered more easily. But personally I try to avoid keywords or tagging, as (a) I have never come up with a consistent scheme that would cover all of my possible uses of reference materials and (b) as I sometimes add hundreds or thousands of references to a database, I don’t have the time or inclination to tag them.
Third, I’ve gone through the process of producing large bibliographies (more than 3,000 references in one) using talazem’s teacher’s trick of producing index card notes taken from “scanning” tables of contents and introductory chapters (and as much more as feasible). But when I’m doing research I’m not interested in the similarity of documents in that sense. I’m looking for the “evidence and details” instead. Similar TOCs doesn’t necessarily imply much about similarity of the contents.
I hope, when I press the See Also button, to get a list of suggestions that will help me explore a topic and perhaps lead me to new (hence unexpected) insights about relationships to other topics or ideas. The results, when I press that button, can be highly variable, depending on the text contents of the document I’m viewing and the other content of the database.
I monitor the memory usage on my Mac using a preference pane named MenuMeters. If I’ve just launched DT Pro, select a document and then press the See Also button, I notice a very large drop in free memory. That’s because DT Pro is comparing the word patterns in the document being viewed to the word patterns in the other documents in the database, a very big task indeed. It may take a few seconds to produce the first set of suggestions. Subsequent uses of See Also on other documents produce virtually instantaneous results on my computers. That’s because DT Pro retains the “setup” for See Also until the application is quit.
I’ll often follow a trail of suggestions, choosing a suggested document and running the See Also routine on it to see where it may lead me.
My main database of more than 20,000 documents deals with my interests in environmental science and technology, and associated policy and regulatory issues. It covers a broad range of scientific and engineering disciplines, from chemistry to conservation ecology, case histories of pollution problems, as well as law, economics and many other topics. Many of these disciplines have a highly structured “language” of technical terminology, others do not. Sometimes I’ll Option-click on a term to see other documents that use that term.
Perhaps I’ll examine a draft environmental regulation that sets limits on a pollutant discharged into the environment. Is it enforceable, i.e., are there available analytical techniques to measure the contaminant? Does the normal background already exceed the discharge limit? Does the toxicological information support the proposed standard? How does one balance risk assessments with cost-benefits of the proposed standard? (Those questions are often raised, and I’ve seen proposals fail because they had not been asked in advance.)
DT Pro provides me a very useful set of tools to ask and answer questions like that. I may identify areas for which I’ve got insufficient information, leading me to go looking for additional references. In such cases, the See Also suggestions may be “dumb”.
The utility of See Also suggestions varies, of course, on the content of the document being viewed and on the other documents in the database. If the text of the viewed document isn’t contextually similar to anything else in the database, there will be few or no suggestions, and they may not be of any use. But in most cases, in my database, there will be potentially useful suggestions. There are occasional glitches. I’ve moved the user manual for my Infiniti G35x car into my main database because I look at it frequently. Although it’s by no means the largest PDF file there, for some reason it turns up very frequently in See Also suggestions. I simply ignore it. One of these days I’ll remove this “magnet” file.
I try always to remember that DT Pro doesn’t “know” anything about chemistry, or toxicology, or the law. It’s up to me to understand and interpret the material I’m looking at. I’m interacting with the information in my database, and I’m responsible for decisions.
So I use See Also to explore connections between the documents in my database. The connections that often prove most valuable are those that may seem surprising, those I wouldn’t have thought of. These are not random. There are logical connections between the documents (throwing out outliers such as the user manual for my car). It’s those “surprising” ones that can lead to a new understanding, or even a new idea. That, in a nutshell, is why I would be disappointed if See Also simply regurgitated a catalog of all the other documents that say the same thing as as one I’m looking at. Example: If I’m looking at a page about dogs and press See Also, I’ll be pleased if the suggestions include documents about canines, carnivores, or perhaps pets.
The “Tower of Babel” problem of multiple languages and linguistic analysis
Maria, who deals with multiple languages in her database, wants the AI function to “see” correspondences between documents regardless of the language. Timotheus and talazem echo the desire for better multilingual capabilities.
No question about it. The Classify and See Also features would be much stronger if they could handle multiple languages. Perhaps some day that may happen.
Maria has suggested that the developers of DT Pro should construct interpretative tables between languages so that terms used among various languages could be correlated in searches and AI functions.
But which languages? There are a great many languages used in DT databases.
And which words? I don’t see any way to decide except all words, including all of the idiosyncratic variations of use and context.
For years very large organizations with very large funding have been working on similar problems. I’m not aware of any claim that a universal solution to language translation has yet been developed. Phone companies, for example, have developed correspondences between limited sets of words or phrases for many languages, but even those limited sets are very large.
For searches, though, the enhanced search capabilities of DT Pro version 2.0 will provide some assistance, as it should be possible to improve searching for terms in multiple languages.
So it should be possible to enter a query such as:
((“ExactEnglish” OR “ExactLatvian” OR “ExactJapanesevariant) BUT NOT (“ExactRussian” OR “ExactPolish”)) BEFORE “ExactGerman”
That’s a silly example. But that kind of query will allow much more useful searches of multi-language databases.