Tips on improving DT's AI?

I was hoping someone could help me improve my own experiences with DT’s AI capabilities by listing some of the most important, most effective steps I can take to assist in improving its accuracy?

A couple of questions I have about this?

Should I choose ‘Get Info’ and specifically check the box to ‘Exclude from classification’ for items that I don’t think I will be searching for very often, and which I don’t want cluttering up AI’s brian? Now that I think about it, does choosing to ‘Exclude from classification’ prevent the item to be searched for, or is this entirely related to being include in, or excluded from, the ‘Classify’ feature?

Are there any tricks that anyone can suggest with regards to adding ‘comments’ to the metadata of a document that might prove handy? For example, A method I’ve seen used by others with regards to improving tagging documents with Spotlight searchable comments in the Finder is to add symbol at the beginning of each comment, such as @dog, @cat, etc…). The idea being that if you used spotlight to search for dog, you might get hundreds of bookmarks, documents, PDFs, videos, music, etc…but search for @dog would result in just that one file you tagged with @dog

If I were to apply the broad subject-encompassing comments to files contained in certain groups (and all of the enclosed sub-groups to the comments field of documents in a chosen group and its sub-groups, wouldn’t that allow me to narrow down search more easily, and wouldn’t it also assist with other DT AI capabilities such as See Also, Classify, etc…?

Aside from applying comments, are there any other tips anyone can offer as to how I can aid in improving DT’s AI capabilities?

Please advise…Thanks!

The item is only excluded from the classification but not from See Also or searching.

Currently comments don’t have any impact on the AI.

You might have a look at “Building Your Database > Adding Your files” in the help.

If I understand correctly, the “see also” and “auto-classify” functions work both through lexical analysis and by how the documents are grouped together (sharing folders and hierarchy).

So if one were to be using keyword tags inside the comments field and used an applescript that collected (replicated) entries to folders of those tags, would those “tags” (now shared folders) then have an impact on the AI?

Thanks.

No. Classify does contextual analysis of a document to recommend placement into a group that contains similar documents. See Also doesn’t consider organizational structure – which, I will argue, makes it more valuable. For example, I can find relationships to a document about chemical reaction equilibria in documents that belong to other disciplines such as ecology and economics. I don’t want those relationships to be obscured by the fact that those related documents are stored in quite different groups in my database.

No for See Also. Yes for Classify, if you were to introduce a new document to it for analysis.

There are tricks you can play on See Also. Suppose you have in your database articles about dogs and articles about wolves, none of which contain the term canine. You can “teach” See Also about the family relationship by creating a new document in which you repeat the text string “dog dogs wolf wolves canine canines” a number of times. You’ve created a textual “bridge” between the terms. Now, if you apply See Also while reading an article about canines, which doesn’t mention specifically dogs or wolves, See Also will likely suggest articles about dogs and wolves.

As your collection of documents grows, it will tend to naturally create such “bridges” in contextual patterns. My main database, which contains tens of thousands of documents, often surprises me with See Also suggestions. Many of them are dumb, not useful; but some of them provide new insights that are very useful.

In a long article or book that covers a variety of topics, one might select a specific paragraph or section and use See Selected Text (Command- or right-click on the selection and choose the option), which will examine possible contextual relationships among the other documents in the database to that particular excerpt.

I tried testing your suggestion for tricking see also, and it has spawned a whole new set of questions I have that maybe you can help me with…

I created 4 documents, entitled:

john
paul
mary
bridge
the contents of each of these documents is as follows

john - john likes mary but hates paul

paul - paul doesn’t like mary

mary - mary likes john and paul

bridge - john john paul paul mary mary john john paul paul mary mary john john paul paul mary mary john john paul paul mary mary john paul mary
the ‘See Also’ results for the above documents are confusing…they are:

for mary document - paul, bridge and john are all recommended

for bridge document - paul, mary and john are all recommended

for paul document - mary, bridge and john are all recommended (but strangely, both paul and bridge are listed as the top documents, each with 100% certainty)

for john document - both mary and bridge are recommended, but not paul (even though the word ‘paul’ is mentioned in the text of this document)
I don’t suppose there’s any logical reasoning behind the strange and unexpected results of this test that you’d be willing to share, are there??

I don’t see anything strange or uncertain about the results. :slight_smile:

DEVONthink isn’t running formal syllogisms. Don’t assume that DEVONthink understands the grammar, syntax and meanings of words in English, Latvian or Italian. It’s analyzing contextual patterns of text strings. But notice that in your “John” document you have introduced a term next to “Paul” that doesn’t exist in the documents for “Mary” or “Paul”.

Your Bridge document did establish contextual relationships similar to my example of tying together dogs and wolves as instances of canines. You might have gone a bit further and tied John, Paul and Mary to the term “people” or perhaps “homo sapiens”.

I usually don’t pay a lot of attention to the rankings in See Also lists.

Your example isn’t, of course, as rich as a large database. In my own database dealing with environmental topics, See Also will suggest an article on chemical equilibria when I’m looking at a report on the changes in species populations when an invasive species is introduced. That’s why I use it, as there really is such a relationship and See Also found it, although I hadn’t thought of it at the time. But it remains my responsibility to “grok” that useful connection, and to reject other suggestions that are not of interest to me.