Someone else's use of AI

edgley · July 7, 2014, 10:26pm

Old, interesting and I think still relevant:

http://www.stevenberlinjohnson.com/movabletype/archives/000230.html

Bill_DeVille · July 8, 2014, 12:14am

Those AI assistants have been in DEVONthink since it was released in 2002, and differentiate it in important ways from other document/information management database applications. I discovered DEVONthink back then, used it to manage a large collection of my documents and have used it since.

Steven Johnson has written about DEVONthink many times, and discussed his use of it, especially the See Also assistant, in his book, Where Good Ideas Come From: The Natural History of Innovation. In that book he notes that See Also is most useful to him when it makes suggestions that he wouldn’t have thought of, sometimes in chaotic ways. He often follows trails of See Also suggestions, picking an interesting article from the list, and invoking See Also on it and perhaps still others in such a trail of ideas.

Johnson’s essay for which edgley provide a link describes one of his databases, which contains groups holding notes and excerpts from various authors and books. Most of these documents are relatively short, and he speculates that See Also works best with documents in a “sweet spot” between 50 and 500 words in length. He then speculates that longer documents could provide better “grist” for See Also if they were split into smaller segments.

I often use See Also when I’m doing research for a project. The database in which I usually work holds some 30,000 documents ranging from abstracts to long reports and books. I have no intention of splitting those longer references into short chunks! It would take a lot of time and effort to break a long document into chunks that remain topically coherent. If, alternatively, I used a “chopper” algorithm that created, say, 400 word chunks, too many concepts might become fractionated and lose contextual coherence. In any case, I have no intention of vandalizing my databases in this way.

Let’s face it. In another venue Johnson confessed that he has paid research assistants who create many of those notes and excerpts in his databases. I don’t have that luxury. Instead, I rely on DEVONthink as my research assistant.

I do my draft writing within DEVONthink, and in the process of research I create a lot of rich text notes that hold links to reference sources. For a long document I’ll often create an Annotation note, and perhaps still other rich text notes will link to that and other documents of interest for a topic. I’ve found that my understanding and retention of information when I’m reading something is supported much better if I write notes, than if I do underling or highlighting. As my notes are visible to See Also, that’s my compromise with Johnson. (And I haven’t vandalized my references by messing them up with highlights and text boxes, which I think are really ugly and distracting.)

See Also is based on computer algorithms that compare the contextual relationships of the terms used in a document to the contextual relationships of all other documents in a database. Those algorithms look at the words that were used, their frequencies of use and ultimately the patterns of word usage in documents. It does this very, very quickly. I can’t think that way. But See Also knows absolutely nothing about the scientific discipline being discussed, or other content of the document.

When See Also suggests other documents that may be contextually similar to the document being viewed, it’s up to the human user to decide whether or not the suggestion is useful. I’ve had training and experience in several scientific and engineering disciplines and in areas of environmental law and policy. It’s up to me to determine whether a suggestion made by See Also helps me explore an idea. I’m not interested in those suggestions that I might expect. I’ll probably glance at some suggestions and immediately reject them as not interesting. The really valuable ones are those that lead me to an unexpected insight. That doesn’t happen every time I invoke See Also. But it happens just often enough to keep my research efforts interesting and rewarding. Once in a while there’s a Eureka! moment.

I’ve found nothing better to end writer’s block than playing with See Also.

See Related Text is similar. When I’ve written about a topic I will often select that text, Control-click and choose that option. I’ll be presented with suggestions of contextually similar documents that might be useful to compare my approach to approaches by other writers.

Tip: When I’m using See Also or See Related Text I usually open possibly interesting suggestions in a new tab of the document pane or window, so that I can move among the documents without losing scrolling position.

edgley · July 8, 2014, 12:49am

Which is why I believe in one huge DB for absolutely everything, the chance of picking up a connection which normally would have been split across 2 DBs is what gets me hot about this sort of thing.

Bill_DeVille · July 8, 2014, 3:57pm

I prefer multiple databases precisely in order to improve the use of the AI assistants and searches within databases.

For example, as a supplement to my Main database that I use for research on environmental topics I have another large database that is devoted to technical and methodological issues, such as environmental sampling procedures, chemical analytical procedures, statistical procedures for evaluation of environmental data, quality assurance procedures, risk assessments, risk/benefit methodologies, cost/benefit methodologies, etc. This is a large database.

When I’m working in the Main database and do a search or invoke See Also for, e.g., health-related issues of mercury contamination in fish I don’t want to see a lot of results that deal with sampling methods and issues and chemical analytical procedures and issues. That would waste my time, were I to merge those two databases. I want to focus on documents that deal with toxicological studies, case histories, regulatory standards and the like.

Conversely, when I’m focused on methodological issues such as sampling and analysis for mercury in edible portions of fish, I don’t want to be bothered by eliminating from results lists a lot of less useful items in my Main database. There are a lot of results in the methodological database for such a focus, as I’ve got hundreds of documents (primarily procedures in the U.S. and EU) that deal specifically with this topic.

By splitting these items into separate databases I’ve made my work more efficient. That goes for other databases I’ve created to meet specific needs and interests, as well.

Sometimes I’ll file a document in more than one database. No problem.

alanshutko · July 8, 2014, 6:49pm

You know what would be really interesting…

ABBYY has a lot of technology to infer document structure. It would be great if DEVONthink could somehow tie into that to virtually divide things based on sections, keeping track of the locations within each document where the sections exist. So if I’m looking at a recipe on grilled chicken, it might list “Saveur 124, section 1” or “Saveur 124, ‘Barbecued Chicken’” if it could infer the title from the content.

This would go so well with the AI that’s already in there!

Ryan_Fuse · July 11, 2014, 3:41pm

This is the reason why i bought Devonthink. In fact, it might be the only reason. The rest of the features are just bonus.

I’m curious Bill_DeVille, how do you utilise see also with PDF or any large documents? As I know See Also will point to the document but not the particular page or paragraph that’s relevant.

Bill_DeVille · July 11, 2014, 5:16pm

Most of my documents are a thousand words or less. But there are hundreds that are larger, and some run to hundreds of pages. I rarely have a problem with that in reviewing See Also suggestions. And although most of the large documents are PDFs, I try to avoid capture of HTML Web documents as PDF, as a) I want to exclude extraneous content and b) reduce file size, so capture relevant content as rich text. Bottom line, most of my documents are not PDF.

Scenario: I’m viewing a paper about a wetlands issue and invoke See Also. Among the suggestions will be a couple of documents that run to hundreds of pages and are PDFs, and/or my notes about those long documents. I included those two long reports in my database because they contain a wealth of information about wetlands issues. I’m pretty familiar with their contents, and have created Annotation notes for each, as well as other rich text notes that reference them. My notes do contain Page Links to specific pages of those long reports. As those notes are searchable and “visible” to See Also, if I’m lucky I’ll have already “split out” information that See Also considers contextually related to that wetlands document from which I originated the request for suggestions.

But suppose See Also suggests only the long documents? Perhaps my existing notes just didn’t capture the topic that See Also is looking for within them. But it’s also possible that the sheer frequency of use of similar terms in those two documents overwhelmed their use in my notes, so that the notes were not suggested. To check that possibility, I can temporarily exclude those two documents from See Also and repeat the request from the original document. If some of my notes about them now turn up, perhaps I’ll find them useful. If not, I might spend a bit of time going through the long reports, probably opening them in Preview in order to do some Find searches for terms that might help me find specific information. Why Preview? Because a Find search presents a list of occurrences of the term in the context of a few other words, and clicking a promising occurrence will take me to that page. I might find the information useful to my research, and in that case add it to my notes. If so, See Also was indeed useful.

I mentioned the ability to exclude a document from See Also analysis. Some documents may act as a “magnet” for See Also, showing up frequently but turning out not to be useful suggestions. Example: Some years ago I purchased an Infiniti G35x and dumped the PDF of its user manual into a database. But that user manual covered so many topics and had so many terms in it that it tended to show up in almost every list offered by See Also. Solution: Exclude it from See Also.