DTPro: A nice review by the scientific community

Fred · September 6, 2006, 10:59pm

DT and DTPro are reviewed at the MacResearch.org web site, by Marco Coïsson, a researcher with INRIM in Italy. It’s a meaty review, well-illustrated with screen shots, and is found at:

 [macresearch.org/devonthink_pro](http://www.macresearch.org/devonthink_pro)

“MacResearch.org is an open and independent community for scientists using Mac OS X and related hardware in their research.” The web site is, of course:

 [macresearch.org/](http://www.macresearch.org/)

dekay · November 9, 2006, 6:08pm

Actually I believe he is missing out on most of DTP’s AI features ;(

Peter_Gallagher · November 9, 2006, 9:11pm

I don’t agree. I think this reveiwer has pointed accurately the most disappointing aspect of DT’s alledged ‘artificial intelligence’

This is the same point that some of us have been making about DT for quite a while now. DT would be MUCH more useful if it took account of very easily recognized clues about which words are IMPORTANT in a document and weighted them accordingly. For example, words and phrases in the title, or words in a document that also occur in the title of the folder to which the document has been classified or (optionally) words in a rich text that have bold emphasis (likely to be headings). As far as I can tell, from experience, DT does none of these things.

I consider this the oddest shortcoming of DT. In my view, it makes DT a ‘storage database’—which is what I use it for—rather than a research tool.

I also agree with the author’s commments on the limited search capabilities. He’s not aware, apparently, that it’s possible to search through a group of files (‘in selection’). But, in my opinion, that is because the search options are not well presented in the interface. Even in the specialized search dialog (CMD+F), the option to search a specific domain (e.g. a folder), which many researchers would consider very important, is located at the BOTTOM of a column of widgets under a drop-down that doesn’t advertise it’s function (I found it more or less by accident).

While I’m on this dialog… a researcher with a moderately complex domain to work in wants PROXIMITY searching. There’s no better way to make distinctions in collections that are statistically very similar from a word-frequency point of view. The lack of any proximity searches in DT is another of those things that, frankly, just puzzles me.

I sometimes wish DT had a ‘plug-in’ structure (the scripting interface won’t do it), because there are plenty of well-established algorithms for proximity searches.

Best wishes,

Peter

Bill_DeVille · November 10, 2006, 12:08am

Hi, Peter:

Is See Also really useful?

It’s true that the See All feature doesn’t give special weight to titles or to bolded text. That might or might not increase the accuracy of the list of suggestions of similar documents. (Many people use cryptic document names as titles, which would probably reduce the accuracy, however.)

Nevertheless, I find See Also does make an extremely useful research assistant.

Example: I just used See Also while viewing an article on a gene-altered bacterium that resulted in new properties (in this case, creating an altered strain of e coli that smells like mint and banana). The See Also list included several very relevant references on synthetic biology. This in a database with more than 20,000 documents, many of which are very large PDF files. As I do use meaningful titles, a quick inspection lets me identify documents in the list that I might want to pursue. So DT Pro did a very good job.

Example: Here’s the See Also list for a recent paper titled “Global Loss of Biodiversity Harming Ocean Bounty”:

Impacts of Biodiversity Loss on Ocean Ecosystem Services – Worm et al. 314 (5800): 787 – Science

Status and Trends of Amphibian Declines and Extinctions Worldwide

PLoS Biology: Troubled Waters: The Future of Global Fisheries

Biodiversity Effects on Soil Processes Explained by Interspecific Functional Dissimilarity

Ecosystem-Based Fishery Management

My Way News - Overfishing May Harm Seafood Population

Ecology - Wikipedia, the free encyclopedia

Ecology for a Crowded Planet

Biodiversity Research Still Grounded – Hendriks et al. 312 (5781): 1715 – Science

Ecological Linkages Between Aboveground and Belowground Biota

Environmental Politics Bibliography

OCEANOGRAPHY: Microbes, Molecules, and Marine Ecosystems

Effectiveness of the global protected area network in representing species diversity

Comparative Risk Resource Guide Third Edition, 1997

Science -ECOLOGY:
How Extinction Patterns Affect Ecosystems - Raffaelli 306 (5699): 1141

SUSTAINABILITY: Resolving Mismatches in U.S. Ocean Governance – Crowder et al. 313 (5787): 617 – Science

Surgeon General Secondhand Smoke.pdf

That’s the unedited list of See Also suggestions. I think it’s pretty darned good! The lowest ranked item, the Surgeon General’s report on secondhand smoke, is the least relevant – but contains information about lung cancer incidence in the fisheries industry and also relationships to ingestion of salted fish.

If I were a journalist writing an overview of potential fishery problems, every single one of the suggested contents (except the secondhand smoke document) would provide useful information to help understand and develop the topic. I looked at each article in the list; it’s really a great reading list on the topic.

Better search operators?

I agree with you, Peter. That’s why version 2 will have the same search operators and query formulation that are currently in DEVONagent and EasyFind. I’m especially looking forward to proximity operators, which will include NEAR/n, BEFORE/n and AFTER/n.

But I’ll note that with a bit of trickery one can already do searches such as combining an exact phrase AND x and NOT y. How? Do the PHRASE operator search first. Replicate the results to a target group created for that purpose. Then search that group for the term x and replicate the results to another target group. Now – you guessed it – search for term y and delete the search results. (That will be so much more straightforward in 2.0.)

Peter_Gallagher · November 10, 2006, 1:38am

Hi Bill,

I think we’ll continue to have different views on the weighting of terms in the DT concordance.

From this review and from other questions and comments in the Forum it looks like I’m not alone in my contention that search, classification and ‘see also’ would all be more useful if they took account of obvious clues to the importance–not just the relative frequency–of words in a document.

As it stands, I find that I get lists of a dozen or so documents (as in your case) from a search or even a ‘see also’ that seems to be ordered by the frequency of occurrence. Often, only one or two documents somewhere on the list use the search term or phrase in a relevant/significant way.

Now I have to search through them all more mechanically or by making guesses. I use ‘meaningful’ titles, too. But even so, who can remember the content of a document from its title six months or two years after filing? Not me. That’s where a ‘research tool’ as opposed to a ‘storage database’ should earn its keep.

Glad to hear about proximity searching coming up.

Best,

Peter

Bill_DeVille · November 10, 2006, 3:49am

Hi, Peter. One more time on See Also.

I’m beginning to think that you and I have very different expectations of what a list of contextually related documents should contain. The list of similar documents presented about declining fish populations is a good example.

I don’t want to narrow the list of See Also suggestions to words or concepts that I’ve predetermined as being the most important. Why is that? It’s because I’m looking for expansion of concepts.

So that example list delighted me. The article about declining populations of amphibians doesn’t appear, at first blush, to have much to do with overfishing. But in fact it presents information that helps one to grasp the complexity of declining populations, and some of the difficulties in looking for causative factors. So I’ll give See Also a genius grade for pulling that one up to show me.

Likewise that article “Biodiversity Effects on Soil Processes Explained by Interspecific Functional Dissimilarity”. Is that about fish? No. But it presents some perspectives on the functional roles of species in a complex ecological setting. It helps me understand how ecological niches – and the ecology as a whole – change when one or several species – of fish, if you like – almost disappear from an ecosystem. Does it have relevance to how I might think about fisheries resources? Emphatically, yes. Would I have seen that article if See Also were more weighted to article titles, or to keyword I might impose? No. And I would have missed a cue about some important principles, and would have lost some useful insights.

I would have been very disappointed if that See Also list had just served up a list of other articles about declining fish populations resulting from overfishing. I could do that with a search query. That’s a “mere file system” and it’s not what I hope for when I click that See Also button.

The reason I consider DT Pro a good research assistant is that it does put up some unexpected results that do have a contextual/conceptual relationship to the original article. Often, some of the suggestions are “dumb”. It’s up to me to recognize relationships that may be interesting and useful. I find it exciting when See Also serves up a suggestion that can lead to a new way of understanding a topic. It did just that for me in both examples mentioned in the previous post in this thread.

I don’t pay attention to the ranking given the suggested list items. Like you, I don’t remember the content of all of the articles in my database; there are more than 20,000 of them. But I do use descriptive titles, and I’m made it a habit to explore some of the items on the list, and sometimes click See Also on them. So one can quickly pick up additional useful references about fisheries and fisheries regulations, or how restaurant menus change as a popular fish is replaced by another species because of population collapse; but also literature about stressed ecosystems and the results of population collapses of species.

So I use See Also to explore and expand upon a topic, not to catalog all the articles with the same subject matter. I delight in those unexpected things on the list, where the real gold often is found.

talazem · November 10, 2006, 10:18am

Not to add fuel to the flames, but I’m going to have to throw in my weight to Peter’s arguments above. Having read Bill’s postings, and Peter’s arguments, I find that my experience of using the AI functionality resembles Peter’s frustrations. Often I will try to use “See Also” and the other AI abilities to find an article or note-entry; title entry – while not being used the same by all, as noted by Bill – IS a natural place for an AI engine to look, considering most people do give a moment’s thought at least to what they’re calling an article, a journal entry, or a note-entry. Other elements, like headings etc., are also natural places for a person to scan to see if there are similarities.

I think everyone here with a high school education (everyone, I’d assume) remembers when a history or english lit teacher tuned you in to the fact that if you SCAN the table of contents and index of a book, plus any abstract that you might find, you’ll have gotten the whole of the gist of the book; the rest is evidence and details.

I personally agree with Peter, and wish such basic wisdom was implemented in DT’s algorithms for the AI.

Maria · November 10, 2006, 10:52am

Ditto to all talazem said.

It could solve the multilingual problem either.

Maria

talazem · November 10, 2006, 10:58am

I hadn’t even thought of that, but that’s so true. My own academic research is in English, but due to its nature it is FULL of technical terminology and short phrases and passages in other languages (largely Arabic); the AI has been decidedly unhelpful for such interjections and passages (unless the whole paragraph or note is in that language).

Maria’s right: depending on how the above was implemented, such multi-language AI abilities would be greatly improved.

yaxpac · November 10, 2006, 11:10am

Hi all - I am only adding this comment as an important illustration of the bigger picture, not as a discussion point for the topic at hand.

Peter it is worth noting that ability to search a specific domain might be considered difficult to find (we all recognize the complexity of DT) however the very nicely put together video tutorial for using DT Pro has a very nice description of the nuances of the search window. This includes a demonstration of using the option of searching a particular domain.

Here is a screen shot from the actual videoL

My point is simply that while this is a very complex progran, many of us tend to gloss over such minutia as these demonstrations. We know we can figure it out. However the 20+ minutes it takes to review these tutorials can save oodles of hours of trying to figure something out. As for me, I am as guilty as anyone for skipping things like this, however with DT I am finding it is critical to read every how to to begin to understand the scale of the complexity. When I was watching the the search part of the video I thought to myself “I wonder why you can’t drill down your research area”. Then it popped up on the screen in and I thought "ahhh, of course…sure am glad I am watching this "

Again this point is not for discussion as it will force a tangent to this useful thread. This is just an observation. If you would like to comment please do so in another forum thread and paste a link to your discussion there if you feel it is relevant.

Best,

R. joe

Timotheus · November 10, 2006, 12:48pm

DT Pro is good for storing, but a poor research tool: such is the inescapable truth.

Unfortunately, the guys at DT invest their “limited resources”, as they call them, in multi user capabilities and similar things. Instead of first getting the core things right, they are dedicating themselves to features which are indoubtedly important, no question about that, but which in my opinion should come after better searching, after multilingual capabilities, and so on.

As it is now, in my opinion DT is not a very convenient tool for doing academic research.

If there existed a database application with clearly better searching capabilities, with a clear multilingual orientation, with a well designed keywords feature and so on, I would switch immediately.

yaxpac · November 10, 2006, 1:43pm

But you miss the point of this thread and DT. Those criteria you want do not exist. Do you remember the Mosaic browser? It lack a few of the features we take for granted today. Building applications, especially one as complicated as DT, is an incremental process and this refinement of this feature is not an overnight task. And I do beleive if you take a close look at DA we might be seeing some of the work under the hood of what we can eventually see incorporated into DT, and more. The develoeprs of DT seem to be quite attentive to their audiences ideas and suggestions (and pretty witty to I might add). So try to keep ideas and suggestions on a point a constructive.

Note:

does not fall into the category of “construtive ideas and suggestions”.

cgrunenberg · November 10, 2006, 1:50pm

The only thing I can say at the moment is that we’re already working on the database and search engine of DT 2. And that searching of DT 2 will outperform anything available for the Mac (in terms of speed, accuracy and functionality). Stay tuned…

Peter_Gallagher · November 10, 2006, 8:01pm

Especially in view of that promise, Christian – and because I always find Bill’s enthusiasm contagious — I certainly will.

Best,

Peter