Topic Maps, Word Clouds, etc. -- DocEar


DTPO is by far my favorite application. Its not just a ‘work’ thing. Its a ‘learning’ thing. The classification features help me figure out what to read if I’m just not ‘getting it’ in one document. Moreover, as I write technical documents, it helps me find references. Generally its my ‘offline brain’.

Over the years, I’ve tried many, many applications. I’m a software and gadget geek. For example, I don’t think there is an Outliner, Todo app, MindMapper, Knowledge Management, system out there that I don’t either own or for which I’ve had a trial.

As I mentioned, classification is the most interesting part of DTPO for me. Indeed, I’d love to be able to get more detail from the classification engine so that I could integrate visualizations from R and other Apps.

All this being said, there is another App out there that is intriguing me. DocEar. DocEar has many of the features that, from time to time, I try to figure out how to replicate within DTPO. I’m interested if anyone knows of other, similar solutions to DocEar that I might be able to use with DTPO?

Here is a particularly interesting visualization:


This is using the Aduna cluster map library, a java based visualization which is, I believe, freely available though unmaintained. Indeed, its used in Carrot^2 and Scan (both available through source forge).

In absence of a direct integration, being able provide data to products like Carrot^2 would be better than nothing in my opinion.

Devonthink needs to stay focused on its core business (e.g. DTTG 2 et al.). I am not at all familiar with the possibilities of 3rd party software interfacing with DT to tap the information contained in DT. But Soma-Zone’s Ammonite application demonstrates that it is possible to make word clouds etc. So in my view, visualization experts should tackle a 3rd party product tying in, and selling it separately.

I apologize in advance for my biased view, but I am truly curious. I just cannot get anything out of these visualizations. I’ve tried hard, checked out mind maps and similar approaches. I always come to the conclusion that simply lists and good search capabilities are superior. Now, I can understand that different minds work differently. Maybe I’m just not the type to make sense of mind maps, and others might immediately “resonate” with them.

However, one thing makes me suspicious: Usually, screenshots provided by the proponents are naturally making a particularly strong case. A good example are CAD programs. Even the simplest 2D CAD software usually advertises with insanely cool looking sports cars supposedly drafted with that application. So you can immediately understand that this software is powerful; the difficulty will be to reproduce this with your own work.

Now: Screenshots and examples for visualization and mind mapping software (and the screenshot shown above is the perfect example), are invariably weak. And equally invariably, they do not present a powerful work scenario, but some odd hodgepodge including shopping lists or contrived examples. They immediately make me wonder why all this graphical power is needed to represent something that trivial. Do we really need this graph above to find out that there is a pdf in the collection that contains the word “aperture”? And worse: This only makes sense for the simplest associations. What about everything that “is a pdf, contains the word aperture, but not the word lightroom”? Visualizations are limited largely to 2 dimensions (well, with crazy arrows going all over the place between mindmap clouds, this can be a little expanded, but not much), and that’s not enough for most things.

A funny example is the infamous “Afghanistan Stability/COIN Dynamics” chart:

I really would like to see an eye-opening example that could pull me into this. And who better would do this than the people who want to sell this kind of software. Yet, their showcases remain unconvincing, at least for me. I’m not a nay-sayer, but like Steve Martin’s boss in “Parenthood”, I’m saying “Dazzle me!”.

Apologies for missing Korm’s posts. I’ve been traveling. I apologize in advance because its late and I’m tired from the aforementioned travel. I’ll take a cue from Korm and edit later.

I think that perhaps you had a knee-jerk reaction to something that looks like a mind-map yet, with respect, is not.

This is a cluster diagram. Since its purpose is to provide an overview of classifications it does not take the place of lists and search when one knows exactly that for which they are looking. It does allow, in my opinion, one to navigate naturally when you don’t know where to start.

Better examples may be had if a google image search is run using the terms ‘aduna cluster map’. I chose this one screenshot because it was very simple.

I notice that you reference Ammonite. I have had it installed for a number of years. I do not see why such functionality could not be easily replicated directly within DTPO. Moreover, in a way that is much more appealing. From what I am able to see, Ammonite is just a list of tags with what appears to be a simple count being used to increase font size and color. Its a tag cloud, not a word cloud. I would expect the latter to be built from the concordance of a database. Also, I simply do not like Ammonite’s presentation. A personal preference.

I suppose that we should agree to disagree about DTTG being ‘core’. Myself, I find that I never use it. Probably for the same reasons that you do not like mindmaps. Because I simply cannot see its utility. BTW, I don’t ‘get’ twitter either so maybe I’m just a dinosaur.

Speaking of mind-maps, I like to switch between outlines and mind-maps specifically for the fact that different parts of the brain are involved in processing the information. As a result, you may sometimes see things one-way that you would not see at all otherwise. A facsimile of the ‘second set of eyes’ pattern.

Finally, there are many algorithms that have to do with clustering of a textual corpus. Some of those recognize languages, parts of speech, topics, etc. The result sets can be quite large. Reducing that to a manageable size while still conveying useful information is the purpose of visualizations.

I am a mere programmer who has a lot of papers, books, documents, etc. not originated by me that I wish to organize and navigate. Indeed, I wish to cluster my whole life and see what results.


A fantastic recent example I have seen of this type of visualisation is Hannah Jacob’s project. There is a nice summary of the project at with links to her presentation and slideshow. Interestingly it starts with all of the texts in Devonthink, processes the text with python and feeds the output into a variety of timeline, mapping, word tree and word cloud tools before pulling it all together in the omeka CMS.

I find the idea of visualisation of my DT information a highly tantalising but mostly frustrating exercise. I have much of the same lingering suspicions about the validity of these visualisations as gg378 expressed. Visualisation whether it be by cluster diagram, a time line or a word cloud works best when it is the product of process of careful distillation to the most essential elements. It must take a relationship that is complex to express in words and make that relationship clearer.

My amateur attempts at applying machine learning, graph databases and other techniques to my Gbs of documents in DT haven’t yet produced the kind of insights I was hoping for. Jacob’s reference to her seemingly neat workflow as a ‘highly organised lie’ definitely resonated with me.

Thank you for the reference to Docear. There are some really interesting ideas buried underneath the mess of of a truly horrible interface. Pity the software is almost unusable.