Concordance - what do use it for?


DT has got a feature called concordance, which delivers some cheesy statistics about the data in your database. So what? What do you use that information for? Just to satisfy your curiosity? Come on, there must be more to it.



Personally I use this to be able to understand why “See Also” or “Classify” sometimes just don’t work and think of possibilities to improve this afterwards  ;) And as I’m a writer too, I’m of course also interested in the statistics.

Software evaluation by respective newsgroup response posts?

Presumably if all the posts in a newsgroup such as this were transferred to a DT database, then the frequency of certain key words could highlight the most commonly occurring issues. The frequency ratio of positive words to negative ones (like ‘crash’ and ‘great’ for example) could also be used to measure the overall user response to the software. Obviously, for the latter to present a meaningful conclusion, the forum sections would have to be balanced in such a way that both positive and negative responses could occur naturally.

If a software developer set up multiple databases to cover specific time frames in relation to the forum posts (to coincide with new version releases), then by comparing the keyword frequencies between the databases, the success or failure of various bug fixes in subsequent versions could be deduced.

The above may be of use to a software (or any other product) developer, but perhaps from a writers point of view, word lengths and associated frequencies may give a rough guide to the suitable reading age of a document. Another use for word frequency may be to highlight any ‘overuse’ of certain key words in a specific document.


Hey Sandman!

This is a fascinating repyl… thanks for the insights and the inspiration!

Best regards

Thanks Christian,

Another thing I need to add is that if positive and negative word frequency is used on a single word basis (as opposed to a qualified expression), then the following errors can occur;

‘great’ is classified as a positive.
‘not great’ is also classed as a positive.
Even if ‘not’ was classified as a negative word, the combined term ‘not great’ would effectively only be classed as a neutral expression.

There is also the possibility that in general, customers may decline to give voluntary positive feedback in relation to a product when all is going well, but the first hint of a problem, then the complaints come flooding in. In this case, the distribution of positive to negative words will be negatively weighted from the start.


Well… as a newbie, I’ve been using it to scan through research entries – they call it “datamining” now, whereas we used to call it “having a look at my notes” – and some interesting things crop up.  Like, for example, I’m using “Rochester” and “Classical” a lot, and much of that is in combination.  So my brain has been trying to make a connection without telling me.  And DT, by inviting me to muck about with the data, has brought that to my attention. Which in turn has given me a useful way forward in my work.