How is the "weight of words" counted?

Westwind · September 15, 2016, 1:46am

Hi,

I wonder how the “weight” of words in the concordance is counted? I try to get the logic behind this function, because often I get results which I can’t understand.

Any explanation?

Regards,
Oliver

cgrunenberg · September 16, 2016, 1:11pm

The weight depends (among others) on the number of occurrences of a word inside a document but also on the occurrences in all documents of the database and in how many different groups the word is used (e.g. more groups means that the word is more common and therefore less important).

Westwind · September 17, 2016, 1:25am

Thanks for the answer.
If it would be like you explain, I could understand. But I asked, because it is obviously NOT like you said. Please look at the attachment: The word with the highest weight is a family name. Next it’s a word with a frequency of 1. And so on …
This is why I asked.

Regards,
Oliver

Bill_DeVille · September 17, 2016, 9:05pm

Christian wrote the algorithms for determination of term weighting; his response as an overview of the procedure is correct. It’s not just the frequency of the use of a term, as he noted.

How many other documents in your database, and how many groups in your database hold documents containing that family name, for example?

Try experimenting with a word list. You will probably find that the highest weighted term, when used as a query term, will have fewer results when you search for it, by comparison to a lower weighted term. Which is to say, the more common in other documents, the lower the weight assigned in your original document.

cgrunenberg · September 21, 2016, 12:48pm

The frequency in the Concordance drawer is only the local frequency in the displayed document, other factors (global frequency, occurrences in groups) aren’t displayed.