Hi community!
I am wondering how DT rank the relevance of a file based on a search term: based on occurrence of this search term in the file or the percentage of this search term in the whole text?
You say “a search term”. Are you talking about a single word?
yes.
Then a document with more occurrences of a single word would rank higher than a document with fewer.
The blue whale is the largest animal on Earth.
… or …
This sentence has the word blue in it.
No fruit is blue.
Being sad is often called "being blue".
Blue is a primary color on traditional color wheels.
Which do you think would rank higher?
2 Likes
It’s percentage, and @Bluefrog’s second example would rank higher not because it has more occurrences of the search term but because the search term occurs once per seven words of the text rather than once per nine words. A short article will generally be ranked higher in a search than a book-length document with a lower focus on the term as a proportion of wordcount. (This is normally what you want.)
2 Likes
I see. Thanks a lot!