Is there an option I’m not seeing to have concordance count words with hyphen’s as different words.
For example, the two words Can’t and Can would result in a frequency count of 2 for Can, and a frequency count of 1 for the word T ( assuming I allow 1 letter words ). You’re and You have the same problem, and count the words You and Re.
DEVONthink treats text strings that contain alphanumeric characters as words. Non-alphanumeric characters that separate words include punctuation marks and hyphens, which are treated as equivalent to Space as separators, and are therefore ignored in searches and in the Concordance. Case is ignored.
In a text layout application that uses soft hyphens, and so will display the same word as hyphenated or not hyphenated depending on margin settings, DEVONthink may not “see” such soft hyphenation. Depending on the algorithms and dictionary used for soft hyphenation the same word may be hyphenated differently when line widths are changed.
But when one prints such text to paper, then OCRs it, the text conversion will break apart as separate text strings those separated by the hard, printed hyphen. The separated strings are treated as different words in the Concordance.
So my only option if I want to do word counts on Can vs Can’t, is to delete all ’ characters so I’m comparing can vs cant. I guess another option would be to write my own script.