I know it’s been asked before, and the answer has always been “no”, but does DT3x support “word stemming”? For instance, searching for the word “swim~” and having DT3 search for “swim, swimming, swam, swum”? I know that I can use a wildcard for “swim*”, and get “swim, swimming, swimsuit, swimmer, etc”, but not the other various forms of the word swim.
I’ve searched the manual for the word “stemming” and not found what I was looking for, and hope that it’s there, but referred to as something different.
The stem being “sw”? Sorry, bad joke. Frankly, I don’t see that coming. It’s different for all languages, and not even possible for all languages (think Asian languages). I’m fairly certain that it’s not there (certainly not for my native language). And I also think it is a bit out of the range of DT.
Just think “swamp” and “swan”, both dangerously close to “swam” – you’d need a whole english/american dictionary, and this is only 1,5 languages. Than there are one or two others … One of them very versatile with pre- and suffixes: we have arbeiten, bearbeiten, verarbeiten, umarbeiten, durcharbeiten… should they be found if someone is looking for “arbeit~” or for “~arbeit”?
Apart from the fact that swimming in a swamp is pushing it a bit … I doubt that this is what the OP is looking for. I suppose they also want “go~” to find “goes”, “going”, “went” and “gone” or “sit~” to match “sitting”, “sits”, “sat” (if the latter even exists, my irregular verbs are a bit rusty). In these cases, wildcards don’t get you very far.
It might even be challenging to get that working at all for english words: “book” → “(she) books” vs “the books”, “voice” → “she voices” vs. “the voices”. There’s probably a reason for linguistics being a domain of AI.
Well, obviously… if the request is if for lexical lookup then clearly the answer is no.
However the OP says that they can’t use wildcards to find ‘swim’ words where the third letter changes. All I’ve done is point out that you can do that, and shown how. I could have added that you can limit the third letter search to ‘a’, ‘i’ and ‘u’. sw[aiu]m*.
I think that the OP does not really want to figure out the appropriate wild card combinations for every case. They’re probably more after a generic solution: throw the indicative of a verb at the search, get back all documents containing the different forms this verb can appear in.
NB finding an expression for “be” is probably futile, anyway.
Maybe. But my examples should just show how to work around it as good as possible. Makes we wonder - is there any software or tool (CLI) on the Mc that can stem text? Then maybe a script could be used to stem the text of items and add the stemmed text as a comment or custom metadata.
No, I don’t personally own any software that does stemming as I don’t often have the need. However, I’ve used TONS at work over several decades that definitely stems anything that you throw at it, including irregular verbs. But, these were very expensive eDiscovery & digital forensics apps that run on severs that are purpose built for quickly find, culling, filtering words in terabytes of data. Though there are some smaller desktop apps like DTSearch that also do stemming, but probably won’t handle irregular verbs.
Sorry for reviving this thread, but Apple provides a “Natural Language” framework which may be interesting for future Devon search features.
I believe that is the framework the PDF Search app uses, and it works well for this “stemming” issue (at least in English and French) which is very useful.