Suppose I have a Group (abc)
All my doc ref to (abc) subject go into that Group
This doc is usually in two languages (50/50) can be three (50/40/10)
Would it be best to separate Group (abc) in two Groups, each group having its own language despite its the same subject (eg Group (abc) - Language 1 / Group (abc) - Language 2),
“best” being assessed in the context of AI classification
OR Can we consider that it does not matter if a substantial number of documents is in the Group whatever the language ?
Subgroups should improve the accuracy but that’s hard to tell without having access to the contents.
I made a test, it does improve accuracy …Now i have 100% accuracy with two subgroups vs 80% with one group …Still 80% is not bad … I wonder if its worth it !
Now I have a main Group (abc) and two sub-groups sorted by languages on subject (abc)
Now I would like to create another group (abc) containing All the doc of the two sub-groups, so i can do the following :
-
Create a new Group (abc) containing the replicants of the two sub-group
Its my understanding that replicants are like aliases, images (correct?) so it will not affect classification as the weigh is zero (correct?)
-
Create a smart-group excluding Group as I only want the documents, in this scenario I have to add a tag “abc” as a second condition or can I do it another way ?
Any other possibility I did not “see” ?
Thanks
No, replicants aren’t aliases, meaning that all replicants are identical. There’s no original and therefore this does affect the classification.