Group - Classification - Languages

smilingtiger · October 27, 2016, 6:37am

Suppose I have a Group (abc)
All my doc ref to (abc) subject go into that Group
This doc is usually in two languages (50/50) can be three (50/40/10)

Would it be best to separate Group (abc) in two Groups, each group having its own language despite its the same subject (eg Group (abc) - Language 1 / Group (abc) - Language 2),
“best” being assessed in the context of AI classification
OR Can we consider that it does not matter if a substantial number of documents is in the Group whatever the language ?

cgrunenberg · October 27, 2016, 7:58am

Subgroups should improve the accuracy but that’s hard to tell without having access to the contents.

smilingtiger · October 27, 2016, 8:21am

I made a test, it does improve accuracy …Now i have 100% accuracy with two subgroups vs 80% with one group …Still 80% is not bad … I wonder if its worth it !

smilingtiger · October 27, 2016, 8:48am

Now I have a main Group (abc) and two sub-groups sorted by languages on subject (abc)
Now I would like to create another group (abc) containing All the doc of the two sub-groups, so i can do the following :

Create a new Group (abc) containing the replicants of the two sub-group
Its my understanding that replicants are like aliases, images (correct?) so it will not affect classification as the weigh is zero (correct?)
Create a smart-group excluding Group as I only want the documents, in this scenario I have to add a tag “abc” as a second condition or can I do it another way ?

Any other possibility I did not “see” ?

Thanks

cgrunenberg · October 27, 2016, 12:08pm

No, replicants aren’t aliases, meaning that all replicants are identical. There’s no original and therefore this does affect the classification.