Outlier detection feature?

Sometimes because of user (my!) error, I move
things into wrong groups by accident and
don’t realize my mistake until much later on.

Since DTPro is performing statistical analysis,
can it also have a detector for statistical outliers
to flag documents that have low likelihood of belonging
to (a) particular group(s).

Obviously, the first document sent to a group cannot
be flagged as an outlier, by definition.

Ooh, this is interesting. I do sometimes accidentally drop something in the wrong group, and then have to remember what I was dragging and search for it: a major pain. This would be a cool fix.

But it would also be another very interesting way to look at data. I imagine not just a list of “outliers” but a way to view information spatially: visually grouping some documents in the center while others are spread farther out like a solar system.

My organization system for the bulk of my research categorizes items by historical period, theme/topic, writer, theoretical approach. I rely heavily on replicants. I’d be very interested to see what the software thinks is the “center” of my “Victorian” group for instance, and which were outliers.

Of course, a visual metaphor seems like an awful big engineering task–even if it is cool. Can anyone think of a better implementation of similar functionality?

I really liked the idea! May be I should add it to my monstrous list I just posted. :wink:
My guess is, it should be easy to implement. I think DT gives a score for each document-group pair while pressing the classify button. So all we need is a list of documents sorted according to this score. Brilliant! Thanks!
I hope Devon people will have enough time for a new sidebar item.

Best,
pj

A visual ‘map’ type display as in DA would be nice
indeed for DTP. :slight_smile:

What I was thinking of was just a way to show the
following: (listed in order of complexity, from easier to
difficult)…

  • ‘flagging’ (e.g. label in some color) an entry if it
    is likely an outlier.
  • likelihood of an entry being a statistical outlier, as
    some numerical quantifier.
  • how much that outlier influences the overall relationship
    of group entries, if it’s indeed a part of the group as well
    as if it is not a part of the group. A sort of a what-if
    scenario.

I realize much of this has a potential to blur distinction
between DTP and DA, but also it seems to me that much of
this is already calculated and filtered-out in both products
anyway. (?)

Thanks,
-Shin

That’s indeed an interesting idea but although v1.2.1 will speed up “See Also” (and similar commands) a lot it will be still too slow to do this on-the-fly. But maybe an additional command/window/view will do this in the future.