Classify Outstanding Except for One Case

I am totally impressed with the Classify (Magic Hat) function of moving items from my Inbasket to Groups, but there is one case where it seems to regularly fail… medical statements. We are a family of six, and so there are some statements that continually are miss classified, and I wish there were a way to tell it where to look for the key info. For example here are two cases.

Case 1 - Medical bill from doctor
All the bills are addressed to me, but there is always a “Patient Name” listed as well. I have a group called “Medical”, and then a group for each of our names within that… “Mike”, “Karen”, etc. For some reason, these bills always show up as the same person in Classify… actually its usually one of the kids. Even though each statement has the correct name on it somewhere.

Case 2 - Explanation of Benefits from Insurance
This is kind of the same thing… they are addressed to me, but the rest of the information including patient name is for the kids. Plus, the amounts and dates are all matching with the medical bill that is often in the correct person’s folder.

These statements and bills seem to repeatedly get misclassified, while the rest of what classify does is totally brilliant.

Bump. Anyone have any ideas on this?

I suspect that the documents are so similar in content that the Classify routine doesn’t see distinctive differences. Which is to say, the relatively few character differences in text content for proper names isn’t enough to give Classify a solid cue.

In what form are these documents? PDF or PDF+Text? If there is no machine-readable text in the document (e.g., for a scanned PDF that is not OCRd) then DT can’t discriminate on much other than the document name. Open the “Words” sidebar for one of these documents to see if the words you expect to see are actually seen by DT.

At times it is possible to “prime” the classify routine by adding comments to documents (“Karen”, “Mike”, etc.) which gives DT another field to which to apply the AI routines.

I am using PDF+Text in all cases, and the correct names are in there. I think its that the pages are so similar and the one name that is different is key. I keep hoping it will eventually figure it out, but this use case seems to kill its ability to figure things out.