Some questions on concordance and one question on custom meta data

ngan · August 23, 2019, 4:20am

I am using Script Debugger. It seems the stall is depending on the unique word count. Anything under 10,000 seems fine and fast (range from 0.01 to 0.04 sec). The editor stalls for count over 30,000 but I don’t have any group between 10,000 and 30,000.
I am just focusing on the implication of the concordance of single document. It’s fast enough to let me think about using script to do some sort of auto tagging (just a pet project…). A really rough idea is to store two word lists in the comment or custom metadata field of each tag in a group of 100-200 tags. One list for OR and and another for AND. Default orList is just the name and aliases of the tag. Auto-tagging can be achieved by using “contains” to get a match between the concordance and the two lists.
Loop through the tag list and using

 if (conList contains orList) and (conList contains andList ) then ...

should do. So, (1) My constrained tag script can pop up and show the recommended tags in first section, with the entire constrained tag tree being shown below for additional choices. (2) the condition of matching can be mutated by changing the word list in the tag’s comment/field directly.

I think auto-tagging is more meaningful for single or a handful of items (e.g., classifying the inbox items) each containing reasonable small amount of words. Any sort of auto tagging on large amount of documents or long document risk assigning too many unnecessary tags and there is no good way to check/audit the quality of assignment on several hundred or thousands of auto-tagged items after-the-fact.