Auto or semi-auto tagging is tough to crack. This script is my final attempt (for now). I’d say that it is somewhere between interesting to “some-what useful in some tasks”.
Most of the core elements are already discussed here: TaggerV1.1: auto and semi-auto tagging (new option for item's name-only matching)
I add two unique features and re-arrange the work-mode of the script.
(1) It is not uncommon to have tags of the same name in different tag groups. Most scripts on tagging (including Tagger V1.1) can’t handle the situation. Tagger V2.0 will assign the right tag even when there are multiple identically named tags.
(2) I tried to utilise the “see also” function of DT. The script will extract the top n documents from “see also” and find the most commonly assigned tags on those documents and suggest those tags. The parameters are configurable.
(3) I re-arrange the workflow of this script to be consistent with AutoGroup script. There are 4 modes:
Manual mode is just like a tag panel for tagging + create new tags and put those tags under a group.
Semi-Auto mode will suggest tags based on matching the criteria set up in each tag to the name or concordance of the document, and based on the more common tags from “see also”.
Auto mode will assign tags to a selection automatically based on name and concordance match as described in TaggerV1.1. Batch mode is to select a list and tag them with the same tags all-in-once.
Quick demo
(1) Semi-Auto Mode, the list shows two types of suggestions (I am using real data). The most interesting one is “Tags (DT Top 10 See Also)”. The “(4)” next to “N.Conceptual” means that 4 out of the top 10 “see-also” documents are tagged with “N.Conceptual”. The is no suggestion from word match because I haven’t set up the matching words in real data.
(2) Manual Mode. Similar to Constrained Tagging V2.0 (with ability to create and gather new tags). You can select multiple documents but need to decide the tags one-by-one. This mode is less cluttered by all the suggested tags, and particularly when you (me) want to limit the choices of tags.
(3) I think Auto-Mode is more suitable for home management, such as statements or short note. s.
(4) a log is generated.
(5) Issues with auto-tagging: Matching words of a long document to words in the TagOR and TagAND fields still creates quite a few unexpected suggestions. For example, a literature is match to “Banking Statement” (because the word “banking” is in the literature) and “ngan” (there is a word in the literature that partially includes “ngan”).
I am still trying to figuring out whether more complex predicates can be embedded. But I think the task requires a good knowledge of shell script or regex. So this is a much later task.
The options:
property topTagGpLocation : "/Tags" -- the tag group that contains all tags
property theNewTagsLocation : "/Tags/New Tags" -- place to save newly created tags
property taggingMode : "S"
property nmMatch : true
property conMatch : true
property seeAlsoMatch : true
property hourBetweenUpdate : 4
property minSeeAlsoWeight : 0.5
property maxSeeAlsoDocs : 10
property minSameTagsInSeeAlsoDocs : 2
topTagGpLocation: you can change the location to the most commonly used tag group if constrained tagging is all you need.
taggingMode: “S”/“M”/“A”/“B” for Semi-Auto/Manual/Automatic/Batch mode
nmMatch: Name match only to the tag name, aliases, and words in field wordOr and WordAnd.
conMatch: concordance-based match.
seeAlsoMatch: include tags recommendation based on DT’s “see also”.
nmMatch, conMatch, and seeAlsoMatch can be used in any combinations.
The options relating to seeAlsoMatch:
Three options are related to “seeAlsoMatch”. The precision of the tags suggestion from “see-also” is an art. The below three options determine the quality of the suggested “see-also” tags . Parameter of each option is depending on the characteristics of items in database.
minSeeAlsoWeight: a cut-off weighting for see-also document to be sampled. If the weight of a “see-also” document is too low (less similar to the target document), there is no point to use its tags for reference.
maxSeeAlsoDocs: Maximum number of documents sampled for computing the suggested “see-also” tags. E.g, There might be 20 “see-also” documents with weight >0.5 but only the first 10 are sampled.
minSameTagsInSeeAlsoDocs: “2” means a tag must be used by least two of the sampled “see-also” documents to be suggested. E.g. If u believe that your existing tags are quite appropriately tagged, a lower number might be good enough.
The scripts:
The scripts are packaged in the same way as Tagger V1.1. The main script that will remind you to update the info after a certain period; a manul update script for you to update the info after making changes. And an empty “tagger.plist” file that should be placed under the toolbar script menu directory.
Archive.zip (511.5 KB)
Any feedback or suggestion on workflow, particularly on ways to find more precise tag suggestion, is very much welcome.
Cheers