Tagger V2. Two unique features, half-useful, but still interesting (I think)

ngan · September 1, 2019, 1:52pm

Auto or semi-auto tagging is tough to crack. This script is my final attempt (for now). I’d say that it is somewhere between interesting to “some-what useful in some tasks”.

Most of the core elements are already discussed here: TaggerV1.1: auto and semi-auto tagging (new option for item's name-only matching)

I add two unique features and re-arrange the work-mode of the script.

(1) It is not uncommon to have tags of the same name in different tag groups. Most scripts on tagging (including Tagger V1.1) can’t handle the situation. Tagger V2.0 will assign the right tag even when there are multiple identically named tags.
(2) I tried to utilise the “see also” function of DT. The script will extract the top n documents from “see also” and find the most commonly assigned tags on those documents and suggest those tags. The parameters are configurable.
(3) I re-arrange the workflow of this script to be consistent with AutoGroup script. There are 4 modes:
Manual mode is just like a tag panel for tagging + create new tags and put those tags under a group.
Semi-Auto mode will suggest tags based on matching the criteria set up in each tag to the name or concordance of the document, and based on the more common tags from “see also”.
Auto mode will assign tags to a selection automatically based on name and concordance match as described in TaggerV1.1. Batch mode is to select a list and tag them with the same tags all-in-once.

Quick demo
(1) Semi-Auto Mode, the list shows two types of suggestions (I am using real data). The most interesting one is “Tags (DT Top 10 See Also)”. The “(4)” next to “N.Conceptual” means that 4 out of the top 10 “see-also” documents are tagged with “N.Conceptual”. The is no suggestion from word match because I haven’t set up the matching words in real data.

(2) Manual Mode. Similar to Constrained Tagging V2.0 (with ability to create and gather new tags). You can select multiple documents but need to decide the tags one-by-one. This mode is less cluttered by all the suggested tags, and particularly when you (me) want to limit the choices of tags.

(3) I think Auto-Mode is more suitable for home management, such as statements or short note.

s.

(4) a log is generated.

(5) Issues with auto-tagging: Matching words of a long document to words in the TagOR and TagAND fields still creates quite a few unexpected suggestions. For example, a literature is match to “Banking Statement” (because the word “banking” is in the literature) and “ngan” (there is a word in the literature that partially includes “ngan”).
I am still trying to figuring out whether more complex predicates can be embedded. But I think the task requires a good knowledge of shell script or regex. So this is a much later task.

The options:

property topTagGpLocation : "/Tags" -- the tag group that contains all tags
property theNewTagsLocation : "/Tags/New Tags" -- place to save newly created tags

property taggingMode : "S"
property nmMatch : true
property conMatch : true
property seeAlsoMatch : true
property hourBetweenUpdate : 4

property minSeeAlsoWeight : 0.5
property maxSeeAlsoDocs : 10
property minSameTagsInSeeAlsoDocs : 2

topTagGpLocation: you can change the location to the most commonly used tag group if constrained tagging is all you need.
taggingMode: “S”/“M”/“A”/“B” for Semi-Auto/Manual/Automatic/Batch mode
nmMatch: Name match only to the tag name, aliases, and words in field wordOr and WordAnd.
conMatch: concordance-based match.
seeAlsoMatch: include tags recommendation based on DT’s “see also”.
nmMatch, conMatch, and seeAlsoMatch can be used in any combinations.

The options relating to seeAlsoMatch:
Three options are related to “seeAlsoMatch”. The precision of the tags suggestion from “see-also” is an art. The below three options determine the quality of the suggested “see-also” tags . Parameter of each option is depending on the characteristics of items in database.
minSeeAlsoWeight: a cut-off weighting for see-also document to be sampled. If the weight of a “see-also” document is too low (less similar to the target document), there is no point to use its tags for reference.
maxSeeAlsoDocs: Maximum number of documents sampled for computing the suggested “see-also” tags. E.g, There might be 20 “see-also” documents with weight >0.5 but only the first 10 are sampled.
minSameTagsInSeeAlsoDocs: “2” means a tag must be used by least two of the sampled “see-also” documents to be suggested. E.g. If u believe that your existing tags are quite appropriately tagged, a lower number might be good enough.

The scripts:
The scripts are packaged in the same way as Tagger V1.1. The main script that will remind you to update the info after a certain period; a manul update script for you to update the info after making changes. And an empty “tagger.plist” file that should be placed under the toolbar script menu directory.
Archive.zip (511.5 KB)

Any feedback or suggestion on workflow, particularly on ways to find more precise tag suggestion, is very much welcome.

Cheers

ngan · September 3, 2019, 7:22am

A minor change in Tagger V2.1. The script will now show the location of the suggested tags for distinguishing tags of the same name.

The main script Tagger V2.1
TaggerV2.1 (Public).scpt.zip (483.6 KB)

Silverstone · November 17, 2019, 10:09pm

Hi, @ngan
For now I use tags solely as folder-tags. I saw all these scripts including auto-tagging from concordance, “top-5 from see-also“ is interesting idea too. And may be I would invest some time in tagging my staff, but I still don’t understand the real value of all this.

Whenever I used tags, I remember only time it took for me, but never the value… Maybe there might be some sense if it would be fully automatic?

Anyway, it’s interesting to know how you use tags when it is all ready and tagged? Tag filter in side pane? Looking for connected tags (clicking on them)? Other ways or tools?
I mean DT3 of course.

ngan · November 21, 2019, 12:28pm

I am still seeking the answer myself. These are my current thoughts:

The paradigm of tagging consists of five interrelated elements: (1) what the nature of info? (2) how is this info used? (3) what’s the role of tags given (2) and how is it complementary to groups? (4) how to tag efficiently and effectively given (3); (5) what additional functions are required to maximise the utility of tags?

The additional functions I build.
I write tagger, stack, pTag, autoGroup, and sTag and all those scripts have quite a few options build in, because I believe that different types and purposes of info require different paradigms. Pls also noted that the scripts are 1st/2nd gen only, some of them are buggy.

‘Tagger’ has 6-ways of tagging (1) tagging within a constrained tags list. (2) tags suggestions based on matching the name of an item with tags’ names. (3) tags suggestions based on matching the concordance of an item with tags’ names. (4) tags suggestions based on matching (2) and (3) with additional keywords defined in the two common CMDs of each tag (OR/AND condtion) (5) tags suggestions based on the tags of DT’s see-also items. Finally (6) options for batch-tagging, one-item-at-a-time-in-selection, and manual/auto-tagging.
‘pTag’ creates and refreshes a cross table of items based on the selected tags group or tags.
‘sTag’ quickly creates smart groups based on the selection of tags with AND/NOT condition (not posted).
‘AutoGroup’ moves/replicates items in batch mode to one/more groups based on the tags of the item.
‘Stack’ breaks an item into small bits of info, and the bits are tagged and grouped under an item-linked group.

I have two types of info: home-related info and academic literature.

Home-related info. Nature of info: info-content of each item is homogeneous and unique. Usage of info: occasionally retrieval, but I know exactly what I need to find. Role of tags and groups: a way to file and retrieve misc info with minimum effort.

Before: I used to create numerous sub-groups for different statements/docs. I always need to OCR and move items to different sub-groups.

I now experiment using just a few broadly defined groups (tax, medical, bills, statements, etc.). I give each item a name with complete info ( e.g. eyes assessment Dr.XXX ngan 2018.11). I use name-only matching, I add variations of keywords under each tag, I use the auto-tagging option in tagger to tag items in batch mode. I use autoGroup to move multiple items to different groups in batch mode. I use sTag to create a smart group the first time I need to retrieve info (e.g. a smart group with tags being HSBC and Bank and Statements and 2010 and ngan). The smart groups become permanent subgroups for future usage.

Note: This is an unnecessary and over-fancy solution. If the name already contains complete info. A simple search in DT will do the job nicely.

Academic-literature. Nature of info: info-content of each item is heterogeneous and multi-aspect. Usage of info: Writing. Papers are reviewed and notes are taken, different bits of info to be used in different chapters and projects. Role of tags: A way to consolidate and retrieve my repository of knowledge for each specific categories (abstract, theory, topic and subject of study, methodology, model, variable designs, etc.). Groups are more for the purpose of project management (by using replicants).

Before: I put all the literature in one group. I review and create one main annotation for each literature. I tag the literature during the review. I need to remind myself to copy different bits from the main annotation to another set of category-specific notes to consolidate similar thoughts and knowledge. I use labels for review status and CMD for static info (year, journals, etc.)

Problems: (1) I can’t tag literature consistently. I use many variations of tags for the same category. Sometimes, I should to assign similar tags in different main tags categories and I missed (e.g. The tag “S.Institutions” is in topical and "“V.Institutions” is in variable design tags group respectively, and I only tag one of them). (2) If I forgot to copy and consolidate the different bits of info (which happened frequently) from main annotation into category notes, the ideas/knowledge are lost. I need to use tags or labels to filter all reviewed literature and to find out whether I have missing bits. (3) Each literature is info-rich. Even I manage to tag diligently and correctly. When I use tags to filter the relevant papers/annotations, I still need to search for the relevant bits within the doc.

I now experiment taking notes by bits.
(1) When I review an article, I use ‘stack’ to create and tag each index card ( with quoted text) and write down my ideas, or just an index card with only notes but is linked to a certain page. Since the info-content of each index card are much more specific, I am more confident to tag it correctly, and the constrained tags list is helping me to tag consistently. The concordance-based tags suggestion based on a small block of text are becoming more relevant, too.
(2) Since all bits from the same literature are under the same stack and sorted, I can choose to combine all cards, after a full review, into one document and make it the main annotation.
(3) I use sTag to create smart groups to gather catagory-specific cards from all stacks (ie documents), I no longer need to create another set of consolidated notes and I will no longer be losing my knowledge (if I can tag consistently!).
(4) I still tag the literature but in broader categories. The suggestion of tags based on “see-also” helps a little bit. I also always create the first index card to contain the abstract of all literature I index into DT (reviewed or not). With a smart group holding all abstracts, I can take a snapshot on my pool of literature and choose which one to review.
(5) I use pTag to create cross tables of tags. So it is now easier for me to to know which literature is reviewed or what literature is available under different categories of tags in a snapshot. pTag also helps me to know the quality of my tagging.

Notes:
This workflow is still experimental, and I continue to make many custom-changes to my scripts. Creating the right tags in my tags tree and tagging the bits correctly are still depeneding on my quality as a researcher. But my knowledge won’t get lost now, and I can retrieve and review my ideas with much higher precision and efficiency . Obviously, I still need to open 10+ papers for ad hoc searching during writing (the first or second review are never perfect)! I also need to gradually breaking down all of my previous annotations into bits!

Cheers.