Proof of concept: Auto-tagging script

ngan · August 23, 2019, 10:39am

Another strange idea: auto-tagging.

Demo:
(1) I have about 50 tags in this groups. I have setup two custom md fields “ATagOr” and “ATagAnd”. Auto-tagging happens when name OR aliases OR any word in “ATagOr” is found in the document AND conditional to the existence of words in “ATagAnd”. There are two tags that have the setup. All my tags are prefixed and have no aliases, therefore won’t find a match in the document. If the name or aliases of your tags are actual word/s, they will be used to find a match in the doc.

(2) A document, no tag is assigned. Run the button “AutoTag POC”.

(3) The document is tagged. Any additional word or phrase can be added to the two fields of any tag at anytime (separate by comma AND no space before or after the comma). So, ur auto-tagging is organic in an “human-intelligence” manner.

Before showing the very primitive proof-of-concept script:
Disclaimer:
(1) Disclaimer: I believe that the concept of auto-tagging is more applicable to one or few items and to items with small amount of text at each activation (snippets of text, bills, pdf<20pages, etc). Because it is very difficulty to control the quality of auto tag-assignments to a large number of items or items with many words. Perhaps, the best application of auto-tagging is to classify an item when it’s in the inbox, or a project item, or a literature that is just downloaded. So, this script is designed for one or very few item. I hope I’ll be able to do this in one/few Items-focused tagging Some questions on concordance and one question on custom meta data.
(2) You can’t use this script unless (a) you have setup the two fields mentioned-above (datatype: multiline text) AND, (b) each field must have at least an empty space " " within (use batch process to do it). The reason for (b) is because custom metadata field can’t be assessed by script unless it has something inside (such as " "). I have no coding for any exception/error scenario in the script, and this script only handles the immediate level children of a tag group.
(3) Once again, the DT3 dictionary is really comprehensive for doing almost anything. Those who know shell script for unix command will be able to develop a much more powerful conditional auto-tagging under this concept.
(4) Smart rule and smart group can achieve the same results AND can perform tag assignment with much more advance predicates. But this one is cleaner (no bunch of rules and groups), and perhaps may encourage more ad hoc adjustment of criteria (u will know which tags to add/change the criteria immediately after the wrong or missed-assignment and there is no need to search/remember which smart rule/group you have included the search words), and it’s fun. EDITED: smart rule/group are doing exactly what they are meant for: dealing with large number of items.

The script:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

-- Ngan 2019.08.23
property topTagGpLoc : "/Tags/S.Subject Test"
property mdTagOR : "mdtagor" -- data type is single line text
property mdTagAND : "mdtagand" -- data type is single line text


tell application id "DNtp"
	
	set theDocs to content record of think window 1
	set theConList to get concordance of record theDocs
	set cTagGp to get record at topTagGpLoc
	set theTags to (children of cTagGp whose tag type is ordinary tag)
	set autoTag to {}
	
	repeat with eachTag in theTags
		set theTagKMD to the custom meta data of eachTag
		set tlOR to {name of eachTag} & {aliases of eachTag} & my strToList(mdTagOR of theTagKMD, ",")
		set tlAND to my strToList(mdTagAND of theTagKMD, ",")
		repeat with eachWordOR in tlOR
			if tlAND is {" "} then
				if theConList contains eachWordOR then set end of autoTag to eachTag's name
			else
				repeat with eachWordAND in tlAND
					if (theConList contains eachWordOR) and (theConList contains eachWordAND) then set end of autoTag to (name of eachTag)
				end repeat
			end if
		end repeat
	end repeat
	
	if autoTag is not {} then set tags of theDocs to (tags of theDocs ) & autoTag
	
	
end tell

on strToList(thestr, d)
	local theList
	set {tid, text item delimiters} to {text item delimiters, d}
	set theList to every text item of thestr
	set text item delimiters to tid
	return theList
end strToList

korm · August 23, 2019, 11:17am

Interesting. I don’t think I would have done it this way. I think I would have had a “stop list” of words that matter, and compare that stop list to the concordance. If a word in the concordance of a document appears in the stop list, then I would add a relevant tag. Your pre-defined tag list(s) are essentially the same as what I’m calling a “stop list”. I just think it’s easier to maintain the stop list concept. Maybe not.

(You might want to correct your userid in line 3.)

ngan · August 23, 2019, 11:24am

Thanks for the advice!

You are absolutely correct that there are several ways to make the script fast. Another one is that the loop should exit as soon as it finds the first match of word in OR and AND. And I think converting the concordance and match list to string and use shell script to compare is a more optimised way. I just begin to figuring out the flow of logic and am still learning all these techniques…

But I like the wordlist to be within the metadata field of each tag for the ease of user-customisation: (a) the user wants to add tag criteria in-place right after an activation, and (2) the user can just edit the field of all tags in list view if a batch-mode evaluation of criteria is needed. On the other hand, if I am thinking from an app developer aspect, I think a centralised stop list (a sheet or text file or plist?) will be easier to maintain. Just a beginner’s thinking.

EDITED after some more thinking…

I am starting to see what u are saying (if I understand correctly). For performance purpose, the list of tags and their OR/AND criteria should be written to a text file, or plist. So, the comparison of concordance vs the list is just a matter or comparing the lists - which will be a lot faster than looping the properties of each tags at run time. User can still change their criteria within each tag’s fields, they just need to save the changes to the file/plist for an update (and/or an automatic process, e.g. Check the last updated time vs now when the script is activated). If that’s the correct understanding, that’s what I’m planning to do when I get the function right anyway, and will do it together with the other constrained tag script (I plan to combine the two scripts). Thanks again.

korm · August 23, 2019, 1:19pm

Evaluating concordance entries of less than a minimal character length is probably not productive, especially in a lengthy document where the concordance might be large. It’s not a big resource issue, but if your script were ever used to evaluate more than one document at a time, the resource load might grow noticeably if the script was spending most of its time looking for word matches that do not matter.

BTW, the concept behind your script might fill part of the gap noticed by those readers who miss the “auto classify” feature that was not carried over to v3 from v2. It’s a different animal, so I’m not suggesting the script is a replacement for auto classify.

ngan · August 23, 2019, 2:54pm

good tip

Hmm… I’ve never thought about it that way. I am just mimicking the retro-approach of Lotus Agenda, it’s the way Agenda uses categories to build the flexible view of info that fascinates me!

Thanks again.