Convert hashtags to tag option - exists, but doesn't seem to work, and isn't documented

…if of interest again…here is my “Smart Rule” I tried successfully now numerous times on +200 markdown files with various tags. If you have many markdown files in the folder, be patient (200 files takes perhaps 20-40 seconds to update all tags from the markdown text using the script)

image

for info: The rule name in english is “Update tags on markdown files in FSNotes folder” (which is an indexed folder)

The script to enter in “Edit script” is:

on performSmartRule(selectedItems)
	tell application id "DNtp"
		
		-- Go through each record
		repeat with thisRec in selectedItems
			
			-- Get the text of the record (assuming text/markdown file)
			set _text to plain text of thisRec
			
			-- Use shell script to extract text after character "#", ending with a space/new-line etc
			
			-- UPDATED search line v2, take care of hashtags in web links, we dont want their # in the link as tags..
			set grepResults to do shell script "grep -E -o " & quote & "(\\s|^)#\\S*" & quote & " <<< " & quoted form of _text & ¬
				"| sed -E " & quote & "s/#//g" & quote
			
			
			-- more info about above cryptic line...
			--
			-- - The line from v1 also found "#ch02lev2sec7" as tag in a text with link e.g. (https://www.oreilly.com/library/view/automotive-spice-in/9781933952291/ch02.html#ch02lev2sec7)):
			--
			-- which is not what we want, so I looked at the reg ex code what the problem is and found it
			--
			-- grep -E ? E is "Add support for extended regular expressions " (needed on Mac I think I read)
			--
			-- The Regex command in applescript is: "(\\s|^)#\\S*"
			-- (It can be verified on this page: https://regexr.com/), but then note that the command is actually (\s|^)#\S* in rexeg, but you need to add an extra backslash (per backslash) if you write the command in apple script.
			--
			-- So the reg ex performs the following search:
			-- (\s|^) ? (Search for whitespace, line break etc) OR (Search for start of row), thus prohibing tags with text prior a "#"
			-- and 
			-- #   ? find character "#"
			-- \S* ? and continue to match characters until we find a  space/line break etc (big S is inverse of s, and s itself is a special code in regex)
			
			
			-- OLD v1
			--		set grepResults to do shell script "grep -o " & quote & "#\\S*" & quote & " <<< " & quoted form of _text & ¬
			--			"| sed -E " & quote & "s/#//g" & quote
			
			-- transform the output to a list
			set grepResultsList to paragraphs of grepResults
			
			-- Clear the current records tag list
			set tags of thisRec to {}
			
			-- Add found tags one by one
			repeat with foundTag in grepResultsList
				
				set foundTagAsString to foundTag as string
				
				-- If the tag is an empty string (likely a # without text afterwards, like markdown header, skip it)
				if foundTagAsString is not "" then
					-- Add this tag to the records tag list
					set tags of thisRec to tags of thisRec & foundTagAsString
				end if
				
			end repeat
			
		end repeat
		
	end tell
	
end performSmartRule

To run it, right click on the smart rule and press “Apply Rule”, then wait for it to completet (depends on # of files)

Since you over time will get tags which is no longer used, or create tags with similar name but not identical (because you might not remember what you wrote last time), it can be a good idea to re-visit devonthink tag section for the database every now and then to ensure you consistently enter same tag name (i for instance only use lower-case letters). Also a script could be made to walk through empty tags and highlight them for some kind of action.