Convert hashtags to tag option - exists, but doesn't seem to work, and isn't documented

It’s the standard function run off the Data > Tags menu, the one provided with the basic installation.

This same Data>Tags process is also hit and miss for me. Sometimes I find if I close DT and re-open, the conversion will then work. It also seems to get caught up on textfiles NOT originally created in DT. (in other words, I have fewer issues quickly generating a text file, then converting. It is text files imported from elsewhere, then converted, that seems to snag).

If of interest, i found an applescript for another program (https://c-command.com/forums/showthread.php/5539-Script-to-convert-hashtags-to-EagleFiler-tags),

and used it to create below AppleScript.
If you select a markdown/text file in devonthink, it will replace all tags with whatever tags it finds in the file. I’m no AppleScript expert but it works as far I have tested.

tell application id "DNtp"
	
	-- Get selected record(s) in Devonthink
	set selectedItems to selection
	
	-- Go through each record
	repeat with thisRec in selectedItems
		
		-- Get the text of the record (assuming text/markdown file)
		set _text to plain text of thisRec
		
		-- Use shell script to extract text after character "#", ending with a space/new-line etc
		
		-- UPDATED search line in v2, take care of hashtags in web links, we dont want their # in the link as tags..
		set grepResults to do shell script "grep -E -o " & quote & "(\\s|^)#\\S*" & quote & " <<< " & quoted form of _text & ¬
			"| sed -E " & quote & "s/#//g" & quote
		
		
		-- more info about above cryptic line...
		--
		-- - The line from v1 also found "#ch02lev2sec7" as tag in a text with link e.g.(https://www.oreilly.com/library/view/automotive-spice-in/9781933952291/ch02.html#ch02lev2sec7)):
		--
		-- which is not what we want, so I looked at the reg ex code what the problem is and found it
		--
		-- grep -E → E is "Add support for extended regular expressions " (needed on Mac I think I read)
		--
		-- The Regex command in applescript is: "(\\s|^)#\\S*"
		-- (It can be verified on this page: https://regexr.com/), but then note that the command is actually (\s|^)#\S* in rexeg, but you need to add an extra backslash (per backslash) if you write the command in apple script.
		--
		-- So the regex performs the following search:
		-- (\s|^) → (Search for whitespace, line break etc) OR (Search for start of row), thus prohibing tags with text prior a "#"
		-- and 
		-- #   → find character "#"
		-- \S* → and continue to match characters until we find a  space/line break etc (big S is inverse of s, and s itself is a special code in regex)
		
		
		-- OLD v1
		--		set grepResults to do shell script "grep -o " & quote & "#\\S*" & quote & " <<< " & quoted form of _text & ¬
		--			"| sed -E " & quote & "s/#//g" & quote
		
		-- transform the output to a list
		set grepResultsList to paragraphs of grepResults
		
		-- Clear the current records tag list
		set tags of thisRec to {}
		
		-- Add found tags one by one
		repeat with foundTag in grepResultsList
			
			set foundTagAsString to foundTag as string
			
			-- If the tag is an empty string (likely a # without text afterwards, like markdown header, skip it)
			if foundTagAsString is not "" then
				-- Add this tag to the records tag list
				set tags of thisRec to tags of thisRec & foundTagAsString
			end if
			
		end repeat
		
	end repeat
	
end tell

  • Update after 1st post, i found a problem in the regex string, should be fixed now.

(Please also note if you include it in a smart rule, please make sure to set “markdown” as search file kind in the filter, also note it clears all existing tags for the file and replaces them with whatever hashtags the file now contains.)

1 Like

Are you intending to clear the record’s Tags first?

Hi, yes it is intended for my specific use case yes.

Because, for markdown files I like them to be “master of tags”, using above code (as a starting point / example to build other AS on) i can use e.g. perhaps a smart rules to periodically scan a folder/files & update tags based on current text content. I mostly alter markdown files in other programs (like FSNotes).
(It happends i remove text & tag in a text file, then I dont want the file/record have old tags hanging on in Devonthink)

…if of interest again…here is my “Smart Rule” I tried successfully now numerous times on +200 markdown files with various tags. If you have many markdown files in the folder, be patient (200 files takes perhaps 20-40 seconds to update all tags from the markdown text using the script)

image

for info: The rule name in english is “Update tags on markdown files in FSNotes folder” (which is an indexed folder)

The script to enter in “Edit script” is:

on performSmartRule(selectedItems)
	tell application id "DNtp"
		
		-- Go through each record
		repeat with thisRec in selectedItems
			
			-- Get the text of the record (assuming text/markdown file)
			set _text to plain text of thisRec
			
			-- Use shell script to extract text after character "#", ending with a space/new-line etc
			
			-- UPDATED search line v2, take care of hashtags in web links, we dont want their # in the link as tags..
			set grepResults to do shell script "grep -E -o " & quote & "(\\s|^)#\\S*" & quote & " <<< " & quoted form of _text & ¬
				"| sed -E " & quote & "s/#//g" & quote
			
			
			-- more info about above cryptic line...
			--
			-- - The line from v1 also found "#ch02lev2sec7" as tag in a text with link e.g. (https://www.oreilly.com/library/view/automotive-spice-in/9781933952291/ch02.html#ch02lev2sec7)):
			--
			-- which is not what we want, so I looked at the reg ex code what the problem is and found it
			--
			-- grep -E ? E is "Add support for extended regular expressions " (needed on Mac I think I read)
			--
			-- The Regex command in applescript is: "(\\s|^)#\\S*"
			-- (It can be verified on this page: https://regexr.com/), but then note that the command is actually (\s|^)#\S* in rexeg, but you need to add an extra backslash (per backslash) if you write the command in apple script.
			--
			-- So the reg ex performs the following search:
			-- (\s|^) ? (Search for whitespace, line break etc) OR (Search for start of row), thus prohibing tags with text prior a "#"
			-- and 
			-- #   ? find character "#"
			-- \S* ? and continue to match characters until we find a  space/line break etc (big S is inverse of s, and s itself is a special code in regex)
			
			
			-- OLD v1
			--		set grepResults to do shell script "grep -o " & quote & "#\\S*" & quote & " <<< " & quoted form of _text & ¬
			--			"| sed -E " & quote & "s/#//g" & quote
			
			-- transform the output to a list
			set grepResultsList to paragraphs of grepResults
			
			-- Clear the current records tag list
			set tags of thisRec to {}
			
			-- Add found tags one by one
			repeat with foundTag in grepResultsList
				
				set foundTagAsString to foundTag as string
				
				-- If the tag is an empty string (likely a # without text afterwards, like markdown header, skip it)
				if foundTagAsString is not "" then
					-- Add this tag to the records tag list
					set tags of thisRec to tags of thisRec & foundTagAsString
				end if
				
			end repeat
			
		end repeat
		
	end tell
	
end performSmartRule

To run it, right click on the smart rule and press “Apply Rule”, then wait for it to completet (depends on # of files)

Since you over time will get tags which is no longer used, or create tags with similar name but not identical (because you might not remember what you wrote last time), it can be a good idea to re-visit devonthink tag section for the database every now and then to ensure you consistently enter same tag name (i for instance only use lower-case letters). Also a script could be made to walk through empty tags and highlight them for some kind of action.

This hashtag conversion doesn’t seem to work on tags that have an underscore in it, like #Teaching_ideas.

Running beta 5

This is confirmed but hashtags do not contain punctation. Obviously this could be extended for use in DEVONthink 3 but this would be non-standard and I wouldn’t suggest it personally.

Development would have to assess this.

Do any popular apps or online services support this?

“Convert hashtags to text” still does not work, AFAICT.

Yes, for one, Agenda supports use of hashtags internally in notes. When notes are exported from Agenda to DEVONthink (via Share extensions) as markdown, the hashtags are not converted to tags.

Bear does this wonderfully. Best I have seen.

Bear does… what?

I’m not seeing an issue here with importing a file or even adding hashtags to a Markdown document and having them detected automatically.

However, @cgrunenberg would have to comment on…

  1. I don’t believe the Tags should be detected in MultiMarkdown metadata, e.g., the second line…
  2. The detected tags are now preserving the # and previously didn’t. I don’t think they should.

I see what’s happening now

  1. Tags with special characters in them (#make_doc) are not recognized. E.g., #makedoc is recognized. I don’t think the parser should care as long as there are no spaces in the tag.
  2. Tags with caps in them (#Outlook) are created as all lowercase (i.e., #outlook). The parser should not change case. E.g., #Jim_Neumann should not become #jim_neumann.

This too.

It detects the tags and updates the menu (very quickly). All tags are links to view all files under it.

To create nested tags, you just type the tag name / tag name.

[EDITED: deleted last image]

The next release will fix this, support underscores and won’t change the case (only if tagging is not case insensitive of course, see File > Database Properties)

@cgrunenberg is there any reason for convert hashtags to tags not showing up as an option in smart rule creation nor being available as a script in the Devonthink script folder? AFAIK the only way to activate it is importing a file to Devonthink or clicking in a specific menu option.

Also, would it be possible for it to interpret / in order to create nested tags? E.g. Parent Tag/Child Tag.

The usual reason - too many requests, not enough time :slight_smile: It’s at least planned for future releases.

BTW: There’s a smart rule script “Assign Tags” which shows how to use the "extract keywords from … " AppleScript command

2 Likes

I totally agree we need this option.
Because DT don’t want to implement a real time md editor I bought a third party tool but for my use case this is almost useless when my tags are not getting updated. So either implement a real time editor or support features for using a third party app please. :pray:
Can I wait for this or do I need to find another solution?

Check this box and the tags will get updated.