Problem: Setting smart rule for auto tagging indexed files based on text content

ymwu · July 25, 2020, 7:56am

Hi,

Background: I’m using Obsidian as me markdown editor and primary note taking app, while using DT as my library for all the clipped articles, PDFs, etc.

To better integrate my notes in Obsidian and DT, I’m indexing the “vault” folder (à la Obsidian, this could be any folder in Finder), where all my .md notes are stored, in DT and thus being able to preserve all the backlinks/wikilinks, etc.

The problem: Tags
In all my .md notes, all the #tags I type in (and detected as tags in Obsidian), seems to be just plain texts to DT.
I’d like to set up a smart rule that scans the text, find any string with #, e.g. #TagName1, and add a tag #TagName1 in DT.

Since I have my own tagging system across all my apps, doing this should allow me to integrate my .md notes into the hierarchical tag structure in my DT.

Maybe I need a script to do this, but I’m no coder and haven’t find threads like this. (or maybe I’ve missed? If so, please kindly point me there. Thanks!)

Cheers!

Blanc · July 25, 2020, 8:25am

Do you have a list of tags, or are the possibilities endless? Is there a delimiter - i.e. in your text, ist the #tag always followed by a space or other character? Are your tags the only incidence of # in your text?

Depending on your answers, this should be easily coded (I’m sitting in front of the first steps already)

PS presumably the preference “convert hashtags to tags” doesn’t work for you? As far as I can tell, that will do almost exactly what you want, adding tags which are marked with #, although it will name the tag TagName1 rather than #TagName1

And just for the sake of it, because coding is fun, this is the script which would do the same, assuming the tags are all delimited by a space, a comma or full stop:

on performSmartRule(theRecords)
	tell application id "DNtp"
		repeat with theRecord in theRecords
			set documentText to plain text of theRecord
			set theTags to tags of theRecord
			repeat
				set theNewTag to ""
				-- find the next #, end if there isn't another one
				set thePosition to offset of "#" in documentText
				if thePosition is equal to 0 then
					exit repeat
				end if
				set theSearch to text thePosition thru -1 of documentText
				-- find the delimiter space, comma or full stop
				set spaceEnd to offset of " " in theSearch
				set commaEnd to offset of "," in theSearch
				set dotEnd to offset of "." in theSearch
				set theList to {spaceEnd, commaEnd, dotEnd}
				set theEnd to 100
				-- find which delimiter comes first and use it
				repeat with a from 1 to count of theList
					if item a of theList is less than theEnd then
						if item a of theList is not equal to 0 then
							set theEnd to item a of theList
						end if
					end if
				end repeat
				set theNewTag to text thePosition thru (theEnd + thePosition - 2) of documentText
				set documentText to text (theEnd + thePosition) thru -1 of documentText
				set theTags to (theTags & theNewTag)
			end repeat
			set tags of theRecord to theTags
		end repeat
	end tell
end performSmartRule

“I don’t code” is perfectly reasonable, but in case you thought “perhaps I should”, this is a simple script to play with. I stopped refining it when I realised the built in function in DT should do the trick for you; if not, the script should be refined to find a tag at the end of the document not delimited by anything other than the end of the document.

ymwu · July 25, 2020, 10:06am

Thank you @Blanc!

“convert hashtags to tags” seems only work when importing, not indexing files in DT? First thing I tried and failed.

I will fiddle with the script above and see what I can do. And yes coding is actually fun (worked on Excel VBA when studying; discontinued since), just that I couldn’t find a starting point earlier. Thanks again!

Thanks again!

Blanc · July 25, 2020, 10:23am

That’s possible - I don’t know.

The script should work “as is” - I tried it on some texts and it did what it should. I just haven’t optimised it to pick up a tag if it is the last word in the text and not followed by a delimiter. Btw if you have other delimiters in you text (i.e. other than space, comma, full stop) it’s simple to add more - I’d point you in the right direction if you need help. As far as the last word is concerned, I’ll play with the script later on to add that. It would also be easy to make it leave out the “#” when actually tagging the document if you wanted (a “+1” added in the right place should do that)

Further delimiters:

add a line set variable to offset of "delimter" in theSearch for each one (choosing a unique variable e.g. colonEnd or semicolonEnd as in the lines before, and replacing the word delimiter with whatever the delimiter is, e.g. : or ;. Then add the variable to the list {spaceEnd, commaEnd, dotEnd, colonEnd, semicolonEnd} a little further down.

chrillek · July 25, 2020, 10:32am

This is one of the cases where I find a JavaScript solution more appropriate, because it is less complicated than the AppleScript version.
Also, I’m not sure about this line:

According to the documentation “tags of record” is a list, and “&” is a string concatenation operator. But my AppleScript is not good enough to know if this construct appends the last found tag to the list here or if it indeed does string concatenation only (and would thus probably raise an error).

A JavaScript implementation might look like this (untested):

function performSmartRule(records) {
  let app = Application("DEVONthink 3");
  let regex = new RegExp("#([^#]+)[\s\.,]+","g"); /* match all #tags not containing # and terminated by space, dot, or comma */
  records.forEach(rec => {
     let match = rec.plainText().match(regex);
     match.forEach(tag => {
     rec.tags().push(tag); 
/* not sure if the above works as intended, might require rewrite like
     let t = rec.tags();
     t.push(tags);
     rec.tags = t;
*/
     }) /* match forEach() */
  }) /* records.forEach */
} /* performSmartRule */

Blanc · July 25, 2020, 10:34am

it works

Yours is certainly less complicated; not sure if that’s because it’s java script or because you as opposed to me know how to code I just know how to get where I’m going to - which probably leaves a lot to be desired of my code quality. I could certainly have used regex which would have been fewer lines and presumably more efficient, too (although my results have been a little unpredictable with regex, again probably because I haven’t properly understood it yet). Anyhow, thanks for your input - I like it, because it gives me new ideas back for what I’ve provided - that’s good

chrillek · July 25, 2020, 10:44am

Yes. JS has a lot more built-in string capabilities than AppleScript (as seen here). Whereas AppleScript has to go through a lot of loops and things to get to all the tags, JS needs just one line. AppleScript is a bit like Basic, in this respect.

Thanks for clearing up the question about the & operator. Another weird decision of Apple, probably with the intention to make coding easier (less operators to remember …)

Blanc · July 25, 2020, 10:46am

That may be why I feel more at home with it - I started coding on a C64, then an Amiga 500 - and then found I couldn’t be friends with C or Pascal when I switched to a 486 running OS/2 and gave up

(edited, because further experimentation shows that which I wrote not to be true, and the & operator simply adding to the list)

cgrunenberg · July 27, 2020, 8:05am

The option to convert hashtags to tags is broken in case of Markdown documents in the current release, the next release will fix this.