Any way to expose metadata in Markdown documents to DT?

MultiMarkdown files can have metadata. MultiMarkdown supports a subset of YAML metadata, and does not seem to complain about other YAML. YAML metadata in Markdown documents is used by prominent Markdown-using applications like the Pandoc converter and static site generators like Hugo and Jekyll. GitHub Pages files use it. In other words, there are lots of reasons one might write Markdown with metadata.

For example, one might use the following sort of metadata block (with YAML’s “---” delimiters being optional for MultiMarkdown):

---
title: "Document 2"
date: 2019-11-18T09:58:00-05:00
draft: false
tags: research, notebook
---

# Main Section

body material

Happily, DEVONthink’s MultiMarkdown processing respects such metadata, by hiding it in previews, so it doesn’t clutter up the screen. My question is whether there is a way to expose some or all of these metadata fields to DEVONthink, or for DEVONthink to see it.

An application of this metadata visibility would be the following sort of automation:

  • A Markdown file is imported to DEVONthink.
  • Instead of processing the first line as the title (which gives it the title “---”), DEVONthink recognizes that the title value is “Document 2” and assigns that as the title.
  • Perhaps other fields are processed, or visible for scripting, like date or tags?

I imagine it might be possible to script this by calling a separate YAML processor like yq from within a shell script in a DT smart rule. I’m wondering whether there might be any more direct way to do it within DEVONthink.

3 Likes

No, there is no support for parsing YAML data though, as noted, you could parse the plain text via scripting.

FYI, if you don’t use the --- blocks, DT still hides metadata. Also it does recognize some MMD metadata, such as css: some-css-file.

That said I, too, would love to be able to use MMD metadata better within DT. I have fiddled with scripting it and it is fragile, depending on frequent reading and parsing the file even if the headers haven’t changed. I guess that’s really the only way to access this data, but I wonder if it could be better if DT were more aware of it somehow.

I’m sure @cgrunenberg could figure it out, but it’s not likely to happen soon.

2 Likes

This should work in a smart rule too

-- Use MultiMarkdown metadata for record properties

property theKeys : {"title", "tags"}

tell application id "DNtp"
	try
		set windowClass to class of window 1
		if {viewer window, search window} contains windowClass then
			set currentRecord_s to selection of window 1
		else if windowClass = document window then
			set currentRecord_s to content record of window 1 as list
		end if
		
		repeat with thisRecord in currentRecord_s
			set theText to plain text of thisRecord
			set theValues to {}
			
			repeat with thisKey in theKeys
				set lineStart to thisKey & ": " as string
				set foundValue to false
				repeat with thisLine in paragraphs of theText
					if thisLine starts with lineStart then
						set end of theValues to my replaceString(thisLine, lineStart, "")
						set foundValue to true
						exit repeat
					end if
				end repeat
				if foundValue = false then set end of theValues to ""
			end repeat
			
			set theName to item 1 of theValues
			if theName ≠ "" then set name of thisRecord to theName
			
			set theTags to item 2 of theValues
			if theTags ≠ "" then
				if theTags contains "," then
					set theTagList to my createList(theTags, ",")
				else if theTags contains ";" then
					set theTagList to my createList(theTags, ";")
				else
					set theTagList to {theTags}
				end if
				set tags of thisRecord to (tags of thisRecord) & theTagList
			end if
		end repeat
		
		
	on error error_message number error_number
		if the error_number is not -128 then display alert "DEVONthink" message error_message as warning
		return
	end try
end tell


on replaceString(theText, oldString, newString)
	local ASTID, theText, oldString, newString, lst
	set ASTID to AppleScript's text item delimiters
	try
		considering case
			set AppleScript's text item delimiters to oldString
			set lst to every text item of theText
			set AppleScript's text item delimiters to newString
			set theText to lst as string
		end considering
		set AppleScript's text item delimiters to ASTID
		return theText
	on error eMsg number eNum
		set AppleScript's text item delimiters to ASTID
		error "Can't replaceString: " & eMsg number eNum
	end try
end replaceString

on createList(theText, theDelimiter)
	set d to AppleScript's text item delimiters
	set AppleScript's text item delimiters to theDelimiter
	set TextItems to text items of theText
	set AppleScript's text item delimiters to d
	return TextItems
end createList
9 Likes

@pete31 Incredible work! Thanks very much indeed, as this works beautifully from my toolbar.

Now I am now working on writing a short script that removes surrounding quotation marks (in various formats), should they be present in YAML. YAML fields are often escaped, because colons break them, and authors love colons in titles.

UPDATE: I made a Smart Rule to do this, below.

Thanks. Did you succeed with removing the quotation marks?

This helps a lot. How would you modify this script in case your list of tags in the markdown header does itself have colons in it? Due to some legacy of an old data management structure that is unfortunately not something I can change. Is there a straight-forward fix?

EDIT: Sorry, my problem was actually a different one (some of my exported metadata used tabs and some used spaces after the colon so I had to account for that). Now all works well.

1 Like

Removing quotation marks from titles (where they wrap titles in YAML, but aren’t part of the names) just became a lot easier (for me) in version 3.5. It can now be done in a Smart Rule using regular expressions.

I’ve scanned the name for the regular expression ^[\"\'](.+)[\"\']$ which means: “Look for a name with single or double quotes at the beginning and end, and if you find that, capture the text between them.” Then I replace the name with the captured text, which is just \1, or in other words, the first capture-group.

So, this looks like:

Screen Shot 2020-05-14 at 10.48.05 AM

One quirk is that renaming an item with @pete31’s Applescript above doesn’t trip the “On Renaming” event in Smart Rules, though I think that would be the most natural trigger. So I’ve used the “On Moving” event to trigger it.

1 Like

Scripts don’t trigger events (at least as long as they don’t use the extended perform smart rule AppleScript command of version 3.5).

1 Like

In case you’re trying to run the script I posted in this thread you’ll find that it doesn’t work in DEVONthink 3.6.

That’s due to DEVONthink’s new handling of “invalide arguments”.

After the release of DEVONthink 3 I decided to continue to use “search window” in scripts so that DEVONthink 2 users could use them in, well, search windows. With version 3.6 that’s not possible anymore.

If you want to use the script you’ll have to replace this voluminous block …

set windowClass to class of window 1
if {viewer window, search window} contains windowClass then
	set currentRecord_s to selection of window 1
else if windowClass = document window then
	set currentRecord_s to content record of window 1 as list
end if

… with this neat line …

set currentRecord_s to selected records

… which does what the six lines have done. Wow, that’s great! :smiley:

2 Likes

Is there a way to search for documents based on their Markdown front-matter? I have custom front-matter attributes (category and notetype) that I use to keep various “kinds” of notes separated and would (very much) like to be able to search for specific types of notes.

The source of such a document would be useful, maybe it’s sufficient to enable the hidden preference IndexRawMarkdownSource .

Hi Christian,

I enabled IndexRawMarkdownSource (using defaults write com.devon-technologies.think3 IndexRawMarkdownSource -bool TRUE) and got almost where I wanted to be. That is, about ⅓ of notes matching a given search showed up. After editing and saving one of the notes that didn’t show up in the search results, and so I thought that perhaps the index wasn’t up-to-date. After rebuilding the database and repeating the search, now all notes matching the search show up.

This is exactly what I was looking for. Thank you (!). I’ve been using DT for years now, and still have many (almost daily) moments where I think to myself, “How did I ever manage without DEVONthink?”. DT (v3, especially) is a superb product, and I, for one, am very grateful for its existence (and for the developers behind it).

4 Likes

Thank you for the nice feedback, definitely appreciated!

What kind of queries are you doing here, by the way? I’m curious!

Hi Ryan,

I have many (~12K) “Zettlekasten-style” notes that use various custom Markdown front-matter attributes such as ‘category’ and ‘notetype’. For example, I might have a note with the following front-matter:

---
category: howto
notetype: macos
---
<Note body>

After enabling IndexRawMarkdownSource (as per previous post), I can now search for notes using queries such as category:howto, notetype:macos, or category:howto notetype:macos.

Note that you may have to rebuild your database/s to get the Markdown front-matter to be fully indexed.

5 Likes

I seem to have run into a bug with DT (Pro, v3.9.4) with indexing done by IndexRawMarkdownSource. I regularly work on two machines (MacBook and iMac). Notes created directly on a particular machine (MacBook, say) are properly indexed based on Markdown front-matter. However, once those same notes sync to the other machine (iMac, in this case), searches based on Markdown front-matter don’t find the new notes.

The only solution I’ve found so far is to rebuild my “Notebox” database on the “other” machine. This “rebuilding” process is becoming tedious and this behaviour is, to my mind, a bug.

I’m willing to work with the DT folks to get to the bottom of this / see a fix provided, if needed.

Also, not sure if there is a better place to report bugs (?)

Is IndexRawMarkdownSource enabled on both Macs? It should be if that’s what you need.