Extract metadata in markdown file with "scan text" function

analogue_man · June 8, 2025, 6:53pm

I need help using the scan text function in a smart rule. I have *.md files with a metadata header such as

Title: 
Author: 
Zettel-ID: 001.001d
Keywords: 
Created:

Now, I would like to extract the Zettel-ID and place it into a dedicated custom metadata field called “id”. I positively tested my RegEx which should be Zettel-ID: (.+) This way I can have any ID I want no matter how it develops.

Yet, I cannot get the smart rule to work with the metadata. Change ID to \1 does not give me any results. I tried to debug it via Display Alert and got zero results as well.

I came to the assumption that metadata in md-files cannot be processed by smart rules but I am not sure. What am I not seeing?

BLUEFROG · June 8, 2025, 8:36pm

DEVONthink doesn’t index hidden content, by default.
If you’re running the DEVONthink 4 beta, select Help > Hidden Preferences then click the On link for IndexRawMarkdownSource. Create a new Markdown doc with that metadata and try the automation again.

analogue_man · June 8, 2025, 8:45pm

Thank you for your response! Is that only available in 4.0 (beta)?

BLUEFROG · June 8, 2025, 8:49pm

In DEVONthink 3.x, (assuming the extra script isn’t installed)…

Open DEVONthink’s Scripts menu > More Scripts.
Install the Fix Hidden Prefs script.
Select Scripts menu > Tabs > Open Hidden Preferences and a window will appear with DEVONthink’s hidden preferences.
Locate the desired preference and click the On/Off link on the desired option.

Obviously, only steps 3 and 4 would be used going forward.

PS: While not always necessary, quitting and relaunching DEVONthink after changing a hidden preference is a good idea.

analogue_man · June 8, 2025, 9:14pm

thanks! that worked

cgrunenberg · June 9, 2025, 12:41pm

The next release will add a markdown parameter to the get metadata of AppleScript command to easily retrieve Markdown metadata too.

troejgaard · June 9, 2025, 2:40pm

Any arbitrary metadata key, not just properties? Hallelujah!
I know it’s already possible to manually parse the metadata header in a script, but that sounds so much easier.

cgrunenberg · June 9, 2025, 2:43pm

Yes, anything.

BLUEFROG · June 9, 2025, 3:54pm

What are you thinking of re: arbitrary metadata?

troejgaard · June 9, 2025, 4:38pm

metadata key
Nothing in particular, I just wanted to spell out my appreciation. Some MMD metadata keys are already indexed as properties. Now we’ll get an easy way to fetch any other metadata key we might come up with.

(Well, I assume it must follow the MultiMarkdown specification, which seems obvious, but aside from that.)

BLUEFROG · June 9, 2025, 4:48pm

Gotcha. I don’t know the implementation yet but it’s an interesting idea.

analogue_man · July 18, 2025, 12:12pm

For some reason I cannot get that extraction/Custom Metadata change to work. I checked the regex again and again, it should work. The Smart Rule is not changing the Custom Metadata “ID” to the Zettel-ID “001.001d”.

In the header it is as a single line

… Zettel-ID: 001.001d …

Any idea what I am doing wrong? I am running DTP 4.0.2.

saltlane · July 18, 2025, 12:25pm

Have you checked the selection? Single click the start rule in the lefthand navigation pane and it should show all the documents that satisfy the selection criteria. I have had this were the smart rule didn’t select anything. ID is “” may be the issue if it is not selecting anything, Is there an ‘is not’ option for ID to find the documents?

analogue_man · July 18, 2025, 12:35pm

the correct document is found. The issue seems to be with the regex > ID

saltlane · July 18, 2025, 12:47pm

Does using (.*) make any difference? (zero or more of any character as opposed to one or more of any character) - I am not a regex expert

chrillek · July 18, 2025, 1:24pm

No, the issue seems to be with the indexing of the metadata.

I set the correct default value for IndexRawMarkdown
I re-built the database to make sure that the MD file gets re-indexed
I tried your RE with a file having Zettel-ID: ... in its metadata
Nothing found by the RE
I create a Zettel-ID: … line in the normal text: Everything honky-dory.
I also tried adding using Autor: in the MD metadata, which is defined in MMD rules. No luck.

Perhaps a regression in 4.0.2? I seem to remember that this worked at some point in the past.

Aside: A smart rule script can easily do what you want. The one below is written in JavaScript because it provides for easy Regular Expression processing. In AppleScript, you would use the text delimiter to split your MD file at the correct place.

function performsmartrule(records) {
  const app = Application("DEVONthink");
  records.filter(r => r.recordType() === "markdown").forEach(r => {
    const txt = r.plainText();
    const zettelMatch = txt.match(/^Zettel-ID: (.*)$/m);
    if (zettelMatch) {
       app.addCustomMetaData(match[1], {for: 'ID', to: r});
    }
  })
}

BLUEFROG · July 18, 2025, 1:57pm

@cgrunenberg: There is a bug for sure.
If there is only custom MultiMarkdown metadata, the Scan Text works. If there is other content, it won’t.

From a batch process…

cgrunenberg · July 18, 2025, 2:36pm

The Scan Text action has been always only using the text body of documents (without metadata) as IndexRawMarkdownSource affects only the search index.

cgrunenberg · July 18, 2025, 2:41pm

And that’s actually working as intended, the one with only metadata shouldn’t.

cgrunenberg · July 18, 2025, 2:55pm

The easiest option in version 4 is actually a script for the Script with Input/Output action. Afterwards the result can be easily used via the Script Output placeholder. The next release will include such a script.

on scriptOutput(theRecord, theInput)
	tell application id "DNtp"
		if record type of theRecord is markdown then
			set theText to plain text of theRecord
			set theMetadata to get metadata of markdown theText
			return |zettel-id| of theMetadata
		end if
	end tell
end scriptOutput