Extract metadata in markdown file with "scan text" function

I need help using the scan text function in a smart rule. I have *.md files with a metadata header such as

Title: 
Author: 
Zettel-ID: 001.001d
Keywords: 
Created: 

Now, I would like to extract the Zettel-ID and place it into a dedicated custom metadata field called “id”. I positively tested my RegEx which should be Zettel-ID: (.+) This way I can have any ID I want no matter how it develops.

Yet, I cannot get the smart rule to work with the metadata. Change ID to \1 does not give me any results. I tried to debug it via Display Alert and got zero results as well.

I came to the assumption that metadata in md-files cannot be processed by smart rules but I am not sure. What am I not seeing?

DEVONthink doesn’t index hidden content, by default.
If you’re running the DEVONthink 4 beta, select Help > Hidden Preferences then click the On link for IndexRawMarkdownSource. Create a new Markdown doc with that metadata and try the automation again.

1 Like

Thank you for your response! Is that only available in 4.0 (beta)?

In DEVONthink 3.x, (assuming the extra script isn’t installed)

  1. Open DEVONthink’s Scripts menu > More Scripts.
  2. Install the Fix Hidden Prefs script.
  3. Select Scripts menu > Tabs > Open Hidden Preferences and a window will appear with DEVONthink’s hidden preferences.
  4. Locate the desired preference and click the On/Off link on the desired option.

Obviously, only steps 3 and 4 would be used going forward.

PS: While not always necessary, quitting and relaunching DEVONthink after changing a hidden preference is a good idea.

1 Like

thanks! that worked

1 Like

The next release will add a markdown parameter to the get metadata of AppleScript command to easily retrieve Markdown metadata too.

3 Likes

Any arbitrary metadata key, not just properties? Hallelujah! :tada:
I know it’s already possible to manually parse the metadata header in a script, but that sounds so much easier.

Yes, anything.

1 Like

What are you thinking of re: arbitrary metadata?

metadata key :wink:
Nothing in particular, I just wanted to spell out my appreciation. Some MMD metadata keys are already indexed as properties. Now we’ll get an easy way to fetch any other metadata key we might come up with.

(Well, I assume it must follow the MultiMarkdown specification, which seems obvious, but aside from that.)

Gotcha. I don’t know the implementation yet but it’s an interesting idea.

For some reason I cannot get that extraction/Custom Metadata change to work. I checked the regex again and again, it should work. The Smart Rule is not changing the Custom Metadata “ID” to the Zettel-ID “001.001d”.

In the header it is as a single line

… Zettel-ID: 001.001d …

Any idea what I am doing wrong? I am running DTP 4.0.2.

Have you checked the selection? Single click the start rule in the lefthand navigation pane and it should show all the documents that satisfy the selection criteria. I have had this were the smart rule didn’t select anything. ID is “” may be the issue if it is not selecting anything, Is there an ‘is not’ option for ID to find the documents?

1 Like

the correct document is found. The issue seems to be with the regex > ID

Does using (.*) make any difference? (zero or more of any character as opposed to one or more of any character) - I am not a regex expert :roll_eyes:

No, the issue seems to be with the indexing of the metadata.

  • I set the correct default value for IndexRawMarkdown
  • I re-built the database to make sure that the MD file gets re-indexed
  • I tried your RE with a file having Zettel-ID: ... in its metadata
  • Nothing found by the RE
  • I create a Zettel-ID: … line in the normal text: Everything honky-dory.
  • I also tried adding using Autor: in the MD metadata, which is defined in MMD rules. No luck.

Perhaps a regression in 4.0.2? I seem to remember that this worked at some point in the past.

Aside: A smart rule script can easily do what you want. The one below is written in JavaScript because it provides for easy Regular Expression processing. In AppleScript, you would use the text delimiter to split your MD file at the correct place.

function performsmartrule(records) {
  const app = Application("DEVONthink");
  records.filter(r => r.recordType() === "markdown").forEach(r => {
    const txt = r.plainText();
    const zettelMatch = txt.match(/^Zettel-ID: (.*)$/m);
    if (zettelMatch) {
       app.addCustomMetaData(match[1], {for: 'ID', to: r});
    }
  })
}
1 Like

@cgrunenberg: There is a bug for sure.
If there is only custom MultiMarkdown metadata, the Scan Text works. If there is other content, it won’t.

From a batch process…

2 Likes

The Scan Text action has been always only using the text body of documents (without metadata) as IndexRawMarkdownSource affects only the search index.

And that’s actually working as intended, the one with only metadata shouldn’t.

The easiest option in version 4 is actually a script for the Script with Input/Output action. Afterwards the result can be easily used via the Script Output placeholder. The next release will include such a script.

on scriptOutput(theRecord, theInput)
	tell application id "DNtp"
		if record type of theRecord is markdown then
			set theText to plain text of theRecord
			set theMetadata to get metadata of markdown theText
			return |zettel-id| of theMetadata
		end if
	end tell
end scriptOutput
3 Likes