Automating DT with JavaScript: Splitting Markdown

Occasionally, people have asked for a way to split Markdown files in new documents. There are basically two methods: the simple one that uses “split here” markers which are thrown away in the process. So you could insert something like “$$$” into your Markdown files wherever you want. This is described in the first example.

The second one is a bit more complicated. It allows you to define a marker as a regular expression and generates at the point where these expressions are found. An obvious example would be to split at a certain level of headlines.

Splitting Markdown, simple case

The next example splits markdown records at a predefined marker. If the prefix variable is set to ‘’, the script will generate new records with the name of the original ones, appending “-1”, “-2” etc. If prefix is set to to something else, the script will generate record names prefix-1, prefix-2 and so on.

If no (markdown) records are selected, the script will bail out with an error message. Also, if one of the selected records does not contain the marker at all, an alert is displayed and has to be acknowledged by the user.

Note that the marker disappears in the process, it is purely meant as a “cut here” indicator. Cf. the next example for a “keep the marker” example.


(() => {
const marker = '$$$' // Marker to split at. Should be on a single line.
/*
Prefix for new records. 
New records will be named 'prefix-1', 'prefix-2' and so on. 
Use '' to use original record's name as prefix
*/
const prefix = 'prefix';

const app = Application("DEVONthink 3");
app.includeStandardAdditions = true;
/*
* get all markdown records from selection
*/
const MDrecords = (app.selectedRecords()).filter(r => r.type() === "markdown");

if (MDrecords.length === 0) { // Abort if no MD records selected.
  app.displayAlert(
    `No Markdown documents selected`, {
    as:  "critical",
    buttons: ['OK'],
  });
  return;
}

// Loop over all Markdown records and split them

MDrecords.forEach(m => splitFile(m, marker, prefix === '' ? m.name() : prefix ));


/* 
Function to split document 'doc' at pattern 'at' into a bunch of new documents named 
'prefix-1', 'prefix-2', 'prefix-3' and so on
*/

function splitFile(doc, at, prefix) {
  const group = doc.parents[0]; /* get the group of the current MD document */
  const chunks = doc.plainText().split(at); /* get the Markdown's text and split it in chunks at the marker */
  if (chunks.length === 1) {
  /* Abort if only one chunk is found, since then there's no marker in it */
    app.displayAlert(
      `No matches found for ${at} in document "${doc.name()}"`, {
      as:  "critical",
      buttons: ['Cancel'],
    }) 
    return;
  }
  let counter = 1;
  chunks.forEach(c =>  newRecord(`${prefix}-${counter++}`, group, c));		
}

/*
Function to create a new markdown record 'recName' in 'inGroup' with plainText 'content'
*/
function newRecord(recName, inGroup, txt) {
  app.createRecordWith({name: recName, type: "markdown", "plain text": txt},
  {in: inGroup});
}

})()

Split Markdown records at headlines

If you want to split Markdown records at headlines, you’ll most probably want to keep those. That’s not possible with the preceding example, since it uses JavaScript’s split method which throws away the strings it splits at. So in order to split somewhere and keep that text, you need a different approach. To illustrate, let’s assume that you have a longish Markdown document that you want to split at the second level headlines. Those are indicated by ## at the beginning of a line.

So assuming you have a Markdown record like this

# Titel

introduction

## First headline

first paragraph

## Second headline

second paragraph

you’d get three new records: The first one containing everything from “#Titel” to just before “## First headline”, the second one everything from “## First headline” to just before “## Second headline” and the last one everything from “## Second headline” to the end.

The previous script only needs minor modifications. Set the marker like so:
const marker = new RegExp(/(^##\s+.*$)/, "gm");
This defines a regular expression as two “#” signs at the start of a line, followed by at least one space character, followed by anything up to the end of the line. The "gm" makes the expression global (“g”) and “m” lets ^/$ match beginning and end of lines, respectively.

In the function splitFile, change the lines
const chunks = doc … if (chunks.length === 1) {
to this:

const text = doc.plainText();
const matches = [...text.matchAll(at)]; /* get all matches into an array */
if (matches.length === 0) {

Here, you save the text of the record in its own variable text, which you’ll need later on. Then you get all matches for the regular expression (at) in the array matches. In order for matchAll to work, the regular expression needs to be defined as “global” as shown before, otherwise you’ll see an error.

Finally, you need to iterate over the matches to create the new records like so:

let start = 0;
matches.forEach(m => {
  newRecord(`${prefix}-${counter++}`, group, text.substr(start, m.index - start));
  start = m.index;
})
// handle last match
newRecord(`${prefix}-${counter++}`, group, text.substr(start, text.length - start));

Every element of matches is itself an array with a special property index. It contains the numerical position where this match starts. The first new record should comprise everything from the beginning of the text just before the first headline, i.e. the first match. So the variable start is set to 0, and the function newRecord is passed the part of the text starting at 0 and consisting of all the characters before to the first match (m.index - start). After that first step, the script adjusts start so that it points at the beginning of the first match… and so on.

You may have noticed that the first match is saved in the second new record. So at the end of the forEach loop, the text starting at the last match (i.e. the last headline) has not been written yet. That’s what the final line above takes care of.

Click here for the full script
(() => {
    /*
	Regular expression to split at. You can also use a simple string like /## /, but that would match anywhere in the text, too. 
	*/
	const marker = new RegExp(/(^##\s+.*$)/, "gm"); 
	/*
	Prefix for new records. 
	New records will be named 'prefix-1', 'prefix-2' and so on. 
	Use '' to use original record's name as prefix
	*/
	const prefix = '';
const app = Application("DEVONthink 3");
app.includeStandardAdditions = true;
/*
* get all markdown records from selection
*/
const MDrecords = (app.selectedRecords()).filter(r => r.type() === "markdown");

if (MDrecords.length === 0) { // Abort if no MD records selected.
	app.displayAlert(
		`No Markdown documents selected`, {
			as:  "critical",
			buttons: ['OK'],
		});
	return;
}

// Loop over all Markdown records and split them

MDrecords.forEach(m => splitFile(m, marker, prefix === '' ? m.name() : prefix ));

/* 
Function to split document 'doc' at pattern 'at' into a bunch of new documents named 
'prefix-1', 'prefix-2', 'prefix-3' and so on
*/

function splitFile(doc, at, prefix) {
	const group = doc.parents[0]; /* get the group of the current MD document */
	const text = doc.plainText();
	const matches = [...text.matchAll(at)]; /* get all matches into an array */
	if (matches.length === 0) {
	/* Abort if only one chunk is found, since then there's no marker in it */
		app.displayAlert(
			`No matches found for ${at} in document "${doc.name()}"`, {
				as:  "critical",
				buttons: ['Cancel'],
			})
		return;
	}
	let counter = 1;
	let start = 0;
	matches.forEach(m => {
	  newRecord(`${prefix}-${counter++}`, group, text.substr(start, m.index - start));
	  start = m.index;
    })
	// handle last match
    newRecord(`${prefix}-${counter++}`, group, text.substr(start, text.length - start));
}

/*
Function to create a new markdown record 'recName' in 'inGroup' with plainText 'content'
*/
function newRecord(recName, inGroup, txt) {
	app.createRecordWith({name: recName, type: "markdown", "plain text": txt},
	 {in: inGroup});
}

})()

3 Likes

Very nice automated approach.
Thanks for sharing (and explaining) it :slight_smile:

Hello, @chrillek,
I wish to use your first script, but my knowledge of javascript is zero. So, after I copy your script, what do I do then?
If you or anyone else here can guide me, it would be great.
Thank you,
Yuval

A good first start in the world of automation with DEVONthink is to read the relevant portions of the “Automation” Appendix in the DEVONthink Handbook. Page 181 of Ver 3.8 of that outstanding document.

Thank you, @rmschne,
I read that, but I still don’t know how to activate the script.
If it’s an AppleScript, I save it with mac built-in scripts editor and then put it in the scripts folder of DT to use it from there. It doesn’t seem to be the case here.
So, what should I do?

What makes you think that? Did you copy / paste the code in Script Editor (changing its language to JavaScript) and save it to DT’s script folder from there? What happened? What did not happen? Any error messages?

“It doesn’t seem to be the case here” is unfortunately not a helpful problem description.

@chrillek, that’s what I was missing. Now that I changed the language, everything is working great!

When I said, “that’s not the case,” I meant I don’t even know what is the right question.
Thank you very much for the script

Also take a look at the blog post by @chrillek linked at DEVONtechnologies | How to Use JavaScript for Automation (which I found via Google for you).

1 Like

@rmschne, thank you for that link. I will check it out.