Occasionally, people have asked for a way to split Markdown files in new documents. There are basically two methods: the simple one that uses “split here” markers which are thrown away in the process. So you could insert something like “$$$” into your Markdown files wherever you want. This is described in the first example.
The second one is a bit more complicated. It allows you to define a marker as a regular expression and generates at the point where these expressions are found. An obvious example would be to split at a certain level of headlines.
Splitting Markdown, simple case
The next example splits markdown records at a predefined marker
. If the prefix
variable is set to ‘’, the script will generate new records with the name of the original ones, appending “-1”, “-2” etc. If prefix
is set to to something else, the script will generate record names prefix-1
, prefix-2
and so on.
If no (markdown) records are selected, the script will bail out with an error message. Also, if one of the selected records does not contain the marker at all, an alert is displayed and has to be acknowledged by the user.
Note that the marker disappears in the process, it is purely meant as a “cut here” indicator. Cf. the next example for a “keep the marker” example.
(() => {
const marker = '$$$' // Marker to split at. Should be on a single line.
/*
Prefix for new records.
New records will be named 'prefix-1', 'prefix-2' and so on.
Use '' to use original record's name as prefix
*/
const prefix = 'prefix';
const app = Application("DEVONthink 3");
app.includeStandardAdditions = true;
/*
* get all markdown records from selection
*/
const MDrecords = (app.selectedRecords()).filter(r => r.type() === "markdown");
if (MDrecords.length === 0) { // Abort if no MD records selected.
app.displayAlert(
`No Markdown documents selected`, {
as: "critical",
buttons: ['OK'],
});
return;
}
// Loop over all Markdown records and split them
MDrecords.forEach(m => splitFile(m, marker, prefix === '' ? m.name() : prefix ));
/*
Function to split document 'doc' at pattern 'at' into a bunch of new documents named
'prefix-1', 'prefix-2', 'prefix-3' and so on
*/
function splitFile(doc, at, prefix) {
const group = doc.parents[0]; /* get the group of the current MD document */
const chunks = doc.plainText().split(at); /* get the Markdown's text and split it in chunks at the marker */
if (chunks.length === 1) {
/* Abort if only one chunk is found, since then there's no marker in it */
app.displayAlert(
`No matches found for ${at} in document "${doc.name()}"`, {
as: "critical",
buttons: ['Cancel'],
})
return;
}
let counter = 1;
chunks.forEach(c => newRecord(`${prefix}-${counter++}`, group, c));
}
/*
Function to create a new markdown record 'recName' in 'inGroup' with plainText 'content'
*/
function newRecord(recName, inGroup, txt) {
app.createRecordWith({name: recName, type: "markdown", "plain text": txt},
{in: inGroup});
}
})()
Split Markdown records at headlines
If you want to split Markdown records at headlines, you’ll most probably want to keep those. That’s not possible with the preceding example, since it uses JavaScript’s split
method which throws away the strings it splits at. So in order to split somewhere and keep that text, you need a different approach. To illustrate, let’s assume that you have a longish Markdown document that you want to split at the second level headlines. Those are indicated by ##
at the beginning of a line.
So assuming you have a Markdown record like this
# Titel
introduction
## First headline
first paragraph
## Second headline
second paragraph
you’d get three new records: The first one containing everything from “#Titel” to just before “## First headline”, the second one everything from “## First headline” to just before “## Second headline” and the last one everything from “## Second headline” to the end.
The previous script only needs minor modifications. Set the marker
like so:
const marker = new RegExp(/(^##\s+.*$)/, "gm");
This defines a regular expression as two “#” signs at the start of a line, followed by at least one space character, followed by anything up to the end of the line. The "gm"
makes the expression global (“g”) and “m” lets ^
/$
match beginning and end of lines, respectively.
In the function splitFile
, change the lines
const chunks = doc … if (chunks.length === 1) {
to this:
const text = doc.plainText();
const matches = [...text.matchAll(at)]; /* get all matches into an array */
if (matches.length === 0) {
Here, you save the text of the record in its own variable text
, which you’ll need later on. Then you get all matches for the regular expression (at
) in the array matches
. In order for matchAll
to work, the regular expression needs to be defined as “global” as shown before, otherwise you’ll see an error.
Finally, you need to iterate over the matches to create the new records like so:
let start = 0;
matches.forEach(m => {
newRecord(`${prefix}-${counter++}`, group, text.substr(start, m.index - start));
start = m.index;
})
// handle last match
newRecord(`${prefix}-${counter++}`, group, text.substr(start, text.length - start));
Every element of matches
is itself an array with a special property index
. It contains the numerical position where this match starts. The first new record should comprise everything from the beginning of the text just before the first headline, i.e. the first match. So the variable start
is set to 0, and the function newRecord
is passed the part of the text starting at 0 and consisting of all the characters before to the first match (m.index - start
). After that first step, the script adjusts start
so that it points at the beginning of the first match… and so on.
You may have noticed that the first match is saved in the second new record. So at the end of the forEach
loop, the text starting at the last match (i.e. the last headline) has not been written yet. That’s what the final line above takes care of.
Click here for the full script
(() => { /* Regular expression to split at. You can also use a simple string like /## /, but that would match anywhere in the text, too. */ const marker = new RegExp(/(^##\s+.*$)/, "gm"); /* Prefix for new records. New records will be named 'prefix-1', 'prefix-2' and so on. Use '' to use original record's name as prefix */ const prefix = '';const app = Application("DEVONthink 3"); app.includeStandardAdditions = true; /* * get all markdown records from selection */ const MDrecords = (app.selectedRecords()).filter(r => r.type() === "markdown"); if (MDrecords.length === 0) { // Abort if no MD records selected. app.displayAlert( `No Markdown documents selected`, { as: "critical", buttons: ['OK'], }); return; } // Loop over all Markdown records and split them MDrecords.forEach(m => splitFile(m, marker, prefix === '' ? m.name() : prefix )); /* Function to split document 'doc' at pattern 'at' into a bunch of new documents named 'prefix-1', 'prefix-2', 'prefix-3' and so on */ function splitFile(doc, at, prefix) { const group = doc.parents[0]; /* get the group of the current MD document */ const text = doc.plainText(); const matches = [...text.matchAll(at)]; /* get all matches into an array */ if (matches.length === 0) { /* Abort if only one chunk is found, since then there's no marker in it */ app.displayAlert( `No matches found for ${at} in document "${doc.name()}"`, { as: "critical", buttons: ['Cancel'], }) return; } let counter = 1; let start = 0; matches.forEach(m => { newRecord(`${prefix}-${counter++}`, group, text.substr(start, m.index - start)); start = m.index; }) // handle last match newRecord(`${prefix}-${counter++}`, group, text.substr(start, text.length - start)); } /* Function to create a new markdown record 'recName' in 'inGroup' with plainText 'content' */ function newRecord(recName, inGroup, txt) { app.createRecordWith({name: recName, type: "markdown", "plain text": txt}, {in: inGroup}); }
})()