Setting created/modified dates from markdown front matter

I’ve been working to convert a couple decades of content in OneNote to DEVONthink and have made a lot of progress. I started with a freely available tool on GitHub and had to do some customizing. But as a result, it spits out my OneNote content as markdown with front matter that includes the original note creation/modification times in plain text in the following format:

    Created: 2018-01-06 09:46:08 -0800
    Modified: 2018-01-06 09:48:51 -0800

My latest step along the way was to figure out how to write a script that could take those created/modified dates and use them to set the actual created/modified dates for the corresponding imported database records. I thought I’d post my findings here both to help others in the same situation and ask a question. So here’s the script I presently have working:

/*
	This function takes a bunch of rich text and then a bit of date-header text, which
	should be something like "Created: " or "Modified: " and then extracts the remainder
	of the line and returns it as the textual value of the given date.
*/

function getDateText(richText, dateHeader) {
    let eolMarker = "\n";
	let datePos = richText.search(dateHeader);
	let partial = richText.substr(datePos);
	let eolPos = partial.search(eolMarker);
	return partial.substr(0, eolPos);		
}

/* 
    Takes a text string that represents a date in "YYYY-MM-DD HH:MM:SS -TZ00" format and
	returns a new Date() object that represents the resulting date and time.
*/

function dateFromString(inputString) {

    /*
	    I'm honestly not sure why all this rigamarole is required, when it
		seems I *should* be able simply to supply the string to the Date()
		constructor and get it to work (as I did in a sandbox), but that 
		doesn't ever work. This does, so I'm sticking with it.
	*/

    const dateRegex = /(\d{4})-(\d{2})-(\d{2}).+(\d{2}):(\d{2}):(\d{2})/;
    var parts = dateRegex.exec(inputString);
	
    var year = parseInt(parts[1]);
    var month = parseInt(parts[2]) - 1;	// Date() month is zero-based
    var day = parseInt(parts[3]);
    var hour = parseInt(parts[4]);
    var minute = parseInt(parts[5]);
    var second = parseInt(parts[6]);
	
	return new Date(year, month, day, hour, minute, second);

}


(() => {

	// First, get the app and the subset of selected documents that are markdown.

    const app = Application("DEVONthink 3")
    app.includeStandardAdditions = true;
    const sel = app.selectedRecords.whose({ _match: [ObjectSpecifier().type, "markdown"] })();

	// Iterate over all of them extracting the dates and setting them into the records.
	
    sel.forEach(g => {
		let createdText = getDateText(g.richText(), "Created: ");
		let modifiedText = getDateText(g.richText(), "Modified: ");		
        const dateCreated = dateFromString(createdText);
		const dateModified = dateFromString(modifiedText);
		//console.log("Created on " + dateCreated + "\nModified on " + dateModified);
		
		g.creationDate = dateCreated;
		g.modificationDate = dateModified;
	})	
	
})()

As you can see, this process only those notes that are both selected and in markdown format, which has been convenient for testing. The one piece of the puzzle I lack is how to process all the notes in a given database in markdown format. I thought I’d ask here to see if maybe somebody could help me finish this step in the project. Thanks in advance!

1 Like

If it’s only necessary once, then the easiest approach is to search for kind:markdown, to select the results and to run the script.

Not really. I have 30+ notebooks importing into 30+ databases and keep finding new ways to tweak the process to improve the results. Is there any decent way to iterate on all the markdown documents in a given database? Or even all the documents seeing as the record type makes it easy to recognize markdown?

I’d also suggest using something like app.search("kind: markdown"). If needed, add {in: app.databases['YOUR DATABASE']} after the search string.
Rationale:

  • That way, you can search through all databases at once. Which is more difficult if you use the records collection of a database, since you’ll have to do that for every database in turn.
  • search with “kind” is presumably a lot faster than database.records().filter(r => r.type === "markdown") since DT organizes its imported records by type on disk: all MD records are already collected underneath the same folder. First getting all the records and then filtering them will incur higher disk and CPU costs, aka “time”.

Some suggestions regarding your script:

  • The function dateFromString is, in my opinion, overly complex and unneeded. What _should_work with a string like “2018-01-06 09:46:08 -0800” is this:
    return new Date(string.replace(/(\S+) (\S+) (\S+)/,"$1T$2$3")); That simply adds a “T” between date and time and removes the space before the time zone. It seems to do what it’s supposed to, though I’m not sure about the time zone. But since you ignore that in your original code, too, I don’t really bother.
  • You could try a simple app.selectedRecords.filter(r => r.type() === "markdown")) instead of using whose here. Easier to write and understand, and perhaps faster.
  • Your getDateText is a bit too complicated (for my taste, that is). Instead of introducing a lot of local variables (that are in fact consts!) you could use a simple RE like so const RE = new RegExp(`^${dateHeader}(.*)$`,"m"); return richText.match[1];
  • If you’re processing a lot of records, some optimizations might be in order, such as defining these regular expressions only once at the top of the script, saving r.richText() as a constant (and using plainText(), as richText() is a misnomer here, though it works).
  • Don’t use let nor var for constants. It’s confusing.

Actually the structure of the Files.noindex folder doesn’t affect the search at all.

Thanks for the info, I’ll try that. Thanks also for the suggestions. For what it’s worth, I was aware of a couple of them but had problems along the way. I tend to develop JavaScript using a sandbox tool on the web, and while multiple sandboxes coped with a string constructor for the Date() object, for example, I couldn’t get one working with the AppleScript tool no matter what I did. I still don’t know why. But all good suggestions nevertheless.

Which AppleScript tool are you referring to?

The Script Editor application included by default with macOS Monterey v12.7.3. That’s the only one I know.

Ah. I was irritated because of “AppleScript”.

Setting the date directly from the original string works just fine here. The timezone seems to be ok, too: With -08:00, the time in your date is 17:46:08 in UTC, which is 18:46:08 in GMT (GMT being UTC+1h).

Good to know. I was more surprised that it didn’t work with the text as it was, seeing as all the sandboxes I used were happy to construct a Date() from it. Go figure.

Well, one would have to see the code to know what might have gone wrong.

Here’s a link to one of the sandboxes I was using yesterday: JavaScript Playground

I can’t get that search qualifier you suggested to work either. For example, consider the following line of JavaScript:

const sel = app.search('kind:markdown', {in: app.databases['Personal']});

That gives me “Error -50: Parameter error.” It doesn’t seem to matter which database name I give it, it fails. If I simply remove the second argument, then I get all the markdown documents as expected, so it’s clearly the second parameter I’ve got wrong. Any suggestions?

That’s my fault. The scripting dictionary says that the value for the in parameter should be a record. Instead, I specified a database. What does work is

const sel = app.search('type:markdown', {in: app.databases['Personal'].root()});

The root property is the top-level group of the database and a record (in the DT sense).

Goodness, that’s slooooow. The code you tried to run was this

var date = new Date("2018-01-06 09:46:08 -0800");
console.log(date);

That code runs ok in Firefox and Chrome. It throws an “invalid date” error in Safari and Script Editor. This behavior is perfectly ok. As the MDN documentation on the constructor of the Date class says:

The JavaScript specification only specifies one format to be universally supported: the date time string format, a simplification of the ISO 8601 calendar date extended format. The format is as follows:
YYYY-MM-DDTHH:mm:ss.sssZ

And further down it says:

You are encouraged to make sure your input conforms to the date time string format above for maximum compatibility, because support for other formats is not guaranteed.

So, Safari and Script Editor behave correctly. Chrome and Firefox are more forgiving in that they support laxer date strings like the one you provided.

I suggest always checking the documentation at MDN when you run into an error on one platform but not on the other. Apple’s JavaScript implementation is, in my experience, quite up-to-date and close to the standard – much more than Safari’s HTML/CSS implementation. If they decide not to implement more than required, that’s perfectly fine, IMO.

Thanks for the tip on the “in:” qualifier. I’m glad I wasn’t screwing something up. I’ll have to try that new approach. It seems like you’re somehow referencing documentation of which I’m unaware. Could you point me to whatever reference it is you’re using? I’ve been doing things just guessing and searching the forums for lack of better information.

As to the JavaScript bit, I suppose it was silly of me to expect code playgrounds to enforce the language specification. I’m used to languages not being so loose, but then I suppose this is JavaScript we’re talking about. I should have known better. Thanks!

The language isn’t loose by any means. In the case of the Date constructor, it specifies a minimum requirement. Nobody can stop implementors from doing more than that, though.

As to the “playgrounds”: Those are (in my opinion) useless. Just start your browser and open its developer tools. In their “Console”, you can fiddle around with JavaScript as much as you want. And that’s the original, not some kind of plugged-in “playground”. It’s also fast and free. You can also use Apple’s osascript command-line tool:
osascript -l JavaScript file.js
runs the code in file.js.

A future release will include an option to use the document dates (of PDF, rich text, HTML, Markdown, movies, images etc.) if available when importing files instead of the file creation/modification dates. This will also support Markdown frontmatter.

I learned JavaScript when it was a no-types, no-checking, anything goes toy barely capable of making text blink on a horribly designed MySpace page, which seems to have been its only real reason for being at the time. Thankfully, it’s finally turned into something worthwhile in the meantime. I can see why you find the playgrounds useless–it certainly misled me–but they’re easier to work with than a browser IMO for more than a couple lines.

Ah, that would be lovely, though I now have a script the does the job reliably :slight_smile: