I’m having trouble completing the importation of a markdown file with metadata. I have Devonthink Pro 3.9.6
Could you help? I’m attaching a test md and the script I’m trying.
Script markdown import.zip (6.9 KB)
I’ve tried several ways to automate importing of tags, but it’s not so straightforward.
If I import a markdown file with # like “#Philosophy/Transcendentalism, #introspection, #divine immanence, #self-reliance, #rejection of external authority” then only the adjecent word to the # is converted to a tag eventhough the delimeter should be ", ".
I’ve been raking my brain, but can’t come up with anything. I keep getting tag errors. This is the latest script
use AppleScript version "2.5"
use scripting additions
use framework "Foundation"
on run
-- We assume DEVONthink exists and it is version 3
set devonThink to application id "com.devon-technologies.think3"
tell devonThink
set theDatabase to current database
if theDatabase is missing value then
display alert "Please open a database in DEVONthink 3."
return
end if
set theFiles to choose file with multiple selections allowed
repeat with aFile in theFiles
try
-- Extract content from a file and then get the metadata and cleaned content
set theContent to readFileAsUTF8(aFile)
set {metadata, cleanContent} to extractMetadataAndContent(theContent)
-- Import the cleaned content to DEVONthink
set theRecord to import {name:(getFileName(aFile)), type:markdown, content:cleanContent} to theDatabase
-- Process metadata and Rename the record
tell theRecord
processMetadata(it, metadata)
renameRecord(it, metadata)
end tell
on error errorMessage number errorNumber
display alert "Error processing file " & (POSIX path of aFile) message errorMessage
end try
end repeat
end tell
end run
on readFileAsUTF8(aFile)
set fileURL to current application's NSURL's fileURLWithPath:(POSIX path of aFile)
set {theContent, theError} to current application's NSString's stringWithContentsOfURL:fileURL encoding:(current application's NSUTF8StringEncoding) |error|:(reference)
if theContent is missing value then
if theError is not missing value then
set errorMessage to theError's localizedDescription() as text
else
set errorMessage to "Unknown error"
end if
error "Failed to read file: " & (POSIX path of aFile) & " - " & errorMessage
end if
return theContent as text
end readFileAsUTF8
on extractMetadataAndContent(theContent)
set prevTIDs to AppleScript's text item delimiters
set AppleScript's text item delimiters to "---"
set contentParts to text items of theContent
set AppleScript's text item delimiters to prevTIDs
if (count of contentParts) < 3 then error "Invalid file format"
set metadata to {}
set rawMetadata to item 2 of contentParts
set metadataLines to paragraphs of rawMetadata
repeat with aLine in metadataLines
if aLine contains ": " then
set {key, value} to my splitString(aLine, ": ")
set metadata to metadata & {{key, value}}
end if
end repeat
return {metadata, item 3 of contentParts} -- Clean content is the 3rd item in the list
end extractMetadataAndContent
on processMetadata(theRecord, metadata)
repeat with metaPair in metadata
set {key, value} to metaPair
if key is "tags" then
set tagList to my splitString(value, ", ")
add tagList as tags
else
set custom meta data key to value
end if
end repeat
end processMetadata
on renameRecord(theRecord, metadata)
set newName to ""
repeat with metaPair in metadata
set {key, value} to metaPair
if key is "Title" then
set newName to value
exit repeat
else if key is "Author" and newName is "" then
set newName to value
end if
end repeat
if newName is not "" then set its name to newName
end renameRecord
on splitString(theString, theDelimiter)
set prevTIDs to AppleScript's text item delimiters
set AppleScript's text item delimiters to theDelimiter
set theArray to every text item of theString
set AppleScript's text item delimiters to prevTIDs
return theArray
end splitString
on getFileName(aFile)
return (name of (info for aFile))
end getFileName
First off: your test.md contains a weird character on the first line (before the line beginning with ---. The same character appears after the metadata. I’d get rid of that.
Second: I’d not use AppleScript to handle this kind of rather advanced string processing. It might be possible, but it’s not pretty, and quite complicated. Instead, use JavaScript.
Next, the steps you perform: They’re too complicated. Just import the files into DT and clean them up there. No need for ASObjC just for that. type is an obsolete parameter for import.
Your getFileName handler uses info for, which is deprecated. I guess that name returned by that command is just the filename. Which can be obtained instead by a simple string operation (see below in the JS code).
The JavaScript code could look similar to the following. I didn’t bother much with error checks and am not sure that the decision on a new name follows your logic.
// I leave out the file selection steps, assuming their path's are in the `paths` array
const app = Application("DEVONthink 3");
const database = app.currentDatabase();
/* Regular expression to find the metadata block */
const MDregEx = /^---\n(.*?)---$/ms;
/* Loop over all files */
paths.forEach(p => {
const record = app.import(p,{name: p.split('/').pop(), to: database.root()});
const txt = record.plainText();
// get the metadata and remove them from the text
const match = txt.match(MDregEx);
if (!match) return; // continue with next record if no metadata found
// Remove metadata from MD file
record.plainText = txt.replace(MDregEx, '');
// Build a string array containing one entry for each line of metadata
const metadata = match[1].split('\n');
let newName = undefined;
// Loop over the metadata
metadata.forEach(md => {
const [key, value] = md.split(':');
if (value === undefined) return; // Skip over lines without a colon
if (key === 'tags') {
record.tags = value.split(',');
} else {
app.addCustomMetaData(value, {for: key, to: record});
if (!newName && (key === 'Title' || key === 'Author')) {
newName = value;
}
}
})
if (newName) {
record.name = newName;
}
})
Thanks… I’ll see how I could do it in javascript.
The weird characters are actually unicode characters and I used them because I don’t imagine i’d ever use them in a real text file, and it was an easy way to “encapsulate” what I would then be able to easily delete through regex or applescript…
In essence, I just wanted to import these things into the metadata…
Then I kept running into errors, and just tweaking until I gave up.
Edit
I looked up the available applescript commads from devonthink, and even referenced the stable 2.5 applescript but it seems that devonthink isn’t using standard applescript in certain areas.
if you have an updated reference, please share.
You need to clarify this. DEVONthink certainly uses standard AppleScript but applications have independent implementations of the various functions they want to make scriptable.
Also, according to the script in the ZIP, you are trying to write metadata to Markdown documents. That is not supported, as is evidenced by the lack of functionality in the Info > Properties inspector when a Markdown document is selected.
set devonThink to application id “com.devon-technologies.think3”
The recommended form is application id "DNtp" and there’s no need to set this to a variable. Just tell it directly.
Also, the language you use is a matter of comfort and experience. JavaScript is not for everyone, just as AppleScript or ASOC is not. That being said, this applies custom metadata and tags from a selected Markdown document’s content…
tell application id "DNtp"
repeat with theRecord in (selected records whose (type is markdown))
set src to plain text of theRecord
set mdMarker to false
set od to AppleScript's text item delimiters
repeat with theParagraph in (paragraphs of src)
if (theParagraph is "---") and (not mdMarker) then
set mdMarker to true
else if (theParagraph is "---") and mdMarker then
exit repeat
else
if theParagraph contains ":" then
set AppleScript's text item delimiters to ":"
set {theKey, theValue} to (text items of theParagraph)
log {theKey, theValue}
if (theKey is not "Tags") then
add custom meta data theValue for theKey to theRecord
else
set AppleScript's text item delimiters to ","
set tagList to (text items of theValue)
set tags of theRecord to (theRecord's tags & tagList)
end if
set AppleScript's text item delimiters to od
end if
end if
end repeat
end repeat
end tell
It could easily be implemented as a smart script for use with batch processing and smart rules.
I’m wondering if it would just be easier to set IndexRawMarkdownSource to true…
I’m ok with reindexing, or just building a new db … but I don’t want the metadata in the md file after it’s been imported…
I don’t want to sift through a lot of similar matches.
This is not a guaranteed occurrence. It depends on what you’re searching for and in what scope / context.
If you are planning to use or transmit the Markdown document outside DEVONthink, it would be wise to retain that text. The applied custom metadata in DEVONthink is not going to be used by an app like Typora, etc.
I wasn’t planning on leaving Devonthink
Also, I can export the metadata with the text in csv… Most of my data is not long… less than 1k words so I could just add it back.
The difficult thing is how to add the metadata from the text.
I think I’m going to have to rethink this.
I was able to do the tags effectively, but the other stuff is tricky to do in one go.
By the way, your symbol ۞ as the first line keeps the metadata from being invisible.
PS: Here is a small modification, including stripping the metadata from the text (though I still don’t care for the idea )…
tell application id "DNtp"
repeat with theRecord in (selected records whose (type is markdown))
set src to plain text of theRecord
set {incr, hasMetadata, docModified, mdMarker} to {1, false, false, 1}
set od to AppleScript's text item delimiters
repeat with theParagraph in (paragraphs of src)
if (text of theParagraph is "---") and (not hasMetadata) then
set hasMetadata to true
set mdMarker to incr
else if (text of theParagraph is "---") and hasMetadata then
exit repeat
else
if theParagraph contains ":" then
set docModified to true
set AppleScript's text item delimiters to ":"
set {theKey, theValue} to (text items of theParagraph)
if (theKey is not "Tags") then
add custom meta data theValue for theKey to theRecord
else
set AppleScript's text item delimiters to ","
set tagList to (text items of theValue)
set tags of theRecord to (theRecord's tags & tagList)
end if
set AppleScript's text item delimiters to od
end if
end if
set incr to incr + 1
end repeat
---------- Remove this section if you don't want to remove the metadata text in the content.
if docModified and (mdMarker is not 1) then
set AppleScript's text item delimiters to linefeed
set modText to ({paragraphs (mdMarker + incr) thru -1 of src} as string)
set AppleScript's text item delimiters to od
set plain text of theRecord to modText
end if
----------
end repeat
end tell
Haha! One is more than sufficient.
Make sure you duplicate a few documents and test the script, especially for the content stripping, before committing to using it on production files.
And more importantly to me: do you understand what’s going on in the script, the reasons for the bits and bobs in it?
Oh!
I see… There’s no undo… It’s gone for good!
Yes…
Thank you for the warning.
I’m in Hanoi time, and my brain doesn’t work now.
But, in case anyone is wondering what I’m doing, and why I’m even rambling on about this is because I do Literary Analysis. I have an ai script which allows me to take a text excerpt, and process it, and spit out a template like in the test file. This is the prompt. The section can be set to any categories someone is working with.
You are a master archivist tasked with helping to organize and retrieve excerpts from various readings, primarily in literature but potentially covering a wide range of interests. Your goal is to distill and describe the essence of each excerpt through carefully chosen tags that will facilitate easy retrieval based on concepts.
Here is the excerpt to analyze:
<excerpt>
{{EXCERPT}}
</excerpt>
Your task is to create a set of tags that accurately capture the main concepts and themes of this excerpt. These tags should follow a specific structure and meet certain requirements:
1. Tag Structure:
- Only for the first tag, use the format: tag/subtag
- Separate multiple tags with a comma and space ", "
2. Tag Requirements:
- Only the first tag must be a tag/subtag type
- It must include the most appropriate category from the provided list of Academic Disciplines
- The subtag must be a keyword, or keyword-phrase that best fits the concept in the text excerpt
- Create up to 4 additional tags that continue the pattern of identifying the concept from broad to narrow
B. Metadata Requirements:
1. Provide a title for the excerpt (if not obvious, create a brief descriptive one that could aid in memorization in a declarative sentence form)
2. Provide an author (if known, otherwise leave blank)
3. Provide a Reference (if known, otherwise leave blank)
Here is the list of Academic Disciplines to choose from:
<AcademicDisciplines>
# Philosophy
Aesthetics
Applied philosophy
Philosophy of economics
Philosophy of education
Philosophy of engineering
Philosophy of history
Philosophy of language
Philosophy of law
Philosophy of mathematics
Philosophy of music
Philosophy of psychology
Philosophy of religion
Philosophy of physical sciences
Philosophy of biology
Philosophy of chemistry
Philosophy of physics
Philosophy of social science
Philosophy of technology
Systems philosophy
Political Philosophy
Epistemology
Justification
Reasoning errors
Ethics
Applied ethics
Animal rights
Bioethics
Environmental ethics
Meta-ethics
Moral psychology, Descriptive ethics, Value theory
Normative ethics
Virtue ethics
Logic
Mathematical logic
Philosophical logic
Meta-philosophy
Metaphysics
Philosophy of Action
Determinism and Free will
Ontology
Philosophy of mind
Philosophy of pain
Philosophy of artificial intelligence
Philosophy of perception
Philosophy of space and time
Teleology
Theism and Atheism
Philosophical traditions and schools
African philosophy
Analytic philosophy
Aristotelianism
Continental philosophy
Eastern philosophy
Feminist philosophy
Islamic philosophy
Platonism
Social philosophy and political philosophy
Anarchism
Feminist philosophy
Libertarianism
Marxism
</AcademicDisciplines>
To create the tags:
1. Carefully read and analyze the excerpt
2. Identify the main concepts, themes, and ideas presented
3. Select the most appropriate Academic Discipline that aligns with the excerpt's content
4. Choose a specific subtag that best represents the core concept of the excerpt
5. Create up to 3 additional tags that further refine and narrow down the concepts, moving from broad to specific
Format your final output as follows:
۞
---
Author: [insert author if known]
Title: [brief descriptive, and memorizable title in declarative sentence form]
Reference: [insert if known]
tags: [Insert your tags here, following the specified format and requirements]
---
۞
1. Note that there must be two return characters.
1a. Formatting is markdown YAML.
Ensure that your tags accurately reflect the content of the excerpt and would be useful for retrieving this information later based on its concepts and themes.
Do not comment on the tags.
I strongly recommend you remove the first ۞ and make sure the metadata starts at the very first line of the document. Include the ۞ after the metadata block, if desired. I would also recommend you excise the code removal in the last version of the script I posted.
PS: Get some sleep. Some problems are better to rest on and come back to afresh.
PPS: The result without the content cutting and the change I recommended…
In case anyone is interested, or if I somehow delete the script, this new script will work as a YAML thingy which will update the fields when changed, and also it will respect the “—” boundary. You can also add your own #tags manually which will autopopulate if you have the create tags from # setup. Also, it will change the filename to whatever is set in the title.
tell application id "DNtp"
repeat with theRecord in (selected records whose (type is markdown))
-- First clear all existing custom metadata
set custom meta data of theRecord to {}
-- Clear existing tags
set tags of theRecord to {}
set src to plain text of theRecord
set mdMarker to false
set od to AppleScript's text item delimiters
set foundTitle to "" -- Variable to hold the title for renaming
set paraCount to count paragraphs of src
set i to 1
repeat until i > paraCount
set theParagraph to paragraph i of src
if (theParagraph is "---") and (not mdMarker) then
set mdMarker to true
else if (theParagraph is "---") and mdMarker then
exit repeat
else if mdMarker then
if theParagraph contains ":" then
set AppleScript's text item delimiters to ":"
set {theKey, theValue} to (text items of theParagraph)
log {theKey, theValue}
if (theKey is "title") then
set foundTitle to theValue
end if
if (theKey is not "Tags") then
add custom meta data theValue for theKey to theRecord
else
set AppleScript's text item delimiters to ","
set tagList to (text items of theValue)
set tags of theRecord to tagList
end if
set AppleScript's text item delimiters to od
end if
end if
set i to i + 1
end repeat
if (foundTitle is not "") then
set name of theRecord to foundTitle
end if
end repeat
end tell