Hi there, here’s a script to convert RTF to MultiMarkdown.
Textutil or pandoc produces a header
CocoaVersion: 1671.5
Generator: Cocoa HTML Writer
which can be turned off by deleting “-s” in the pandoc part.
But I couldn’t find a way to get rid of it without losing first lines that end with a colon (I think it’s because they have the metadata style of key: value and I’m new to textutil and pandoc…). If you’re sure you don’t have colons at the end of a first line remove the “-s” option. In the end I removed the header afterwards with TextSoap, but pretty sure this is not the normal way to go.
-- Convert RTF (→ textutil → HTML → pandoc →) to MultiMarkdown
tell application "Finder"
try
set theTempFolder to make new folder in desktop with properties {name:"TEMP - RTF to MultiMarkdown"}
on error
display notification "Folder already exists!"
return
end try
end tell
tell application id "DNtp"
try
set windowClass to class of window 1
if {viewer window, search window} contains windowClass then
set currentRecord_s to selection of window 1
else if windowClass = document window then
set currentRecord_s to content record of window 1 as list
end if
set theTempGroup to indicate (POSIX path of (path to desktop) & "TEMP - RTF to MultiMarkdown/") to incoming group
set theOutputGroup to display group selector "Output to:"
repeat with thisRecord in currentRecord_s
if type of thisRecord = rtf then
try
tell thisRecord
set theRTFURL to URL
set theRTFCreationDate to creation date
set theRTFAdditionDate to addition date
set theRTFModificationDate to modification date
set theRTFComment to comment
end tell
set thePath to path of thisRecord
set theName to name of thisRecord
set theNameWithoutExtension to my Basename(theName)
if theNameWithoutExtension contains "/" then set theNameWithoutExtension to my encode_Text(theNameWithoutExtension, true, false)
if (count of characters in theNameWithoutExtension) > 250 then set theNameWithoutExtension to (characters 1 thru 250 in theNameWithoutExtension as string)
set theOutputPath to (POSIX path of (path to desktop) & "TEMP - RTF to MultiMarkdown/") & theNameWithoutExtension & ".md"
set theShellScript to "textutil '" & thePath & "' -strip -convert html -stdout | /usr/local/bin/pandoc -t markdown_mmd --wrap=preserve -s -o '" & theOutputPath & "' -f html-native_divs-native_spans"
set convertToMultiMarkdown to do shell script theShellScript
repeat with i from 1 to 20
try
set theIndexedRecord to (child 1 of theTempGroup)
exit repeat
on error
delay 1.5
end try
end repeat
set moveIntoDatabase to consolidate record theIndexedRecord
set moveToOutputGroup to move record theIndexedRecord to theOutputGroup
set theMultiMarkdownRecord to (child -1 of theOutputGroup)
tell theMultiMarkdownRecord
set URL to theRTFURL
set creation date to theRTFCreationDate
set addition date to theRTFAdditionDate
set modification date to theRTFModificationDate
set comment to theRTFComment
end tell
on error
set label of thisRecord to 1
end try
end if
end repeat
set cleanUpDEVONthink to delete record theTempGroup
open window for record theOutputGroup
activate
on error error_message number error_number
if the error_number is not -128 then display alert "DEVONthink" message error_message as warning
end try
end tell
tell application "Finder" to set cleanUpFinder to delete theTempFolder -- delete TEMP folder
on Basename(filename)
set revName to reverse of characters of filename as string
set revNameWithoutExtension to characters ((character offset of "." in revName) + 1) thru -1 in revName as string
set theBasename to reverse of characters of revNameWithoutExtension as string
end Basename
on encode_Text(theText, encodeCommonSpecialCharacters, encodeExtendedSpecialCharacters)
set theStandardCharacters to "abcdefghijklmnopqrstuvwxyz0123456789"
set theCommonSpecialCharacterList to "$+!'/?;&@=#%><{}\"~`^\\|*"
set theExtendedSpecialCharacterList to ".-_:"
set theAcceptableCharacters to theStandardCharacters
if encodeCommonSpecialCharacters is false then set theAcceptableCharacters to theAcceptableCharacters & theCommonSpecialCharacterList
if encodeExtendedSpecialCharacters is false then set theAcceptableCharacters to theAcceptableCharacters & theExtendedSpecialCharacterList
set theEncodedText to ""
repeat with theCurrentCharacter in theText
if theCurrentCharacter is in theAcceptableCharacters then
set theEncodedText to (theEncodedText & theCurrentCharacter)
else
set theEncodedText to (theEncodedText & encodeCharacter(theCurrentCharacter)) as string
end if
end repeat
return theEncodedText
end encode_Text
on encodeCharacter(theCharacter)
set theASCIINumber to (the ASCII number theCharacter)
set theHexList to {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F"}
set theFirstItem to item ((theASCIINumber div 16) + 1) of theHexList
set theSecondItem to item ((theASCIINumber mod 16) + 1) of theHexList
return ("%" & theFirstItem & theSecondItem) as string
end encodeCharacter
It needs Pandoc and RegexAndStuffLib installed (put the “RegexAndStuffLib” script in /Users/Username/Library/Script Libraries/).
There’s an option to remove empty lines that pandoc produces (removing unwanted lines is not ideal but couldn’t find the option in pandoc to avoid them …). If the resulting markdown record in unrendered view doesn’t look similar to the rtf record try again with removeEmptyLines set to false.
Make sure to uncomment / add all properties you’d like the markdown record to take over from the rtf.
-- Convert RTF to MultiMarkdown (via textutil and pandoc)
-- This script needs Pandoc (https://pandoc.org/installing.html) and RegexAndStuffLib (https://latenightsw.com/support/freeware/) installed.
-- It does not support RTFD
use scripting additions
use script "RegexAndStuffLib" version "1.0.6"
property removeEmptyLines : true
tell application id "DNtp"
try
set windowClass to class of window 1
if {viewer window, search window} contains windowClass then
set currentRecord_s to selection of window 1
else if windowClass = document window then
set currentRecord_s to content record of window 1 as list
end if
set theOutputGroup to display group selector
set displaySuffix to do shell script "defaults read com.devon-technologies.think3 DisplaySuffix"
show progress indicator "Converting... " steps (count of currentRecord_s) with cancel button
repeat with thisRecord in currentRecord_s
if type of thisRecord = rtf then
try
if displaySuffix = 0 then
set theName to name of thisRecord
else
set theName to my basename(name of thisRecord)
end if
step progress indicator theName
if theName contains "/" then
set theName to my encode_Text(theName, true, true) -- encode in case the name contains e.g. an url
set encodedName to true
else
set encodedName to false
end if
set thePath to path of thisRecord
set theOutputPath to (POSIX path of (path to temporary items folder) & theName & ".md") as string
set convertToMultiMarkdown to do shell script "textutil " & quoted form of thePath & " -convert html -stdout | /usr/local/bin/pandoc -f html-native_divs-native_spans -t markdown_mmd --wrap=preserve -o " & quoted form of theOutputPath
set newRecord to indicate theOutputPath to theOutputGroup
consolidate record newRecord
tell application "Finder" to delete file (POSIX file theOutputPath as alias)
tell newRecord
set URL to (URL of thisRecord)
set comment to (comment of thisRecord)
#set creation date to (creation date of thisRecord)
#set addition date to (addition date of thisRecord)
#set modification date to (modification date of thisRecord)
set theText to plain text
set firstLine to paragraph 1 in theText
if firstLine contains ":" then
set escapedFirstLine to regex change firstLine search pattern (":") replace template ("\\\\:")
set escapedText_List to ((escapedFirstLine as list) & paragraphs 2 thru -1 in theText) as list
set escapedText to my string_From_List(escapedText_List, linefeed)
set plain text to escapedText
set theText to plain text
end if
if removeEmptyLines = true then
set cleanText_1 to regex change theText search pattern ("\\n\\n") replace template (space & space & linefeed)
set cleanText_2 to regex change cleanText_1 search pattern ("^ +$") replace template ("")
set plain text to cleanText_2
end if
if encodedName = true then
set name to my decode_Text(name)
end if
end tell
on error
set label of thisRecord to 1
end try
end if
end repeat
hide progress indicator
open window for record theOutputGroup
activate
on error error_message number error_number
hide progress indicator
if the error_number is not -128 then display alert "DEVONthink" message error_message as warning
return
end try
end tell
on basename(filename)
set revName to reverse of characters of filename as string
set revNameWithoutExtension to characters ((character offset of "." in revName) + 1) thru -1 in revName as string
set theBasename to reverse of characters of revNameWithoutExtension as string
end basename
on encode_Text(theText, encodeCommonSpecialCharacters, encodeExtendedSpecialCharacters)
set theStandardCharacters to "abcdefghijklmnopqrstuvwxyz0123456789"
set theCommonSpecialCharacterList to "$+!'/?;&@=#%><{}\"~`^\\|*"
set theExtendedSpecialCharacterList to ".-_:"
set theAcceptableCharacters to theStandardCharacters
if encodeCommonSpecialCharacters is false then set theAcceptableCharacters to theAcceptableCharacters & theCommonSpecialCharacterList
if encodeExtendedSpecialCharacters is false then set theAcceptableCharacters to theAcceptableCharacters & theExtendedSpecialCharacterList
set theEncodedText to ""
repeat with theCurrentCharacter in theText
if theCurrentCharacter is in theAcceptableCharacters then
set theEncodedText to (theEncodedText & theCurrentCharacter)
else
set theEncodedText to (theEncodedText & encodeCharacter(theCurrentCharacter)) as string
end if
end repeat
return theEncodedText
end encode_Text
on encodeCharacter(theCharacter)
set theASCIINumber to (the ASCII number theCharacter)
set theHexList to {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F"}
set theFirstItem to item ((theASCIINumber div 16) + 1) of theHexList
set theSecondItem to item ((theASCIINumber mod 16) + 1) of theHexList
return ("%" & theFirstItem & theSecondItem) as string
end encodeCharacter
on decode_Text(theText)
local str
try
return (do shell script "/bin/echo " & quoted form of theText & ¬
" | perl -MURI::Escape -lne 'print uri_unescape($_)'")
on error eMsg number eNum
error "Can't urlDecode: " & eMsg number eNum
end try
end decode_Text
on string_From_List(theList, theDelimiter)
set theString to ""
set theCount to 0
repeat with thisItem in theList
set theCount to theCount + 1
set thisItem to thisItem as string
if theCount ≠ (count of theList) then
set theString to theString & thisItem & theDelimiter
else
set theString to theString & thisItem
end if
end repeat
return theString
end string_From_List
THANK YOU! It works smoothly for RTF file that has no image - and that’s already perfect for my purpose.
I learnt quite a few good tricks on system-level file manipulation from reading the code lines!
This is the first time I see Pandoc at works and it is powerful.
I probably understand most of your program flow but hope you won’t mind me asking two questions:
Why the script needs “DisplaySuffix” and uses it as a condition for whether or not to change “theName” by using basename()? Perhaps it’s more for your specific settings?
set displaySuffix to do shell script "defaults read com.devon-technologies.think3 DisplaySuffix"
I wonder why the scripts needs regex and this block if there are “:” in the “plain text”? The reason that I am asking is the script still works as expected and can retain/convert all DT-Links to MD format when I comment out the block. EDITED: to avoid markdown to interpret any first line with “:” as meta data?
if firstLine contains ":" then
set escapedFirstLine to regex change firstLine search pattern (":") replace template ("\\\\:")
set escapedText_List to ((escapedFirstLine as list) & paragraphs 2 thru -1 in theText) as list
set escapedText to my string_From_List(escapedText_List, linefeed)
set plain text to escapedText
set theText to plain text
end if
Yes to avoid interpretation as metadata. For other readers:
MultiMarkdown treats a first line containing a : as metadata and hides it in rendered view (see MultiMarkdown Syntax Guide). In context of converting from RTF we don’t want a first line that contains a : to be hidden, escaping prevents this. This capture makes it clear
If the first line in the resulting markdown record contains a :and contains formatting there’s no problem.
If the first line isn’t formatted it will be treated as metadata if we don’t escape :.
Easiest way to handle this is to always escape if there’s a colon.
It is, there are so many options one can use, I didn’t get to read the whole User’s Guide yet. There might be formatting in your RTFs that isn’t covered from the script so it’s a good idea to read the guide and add everything that you might need.
I’ve found an option that might make it possible to convert RTFDs too:
--extract-media=DIR
Extract images and other media contained in or linked from the
source document to the path DIR, creating it if necessary, and
adjust the images references in the document so they point to
the extracted files. If the source format is a binary container
(docx, epub, or odt), the media is extracted from the container
and the original filenames are used. Otherwise the media is
read from the file system or downloaded, and new filenames are
constructed based on SHA1 hashes of the contents.
-- Convert RTF to MultiMarkdown (via textutil and pandoc)
-- This script needs Pandoc (https://pandoc.org/installing.html) and RegexAndStuffLib (https://latenightsw.com/support/freeware/).
-- This version converts RTF and RTFD - but only images are preserved, other attachments are not supported!
use scripting additions
use script "RegexAndStuffLib" version "1.0.6"
property moveMarkdownRecord : false -- set to true if you want markdown and image records in one group
property removeEmptyLines : false
tell application id "DNtp"
try
set windowClass to class of window 1
if {viewer window, search window} contains windowClass then
set currentRecord_s to selection of window 1
else if windowClass = document window then
set currentRecord_s to content record of window 1 as list
end if
set theDestinationGroup to display group selector
set tempPath to POSIX path of (path to temporary items folder)
show progress indicator "Converting... " steps (count of currentRecord_s) with cancel button
repeat with thisRecord in currentRecord_s
if (type of thisRecord) is in {rtf, rtfd} then
set theName to my recordName(name of thisRecord, filename of thisRecord)
step progress indicator theName
set tempName to do shell script "date \"+%Y%m%d%H%M%S\""
if (type of thisRecord) = rtf then
try
set thePath to path of thisRecord
set theOutputPath to (tempPath & tempName & ".md") as string
set convertToMultiMarkdown to do shell script "textutil " & quoted form of thePath & " -convert html -stdout | /usr/local/bin/pandoc -f html-native_divs-native_spans -t markdown_mmd --wrap=preserve -o " & quoted form of theOutputPath
set newRecord to indicate theOutputPath to theDestinationGroup
tell application "Finder" to delete file (POSIX file theOutputPath as alias)
on error
set label of thisRecord to 1
end try
else
try
set theSource to source of thisRecord
set theSourcePath to (tempPath & tempName & ".html") as string
set theOutputPath to (tempPath & tempName & ".md") as string
set theExtractionPath to (tempPath & tempName) as string
set createExtractionFolder to do shell script "mkdir -p " & quoted form of theExtractionPath
set sourceFile to open for access theSourcePath with write permission
write theSource as «class utf8» to sourceFile
close access sourceFile
set convertToMultiMarkdown to do shell script "/usr/local/bin/pandoc -f html-native_divs-native_spans -t markdown_mmd --wrap=preserve --extract-media=" & quoted form of theExtractionPath & " -o " & quoted form of theOutputPath & " " & quoted form of theSourcePath
set newRecord to indicate theOutputPath to theDestinationGroup
set theGroup to indicate theExtractionPath to theDestinationGroup
tell application "Finder"
delete folder (POSIX file theExtractionPath as alias)
delete file (POSIX file theSourcePath as alias)
delete file (POSIX file theOutputPath as alias)
end tell
set name of theGroup to (theName & ".md") as string
if moveMarkdownRecord = true then move record newRecord to theGroup
set theText to plain text of newRecord
set theParagraphs to paragraphs of theText
set theText_List to {}
repeat with thisParagraph in theParagraphs
set thisParagraph to thisParagraph as string
if thisParagraph contains theExtractionPath then
set theFilename to item 1 of (regex search thisParagraph search pattern "(?<=/)[a-z|0-9]{40}\\.(.*?)(?=\\))" as string)
repeat with thisChild in (children of theGroup)
if (filename of thisChild) = theFilename then
set replaceLink to regex change thisParagraph search pattern "(?<=!?\\[\\]\\()(.*?)" & theFilename & "(?=\\))" replace template (reference URL of thisChild)
set end of theText_List to replaceLink
exit repeat
end if
end repeat
else
set end of theText_List to thisParagraph
end if
end repeat
set plain text of newRecord to my string_From_List(theText_List, linefeed)
on error
set label of thisRecord to 1
end try
end if
tell newRecord
set name to (theName & ".md") as string
set URL to (URL of thisRecord)
set creation date to (creation date of thisRecord)
set addition date to (addition date of thisRecord)
set modification date to (modification date of thisRecord)
set comment to (comment of thisRecord)
set theText to plain text
set firstLine to paragraph 1 in theText
if firstLine contains ":" then
set escapedFirstLine to regex change firstLine search pattern ("(?<!\\\\):(?!//)") replace template ("\\\\:")
set escapedText_List to ((escapedFirstLine as list) & paragraphs 2 thru -1 in theText) as list
set escapedText to my string_From_List(escapedText_List, linefeed)
set plain text to escapedText
set theText to plain text
end if
if removeEmptyLines = true then
set cleanText_1 to regex change theText search pattern ("\\n\\n") replace template (space & space & linefeed)
set cleanText_2 to regex change cleanText_1 search pattern ("^ +$") replace template ("")
set plain text to cleanText_2
end if
end tell
end if
end repeat
hide progress indicator
open window for record theDestinationGroup
activate
on error error_message number error_number
hide progress indicator
tell application "Finder"
try
delete folder (POSIX file theExtractionPath as alias)
delete file (POSIX file theSourcePath as alias)
delete file (POSIX file theOutputPath as alias)
end try
end tell
if the error_number is not -128 then display alert "DEVONthink" message error_message as warning
return
end try
end tell
on recordName(theName, theFilename)
set revName to reverse of (characters of theName) as string
set suffixName to reverse of (characters 1 thru ((character offset of "." in revName) - 1) in revName) as string
set revFileName to reverse of (characters of theFilename) as string
set suffixFileName to reverse of (characters 1 thru ((character offset of "." in revFileName) - 1) in revFileName) as string
if suffixName = suffixFileName then set theName to reverse of (characters ((character offset of "." in revName) + 1) thru -1 in revName) as string
return theName
end recordName
on string_From_List(theList, theDelimiter)
set theString to ""
set theCount to 0
repeat with thisItem in theList
set theCount to theCount + 1
set thisItem to thisItem as string
if theCount ≠ (count of theList) then
set theString to theString & thisItem & theDelimiter
else
set theString to theString & thisItem
end if
end repeat
return theString
end string_From_List
If I would want the markdown file to be created next to the original rtf file I guess I’d have theDestinationGroup to be the same as the currentRecord_s Group - any chance you could show me how to modify the code accordingly?
In case you’re trying to run the script I posted in this thread you’ll find that it doesn’t work in DEVONthink 3.6.
That’s due to DEVONthink’s new handling of “invalide arguments”. After the release of DEVONthink 3 I decided to continue to use “search window” in scripts so that DEVONthink 2 users could use them in, well, search windows. With version 3.6 that’s not possible anymore.
If you want to use the script you’ll have to replace this voluminous block …
set windowClass to class of window 1
if {viewer window, search window} contains windowClass then
set currentRecord_s to selection of window 1
else if windowClass = document window then
set currentRecord_s to content record of window 1 as list
end if
… with this neat line …
set currentRecord_s to selected records
… which does what the six lines have done. Wow, that’s great!