Convert Markdown to PDF / DOCX in Devonthink using Pandoc

Silverstone · March 16, 2019, 11:03pm

EDIT: For the latest working script and examples of converted files scroll to my latest posts.

Hello everyone!

I have a bash script which converts MD to PDF using pandoc. I also automated it for Path Finder using AppleScript and Keyboard Maestro. But I want to do it right in Devonthink. I’ve seen the script in DTPO, converting files to PDF, but don’t know how to make it work with the shell script I have. Here it is:

$ export PATH=/Library/TeX/texbin:$PATH
$ /usr/local/bin/pandoc "Path to MD file" -s -o "Path for output PDF file" --pdf-engine=xelatex --toc

I’ve uploaded my KM macro, if anyone would like to play with it. You can make it work from Finder or other app. But you need to install pandoc and TeX.
Markdown to PDF.kmmacros.zip (1.6 KB)

So, I’d appreciate if anyone could help me make this shell script work with the AppleScript we have for converting to PDF in Devonthink

BLUEFROG · March 17, 2019, 12:05am

What part specifically are you needing assistance with - invoking a shell command or DEVONthink specifics?

Silverstone · March 17, 2019, 6:39am

I would like to incorporate the shell script above into current DEVONthink AppleScript, converting files to PDF (it is in Convert folder of DTPO scripts)

I know how to make it work with Finder, but I need it to work with DTPO directly

cgrunenberg · March 18, 2019, 7:41am

BTW:
A future release will support conversion to PDFs both via the user interface and via AppleScript without third-party tools.

zeitlings · March 18, 2019, 1:41pm

To use pandoc, wouldn’t you would have to call Terminal?

Maybe you can work with something along these lines:

    set theItem to the selection
    set itemPath to path of theItem
    set itemName to name of theItem

	tell application "Terminal"
		set currentTab to do script ("pandoc --wrap=preserve -s " & itemPath & " -o ~/Desktop/" & itemName & ".pdf")
		delay 5
		do script ("exit") in currentTab
	end tell

Using shell script / CURL you could make use of e.g. Docverter somehwere along these lines:
(https://docverter.com/api/)

do shell script "curl POST http://c.docverter.com/convert from=markdown to=pdf input_files[]=@" & itemPath & "-- output " & (name of theItem) & ".pdf && open " & name of thisItem & ".pdf"

DTPO has no idea what to do with the received PDF though. I don’t know how to fetch the file and place it inside the current group. Perphaps you can get the location of the file and use it to construct a path. I’m too unfamiliar with the architecture though, sorry.

set itemLocation to location of thisItem
set parentPath to id of current group

None of this working code but merely some ideas.

Edit:
Using this: how to get the path to a group I reckoned the Terminal approach might work, but in the end I’m promped with some unicode problem.

Silverstone · March 18, 2019, 6:36pm

Thanks all for the answers,

Finally I did this!
So, If anyone has all these pandoc stuff set up, with its extremally-flexible-to-fine-tune and beautifully-looking PDF output, you may use this script (thanks to Christian Grunenberg for the original script):

-- Convert Markdown documents to Pandoc PDFs (using XeTeX)
-- Created by Christian Grunenberg on Mon Dec 01 2008.
-- Copyright (c) 2008-2011. All rights reserved.
-- Slightly changed by Silverstone on March 18 2019, 
-- All copyrights go to great DEVONtech Team ;)

tell application id "com.devon-technologies.thinkpro2"
	try
		set theSelection to the selection
		if theSelection is not {} then
			show progress indicator "Converting..." steps (count of theSelection)
			set theWindow to missing value
			repeat with theRecord in theSelection
				set theName to (name of theRecord) as string
				step progress indicator theName
				if cancelled progress then exit repeat
				
				set theType to type of theRecord
				if theType is not group and theType is not feed and theType is not smart group then
					if theWindow is missing value then
						set theWindow to think window of (open tab for record theRecord)
					else
						set record of theWindow to theRecord
					end if
					
					repeat while loading of theWindow
						delay 0.5
					end repeat
					
					set Path_to_MD to path of theRecord
-- Setup Your Temporary Folder Here:
					set theOutput to "/Users/ilya/Documents/00_Temp/" & theName & ".pdf"
					
-- Construct your personal command line options here:
					do shell script "export PATH=/Library/TeX/texbin:$PATH && /usr/local/bin/pandoc " & Path_to_MD & " -s -o " & theOutput & " --pdf-engine=xelatex --toc"
					
					try
						set theParents to parents of theRecord
						set thePDF to import theOutput to (item 1 of theParents) name theName
						
						repeat with i from 2 to (count of theParents)
							replicate record thePDF to (item i of theParents)
						end repeat
						
						set URL of thePDF to URL of theRecord
						set creation date of thePDF to creation date of theRecord
						set modification date of thePDF to modification date of theRecord
						set comment of thePDF to comment of theRecord
						set label of thePDF to label of theRecord
					end try
					tell application "Finder" to delete theOutput as POSIX file
				end if
			end repeat
			if theWindow is not missing value then close theWindow saving no
			hide progress indicator
		end if
	on error error_message number error_number
		hide progress indicator
		if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
	end try
end tell

Just a few words about tuning:

You need to setup your own Temporary folder (place is marked in the script), which will be used for creating PDFs in the process of conversion. They will be deleted after the import. I just don’t know of the other maybe shorter or more effective ways to do it.
You need to setup Pandoc converter. And create the PDF templates you like (unlimited possibilities: text, titles, graphics, table of contents inside the PDF and as outline, notes, math, bibliography, pagination and all that pro typography stuff). All links and crosslinks are fully preserved. Along with templates, you may use your favourite Pandoc command line options (place is marked in the script).
Script repeats all replicants (if any) of the source MD, as well, as the other important metadata (creation and modification dates, label, URL and comment)

Create fully functional and professionally looking PDFs from your working MDs, right in your groups in one click. A good solution for those who uses heavily in their workflow the nice function of DTPO and DTTG to clip a web with a clutter-free markdown.

Happy experimenting!

PS
@cgrunenberg, could you please come and say if all is good with this script. I tested it and it works fine and stable. Just in case I didn’t take into account some deep Devonthink matters.

cgrunenberg · March 19, 2019, 7:30am

Loading the document in a window is actually unnecessary in this case as the rendered document isn’t used. In addition, the bundle identifier shouldn’t be used to script DEVONthink so that scripts are compatible to future versions/editions. Here’s a revised script:

-- Convert Markdown documents to Pandoc PDFs (using XeTeX)
-- Created by Christian Grunenberg on Mon Dec 01 2008.
-- Copyright (c) 2008-2011. All rights reserved.
-- Slightly changed by Silverstone on March 18 2019, 
-- All copyrights go to great DEVONtech Team ;)
    
tell application id "DNtp"
    try
        set theSelection to the selection
        if theSelection is not {} then
            show progress indicator "Converting..." steps (count of theSelection)
            repeat with theRecord in theSelection
                set theName to (name of theRecord) as string
                step progress indicator theName
                if cancelled progress then exit repeat
                
                set theType to type of theRecord
                if theType is not group and theType is not feed and theType is not smart group then
                    set Path_to_MD to path of theRecord
                    -- Setup Your Temporary Folder Here:
                    set theOutput to "/Users/ilya/Documents/00_Temp/" & theName & ".pdf"
                    
                    -- Construct your personal command line options here:
                    do shell script "export PATH=/Library/TeX/texbin:$PATH && /usr/local/bin/pandoc " & Path_to_MD & " -s -o " & theOutput & " --pdf-engine=xelatex --toc"
                    
                    try
                        set theParents to parents of theRecord
                        set thePDF to import theOutput to (item 1 of theParents) name theName
                        
                        repeat with i from 2 to (count of theParents)
                            replicate record thePDF to (item i of theParents)
                        end repeat
                        
                        set URL of thePDF to URL of theRecord
                        set creation date of thePDF to creation date of theRecord
                        set modification date of thePDF to modification date of theRecord
                        set comment of thePDF to comment of theRecord
                        set label of thePDF to label of theRecord
                    end try
                    tell application "Finder" to delete theOutput as POSIX file
                end if
            end repeat
            hide progress indicator
        end if
    on error error_message number error_number
        hide progress indicator
        if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
    end try
end tell

Silverstone · March 19, 2019, 1:14pm

Thank you.

A little update: now script can handle long filenames with almost any symbols and spaces in them (added quoting in shell script).

New version is here:

-- Convert Markdown documents to Pandoc PDFs (using XeTeX)
-- Created by Christian Grunenberg on Mon Dec 01 2008.
-- Copyright (c) 2008-2011. All rights reserved.
-- Slightly changed by Silverstone on March 18 2019, 
-- All copyrights go to great DEVONtech Team ;)

tell application id "DNtp"
	try
		set theSelection to the selection
		if theSelection is not {} then
			show progress indicator "Converting..." steps (count of theSelection)
			repeat with theRecord in theSelection
				set theName to (name of theRecord) as string
				step progress indicator theName
				if cancelled progress then exit repeat
				
				set theType to type of theRecord
				if theType is not group and theType is not feed and theType is not smart group then
					set Path_to_MD to path of theRecord
					
					-- Setup Your Temporary Folder Here:
					set theOutput to "/Users/ilya/Documents/00_Temp/" & theName & ".pdf"
					
					-- Construct your personal command line options here:
					do shell script "export PATH=/Library/TeX/texbin:$PATH && /usr/local/bin/pandoc \"" & Path_to_MD & "\" -s -o \"" & theOutput & "\" --pdf-engine=xelatex --toc"
					
					try
						set theParents to parents of theRecord
						set thePDF to import theOutput to (item 1 of theParents) name theName
						
						repeat with i from 2 to (count of theParents)
							replicate record thePDF to (item i of theParents)
						end repeat
						
						set URL of thePDF to URL of theRecord
						set creation date of thePDF to creation date of theRecord
						set modification date of thePDF to modification date of theRecord
						set comment of thePDF to comment of theRecord
						set label of thePDF to label of theRecord
					end try
					tell application "Finder" to delete theOutput as POSIX file
				end if
			end repeat
			hide progress indicator
		end if
	on error error_message number error_number
		hide progress indicator
		if the error_number is not -128 then display alert "DEVONthink Pro" message error_message as warning
	end try
end tell

And here are some random PDFs, which are made using this script, after clipping web pages in a clutter-free markdown (without any formatting):

Quitting Evernote for DEVONthink – Yann Rousse – Medium.pdf (479.3 KB)
What is life really like in Africa? - Quora.pdf (5.1 MB)
What Is Long-Form Content and Why Does It Work?.pdf (765.1 KB)

Silverstone · March 19, 2019, 1:30pm

This is how nice word document looks from the same markdown:
What Is Long-Form Content and Why Does It Work?.docx.zip (669.7 KB)

Again, no any editing! Just converting from MD.

BLUEFROG · March 19, 2019, 1:47pm

Thanks for sharing the examples. A visual of the output will certainly be helpful to those interested in this process.

phillipsmn · March 24, 2019, 2:18pm

For those who don’t know how/want to use Pandoc (which is great) and would feel more comfortable with a GUI option, I definitely recommend Marked 2 by Brett Terpstra. There are a ton of features including custom CSS files and exporting to various outputs. I’ve used it in tandem with most of my multimarkdown editors for years and it works great with Devonthink via the “Open with” menu item.

If you include your own CSS in Devonthink MD files, then you can simply Print to PDF using the built-in Mac dialog.

Bernardo_V · April 27, 2019, 8:40pm

Is this implemented already in DT3?

Chazzo · February 19, 2020, 5:27pm

A belated thank-you for this, @Silverstone and @cgrunenberg. I appreciate DT3’s built-in document type conversions, but the flexibility of pandoc is also welcome.

I do a lot of writing in Markdown but I have to send Word files to other people, typically with consistent styles. pandoc’s ability to copy Word styles from a reference document is very useful here.

If anyone happens to be playing around with reference documents, I have a query. pandoc’s --data-dir option works for me (it looks for a file named ‘reference.docx’ in a specified folder). A more flexible option is --reference-doc=, where you specify a filename and so you could choose between different sets of styles. The latter option works when I run it from the terminal, but within this script I get an error message about the reference file not being UTF-8. The shell script seems to find the reference file OK, so I don’t think it’s an issue with escaping, POSIX path or whatever. Suggestions appreciated.

PS: I find the escaped pandoc string hard to decipher, especially when you start adding more arguments! A tiny tweak is to build the string first: set myPandoc to "export PATH=..., and then: do shell script myPandoc. The AppleScript variable myPandoc is a bit easier to troubleshoot.

ksandvik · February 19, 2020, 7:28pm

BTW Typora also uses pandoc to export markdown files to various formats, including Word and PDF. So if have this as your default Markdown editor, double-click the file and export.

Bernardo_V · February 19, 2020, 10:04pm

I took the script and added support for Keyboard Maestro variables as parameters. Perhaps it could be useful to some here.

nnettsplace · June 21, 2020, 6:32am

Is there anyway to add to the script to convert devonthink URIs to the file path so that pandoc recognizes the image path?

E.g.:
convert
![Riley township](x-devonthink-item://7ED1EF6C-234E-4F6E-9753-8D99A2EE32E7)

to
![Riley township](/Users/user/ResearchSources.dtBase2/Files.noindex/png/b/60922 Combination Atlas St. Clair Co., Mich. 15 (Riley T6R14E) (with Markups) (zoomed).png)

?

chrillek · June 22, 2020, 1:51pm

If I understand you correctly, you want to massage the Markdown file before passing it to Pandoc so that contain’s file references instead of DT3 references?
That might be possible by modifying the plaintext part of the record, finding all the x-devonthink-item references, replacing them by the path in the record the URL points to and then passing the modified text onto pandoc.

Personally, I’d not want to do that in AppleScript, because its string processing sucks. Oh, and while you’re at it: I’m not sure that spaces et al work ok in a filename like the one you mentioned. So maybe you’ll have to URL encode the path while you’re at it.

jooz · August 1, 2020, 8:15pm

Hi,

I am trying to re-implement the script to convert from docx > markdown. Although @Bernardo_V’s script is great it is way too sophisticated for my needs as I just need this one translation and want this to happen automatically on a folder via a smart rule in DT3.

So far I was not successful.

I adjusted the following part:

set theOutput to "/Users/USER/Downloads/" & theName & ".md"
				

do shell script "/usr/local/bin/pandoc --wrap=none --extract-media=images" & "Path_to_Docx" & "\" -o \"" & theOutput

That is what I get:

This bash works for me directly on the shell:

/usr/local/bin/pandoc --wrap=none --extract-media=images "DOC.docx" -o NEW.md

Bernardo_V · August 2, 2020, 12:13am

Your shell is probably zsh. Applescript does not use zsh, so you need to export the path. Hence the first part the the command:

do shell script "export PATH=/Library/TeX/texbin:$PATH && /usr/local/bin/pandoc \""

You might be fine if you add it. (Didn’t test)

chrillek · August 2, 2020, 8:06am

I suppose there’s a space missing, like so

--extract-media=images" & " Path_to_Docx" &

And of course @Bernardo_V is probably right about zsh