Convert Markdown to PDF / DOCX in Devonthink using Pandoc

Thanks for sharing the examples. A visual of the output will certainly be helpful to those interested in this process.

For those who don’t know how/want to use Pandoc (which is great) and would feel more comfortable with a GUI option, I definitely recommend Marked 2 by Brett Terpstra. There are a ton of features including custom CSS files and exporting to various outputs. I’ve used it in tandem with most of my multimarkdown editors for years and it works great with Devonthink via the “Open with” menu item.

If you include your own CSS in Devonthink MD files, then you can simply Print to PDF using the built-in Mac dialog.

Is this implemented already in DT3?

A belated thank-you for this, @Silverstone and @cgrunenberg. I appreciate DT3’s built-in document type conversions, but the flexibility of pandoc is also welcome.

I do a lot of writing in Markdown but I have to send Word files to other people, typically with consistent styles. pandoc’s ability to copy Word styles from a reference document is very useful here.

If anyone happens to be playing around with reference documents, I have a query. pandoc’s --data-dir option works for me (it looks for a file named ‘reference.docx’ in a specified folder). A more flexible option is --reference-doc=, where you specify a filename and so you could choose between different sets of styles. The latter option works when I run it from the terminal, but within this script I get an error message about the reference file not being UTF-8. The shell script seems to find the reference file OK, so I don’t think it’s an issue with escaping, POSIX path or whatever. Suggestions appreciated.

PS: I find the escaped pandoc string hard to decipher, especially when you start adding more arguments! A tiny tweak is to build the string first: set myPandoc to "export PATH=..., and then: do shell script myPandoc. The AppleScript variable myPandoc is a bit easier to troubleshoot.

BTW Typora also uses pandoc to export markdown files to various formats, including Word and PDF. So if have this as your default Markdown editor, double-click the file and export.

I took the script and added support for Keyboard Maestro variables as parameters. Perhaps it could be useful to some here.

3 Likes

Is there anyway to add to the script to convert devonthink URIs to the file path so that pandoc recognizes the image path?

E.g.:
convert
![Riley township](x-devonthink-item://7ED1EF6C-234E-4F6E-9753-8D99A2EE32E7)

to
![Riley township](/Users/user/ResearchSources.dtBase2/Files.noindex/png/b/60922 Combination Atlas St. Clair Co., Mich. 15 (Riley T6R14E) (with Markups) (zoomed).png)

?

If I understand you correctly, you want to massage the Markdown file before passing it to Pandoc so that contain’s file references instead of DT3 references?
That might be possible by modifying the plaintext part of the record, finding all the x-devonthink-item references, replacing them by the path in the record the URL points to and then passing the modified text onto pandoc.

Personally, I’d not want to do that in AppleScript, because its string processing sucks. Oh, and while you’re at it: I’m not sure that spaces et al work ok in a filename like the one you mentioned. So maybe you’ll have to URL encode the path while you’re at it.

Hi,

I am trying to re-implement the script to convert from docx > markdown. Although @Bernardo_V’s script is great it is way too sophisticated for my needs as I just need this one translation and want this to happen automatically on a folder via a smart rule in DT3.

So far I was not successful.

I adjusted the following part:

set theOutput to "/Users/USER/Downloads/" & theName & ".md"
				

do shell script "/usr/local/bin/pandoc --wrap=none --extract-media=images" & "Path_to_Docx" & "\" -o \"" & theOutput

That is what I get:

This bash works for me directly on the shell:

/usr/local/bin/pandoc --wrap=none --extract-media=images "DOC.docx" -o NEW.md

Your shell is probably zsh. Applescript does not use zsh, so you need to export the path. Hence the first part the the command:

do shell script "export PATH=/Library/TeX/texbin:$PATH && /usr/local/bin/pandoc \""

You might be fine if you add it. (Didn’t test)

1 Like

I suppose there’s a space missing, like so

--extract-media=images" & " Path_to_Docx" &

And of course @Bernardo_V is probably right about zsh

Thanks @Bernardo_V & @chrillek for your tips.

I implemented both step by step, but still facing an error (in this example I am trying to convert a Doc in Devonthink named Thinking.docx into a md. doc:

pandoc: Thinking.docx.md: openBinaryFile: does not exist (No such file or directory)

Searching for this error code points to some path configuration problems but in my case path should be okey.

				-- Setup Your Temporary Folder Here:
				set theOutput to "/Users/MY_USER/Downloads/" & theName & ".md"
				
				-- Construct your personal command line options here:
				do shell script "export PATH=/Library/TeX/texbin:$PATH && /usr/local/bin/pandoc --wrap=none --extract-media=images" & " Path_to_Docx" & "\" -o \"" & theOutput

Should you have any other ideas, I would appreciate any help!

Is your input file

Or “thinking.docx”?

I checked again, it is “Thinking.docx”

image
Looking on the error message I wonder is there is something wrong with the line which appends .md to the name of the file.

set theOutput to "/Users/MY_USER/Downloads/" & theName & ".md"

Couldn’t tell from the error whether it occurred on input or output, and thought it might be trying to open a (non-existent) thinking.docx.md.

The other question is: Do you really have a folder named

(Do a Shift-Command-G in the finder and copy/paste the folder name in the dialog box’ input line). I’d suppose not - MY_USER is a placeholder for the real user name (like “jooz”, maybe).

This translates to
--extract-media=images Path_to_DocX on the shell level. Which is most probably NOT what you want, since Path_to_DocX is not a command or anything the shell understands. It is probably (!) an AppleScript variable, in which case the line might work better as

do shell script "export PATH=/Library/TeX/texbin:$PATH && /usr/local/bin/pandoc --wrap=none --extract-media=images " & Path_to_Docx & "\" -o \"" & theOutput

(Note the space after images!) I don’t know why the -o should be quoted here, but I don’t really care, that shouldn’t set off the shell one way or another.

However, if the path to your document(s) contain spaces, you might have to quote them like so

do shell script "export PATH=/Library/TeX/texbin:$PATH && /usr/local/bin/pandoc --wrap=none --extract-media=images \"" & Path_to_Docx & "\" -o \"" & theOutput & "\""

(not tested).
Unfortunately, all this is overly complicated. I’d rather go for a simple AppleScript that calls a shell script with the input file name as its only parameter. Then let the shell script do all the rest. Otherwise, you have all there problems with quoting, overly verbose and illegible shell calls etc.

Hei @chrillek i really appreciate all the detailed recommendations!

  • "/Users/MY_USER/Downloads/" does exist with the proper user name. That was the first thing i tested as well once i hit this error
  • Path_to_DocX sounds like a variable to me. The script above sets it as set Path_to_Doc to path of theRecord
  • if the path to your document(s) contain spaces > yes, good catch. It does indeed may include spaces
  • I tested the

do shell script “export PATH=/Library/TeX/texbin:$PATH && /usr/local/bin/pandoc --wrap=none --extract-media=images “” & Path_to_Docx & “” -o “” & theOutput & “””

And we are coming back to the undefined variable as you alluded to above.

image

Unfortunately, all this is overly complicated. I’d rather go for a simple AppleScript that calls a shell script with the input file name as its only parameter.

My initial idea was to use a shell script which would be fired up by Keyboard Maestro. But the integrated smart rule in DT3 is something I would prefer for this translation to happen automatically. I will look into your recommendation to re-build the flow and just use a simple shell embedded into apple script.

Btw, the reason I am trying to get this working is to finally have direct integration between iThoughts mindmaps and DT3. iThoughts is able to export into docx but not into md with embeeded pictures … so one needs to go via this workaround.

In the ideal world, it would be cool to have DT3 be able do translation from docx into markdown + images directly (either via a bespoke script in their library) or native feature.

iThoughts is able to export into docx but not into md with embeeded pictures … so one needs to go via this workaround.

Have you contacted iThoughts about them possibly adding support for this?

Yes sir. I hope that Craig (iThoughts developer) will consider implementing this. He was very responsive to the feedback in the past.
For now, I need to get this working with some bash.

Argh. I’m sorry, I didn’t notice that the backslashes where gone in this code snippet. The end should read
images \"" & Path_to_Docx & "\" -o \"" & the Output & "\""
What you’d want to see as a shell command is
export PATH=/Library/TeX/texbin:$PATH && /usr/local/bin/pandoc --wrap=none --extract-media=images "Document.docx" -o "Converted.md"
So you need quotes around the original document name and the converted one, to protect spaces and other special characters. And you need to make sure that not the string “Path_to_Docx” is passed to the shell but the value of the variable of that name.

I’m not quite sure that what you want to achieve (export a mindmap from iThoughts to Markdown) is feasible. As I see it, the mindmaps can be fairly complicated two-dimensional graphs. Is this really something that can be represented in Markdown? How so? On the other hand, wouldn’t PDF be a more useful format (and supported by iThoughts, too) because it can preserve the graphics? And a PDF should be searchable, too (like Markdown).