Applescript to extract the PDF's TOC into a markdown file

This script extracts the Table of Contents from the selected PDF file into a Markdown file. For it to work, you’ll need to install MuPDF and add the script RegexAndStuffLib to ~/Library/Script Libraries.


The Script

use AppleScript version "2.4" -- Yosemite (10.10) or later
use script "RegexAndStuffLib"
use scripting additions

tell application id "DNtp"
	set theRecords to the selection
	repeat with theRecord in theRecords
		
		-- Propriedades
		set {theName, thePath, theRefURL, theURL} to {the name, the path, the reference URL, the URL} of theRecord
		
		set theScript to "/usr/local/bin/mutool show " & quoted form of thePath & " outline"
		tell me to set theToc to do shell script theScript
		
		set theToc to my replaceText(theToc, "\\n", " ")
		set theToc to regex change theToc search pattern "	#(.*?),.+" replace template " (page $1)"
		set theToc to regex change theToc search pattern "\" \\(" replace template "]("
		set theToc to regex change theToc search pattern "\"" replace template "["
		set theToc to regex change theToc search pattern "[\\||\\+|-]" replace template ""
		set theToc to regex change theToc search pattern "				" replace template "4 "
		set theToc to regex change theToc search pattern "			" replace template "3 "
		set theToc to regex change theToc search pattern "		" replace template "2 "
		set theToc to regex change theToc search pattern "	" replace template "* "
		set theToc to regex change theToc search pattern "^2" replace template "    *"
		set theToc to regex change theToc search pattern "^3" replace template "        *"
		set theToc to regex change theToc search pattern "^4" replace template "            *"
		
		try
			set oldDelims to AppleScript's text item delimiters
			set AppleScript's text item delimiters to {return}
			set theLines to every text item of theToc
			set AppleScript's text item delimiters to oldDelims
		on error
			set AppleScript's text item delimiters to oldDelims
		end try
		
		set theTableofContents to {}
		repeat with theLine in theLines
			set thePage to regex search theLine search pattern "\\(page ([0-9]{1,4})\\)" replace template "$1"
			set thePage to item 1 of thePage as text
			set thePage to thePage - 1
			set thePage to "(" & theRefURL & "?page=" & thePage & ")"
			set theRef to regex change theLine search pattern "(\\(page [0-9]{1,4}\\))" replace template "page"
			set theRef to my replaceText(theRef, "page", thePage)
			set theTableofContents to theTableofContents & theRef & return
		end repeat
		
		
		--set theMD to "tags: #Table_of_Contents" & return & return & "# " & theName & return & return & "## Table of Contents" & return & return & theTableofContents
		
		set theRecName to theName & " - Table of Contents"
		set theReference to get custom meta data for "Reference" from theRecord default value ""
		set theBibkey to get custom meta data for "Bibkey" from theRecord default value ""
		if theBibkey is "" then set theBibkey to the aliases of theRecord
		set theReference to "`[#" & theBibkey & "]: " & theReference & "`"
		set theMD to "# " & theName & return & return & theReference & return & return & "## Table of Contents" & return & return & theTableofContents
		set theResult to create record with {name:theRecName, type:markdown, content:theMD}
		set the tags of theResult to "_\\Bib Toc"
		
		set theReferenceURL to the reference URL of theResult
		set the URL of theResult to theURL
		add custom meta data theReferenceURL for "Link2" to theRecord
		open tab for record theResult
		
		
	end repeat
end tell


on replaceText(theString, old, new)
	set {TID, text item delimiters} to {text item delimiters, old}
	set theStringItems to text items of theString
	set text item delimiters to new
	set theString to theStringItems as text
	set text item delimiters to TID
	return theString
end replaceText

2 Likes

store it as part of the records custom metadata.

Can you clarify what you mean by this?

Sure.

This is what I mean:

If you ever need the ToC links, they are all there waiting to be copied and pasted.

Another option would be to save it as a markdown file, which I am experimenting with right now.

Just updated the first post with an applescript that will do everything.
It doesn’t rely any longer on Keyboard Maestro or automator.

Hi Bernardo,

I’m having a bit of a problem installing MuPDF. I downloaded the source file (mupdf-1.17.0-source.tar.gz) to my download folder and pasted the following command (as indicated in the MuPDF website) in Terminal:

  tar xzf mupdf-1.9a-source.tar.gz && cd mupdf-1.9a-source
  export XCFLAGS=-I/opt/X11/include/X11
  make prefix=/usr/local HAVE_GLFW=no install

No luck. I noticed the name of the downloaded file and changed the command to

  tar xzf mupdf-1.17.0-source.tar.gz && cd mupdf-1.17.0-source
  export XCFLAGS=-I/opt/X11/include/X11
  make prefix=/usr/local HAVE_GLFW=no install

Also no luck. Gives me the following:

Error opening archive: Failed to open 'mupdf-1.17.0-source.tar.gz'

I’m at a dead end. Maybe you can nudge me in the right direction?

The error message clearly says that the file is not there. What’s the output of

ls -l

in this folder? Is there a file mupdf-1.17.0-source.tar.gz?

I think I managed to install the mupdf (had to move it to /~ instead of /downloads). Still, I don’t think it installed correctly since the mutool commands give me -bash: mutool: command not found.

Edit: managed to install mupdf via homebrew. It works. :slight_smile: Now, alas, I cannot make @Bernardo_V script work. I installed the script RegexAndStuffLib at ~/Library/Scripts, but when I run the DNtp script it gives me an error:

Error Number: -2700
This script contains uncompiled changes and cannot be run.

That is not the right folder. :wink:

The script RegexAndStuffLib has to be in ~/Library/Script Libraries for the script to compile.

had to move it to /~ instead of /downloads

That makes no sense. You can install the script anywhere you want, in ~/ or ~/Downloads (please note the position of the slash!)
As to running it with bash: Use the complete path like

/Users/<your user name>/mutool 

or

/Users/<your user name>/Downloads/mutool

Yes, in my last message the path was with an error. I meant to say ~/ and ~/downloads. Thanks.

In any case, I already managed to install MuPDF (with homebrew) and it’s working fine.

I created the ~/Library/Script Libraries (didn’t exist) and placed the RegexAndStuffLib.scptd within. (The full path of the script is /Users/[me]/Library/Script Libraries/RegexAndStuffLib.scptd.)

Alas, it gives me a syntax error when I run The Script.