Script to create individual Markdown notes from PDF annotations

msteffens · October 14, 2024, 10:35pm

Here’s a script that creates individual Markdown notes for annotations contained in one or more selected PDFs:

For each of the PDFs selected in DEVONthink, this script will iterate over its contained PDF annotations and create or update a Markdown record for each markup or text annotation.

The URL of each Markdown record will be set to a deep link that directly points to the corresponding PDF annotation. I.e., clicking this deep link will open the associated PDF and scroll the corresponding PDF annotation into view.

For each Markdown record, the script assigns a color label that matches your annotation’s highlight color.

My script is similar in spirit to @Frederiko’s Annotation Pane script. However, my script doesn’t offer a custom UI to set annotation properties but instead parses the PDF annotations directly.

In addition, the script recognizes some markup in PDF annotation notes. This lets you specify the annotation’s name/title and comment as well as its flagged status, star rating, tags and custom metadata. Example annotation note as supported by this script:

# Your title for this annotation

Your comment about this annotation.

< *** @tag @another tag @:flagged @:metadatakey:Some value

If a DOI was found for a PDF, the script can also fetch its bibliographic metadata and set the custom metadata and/or Finder comment of the Markdown records & their group accordingly.

For further details, please see the script’s README as well as the notes at the top of the script.

msteffens · October 15, 2024, 11:44am

Here are a few screenshots that illustrate the script’s input & output:

You’d start as usual by annotating a PDF (in DEVONthink on a Mac or on iOS, or in any third-party app that saves PDF annotations to the PDF).

Here’s a view of an annotated PDF, after importing & opening it in DEVONthink:

In the above screenshot, the first PDF annotation is selected, and you can see that I’ve added my own comments/notes to each of the displayed annotations. Within these annotation notes, I’ve used Markdown-style headings (like # Overall Arctic…) which will become the title of the Markdown note that will be created by the script for the PDF annotation.

In a “metadata line” (which is a line starting with < ), I’ve added a 3-star rating (***), some @tags and the special tag @:flagged which will mark the created Markdown note with a flag.

After running the script with this PDF selected, this is how the created Markdown note looks:

Note that the Markdown heading has been set as the note’s name (prefixed with the annotation’s page), and the properties and tags have been set according to the markup in my PDF annotation note.

The body text of the note contains your Markdown heading and your comments for that annotation, with the highlighted text in between.

In addition, a DEVONthink color label has been set for this note that roughly matches the original PDF annotation highlight color (red).

Clicking the URL for this note would again open the PDF, and scroll the original PDF annotation into view.

By default, the script will also try to extract a DOI from the PDF’s own metadata (or from its first page), and fetch bibliographic metadata (and optionally also BibTeX data) for it. These get set as custom metadata of the note (and for its group folder):

Each note (as well as the original PDF) will also have a link in their custom metadata (“Pdfannotations”) that links back to the corresponding group folder.

The many note properties & metadata can help when searching your notes (or when setting up smart groups):

jpavao · October 16, 2024, 8:58pm

Nice work. Thanks for sharing. DT is a great app and with these kind of scripts is nicely extended.

Phileosophos · October 17, 2024, 8:04pm

Thanks for sharing! That looks like a lovely tool I am going to have to try!

msteffens · October 17, 2024, 8:27pm

As people had issues installing the script, I’ve made a quick screencast that shows how to download, install and use the script:

As shown in the screencast, it’s easiest to make use of the direct “download” link that’s available in the “Installation” section of the script’s Readme.

After download has completed, just double click the .zip package to unzip it. Then move the .scptd file to the DEVONthink script folder that’s located at ~/Library/Application Scripts/com.devon-technologies.think3/Menu.

tja · October 17, 2024, 8:32pm

For those like me, who mostly work on the terminal and never use Finder, it’s esp. simple:

Just unzip the ZIP file from https://github.com/extracts/mac-scripting/raw/master/DEVONthink/DEVONthink_Notes_from_PDF_Annotations/DEVONthink_Notes_from_PDF_Annotations.scptd.zip

unzip DEVONthink_Notes_from_PDF_Annotations.scptd.zip

And then move the contained script bundle (that’s a directory) to the script menu folder of DEVONthink:

mv DEVONthink_Notes_from_PDF_Annotations.scptd ~/Library/Application\ Scripts/com.devon-technologies.think3/Menu/

Finish

msteffens · October 18, 2024, 8:18pm

@msteffens This is amazing and is immediately useful to me. Thank you for sharing it. Even more surprising that we seem to have very similar approaches to annotating pdfs!

Thank you!

Now… I hate to be “that guy” (but here we go…) Where you use “@” symbols to denote your tags, I just put mine on a new line that starts with “Tags:” followed by comma-separated values that represent the tags. I cannot figure out how to modify your script to capture these. If you’re willing, any chance you could suggest how this could be done? Thanks in advance for any advice you’re willing to share.

I think the easiest approach would be to inject a “filter” method which transforms your markup syntax into the one used by the script. To do so, please open the script in Script Editor and search for this line:

    set annotText to (pdfAnnotation's annotText)

Insert following line in front of the above line:

    set aComment to my preprocessAnnotationComment(aComment)

Then, at the very bottom of the script, insert this script handler:

-- Transforms the given annotation comment/notes (which may contain custom markup
-- syntax) into a Keypoints-style format that's supported by this script and returns it.
on preprocessAnnotationComment(aComment)
	-- convert tags
	-- input: a separate line that starts with “Tags:” followed by comma-separated values that represent the tags
	set transformedLines to {}
	set tagsLineRegex to "(?<=^|[\\r\\n])Tags:\\s*"
	set tagDelimiterRegex to "(?<=^<|[\\r\\n]<)\\s+|\\s*,\\s*"
	
	repeat with aLine in paragraphs of aComment
		if (KeypointsLib's regexMatch(aLine, tagsLineRegex)) is not "" then
			set aLine to KeypointsLib's regexReplace(aLine, tagsLineRegex, "< ")
			set aLine to KeypointsLib's regexReplace(aLine, tagDelimiterRegex, " @")
		end if
		copy aLine as text to end of transformedLines
	end repeat
	
	set transformedString to KeypointsLib's mergeTextItems(transformedLines, linefeed) & linefeed
	
	return transformedString
end preprocessAnnotationComment

I’m sure this could be done in a better way but it should get the work done.

bangersandmash · October 19, 2024, 3:51pm

Wow! Amazing. Unfortunately, I get an error that says " The document “DEVONthink_Notes_from_PDF_Annotations.scptd” could not be saved. (Error -45) (which obviously has nothing to do with your script). I’ll just pre-process with something that converts my tags to your format and go with it. I really appreciate you taking the time to respond, though and I learned a lot about AppleScript along the way – which is a lovely bonus. Have a great weekend!

chrillek · October 19, 2024, 7:04pm

You get this error when you do what exactly?

msteffens · October 19, 2024, 7:06pm

Many thanks for the kind words, you‘re welcome!

Hmm, maybe it has to do with my script (as it was saved by the third-party app Script Debugger, makes heavy use of ASObjC code, and uses an included library, so it’s rather complex). But I‘m not sure.

I‘ve found only one other mention of this error. In their case, the only workaround that helped was apparently downloading a copy of Script Debugger (it has a free Lite version), then opening, editing & saving the script with Script Debugger.

I‘m confident that this would work, but I can understand if that’s too much hassle for you. That said, having a copy of Script Debugger on your machine is definitely worth it IMO, this app definitely belongs to the best Mac apps ever made, along with better known ones like DEVONthink, BBEdit or Scrivener.

msteffens · October 19, 2024, 7:28pm

Alternatively you could also try to download the .applescript or .scpt version of the script (which aren‘t script bundles and don’t include the script library file). Then try to open, edit & save again in Script Editor.

If you go that route, however, you’ll also need to put the used script library into a Script Libraries folder inside your Library folder that’s within your home folder. See here for more info.

bangersandmash · October 19, 2024, 11:23pm

Script Debugger did the trick! Everything is working beautifully!

reb2012 · October 24, 2024, 10:00am

Thank you for sharing this script which is potentially very useful. I am definitely not an expert user, but for me it worked first time. I just used the DTP menu to open the scripts folder, dropped the file into the Menu subfolder, and it worked immediately. I note that it appears to read only highlighted text in the PDF and text that is in text boxes (the notes that have to be clicked on to open them), not text that is typed directly on the page, something I have been using more recently to avoid having to click on an annotation to see it.

I have one question. I have not worked out how to get the md files to be listed in DTP in the order in which they appear on the page of the PDF document so that when I merge them they appear in the “correct” order. Is this a matter of simply changing the sort order in DTP, or is it necessary to do something about titles/filenames? This is potentially significant because I can see many situations where I will want to merge all the annotations created in a single PDF, or perhaps those on selected page(s) of a PDF.

msteffens · October 24, 2024, 4:44pm

I’m glad to hear that the script worked fine out of the box! It’s true that the script only extracts text and markup (highlight, underline, strikethrough & squiggly) annotations. It may be possible to support other annotation types, but I’d need to test what I get returned for these types from the PDFKit framework.

The order of annotation notes is indeed not perfect yet. I think that a correct sort order could be achieved by the script but it would require adding some sortable identifier string as part of the note’s name (or any other field that can be used for sorting in DEVONthink). If you don’t want the script to always renumber all notes (when a new annotation note gets added) then the script would need to add some kind of identifier string that encapsulates the annotation’s x/y positions on the page, and which can handle a typical two-column layout. I’ve implemented this before for my own app, but I had left this out for the script since I wasn’t sure how to best achieve this w/o polluting the note’s name.

msteffens · November 10, 2024, 12:49pm

@reb2012, I’ve just released version 1.2 of the script which adds support for free text annotations, i.e. annotations whose text is always visible (instead of being displayed in a pop-up window).

For each Markdown record, version 1.2 of the script also adds a sort identifier string to an annotationorder custom metadata field. This metadata field can be used in DEVONthink to sort annotations in the order they appear in the text of a PDF page.

By default, sort identifier generation tries to respect a typical 2-column PDF text layout. Note that this may not always be perfect, but you can disable or tweak this via properties in the script. And, of course, you can always edit the sort identifier strings manually to account for any tricky cases.

@bangersandmash wrote:

I’ve now added the above mentioned filter method as a hook into the script. This method is now always called for every annotation with an annotation comment. As explained above, this can be used to preprocess & transform the given annotation comment (which may contain custom markup syntax) into a Keypoints-style format that’s supported by this script.

@bangersandmash, to re-enable your use case of converting “Tags: …, …” lines in annotation comments, just open script version 1.2 with Script Debugger and search for this line:

return aComment

and comment it out (i.e., prefix it with --):

--return aComment

Then save the script again.

For other use cases, the hook method preprocessAnnotationComment() would need to get adopted.

In the future, the script may be further enhanced by adding similar hook methods that allow for easy customization of other aspects of the script.

In addition, I’d like to allow the name & content of created Markdown records to be generated via a template mechanism. This would, for example, allow for custom YAML headers which, in turn, would ease the reuse of the created Markdown files with other applications.

reb2012 · November 10, 2024, 5:40pm

Thank you very much for revising this. That was unexpected and very welcome.

First time I tried it the new version did not work, but I ran it in script editor which told me that the problem was the missing KeypointsScriptingLib. I then read the instructions in the comments, and installed the library as instructed. The script now appears to be working perfectly.

msteffens · November 10, 2024, 6:55pm

You’re welcome! Your suggestions helped to improve the script, thank you!

Ah, my mistake, sorry for the trouble! For some reason, the updated .scptd file did not contain the KeypointsScriptingLib anymore. I’ve fixed this issue and re-uploaded the DEVONthink_Notes_from_PDF_Annotations.scptd.zip file which can be downloaded here. That script package should be ready to go (again).

But great to hear that you could help yourself already!

I’m glad to hear this. Let me know if you run into further issues or have any more suggestions.

bangersandmash · November 11, 2024, 2:42pm

Wow, incredible stuff. Thanks for these tweaks @msteffens. Your work has already been a goldmine for me. Looking forward to installing this.

reb2012 · November 14, 2024, 3:06pm

A very minor note. If you drop a file into DTP3 and then run the script immediately, it says that no PDF with annotations has been highlighted. Wait a moment (presumably while DTP3 indexes or something like that) and it works perfectly.

The script is also useful if all you want to do is count the annotations.

msteffens · November 14, 2024, 3:50pm

Interesting, thanks for the note. Not sure if the script can do anything about this but I guess not.

If you just want to see PDF annotation counts for all records selected in DEVONthink, you could also run a script like this:

tell application id "DNtp"
	set selRecords to selected records as list
	set annotationsByPDF to {}
	
	repeat with theRecord in selRecords
		if (type of theRecord is PDF document) then
			set annotationsByPDF to annotationsByPDF & ¬
				(name of theRecord & ": " & annotation count of theRecord & linefeed)
		end if
	end repeat
	
	set infoText to "(none)"
	if annotationsByPDF is not {} then set infoText to annotationsByPDF as string
	display alert "Selected records with PDF Annotations:" message infoText as informational buttons {"OK"} default button "OK"
end tell

It might be neat to change this script so that it displays results as selectable list items, and then instructs the “DEVONthink Notes from PDF Annotations” script to import PDF annotations for all chosen items.