Obsidian and PDF++: what deep PDF integration with Markdown can look like

This is a bit off-center regarding DEVONthink per se, but as many people here are into highlighting / annotating PDFs using Markdown I’d just like to point out an excellent companion to DT by using Obsidian with the new PDF++ plugin. The depth of integration in the PDF libraries the author has done is very inspirational in terms of how annotations can link back to Markdown and vice-versa (if you’re an Obsidian user: try to install the plugin and see the enormous Markdown-PDF integration options).

I’m not trying to suggest anything regarding the up- or downsides of tools like Obsidian. But as this is an open source project, I just thought this might be interesting for those working with PDF and Markdown annotations and serve as an inspiration!

1 Like

Interesting, thanks for sharing.

I may have misunderstood, but it looks to me that it’s adding Obsidian-flavour highlights to PDFs, not standard annotations? (E.g. the highlights are only visible in the Obsidian viewer?) Which would make it an automatic no for me (I have a rule that I only undertake PDF annotations in the standard format so that it’s cross-compatible and not locked into the PDF or a specific app).

Otherwise it looks quite good, though DT does most this stuff natively (as does PDFExpert, which uses standard annotations and is cross-compatible).

That block highlighting function is useful in principle - a couple of apps offer it. I’ve yet to find a use case myself, but I understand why our mathematician friends and others might need it. (I’m either highlighting words, or I end up screengrabbing an entire diagram, so I’ve not made much use of the block highlight function.)

1 Like

@MsLogica it’s not adding anything Obsidian-flavoured to make it work, it uses standard annotation, but also can offer more / other ways. It can add the highlights to PDF as you make them (using standard annotations, also editable by DT), but also keep them in a separate annotation file (which is my personal preference) and dynamically highlight them. It’s fully cross-compatible by default, and can offer additional ways, but only if you want to. This is from the PDF++ settings:

I’ve built something similar for DT a while back:

See for more information the scripts in this thread Stream annotations from your PDF reading sessions with DEVONthink - #34 by mdbraber

I only try to use things that are cross-compatible, without having to use DT, Obsidian or any other tool - which is why Markdown and PDF are favorites. I see DT and other tools more as viewer which offer an additional layer of convenience.

1 Like

Ah, it wasn’t clear in the documentation, thank you for clarifying.

Thanks for sharing, @mdbraber (and thanks for your great contribution to making PDF and Markdown play nice together within DTP as well, truly an inspiration)!

Features like in the Obsidian PDF Plus plugin are exactly what I wished was possible within DEVONthink a while back; (mainly) being able to do highlights and annotations while reviewing the PDF, but storing all the highlights and annotations (and notes) in a separate (Markdown) file directly (not via export), with a 1-1 mapping/backlink (both ways), without ever editing the “original PDF” (leaving the PDF as original as possible).

A short backstory to illustrate the reasoning was that I experienced (a lot of) corrupted PDFs, e.g. font/OCR layer corrupted after doing some simple highlights/annotation within DEVONthink (using Apple PDFKit) on a PDF, making the PDF lose all the text layer (or rendering the text as garbage, which also causes the actual text to be “unsearchable”, etc.). DTP/PDFKit has improved here, e.g. not being able to edit PDFs which might cause corruption (which is good), but it left me wishing for alternatives, not tampering with the original PDF (which might cause corruption, while the PDF was being re-saved every time I did some highlights/annotations).

As usual, there will be pros & cons by separating the PDF and “metadata”, but as long as the output formats are good, and it’s readable/easy to understand (e.g. a page number and x/y), it outweighs the cons, for my use cases.

Some added benefits while keeping the source untouched, and annotations/highlights separate are:

  • Blazing-fast synchronization, as highlights are only tiny bits of text, that need to be updated, every time you make a “modification”.
  • PDFs do not have to be re-saved and re-synced every time a new highlight or annotation is added.
  • Version Control (e.g. git) and diff(s) on highlights/annotations only (something I personally really like, as I use git/version control for my Obsidian/Markdown as well, and keeps a kind of “work log” of what’s changed, on a day-to-day basis)
  • Version Control on PDF (if original, should be unchanged, but it’s possible).
  • Possible to re-use the metadata (e.g. highlights) on multiple revisions of basically the same PDF (which I do often, e.g. release notes, with “added content in the end”, e.g. an updated DTP manual after a new release, or similar). One can simply “copy” the previous note, and update the source mapping to the new PDF, and it will automatically “overlay” while opening the new PDF, and all previous highlights/annotations/connections will apply to the new PDF as well
    • Here I also found alternative methods, e.g. matching based on the page number and x/y offset (word/letters, etc.), and another (like hypothes.is, or hypothes.is-based plugins), being able to match based on a “search phrase”, and 100% based match on (unique) content (basically a search string, and highlight the matching parts). Again, both have their pros & cons.
  • (Markdown) note with highlights and annotations can be viewed, searched, and updated in a separate process, without needing to “load the PDF first”, edit “via a PDF tool”, etc. This can open up new possibilities, e.g. using other tools for editing the “extracts” (e.g. Obsidian/Vim, etc.), based on user preference, while still maintaining a connection the the source PDF.

I guess the cons speak for themself, but the biggest one is not being able to view the PDF including the annotations without a tool possible to connect the two (when thinking about future-proofing). One option (possible with e.g. Zotero), is keeping a separate copy with extracted PDF (including annotations), which is what I opted in for, until now. Again, if the outputs are easy to read/understand, it might not be a problem.

So yeah, lots of (possibly unneeded) information here, but just wanted to leave my comments, and a short background for needing this.

The end goal (for me), is having a good process while reviewing a PDF, mainly consuming the information, doing quick highlights and annotations “as I review”, and in the end, connecting the information with other relevant parts (using Obsidian here, as it’s quite an awesome tool). The last part is being able to go back to the original reference in the future, if needed. The “extracted annotations”, including my (personal) notes and connection (to other notes), are the most important part, and the backlinked PDF reference is a “nice to have”-feature. Having it all-in-one (like with PDF Plus, now), is really cool! So thanks for sharing, again!

Side note(s)

  • I’ve been tracking (Mozilla) pdf.js development with support for doing highlights and (editing) annotations (which has been partially available since late 2023), and I guess this will be available within Obsidian “when it’s ready”. Will be interesting to see how the implementation plays out while connecting the dots between the PDF, annotations, notes/Markdown, and DTP.
  • PDF Plus (plugin) is using pdf-lib, if I understand correctly from the github page (if you want the actual “editing” on the PDF); have you seen any issues (if using it), so far? As you might understand from above, I’m a bit concerned about corrupting my PDFs (again).

It’s possible to do this conveniently with a script. This script basically gets text stored in the clipboard, changes it into the format you desire, and then inserts it into the annotation file of the currently opened PDF document. First you press shift-command-C to Copy Source Link of selected text in a PDF. Then press a custom keyboard shortcut to run your script.

1 Like

Being a “one way export”, yes; but if the highlights/annotations are not stored within the PDF, and only stored in clipboard > and then saved/exported to the Markdown file, will it then be shown (as “overlay”) when (re)opening the very same PDF again later, based on what’s stored within the Markdown (being used to store the annotations)? This is what is possible with the combination of Obsidian, PDF and PDF Plus now!

I know it’s possible in theory to “re-markup” (or whatever it’s called), based on the awesome work @mdbraber (linked to above), but as far as I understand, it’s then based on annotations stored in the actual PDF (while maintaining a sync between the Markdown file), hence not even possible in my case (whereas DTP can’t annotate the PDF while using PDFKit, or possible PDF corruption).

1 Like