Thanks for sharing, @mdbraber (and thanks for your great contribution to making PDF and Markdown play nice together within DTP as well, truly an inspiration)!
Features like in the Obsidian PDF Plus plugin are exactly what I wished was possible within DEVONthink a while back; (mainly) being able to do highlights and annotations while reviewing the PDF, but storing all the highlights and annotations (and notes) in a separate (Markdown) file directly (not via export), with a 1-1 mapping/backlink (both ways), without ever editing the “original PDF” (leaving the PDF as original as possible).
A short backstory to illustrate the reasoning was that I experienced (a lot of) corrupted PDFs, e.g. font/OCR layer corrupted after doing some simple highlights/annotation within DEVONthink (using Apple PDFKit) on a PDF, making the PDF lose all the text layer (or rendering the text as garbage, which also causes the actual text to be “unsearchable”, etc.). DTP/PDFKit has improved here, e.g. not being able to edit PDFs which might cause corruption (which is good), but it left me wishing for alternatives, not tampering with the original PDF (which might cause corruption, while the PDF was being re-saved every time I did some highlights/annotations).
As usual, there will be pros & cons by separating the PDF and “metadata”, but as long as the output formats are good, and it’s readable/easy to understand (e.g. a page number and x/y), it outweighs the cons, for my use cases.
Some added benefits while keeping the source untouched, and annotations/highlights separate are:
- Blazing-fast synchronization, as highlights are only tiny bits of text, that need to be updated, every time you make a “modification”.
- PDFs do not have to be re-saved and re-synced every time a new highlight or annotation is added.
- Version Control (e.g. git) and diff(s) on highlights/annotations only (something I personally really like, as I use git/version control for my Obsidian/Markdown as well, and keeps a kind of “work log” of what’s changed, on a day-to-day basis)
- Version Control on PDF (if original, should be unchanged, but it’s possible).
- Possible to re-use the metadata (e.g. highlights) on multiple revisions of basically the same PDF (which I do often, e.g. release notes, with “added content in the end”, e.g. an updated DTP manual after a new release, or similar). One can simply “copy” the previous note, and update the source mapping to the new PDF, and it will automatically “overlay” while opening the new PDF, and all previous highlights/annotations/connections will apply to the new PDF as well
- Here I also found alternative methods, e.g. matching based on the page number and x/y offset (word/letters, etc.), and another (like hypothes.is, or hypothes.is-based plugins), being able to match based on a “search phrase”, and 100% based match on (unique) content (basically a search string, and highlight the matching parts). Again, both have their pros & cons.
- (Markdown) note with highlights and annotations can be viewed, searched, and updated in a separate process, without needing to “load the PDF first”, edit “via a PDF tool”, etc. This can open up new possibilities, e.g. using other tools for editing the “extracts” (e.g. Obsidian/Vim, etc.), based on user preference, while still maintaining a connection the the source PDF.
I guess the cons speak for themself, but the biggest one is not being able to view the PDF including the annotations without a tool possible to connect the two (when thinking about future-proofing). One option (possible with e.g. Zotero), is keeping a separate copy with extracted PDF (including annotations), which is what I opted in for, until now. Again, if the outputs are easy to read/understand, it might not be a problem.
So yeah, lots of (possibly unneeded) information here, but just wanted to leave my comments, and a short background for needing this.
The end goal (for me), is having a good process while reviewing a PDF, mainly consuming the information, doing quick highlights and annotations “as I review”, and in the end, connecting the information with other relevant parts (using Obsidian here, as it’s quite an awesome tool). The last part is being able to go back to the original reference in the future, if needed. The “extracted annotations”, including my (personal) notes and connection (to other notes), are the most important part, and the backlinked PDF reference is a “nice to have”-feature. Having it all-in-one (like with PDF Plus, now), is really cool! So thanks for sharing, again!
Side note(s)
- I’ve been tracking (Mozilla) pdf.js development with support for doing highlights and (editing) annotations (which has been partially available since late 2023), and I guess this will be available within Obsidian “when it’s ready”. Will be interesting to see how the implementation plays out while connecting the dots between the PDF, annotations, notes/Markdown, and DTP.
- PDF Plus (plugin) is using pdf-lib, if I understand correctly from the github page (if you want the actual “editing” on the PDF); have you seen any issues (if using it), so far? As you might understand from above, I’m a bit concerned about corrupting my PDFs (again).