Stream annotations for dummies

Dellu · November 16, 2022, 11:34am

This is a workflow I developed a long time ago. I have been using it for a long time. It is very simple; but quite effective. I am encouraged to post it because of the wonderful script by @ ryanjamurphy.

How is my system is different from his?
Mine is for the uninitiated: for person who has no skill to tweak and tinker with scripts. Since his script is advanced and could be hard to use it for some people, I am posting this as an alternative. For people who have less skill (or time) and want to invest on a paid app:

My system also has some advantages over ryanjamurphy’s:

images are supported
you can annotate back and forth across pages (page 3, then back to page 1, etc): all the annotations will be synced to Devonthink with no problem.
Very simple to run. You don’t need any scripting skill to use this. The only script this workflow has is the SED search-and-replace in one of the hazel rule. The highlights app does everything for you.
Changes are reflected immediately

My workflow also has its own problems–you need to invest on Hazel and Highlights app. Note that I am using very old version of Highlights app. The app is now went to subscription. The old version is still working fine. I cannot be sure if the settings I am talking about here have been changed in the new versions.

The propose:

You do your annotation in Highlights app, and all the annotations you are doing in Highlights are synced to DT in real time (within a couple of seconds). There is no manual exporting. You highlight and save, you have your annotation appear in DT with properly formed Markdown format.

Tools you need

gSed: mac version of sed–free software. You can istall it via homebrew.
Hazel
Highlights app

Steps

Go to the preference of highlights app & tick Save sidecard Markdown file in the general tab. This makes Highlights app to export the annotations every time you make changes on your annotations. This is the most important part of this workflow. I build everything around the annotation Sidecards provided by Highlights. Every time you save your annotated pdf, the sidecard will be updated. The sidecard will be in .md format, or textbundle (if it contains image).
Create two finders in finder:
a) m_highlights
b) Highlights

The Highlights folder is where we are going to store our exported markdown file. You can create this folder inside your Obsidian vault, if you are into that. That folder is what we are going to index in DT. The m_highlights folder is used just to process the workflow. You can create it anywhere.

Now, we are going to install hazel rules to process.

Setup these rules (
articles.hazelrules.zip (4.5 KB))
on your Book/articles folder. I assume you store your pdf articles or books in one folder. You just put these rules on the folder where your pdf live. This rule is to move the sidecard file from the that folder to the m_highlights folder.
Setup this hazel rule
h_highlights.hazelrules.zip (3.5 KB)
on your m_highlights folder.

These rules are to move and convert the sidecards.

Setup more rules
Highlights.hazelrules.zip (5.1 KB)
on your Highlights folder:

This one so to cleanup the markdown file. That is, the main changes will be make will be to make the headings in the pdf to appear in as headings in the markdown.

you need to highlight the heads in the pdf so that they will show up in the markdown.

We are finished. You simply index the Highlights folder in DT.

You will get markdown notes with proper formats. They have page numbers, markdown quotations, your own notes outside of the quotations, and headings of the pdf mapped to the headings (chapters, sections and subsections) of the markdown.

ryanjamurphy · November 16, 2022, 1:07pm

Neat use of Highlights! There may be a way to do most of this from within DEVONthink via Applescript and Smart Rules, too, cutting out Hazel and gsed. Could be better for some.

ryanjamurphy · November 16, 2022, 1:24pm

I’m not sure offhand, but if Applescript can’t handle it you can probably pull in gsed via Applescript. (I imagine that’s what are using gsed for?)

Dellu · November 16, 2022, 1:28pm

Dellu

ryanjamurphy

9m

The gsed is only to modify the texts. That part can be simply imported to your script. The hard part is exporting the image annotations. The image annotations are exported by Highlights packaged into a textbudle file. When you rename the .textbundle file to folder, you get the markdown file and the images under /assets folder. That is very transparent.

I know devonthink also exports image annotations. But, the way they are exported in DT is obscured. I don’t know where they are stored

ryanjamurphy · November 16, 2022, 1:32pm

In that case, this should be pretty trivial to do via Applescript!

To be clear, I like my “reading sessions” approach for different reasons, but in case anyone else is reading and wants to do this within DEVONthink, I think it’s worth a shot.

Dellu · November 16, 2022, 1:39pm

Indeed, there is a huge advantage to your approach–we are not locked into one reading/exporting app. I personally like PDF-expert much better than Highlights. The Highlights I have is old, and clunky. I cannot update it because the app has gone to subscription. But, I am still sticking with it because I am getting very neat outputs/exports.

the best part is the images are visible to any markdown reader: including Obsidian, Typora and DT itself. I like it because I snap a lot of image annotations when I am reading. There are many items on the pdf that I cannot just highlight–drawings, images, graphs, tables and formulas never appear clean in the text highlight. I put rectangle around them when I am reading. When I export the annotation, they appear as images. I like this. This makes the annotation complete–fully readable.

ryanjamurphy · November 16, 2022, 1:54pm

The extraction of images is indeed stellar.

What I like about reading sessions is that they are a nice way of breaking up the different “moments of thinking” that might come from reviewing something, particularly longer items like scholarly articles or books. I.e., I don’t want all of a reading’s annotations extracted into one thing.

There might be a way of extracting only the changes from your Highlights exports by looking at the difference between an earlier extraction and the newer one… which would give me the best of both worlds. Someday/Maybe!

Dellu · November 16, 2022, 1:56pm

I see. You are extracting only some part of the annotations. I didn’t understand that part. That is neat.

I don’t think that kind of workflow is possible with Highlights because it updates the sidecard file in place.

A simple way of imitating that kind of thinking with my method is simply to zoom into a certain section or subsection of the exported markdown annotation. If you are reading Chapter 4 of a dissertation, you can zoom into that heading, even if the annotations of the whole dissertation are exported.

FoldingText for example allows zooming into a single heading. Obsidian also might have a plugin for zooming to a heading. I didn’t check that one.

ryanjamurphy · November 16, 2022, 2:30pm

It’s like Jurassic Park. The question isn’t if we can… it’s if we should! It might not be worth the effort.

To that end, I don’t want my curious workflows to distract. This is a great workflow that will probably be better than my reading sessions approach for most people.

Thinking out loud for my own purposes here...

Ideally, for this automation, Highlights’s sidecar export would extract annotations in the order they are made. Unfortunately that is not the case: annotations are listed in the order they appear in the document. So we must find some other way to diff-check a changed Highlights sidecar file against previously-exported reading sessions.

A rough algorithm could be:

On a change in a Highlights sidecar file:

Get the Highlights sidecar file
Get all previous reading sessions files for this reading
Concatenate the reading sessions text
Split the reading sessions text by lines
For each line:
- Find and replace the line in the Highlights sidecar file with nothing
That will leave only new/different annotations in the Highlights sidecar text. Now save this text as a new reading session.

Obviously this wouldn’t be perfect. In particular, the find-and-replace technique is very hamfisted. It would work well in the ideal cases (e.g., where each annotation is completely unique), but could easily cause trouble if the user annotates the same phrase twice in a document. For example, if you accidentally highlighted only the word “and” in an old annotation, it would erase all instances of “and” from new reading sessions.

Still, that failure is probably rare in intended use, and extracting image annotations could well be worth it.

Dellu · November 16, 2022, 3:40pm

In the long run, investing on your script is much better move for the whole community.

The developers of DT might help in a few cases. They can improve the export format, for example, by making it cleaner than the current state.
They can also make the exporting of the image annotations more explicit.

That way, we don’t have to be locked to a single application such as Highlights. We will be able to export our annotations in better forms.

rctill · March 11, 2023, 11:19pm

Is there a way to do this without Highlights Pro? I don’t like subscriptions.

Dellu · March 12, 2023, 6:05am

I also would like to find out a way. But, so far, Highlights Pro is the only app that automatically writes a side file for the annotations.

If the developers of DT can look at ways to export images transparently, @ryanjamurphy’s script could be modified to do it.