Exporting highlight colour information in Summarise Highlights

I use DT frequently to review documents in PDF format and make use of different highlight colours to identify different aspects or uses of the highlighted text. After going through a document I often use the Summarise Highlights capabilities of DT to export the highlighted text (+ details I might have added) for further analyses outside of DT, typically using Tinderbox. Unfortunately the highlight colour information is not consistently exported this way.

This is illustrated below in an test PDF document I generated for illustration purposes:

If I now use the excellent Summarise Highlights command and select as Markdown as the export format I get the following:

Notice that the green and blue highlight colours are lost in the process and are now represented as yellow highlights in the summary markdown file. There is no other information in the markdown file on the original highlight colour.

If I select as Rich Text as the export format on the other hand I get

The highlight colours are maintained but would need to be extracted from the RTF document somehow. I don’t know how to do and represents an extra step that I would like to avoid if possible.

Not illustrated here is the result of exporting as Sheet which generates a TSV file, one row for each highlight. I may be wrong but none of the columns in the TSV file include information on the original highlight colour so I’m stuck here too.

I’m hoping this feature may be considered by the developer in future updates. I noticed related requests from other users for instance here and here. I realise that other apps outside of DT provide this function, but would greatly prefer this functionality to be available within DevonThink esp. since all the other steps are taken within DT.

Markdown and in this case MultiMarkdown doesn’t support this, usually we avoid adding proprietary extensions to standard file formats or bloating Markdown with HTML.


Thanks for the feedback @cgrunenberg. I understand you do not want to break existing standards.

Perhaps I’m mistaken but for Markdown I was thinking more a text-based tag for each highlight giving the colour of the highlight (rather than the highlight of the text itself). I don’t think this would break any standards as simply a customised textural output. For TSV I assumed an additional column with highlight colour be added ?

I’ve included a screenshot of the output from the Highlights app - using the same input file as above - to illustrate how highlight colour can be included in markdown output. Note that I much prefer the output format of DevonThink and the choice to include the page link for each highlight which ensures I have a reference for each highlight should I split the file into smaller notes. Including colour information in addition to the link would be ideal.

1 Like

One issue is that the colors defined in Preferences > Colors > Highlighting are frequently not the ones used by the document, e.g. if the settings were changed in the past or other apps were used to annotate the document. And in this case there would be no name to add to the output.

Good point - if I’ve understood correctly colour names such as yellow, blue or green to reference highlights and their colour can be lost if using other Apps than DevonThink to perform the highlighting. On the other hand the colour information as in colour specification (not name) itself seems to be maintained between at least two common annotation programs. I’ve tested it myself using Highlights and Acrobat Reader.

Here the output of the Highlights app:

and the corresponding Rich Text DT Summarise Highlights output

Similarly when I use Adobe Acrobat Reader, the output:

and the DT Summarise Highlights output

In the above we see that the colours are different (look at the yellow highlights for instance) but consistent between PDF and Summarise Highlights output. I assume some kind of colour tag or similar that allows Summarise Highlights to track and specify the colour.

A solution perhaps might be the following:

  • If highlighting is done only within DT the colours can be tabled and exported with colour names (yellow, green, blue etc…)
  • Using external highlighting apps it is understood that the colour names are lost. Usually highlighting colours are distinct and well separated in the colour space. Either an algorithm is used to assign colours based on nearest reference colour (closest to RGB=255,255,0 is given the label yellow) or the colour tag is provided as information (eg. #FFFF00 for Yellow).

Other more tech savvy users might have better proposals.

The choice of highlight colour can carry a lot of meaning in the annotation of PDFs - as you can see I’m still motivated to find a solution or at least convince @cgrunenberg to find one :slight_smile:

Highlights does not preserve custom highlight color names.

And the names in DEVONthink’s preferences are just used by the user interface, they’re not part of any highlighted document. That’s why mapping would be necessary but could easily fail due to different users, different settings or different apps.

1 Like

I think we are saying the same thing e.g. that the colour “names” which correspond to the names appearing in the DT highlight menu are not stored in the document. However, the colour specification perhaps expressed as RGB triplets or in a Hex string in HTML must be stored somehow, otherwise the RTF summarised output file would not be able to replicate highlight colours.

Given the lack of colour names the choices seem to be:

  • Include only available information on colours during the summarise output process (the same or derived from that embedded in the RTF output)
  • Perform a mapping colour names as found in the menu during the summarise output process (only works for users who stay within DT for their annotation)
  • Perform a mapping of colour names based on the nearest reference colour (more complicated as you need to classify the colour)

The first choice above is the most robust, the 2nd is similar to what the Highlights App provides, the 3rd is more general but complicated and perhaps error prone if two highlight colours are very similar (e.g. two different shades of yellow).

Another solution might be to offer the user the option to group all highlights by their colours first and then export.