Sending annotation to DTTG

kseggleton · October 7, 2016, 7:37pm

I really like reading about other peoples workflows and am interested to know how other people are automating DTTG. One of the things I like about the desktop version of DT is the ability to use third party apps or scripts to extract annotations from pdfs in DT. On iOS it is a little more difficult but not impossible. I thought that I would share how I extract annotations from pdfs and send the annotations as markdown formatted text back to DTTG. While the annotation tools in DTTG are nice LiquidText gives the ability to extract annotations and save them as a docx file or to the clipboard. I do the later.

First I copy a pdf to LiquidText from DTTG, then I markup the pdf using LiquidText’s annotation tools, then I export the notes outline via the share sheet to the clipboard. Finally I fire up the following python script in Pythonista (launched via a Launch Center Pro widget in Notification center).

#Script to take contents of clipboard, derived from LiquidText, strip out unnecessary words, format for Markdown and send to DevonThink

import re
import clipboard
import urllib.parse
import webbrowser

clip = clipboard.get()

#Replaces carriage returns with new line
stripReturn = re.sub("\r","\n", clip)

#Search for title and remove unnecessary words
searchTitle = re.match('Notes\sin\s(.*)\.pdf',stripReturn)
title = searchTitle.group(1)
cleanTitle = re.sub('(\‘)|(\.pdf)','',title)

#Remove first two lines
stripTitle = re.sub('Notes\sin.*\n',"", stripReturn)

#Remove unnecessary words and format in Markdown
stripObjectGroup = re.sub("Object\sGroup","", stripTitle)
stripExcerpt = re.sub("Excerpt:\t","> ",stripObjectGroup)
stripComment = re.sub("Comment:\s\s","", stripExcerpt)

#Add back title
text = '# Notes from ' + cleanTitle +  stripComment

#URL encode
encodeText = urllib.parse.quote(text)
encodeTitle = urllib.parse.quote(cleanTitle)

#Send to DevonThink
devonURl = 'x-devonthink://createText?title=' + encodeTitle + '&location=&text=' + encodeText

webbrowser.open(devonURl)

The LCP url scheme for launching a Pythonista script is in the form of pythonista3://{{name_of_script}}?action=run

The same thing can be achieved with the Workflow app and Drafts. What would be great is if the url scheme for DTTG expanded so that more iOS automation could be achieved.

korm · October 7, 2016, 7:57pm

This is great. I’ll want to test it out. I don’t recall anyone before using Pythonista with DTTG – so I hope this is a new trend

bister · September 1, 2017, 12:33am

Hi

I know nothing about python so apologies if I’ve missed something. I copied and pasted your code into a pythonista script.

However when I run it I get an error from
line 15 title = searchTitle.group(1).

It says ‘NoneType’ object has no attribute ‘group’

Is there an easy fix?

kind regards

Michael

kseggleton · September 1, 2017, 7:53am

I have to admit that I am not using this script these days as I am doing my annotations in PDF Expert and then using a Workflow.app workflow to convert the annotations to Markdown and export to DevonThink. The problem in the above script can be corrected by changing the below line:


searchTitle = re.match('Notes\sin\s(.*)\.pdf',stripReturn

to the following:


searchTitle = re.match('Notes\sin\s(.*)',stripReturn)

bister · September 1, 2017, 7:57am

thanks… I’ll try it. Although pdf expert and a workflow sound even better. Is it in the workflow directory on the website?

cheers

Michael

kseggleton · September 1, 2017, 8:21am

In PDF Expert you need to ‘save a copy’ of your annotations (found under the ellipsis menu) as an ‘annotations summary’. PDF Expert will then create a rich text file of your annotations. You can send this directly to DevonThink or you can run the following Workflow https://workflow.is/workflows/d056118836314d15b5c6f1ddca7e3725 which will convert the annotations summary to a Markdown formatted file.

brookter · September 1, 2017, 10:23am

This is very timely and very welcome! I’ve just been trying to work my way through a process to integrate DTTG (good at making and searching PDF annotations) and DTPO (not good at making and searching PDF annotations at all) and I’ve just reached the ‘Trying PDF Expert because that works on iPad and Desktop’ approach after spending some time in the Highlights (crashes all the time) and Skim (too much faff converting annotations) cul-de-sacs.

Your workflow rounds it all off very nicely – thank you!

I’ve just amended in one small way – the file it produced for me was plain text (.txt), so that DTTG doesn’t trigger the Markdown preview. Of course you can convert it in DTTG but it’s an extra step.

I managed to set the output file to be markdown by adding .md as straight text after the Group from Matched Text variable in the Set Name section, keeping Don’t Include File Extension (in Advanced) as unset.

This seems to work – the annotations are now recognised as Markdown in DTTG by default.

Thanks very much for the workflow – I mention the amendment only because it took me about 20 minutes to work out how to do it, so I thought it might help someone who’s also new like me to Workflow.

bister · September 1, 2017, 10:38am

Hi
thanks for all the advice. It all works.

In an ideal world I’d be capturing each note as a separate snippet in Devonthink ultimately all grouped with the original text. And of course I’d be able to grab figures. That’s what Sente used to allow with a script to post to Devonthink.
But at least I’ve got back to a point where all the annotations are getting into Devonthink

So now I actually have to do some work instead of fiddling

Michael

aroddick · September 1, 2017, 7:58pm

Following all this with great interest (like many here, a “Sente refugee”). I also would love to hear if the annotations can be saved somehow (through workflow?) as separate text files and sent directly to DT. My current workflow is to annotate in PDF Expert, send back to Bookends, then run a script from there to split them into individual files back on my Mac machine (which are then imported to DT). But if we could avoid that step…oh happy days!

kseggleton · September 3, 2017, 3:15am

This is possible with Workflow.app. For example the following workflow will take annotations in PDF Expert (as described above), split the annotations into seperate markdown files based on the quote, name the files with the first few words of the quote, send each quote (and any notes associated with that quote) through to a group in DTTG named ‘Annotations of name of PDF’ . The group is saved in an inbox or another group of your choosing, although you will need to adjust the workflow to insert the UUID of the inbox in DTTG that you want the grouped annotations to go to (the UUID are the numbers in a Devonthink item link).

The workflow is here: https://workflow.is/workflows/b9ce3e4bc1e24a24aac29b1929f82ddf

aroddick · September 3, 2017, 9:36pm

Wonderful! Thanks so much. Is it greedy to wonder if the page number can be inserted as well?

kseggleton · September 3, 2017, 10:00pm

That is doable - you would need to adjust the REGEX command in the workflow that searches for the highlights and notes. In the workflow there are two ‘Replace Text’ actions near the beginning of the workflow. In the first one replace the ‘Find Text’ expressions with:


highlight \[(.*)\]\:

And then in the ‘Replace With’ field underneath write:


> $1

In the second ‘Replace Text’ action replace the ‘FInd Text’ field with:


note \[(.*)\]\:

And in the ‘Replace With’ field underneath write:

$1

You will then get your page numbers in the Markdown files and your files will be named with page number first followed by the first few words of the file

aroddick · September 4, 2017, 12:40am

Wow! This is great — thanks for your time with this.

bister · September 8, 2017, 1:38am

kseggleton:

I have to admit that I am not using this script these days as I am doing my annotations in PDF Expert and then using a Workflow.app workflow to convert the annotations to Markdown and export to DevonThink. The problem in the above script can be corrected by changing the below line:
searchTitle = re.match('Notes\sin\s(.*)\.pdf',stripReturn
to the following:
searchTitle = re.match('Notes\sin\s(.*)',stripReturn)

Hi again

Thanks. Both the pdf expert export using workflow and the liquidtext python script work. The pdf expert is nicer in that you get separated snippets for eac note. But of course neither do mages.

Have you found a way using either PDF expert or liquidtext to clip import tables and figure to DEVONthink?

Cheers

Michael

kseggleton · September 8, 2017, 4:48am

PDF Expert currently doesn’t allow selection of non-text items. In LiquidText you can select non-text items and export the annotation summary manually to DTTG as a DOCX document, which should include diagrams

bister · September 8, 2017, 6:04am

Hi

you are absolutely right - liquid text is the only pdf reader I’ve come across on iOS that allows image selection. I’ve had some success using it with copied (an app which manages the clipboard history and syncs it across all your devices) but it is quite labour intensive. I’ve put in a request with Readdle to consider adding non-text selection… here’s hoping.

Ultimately I am aiming for the following:

something that works across iOS (including phone) and macOS
the ability to easily mark annotations within the pdf file: text but also images and tables/figures
the ability to add comments to the pdf
then to be able to save all those quotes/comments/images to Devonthink as individual snippets stored within a group specific to the source (linked back to the original source which is in Bookends in my case)

The point of saving everything in the pdf file is that it will be more robust than having extra annotation files. The reason for using Devonthink is to be able to locate things easily in the future.

It seems like this is the workflow everyone is more or less heading towards. There are tools around that do all the individual bits, (if stuck together with scripts and workflows). But nothing does everything. Sente and the Sente2Devonthink scripts came closest - but are sadly no longer with us.

For now I think I’m stuck using pdf expert for text annotations and doing the extracting on my iPad (unless anyone knows of a script on MacOS that does the same as the workflow you kindly linked). I’ll have to put comments in saying there is an important figure to grab and at some point grab them on my desktop with highlights or preview and copy them by hand into a media file in the group where the comments for the reference are.

I have considered running Devonthink and a pdf reader in split screen on the iPad and manually creating annotations in Devonthink as I read. But anything which gets in the way of actually concentrating on what I’m reading makes it hard to get to grips with the material.

Anyway, thanks for your help. I’m going to check back on this thread regularly and see if anyone has any new bright ideas

M

bister · September 10, 2017, 6:08am

An update…liOS 11 addresses nearly all the issues! I’m annotating PDFs in PDF expert for text and extracting annotations to DTTG via the workflow. I’m using the iOS screen capture and crop function to capture figures. I then share these to a list for the article in Copied (clipboard management app). This is very seamless. Finally I dump the list of clippings from copied into the group generated in DT by the workflow that extracts the text annotations. Sounds complicated but it really isn’t. And it’s effortless.

Cheers

M

nestor · April 14, 2021, 2:38pm

Hello Kseggleton,
very interesting…how do you manage the same workflow in DT? I mean, now that Liquidtext has a Mac version, I tend to finish the annotation part with my iPad and after I open the Liquidtext project on my mac and export. But in which format do you import the pdf in DT and how you extract the annotations and excerpts? Thanks!

kseggleton · April 14, 2021, 6:19pm

I have stopped using Liquidtext some time ago. These days I annotate in Bookends and then export into DT with some applescripts

nestor · April 15, 2021, 11:25am

Ok thanks anyway!