Extract annotations and highlighted text from PDF into RTF

This would be a huge boon!

If only DTPO could get an extension like Zotfile 2.0 for Zotero. It seems like the key piece is pdf.js, whatever that is:

★ Extract Annotations from PDF Files

After highlighting and annotating pdfs on your tablet (or with the PDF reader application on your computer), ZotFile can automatically extract the highlighted text and note annotations from the pdf. The extracted text is saved in a Zotero note. Thanks to Joe Devietti, this feature is now available on all platforms based on the pdf.js library.

columbia.edu/~jpl2136/zotfile.html
Any idea if this sort of thing is easily implementable for DTPO?

Also, check out how Zotfile handles the tablet editing issue, which DTToGo completely fails on. Would love to have this capability too!

One possibility is to use an Automator workflow with these actions:

  1. Get Selected Records
  2. Run AppleScript:

on run {input, parameters}
	set thePaths to {}
	repeat with theRecord in input
		set thePath to path of theRecord
		if thePath is not "" then set thePaths to thePaths & thePath
	end repeat
	return thePaths
end run

  1. Extract PDF Annotations
  2. Run AppleScript:

on run {input, parameters}
	if class of input is list and (count of input) is 1 then
		tell application "DEVONthink Pro" to create record with {name:"Annotations", type:txt, content:item 1 of input}
	end if
end run

The workflow could be simplified by replacing the second step with the “Get Item from Records” action and the last step with the “New Text Record” action. Obviously I’m a scripter, not an automator :slight_smile:

Thanks!
I’ve actually discovered a workflow that I find ideal. DTPO is still part of it, but the key is iAnnotate on iPad:

iAnnotate was the deciding factor for me to get an iPad. I love everything about the app, but especially the ability to export highlighted text and comments as a text file. Syncing with Dropbox works great, and I have the target folder in Dropbox indexed by DevonThink Pro Office on my MacBook, so all annotations appear in DTPO. They are not searchable within the document, but here’s what I do:

  1. First I set up an e-mail address through sendtodropbox.com. Anything sent to this address ends up in a folder in DropBox.
  2. Then I set that folder in DropBox to import any new files to DTPO.
  3. When I’m finished annotating a file in iAnnotate, I e-mail the comments/highlighting to DropBox, and the text file ends up in DTPO.
  4. I attach the text file to the PDF in DTPO using the annotation script.

Of course none of this is necessary if you just want an annotated doc to show up in DTPO. But I like having a separate doc that has all my notes and quotes in one place. And considering that I spend two hours reading an article, this manual process takes 30 seconds that are totally worth it.

I also love Joliprint, which gives me a bookmarklet in Safari on ipad that turns any web page into a PDF that can then be saved to Dropbox or opened directly in iAnnotate.

Joliprint also ingeniously allows you to turn any shareable content from any app on your ipad into a PDF by adding their Twitter handle to a tweet. The results from within the NYTimes ipad app are just stunning (though legally shady?).

By the way, I ran this automator script to process PDFs that I had annotated previously in DTPO, but it doesn’t do what I especially want b/c Preview’s “extract PDF annotations” automator action only extracts the text of commments/notes, not highlighted text. It will show you on which page you highlighted something, but it won’t extract that text.

Suggestion: integrate iAnnotate’s service within DTPO, just like you integrate ABBY OCR. This is a huge deal as more and more academics move to digital workspaces for PDF reading. (Just see all the dozens of frustrated forum strings out in Mac-help-land: everybody wants to do this, and so far there are only a few iPad-based apps that do it.)