Any way to remove PDF annotations prefix (page, highlight, date+time) which DT inserts before each annotation?

When I annotate a PDF and copy paste the annotations from inspector into an RTF file, DT automatically adds a prefix to each highlighted word(s), sentence and paragraph (when they are highlighted separately) , for example
2 Highlight 2020-11-20, 21:39:28 (page, highlight, date+time)

In some cases this can be useful but for my use in most cases it makes reading the annotations extremely tedious. Is there any way to eliminate the prefix. My only workaround has been to OCR the whole annotations section in the inspector, which is not a realistic solution.

thank you very much

Try this.

-- Remove page, annotation type and date from manually copied PDF annotation

-- 1. Copy your annotation in the annotation inspector
-- 2. Run the script
-- 3. Paste

-- NOTE: from a short test it seems "thePattern" works with my locale, 
-- for any other locale you'll probably have to adjust the date matching \\d\\d\\.\\d\\d\\.\\d\\d\\, \\d\\d\\:\\d\\d\\:\\d\\d\\

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions


set thePattern to "^\\d+\\t\\w+ ?\\w+\\t\\w+\\t\\d\\d\\.\\d\\d\\.\\d\\d\\, \\d\\d\\:\\d\\d\\:\\d\\d\\t"


set theText to the clipboard as string
set theText_clean to my regexReplace(theText, thePattern, "")
if theText_clean ≠ theText then
	set the clipboard to theText_clean & return
	display notification "Copied cleaned annotation"
else
	display dialog "There's either no annotation in your clipboard or you have to adjust the Regex" with title "Clean annotation"
end if
return theText_clean

on regexReplace(theText, thePattern, theRepacement)
	try
		set theString to current application's NSString's stringWithString:theText
		set newString to theString's stringByReplacingOccurrencesOfString:(thePattern) withString:(theRepacement) options:(current application's NSRegularExpressionSearch) range:{location:0, |length|:length of theText}
		set newText to newString as string
	on error error_message number error_number
		activate
		display alert "Error: Handler \"regexReplace\"" message error_message as warning
		error number -128
	end try
end regexReplace

a very nice script, thank you.
It does not work despite checking that the annotations are in the clipboard. Error message below. Probably related to date problem. I will figure it out.
I am trying to use the script as a generic regex text clean up.
May I ask you where and how in the script I would insert the replace in the script if I want to use it for something else
thank you

If you post an annotation that doesn’t work I’ll take a look. If you don’t want to post your real username replace it, e.g. if your real username is “petE 31” change it to “userA 31”.

very kind of you

my name is not in the prefix

4 Highlight 2020-11-20, 21:46:18 PNEUMONITIS

4 Highlight 2020-11-20, 21:46:22 . Rituximab has also been reported to cause
pneumonitis, cough, dyspnea, and pulmonary infiltrates.


This is my analysis of prefixes from left to right

Explanation of the prefix from left to right

  1. page number, let say 0 to 1000

  2. one tab

  3. the word Highlight

  4. two (2) tabs

  5. date and time as 2020-11-20, 21:48:47 (note the space after the comma)

  6. one tab


I am adding the snapshot below because the forum does not render the text I typed above correctly

Didn’t thought about that.

Try this.

-- Remove page, annotation type and date from manually copied PDF annotation

-- 1. Copy your annotation in the annotation inspector
-- 2. Run the script
-- 3. Paste

-- NOTE: This seems to work with date format 2021-11-20 or 20.11.20 and with or without a authorname
-- However it's likely that you need to adjust the pattern as I'm by no means familiar with Regex

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions


set thePattern to "^\\d+\\t(\\w+ ?\\w+)\\t(\\w+ ?\\w+)?\\t(\\d\\d\\d\\d-\\d\\d-\\d\\d|\\d\\d\\.\\d\\d.\\d\\d), \\d\\d:\\d\\d:\\d\\d\\t"


set theText to the clipboard as string
set theText_clean to my regexReplace(theText, thePattern, "")
if theText_clean ≠ theText then
	set the clipboard to theText_clean & return
	display notification "Copied cleaned annotation"
else
	display dialog "There's either no annotation in your clipboard or you have to adjust the Regex" with title "Clean annotation"
end if
return theText_clean

on regexReplace(theText, thePattern, theRepacement)
	try
		set theString to current application's NSString's stringWithString:theText
		set newString to theString's stringByReplacingOccurrencesOfString:(thePattern) withString:(theRepacement) options:(current application's NSRegularExpressionSearch) range:{location:0, |length|:length of theText}
		set newText to newString as string
	on error error_message number error_number
		activate
		display alert "Error: Handler \"regexReplace\"" message error_message as warning
		error number -128
	end try
end regexReplace

thank you @pete31

unfortunately all the script does is to add my name to the prefix (I changed my name to John Smith as you suggested)

Before

After

1 Highlight John Smith 2020-11-20, 17:18:10

There’s nothing in the script that could do this.

You don’t get the error dialog anymore, right?

I really don’t understand what is going on. Now since 5 minutes ago, DT is adding my name to the prefix. I am not on drugs and not completely crazy. I have no clue what is going on. I have multiple proofs that the name was not there before. Perhaps another PDF ?
I will retest your initial solution.
you are very patient !

You added an authorname in Preferences Edit > Authorname. Or what did you do here:

No, I did not change any configuration. The name change was simply when I pasted a prefix.

I doubt that’s possible. If you’ve copied annotations in DEVONthink 3.6 before and they had no authorname then there’s no way I can think of that annotations now include an authorname without changing DEVONthink preferences.

this is what script debugger says

I’m new to AppleScriptObjC so I’m not sure why this happens, however it should be possible to compile by adding a space on a blank line and then trying to compile again, If that doesn’t work I’m out. The script works over here.

1 Like

I am working on it and will get back to you. I have some ideas.

I now understand. When the annotations are created with the native DT PDF viewer, the author name is included. When the PDF is annotated with the open in external app, in this case PDF Expert, the name is absent.
Thank you SO MUCH for your help.
I will make the changes using the regex 101 site.