The following AppleScript is one I found via this article for use in parsing Kindle’s “My Clippings” text file, which contains highlights, notes, etc…:
tell application "DEVONthink Pro"
set theSelection to the selection
if theSelection is {} then error "Please select some contents."
display dialog "Enter the desired text delimiter (or nothing to break at each paragraph):" default answer "" buttons {"OK"} default button 1
set SplitPointRegEx to text returned of the result
if SplitPointRegEx is equal to "" then set SplitPointRegEx to ASCII character 10
set OldDelimiters to AppleScript's text item delimiters
repeat with CurrentItem in theSelection
set AppleScript's text item delimiters to SplitPointRegEx
set theSource to the plain text of CurrentItem
set RepeatCount to 0 as integer
set TotalCount to (count each text item of theSource) as integer
repeat until RepeatCount is equal to TotalCount
set RepeatCount to RepeatCount + 1
set CurrentText to (text item RepeatCount of theSource)
if length of CurrentText is greater than 0 then
create record with {name:CurrentText, type:txt, plain text:CurrentText}
end if
end repeat
end repeat
set AppleScript's text item delimiters to OldDelimiters
end tell
The problem with the code is, as you can see, its naming ability. Filenames are enormous (proportional to the size of the highlighted quote, for instance). I know absolutely no Applescript and am wondering if one of you kind people might be able to easily alter the script to name the txt file based on only the first two lines of the “CurrentText” object?
For instance, the highlight
Annals Of the Former World (John McPhee)
- Your Highlight on page 20 | Location 297-299 | Added on Saturday, September 20, 2014 1:41:46 PM
There seemed, indeed, to be more than a little of the humanities in this subject. Geologists communicated in English; and they could name things in a manner that sent shivers through the bones.
would return the filename:
Annals Of the Former World (John McPhee) - Your Highlight on page 20 | Location 297-299 | Added on Saturday, September 20, 2014 1:41:46 PM.txt
tell application id "DNtp"
set theSelection to the selection
if theSelection is {} then error "Please select some contents."
display dialog "Enter the desired text delimiter (or nothing to break at each paragraph):" default answer "" buttons {"OK"} default button 1
set SplitPointRegEx to text returned of the result
if SplitPointRegEx is equal to "" then set SplitPointRegEx to ASCII character 10
set OldDelimiters to AppleScript's text item delimiters
repeat with CurrentItem in theSelection
set AppleScript's text item delimiters to SplitPointRegEx
set theSource to the plain text of CurrentItem
set RepeatCount to 0 as integer
set TotalCount to (count each text item of theSource) as integer
repeat until RepeatCount is equal to TotalCount
set RepeatCount to RepeatCount + 1
set CurrentText to (text item RepeatCount of theSource)
if length of CurrentText is greater than 0 then
if length of CurrentText is less than 20 then -- set "20" to whatever you want as max length
set theTitle to CurrentText
else
set theTitle to texts 1 thru 20 of CurrentText
end if
create record with {name:theTitle, type:txt, plain text:CurrentText}
end if
end repeat
end repeat
set AppleScript's text item delimiters to OldDelimiters
end tell
The problem I have with this script is that it creates a separate file for each line in the source file – specifically, each block of text terminated by /n (ASCII 10). Perhaps that’s just the case with the sample you posted?
I tried changing the max length to “255” which I believe is the character limit for filenames in OSX so that each file would have a unique name. Unfortunately the clip I provided above resulted in a file with the title “Annals of the Form”, and so did all the other annotations from that book.
I wonder if the problem is that each individual text file produced by the script contains a blank first line? Perhaps the blank line is being counted as a certain number of characters?
if length of CurrentText is less than 255 then -- set "20" to whatever you want as max
I altered it to the max length to see how long a filename would have to be to ensure its uniqueness, and for some reason it didn’t alter the length of the filename at all.