Script to split doc to RTFs in DT according to outline/style

nickharambee · November 3, 2010, 8:31am

Hi

I would like to split book notes by creating separate RTFs for each paragraph from the book notes in DT. The first way I thought of doing this was to do it by outline in an MS Word document. The document would have an outline of 3 levels and then body text. I would then want to split this into folders and files according to the outline as follows:

Level 1, Level 2 and Level 3 become a group hierarchy in DT with the name of the group matching the text in the Levels. Then each section of body text under Level 3 (separated by bullet points), becomes an RTF file in the group for Line 3.

Additionally, at the end of each section of text would be a text delimiter and then a small amount of additional text which would be used to name the RTF files.

I then had a leaning towards using TextEdit, as I like to avoid using MS Word whenever possible. Of course TextEdit doesn’t have an outline view, so I would have to use some other method of knowing where to split the document.

I imagine that the simplest method would be with styles. So I have one style for Parts of the Book, one for Chapters and one for Sections. These would then become groups in DT.

I am wondering if anyone has tried to do anything similar and whether there is a similar script knocking about to get me started.

Thanks

Nick

nickharambee · November 4, 2010, 10:41pm

Just to say that I have managed to put together a script that pretty much does what I want it to do, using just DT, and various text delimiters:

tell application "DEVONthink Pro"
	set theseItems to the selection
	repeat with thisItem in theseItems
		set bookName to (name of thisItem as string)
		set authorName to texts 1 thru ((offset of "-" in bookName) - 2) of bookName
		set titleName to texts ((offset of "-" in bookName) + 2) thru -1 of bookName
		set authorLoc to create location "Sync/psychotherapy/book notes/" & authorName in database "nick"
		set titleLoc to create location "Sync/psychotherapy/book notes/" & authorName & "/" & titleName in database "nick"
		
		set theText to text of window 1
		set sectionNo to 1
		set thisPart to ""
		set thisChapter to ""
		activate
		repeat with j from 1 to (count paragraphs in theText)
			set theParagraph to paragraph j of theText
			set paraText to text of theParagraph
			set thisTag to ""
			if theParagraph begins with "*** " then
				set thisPart to texts 5 thru -1 of paraText
				create location "Sync/psychotherapy/book notes/" & authorName & "/" & titleName & "/" & thisPart in database "nick"
			else if theParagraph begins with "** " then
				set sectionNo to 1
				set thisChapter to texts 4 thru -1 of paraText
				create location "Sync/psychotherapy/book notes/" & authorName & "/" & titleName & "/" & thisPart & "/" & thisChapter in database "nick"
			else if theParagraph begins with "* " then
				if thisPart is "" and thisChapter is "" then
					set thisSection to sectionNo & ". " & texts 3 thru -1 of paraText
					set x to create location "Sync/psychotherapy/book notes/" & authorName & "/" & titleName & "/" & thisSection in database "nick"
					set sectionNo to (sectionNo + 1)
				else
					set thisSection to sectionNo & ". " & texts 3 thru -1 of paraText
					set x to create location "Sync/psychotherapy/book notes/" & authorName & "/" & titleName & "/" & thisPart & "/" & thisChapter & "/" & thisSection in database "nick"
					set sectionNo to (sectionNo + 1)
				end if
			else if theParagraph is not "" then
				if paraText contains "~" then
					set thisName to texts ((offset of "~" in paraText) + 1) thru ((offset of "*" in paraText) - 2) of paraText
				else
					set thisName to texts 1 thru ((offset of ":" in paraText) - 1) of paraText
				end if
				if paraText starts with "0" then
					set thisText to texts ((offset of ":" in paraText) + 2) thru ((offset of "*" in paraText) - 2) of paraText & return & return & texts (offset of "{" in bookName) thru ((offset of "}" in bookName) - 1) of bookName & "@" & texts ((offset of "/" in paraText) + 1) thru ((offset of ":" in paraText) - 1) of paraText & "}"
				else if texts 1 thru 4 of paraText contains "/" then
					set thisText to texts ((offset of ":" in paraText) + 2) thru ((offset of "*" in paraText) - 2) of paraText & return & return & texts (offset of "{" in bookName) thru ((offset of "}" in bookName) - 1) of bookName & "@" & texts 1 thru ((offset of "/" in paraText) - 1) of paraText & "}"
				else
					set thisText to texts ((offset of ":" in paraText) + 2) thru ((offset of "*" in paraText) - 2) of paraText & return & return & texts (offset of "{" in bookName) thru ((offset of "}" in bookName) - 1) of bookName & "@" & texts 1 thru ((offset of ":" in paraText) - 1) of paraText & "}"
				end if
				if paraText contains "*" then
					set thisTag to texts ((offset of "*" in paraText) + 1) thru -1 of paraText
				end if
				set thisPage to texts 1 thru ((offset of ":" in paraText) - 1) of paraText
				create record with {name:thisName, type:rtf, plain text:thisText, comment:thisPage, tags:thisTag} in x
			end if
			
		end repeat
		set y to create location "Sync/psychotherapy/book notes/" & authorName & "/" & titleName in database "nick"
		move record thisItem to y
		set name of thisItem to "Full Text"
	end repeat
end tell

It processes a book’s worth of notes in just a couple of seconds, so that is pleasing, adding tags to the new records as well. It creates a group hierarchy in DT that looks like this:

If anyone would like to do something similar and wants to know more about the delimiters/format I am using then I’d be happy to let you know.

There are a few things that I haven’t worked out how to do yet though and I wonder if someone could help me out:

1. How to specify whether groups created are excluded from tagging or not.
2. How to refer to a single record. I will only be processing one file/record at a time, but couldn’t work out how to refer to just one record, so have wrapped the script in a “set theseItems to the selection/repeat with thisItem in theseItems” argument, as this is a method I am familiar with. Similarly to define the text in the current document I have used the argument: “set theText to text of window 1” when perhaps there is a way of referring to the text in the document rather than the window.
3. I am wondering if it is possible to determine the type of paragraph by font attributes rather than characters, e.g. “if font of theParagraph is bold then”. When I tried this I got an error message stating something like “can’t get font of the paragraph”.
4. The script is sometimes returning an error on attempting to rename the original file after it has been moved (see end of script): “error “DEVONthink Pro got an error: Can’t set content id 100608 of database id 1 to “Full Text”.” number -10006 from content id 100608 of database id 1”

In time I want to adapt the script so that it can work out from the notes how many levels there are in the hierarchy for a particular book, so that the one script will work well with any book structure.

Nick