Split a document using delimiters?

I have several very long documents divided into short notes by a delimiter, such as: ^^^. I’d like to be able to split the document at each delimiter (it would be nice to have the delimiter deleted at the same time). Is there any way to do this in DT3?

A couple of years ago I used Tinderbox’s Explode command. I added a delimiter at the end of each paragraph of a long Word doc with “find and replace” and then imported it into TBX. Now I find TBX to be pretty challenging, but without doubt, Explode was the easiest “slice and dice” in their armory.

Is there any similarly easy way to accomplishing this in DT3? If not, could it be added as a feature request? My sense is that many researchers also work with delimited text – paragraphs, subsections, sections – where such a feature would be appreciated if it’s not yet available.

Thanks,
Linn

1 Like

Tinderbox Explode uses regex, without escaping ^^^ as \^\^\^ it won’t work.

DEVONthink has no built-in explode command but this is a perfect case for AppleScript's text item delimiters.

This script creates new records (and a version of the source record without delimiters). Don’t know how you’ve set your delimiters so maybe you have to change theDelimiter.

Edit: This script handles only plain text - if you want to split RTF(D) text use this instead.

-- Explode text into new text records (and create version of source text without delimiters)

property theDelimiter : linefeed & linefeed & "^^^" & linefeed & linefeed

tell application id "DNtp"
	try
		set windowClass to class of window 1
		if {viewer window, search window} contains windowClass then
			set currentRecord_s to selection of window 1
		else if windowClass = document window then
			set currentRecord_s to content record of window 1 as list
		end if
		
		set theRecord to item 1 of currentRecord_s
		set theText to plain text of theRecord
		
		set d to AppleScript's text item delimiters
		set AppleScript's text item delimiters to theDelimiter
		set TextItems to text items of theText
		set AppleScript's text item delimiters to d -- always set them back
		
		set theGroup to (parent 1 of theRecord)
		
		repeat with thisTextItem in TextItems
			repeat with thisParagraph in (paragraphs of thisTextItem)
				if thisParagraph ≠ "" then
					set theName to thisParagraph
				end if
				exit repeat
			end repeat
			
			set thisRecord to create record with {name:theName, plain text:thisTextItem, type:text} in theGroup
		end repeat
		
		set theTextWithoutDelimiters to my string_From_List(TextItems, linefeed)
		set recordWithoutDelimiters to create record with {name:(name of theRecord & " (without Delimiters)"), plain text:theTextWithoutDelimiters, type:text} in theGroup
		
		
	on error error_message number error_number
		if the error_number is not -128 then display alert "DEVONthink" message error_message as warning
		return
	end try
end tell


on string_From_List(theList, theDelimiter)
	set theString to ""
	set theCount to 0
	
	repeat with thisItem in theList
		set theCount to theCount + 1
		set thisItem to thisItem as string
		if theCount ≠ (count of theList) then
			set theString to theString & thisItem & theDelimiter
		else
			set theString to theString & thisItem
		end if
	end repeat
	
	return theString
end string_From_List

This script is a good, if not the best example, why I probably never will understand why some people are so fond of Apple Script. Other programming languages have a simple Split command with two variables: the source text and the delimiter. One line of code and you got an array of items.

*

The script above handles only plain text which might be a problem.

*

Linn, you might do a search for “Kindle” or “Kindle Clippings” in this forum. Amazon’s Kindle creates a txt file out of highlighted text snippets and comments and divides them with delimiters. And because of that there are a number of topics here about splitting such files into single items which might be useful for you too. (But again: They handle just plain text.)

*

Another recommendation: Leave Word if you can and switch to Scrivener. Not only is it the writing tool for anything more complex than just a letter and has a focus on writers with lots of research materials.

Also Scrivener’s ‘chapters’ (or whatever the single parts of the document are) by nature are single files and therefore can be easily exported to, say, DEVONthink. For the opposite direction of moving in and splitting a document that already has delimiters Scrivener has a “Import and Split” feature which does exactly what its names says, and it is simple to use. Both directions do work with rich text too.

By the way, I am not suggesting to replace DEVONthink by Scrivener. While there is some overlap in functions they are two different kinds of beasts that complement each other really well.

Because it works to customize off-shelf software! That’s a huge plus.

1 Like

I ran this script about one year ago for a summary of about 500 abstracts . Just tried it with a small test file and it still works for rtf and text file - except for images in the file.

But need to change the tell statement to


tell application id "DNtp"

Now there’s one that preserves images

Many thanks for the help with the delimiter problem. I learned more than I anticipated from your replies. I’m not a programmer, so @pete31’s short Applescript was interesting for learning how that problem could be solved in code. The back and forth with @ngan encourages me to give it a try next time.

@suavito, thanks for the suggestion to search here for Kindle clippings. It hadn’t occurred to me to export them into DT3 – it will encourage me to do more professional reading in Kindle. As for Scrivener, I use it until I get what I’m really writing about, and then finish off in Word, which is also required by many publishers. I knew about making a split or two in a Scrivener document, but not about the Import and Split – which immediately solved the problem and allowed me to pull out the pieces that I needed in DT3. Thanks!

It wasn’t really solved as this short script above only supports plain text.

Meanwhile there’s a script that splits RTF(D).

Thanks. I didn’t realize that there could be such a difference between the two.

In case you’re trying to run the script I posted in this thread you’ll find that it doesn’t work in DEVONthink 3.6. That’s due to DEVONthink’s new handling of “invalide arguments”.

After the release of DEVONthink 3 I decided to continue to use “search window” in scripts so that DEVONthink 2 users could use them in, well, search windows. With version 3.6 that’s not possible anymore.

If you want to use the script you’ll have to replace this voluminous block …

set windowClass to class of window 1
if {viewer window, search window} contains windowClass then
	set currentRecord_s to selection of window 1
else if windowClass = document window then
	set currentRecord_s to content record of window 1 as list
end if

… with this neat line …

set currentRecord_s to selected records

… which does what the six lines have done. Wow, that’s great! :smiley: