Mass editing advice

I’m currently working on a bunch (a WHOLE LOT) of PDF’s. I want to first convert them all to Rich text, and then do lots of find and replaces through the database. Is there an easy way to do this type of thing that I am missing?

Thanks for the help

Do you want to use regular expressions (powerful) or just conventional find & replace (simple)?

Like, do you want to change all words beginning with “cat” except “catherine” to words beginning with “dog,” or do you just want to change all "cat"s to "dog"s?

It would be simple find and replace, mostly to fix the problems inherent in making rich text from PDFs: space+paragraph or paragraph+ space and changing them to just a space. Its not a huge change, but we’re talking a couple thousand documents.


That makes it a lot easier :slight_smile:

Just select a bunch of RTF files and run it. It should be pretty quick.

The code would be something like:

tell application "DEVONthink Pro"
	set theFindString to "a" -- Whatever you're looking for
	set theReplaceString to "b" -- Whatever you're replacing it with
	-- Don't modify any of this stuff:
	set theOldDelimiters to AppleScript's text item delimiters
	set theSelection to the selection
	repeat with thisSelection in theSelection
		set theRichText to the rich text of thisSelection
		set AppleScript's text item delimiters to theFindString
		set theTextItems to the text items of theRichText
		set AppleScript's text item delimiters to theReplaceString
		tell theTextItems to set theNewText to beginning & theReplaceString & rest
		set the rich text of thisSelection to theNewText
	end repeat
	set AppleScript's text item delimiters to theOldDelimiters
end tell

The problem is that it annihilates your formatting. Basically, everything ends up looking like you converted it to plain text and then back to rich text. I’m not sure how to get around that. There might be a way to do it by checking attribute runs or something like that, but I don’t really understand it.

Edit: I poked around a little bit. There’s no real way to do this and preserve the formatting, so far as I can tell. As soon as DEVONthink gets the rich text of something, it becomes just a dumb string with no formatting information.

It’s not a problem with DEVONthink but rather TextEdit, and it’s not a problem with TextEdit but a (generally good) design decision. It’d be far more difficult to manipulate the string with rich text instructions.

hmm, I don’t think I’d want to do that. It has unicode Greek and I always become wary of moving to plain text - it seems something always changes.

Thanks for your help.

Hark! There is a way. Download this script: …

It would be hellishly slow, though. It also requires some changes – basically just needs to be looped and have the dialogs removed, I think. Looking at my script and his you can probably figure it out, or I can help if you want.

Hmm. Something goes wrong. It seems to be rather selective about what it preserves and what it doesn’t :-/