I’m currently working on a bunch (a WHOLE LOT) of PDF’s. I want to first convert them all to Rich text, and then do lots of find and replaces through the database. Is there an easy way to do this type of thing that I am missing?
Thanks for the help
I’m currently working on a bunch (a WHOLE LOT) of PDF’s. I want to first convert them all to Rich text, and then do lots of find and replaces through the database. Is there an easy way to do this type of thing that I am missing?
Thanks for the help
Do you want to use regular expressions (powerful) or just conventional find & replace (simple)?
Like, do you want to change all words beginning with “cat” except “catherine” to words beginning with “dog,” or do you just want to change all "cat"s to "dog"s?
It would be simple find and replace, mostly to fix the problems inherent in making rich text from PDFs: space+paragraph or paragraph+ space and changing them to just a space. Its not a huge change, but we’re talking a couple thousand documents.
Danny
That makes it a lot easier
Just select a bunch of RTF files and run it. It should be pretty quick.
The code would be something like:
tell application "DEVONthink Pro"
set theFindString to "a" -- Whatever you're looking for
set theReplaceString to "b" -- Whatever you're replacing it with
-- Don't modify any of this stuff:
set theOldDelimiters to AppleScript's text item delimiters
set theSelection to the selection
repeat with thisSelection in theSelection
set theRichText to the rich text of thisSelection
set AppleScript's text item delimiters to theFindString
set theTextItems to the text items of theRichText
set AppleScript's text item delimiters to theReplaceString
tell theTextItems to set theNewText to beginning & theReplaceString & rest
set the rich text of thisSelection to theNewText
end repeat
set AppleScript's text item delimiters to theOldDelimiters
end tell
The problem is that it annihilates your formatting. Basically, everything ends up looking like you converted it to plain text and then back to rich text. I’m not sure how to get around that. There might be a way to do it by checking attribute runs or something like that, but I don’t really understand it.
Edit: I poked around a little bit. There’s no real way to do this and preserve the formatting, so far as I can tell. As soon as DEVONthink gets the rich text of something, it becomes just a dumb string with no formatting information.
It’s not a problem with DEVONthink but rather TextEdit, and it’s not a problem with TextEdit but a (generally good) design decision. It’d be far more difficult to manipulate the string with rich text instructions.
hmm, I don’t think I’d want to do that. It has unicode Greek and I always become wary of moving to plain text - it seems something always changes.
Thanks for your help.
Hark! There is a way. Download this script: danshockley.com/downloads/Rt … 8.scpt.zip
It would be hellishly slow, though. It also requires some changes – basically just needs to be looped and have the dialogs removed, I think. Looking at my script and his you can probably figure it out, or I can help if you want.
Hmm. Something goes wrong. It seems to be rather selective about what it preserves and what it doesn’t :-/