Script for replacing em dash with two hyphens

I need help replacing em dash (—) with two hyphens (–) over 21,000 + notes. Any help at all would be greatly appreciated. Thanks!

This is indeed tricky but these threads might be useful:

viewtopic.php?f=20&t=7664
viewtopic.php?f=20&t=11221

Personally, I would probably drop in to the UNIX world to do that. But that certainly isn’t for everyone. Something like this from the terminal:

Script removed because it could modify binary files if the user is not careful.  See next post instead for a safer version.

Naturally, make sure you have a backup first.

I really should have added a few caveats to that script. Don’t let it work on binary files like pictures or PDFs. Obviously you need to change the “/path/to/your/DEVONthink/database.dtBase2” part to match your database location, but you do want to keep the “/Files.noindex” at the end.

This version will only modify txt and rtf files and so should be safe.

cd /path/to/your/DEVONthink/database.dtBase2/Files.noindex
find . -type f \( -name '*.txt' -o -name '*.rtf' \) -print0 \
| xargs -0 grep -l '—' \
| while read file
do
        echo "Fixing:  ${file##*/}"
        sed 's/—/--/g' "${file}" > /tmp/new
        touch -r "${file}" /tmp/new
        mv /tmp/new "${file}"
done

If you just want to see a list of files that have the em dash without actually changing anything, try this.

cd /path/to/your/DEVONthink/database.dtBase2/Files.noindex
find . -type f \( -name '*.txt' -o -name '*.rtf' \) -print0 \
| xargs -0 grep -l '—'

Is there a purpose for the “rm …” after the “mv …”? That’ll generate “rm: /tmp/new: No such file or directory” errors, unless it’s “rm -f …”.

Doh! you’re right. The rm is unnecessary. I’ll edit the post.

Your script’s ${file##*/} usage inspired me to look up ${parameter##word} expansion in the bash man page and realize several of mine could be using that instead of running a basename process. :slight_smile:

Thanks for your help. I’ll try to digest this and let you know what happens.