Find (search) in RTF and RTFD documents

How do you find Returns, Line Feeds (Control+Returns), and Tabs when editing a RTF or RTFD document?

I continually am using an external editor for simple find (search and replace) operations.

By copying the feed/return/Tab character and pasting it into the panel.

Or possibly add those characters to the Favorites tab of Character Palette and insert them from there.

Returns copied and pasted are not reliable. Some may be found, some not, sometimes none at all. Tabs and Line Feeds (Control+Returns) never work.

Copy the entire RTF or RTFD document to another editor and these exact same find (search and replace) operations do work as expected when copied and pasted into the find and replace dialog.

I don’t believe any invisible characters are in the character palette.

View “code tables.” 0x09, 0x0A, 0x0D and so on…

C

I don’t believe any characters aren’t in Character Palette. :slight_smile:

Thanks for the followup.

Thanks for the suggestion. Now I get it. Those “invisible” characters are in the character palette, they are just not displayed, not visible! 0009 Tab, 000A LF, 000D Return, and 0020 Space.

Take one of those not-displayed invisible characters, copy, and paste it into the DTPO Find dialog and those characters are still not found in a RTF or RTFD document. And that was my question to start with.

Works like a charm here. The one thing I am having trouble with is the “Find in Database” choice, which seems to be doing nothing. (ie no dialog appears) Of course, I may have no idea of how to use it.

Best, Charles

OK, I see that “Find in Database…” just shifts focus to the find input area on the current database window. That works fine here.

C

To clarify. The Find in a DTPO RTF or RTFD document just does not work reliably or as expected. Especially on imported documents not created in DTPO.

Also, I was expecting the LF 000A to be what you get from a Control+Return (line spacing no paragraph spacing) which may not be the case. Turn on Show Invisible Characters and they look exactly like a Return (although without paragraph spacing).

Paste a LF 000A from the Character Palette into the Find dialog and Returns are found no LFs. Paste a CR 000D and nothing is found. Paste a Tab 0009 and they are all found.

Copy a LF from one (Control+Return) in the document, paste it into the Find dialog and LFs are found no Returns. Paste a CR and they are all found. Paste a Tab and they are all found.

Okay. Tabs do work as expected.

Bringing the entire document from DTPO to an external RTF editor exhibits none of these glitches. The Control+Returns that did look like Returns in DTPO now appear as LFs or whatever they are (line spacing no paragraph spacing) in the other RTF editor.

OK, like from where? Can you post an example?

This might be your problem. First of all. where is all this Ctrl-Ret stuff coming from? Does DT offer this as a formatting option?

If you look at this dump of an RTF, the effect of Ctrl-Ret is to insert a Unicode UTF-8 escape sequence (\uc0\u8232), most likely the keycode for whatever Ctrl-Ret is:

\f0\fs24 \cf0 here's a line\
another line\
here's a line with ctrl-ret
\f1 \uc0\u8232 
\f0 another line\
another line\

OSX uses 0x0a (LF) as it’s line ending a la Unix. pre-OSX used 0x0d, and Windows uses 0x0d,0x0a. So if you swap your understanding of what constitutes EOL, your report above makes perfect sense: 0x0a finds line endings, 0x0d finds nothing (because OSX doesn’t use them) and tabs 0x09 are OK.

I don’t really understand what you’re saying here, but maybe I’ve answered your question.

Best, Charles

The raw RTF example above was produced in DT, but I checked TextEdit and got the same thing. This probably isn’t a surprise, because likely DT uses the OSX RTF services, same as TE.

I don’t use M$ Word but NeoOffice has a completely different approach:

\ltrpar\s1\cf0{\*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\rtlch\af4\afs24\lang255\ltrch\dbch\af3\langfe255\hich\f0\fs24\lang1033\loch\f0\fs24\lang1033 {\rtlch \ltrch\loch\f0\fs24\lang1033\i0\b0 Here's a line}
\par \pard\plain \ltrpar\s1\cf0{\*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\rtlch\af4\afs24\lang255\ltrch\dbch\af3\langfe255\hich\f0\fs24\lang1033\loch\f0\fs24\lang1033 {\rtlch \ltrch\loch\f0\fs24\lang1033\i0\b0 Here's a line with ctrl-ret\line Here's a line}
\par \pard\plain \ltrpar\s1\cf0{\*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\rtlch\af4\afs24\lang255\ltrch\dbch\af3\langfe255\hich\f0\fs24\lang1033\loch\f0\fs24\lang1033 {\rtlch \ltrch\loch\f0\fs24\lang1033\i0\b0 Here's a line}
\par \pard\plain \ltrpar\s1\cf0{\*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\rtlch\af4\afs24\lang255\ltrch\dbch\af3\langfe255\hich\f0\fs24\lang1033\loch\f0\fs24\lang1033 

You’ll see that the Ctrl-Ret is coded as \line within a \par(d), where the other lines are treated as \par(d). This is straight out of the M$ RTF specification.

OSX handles things differently because it lives in a Unicode world, and RTF only addressed a 7-bit ASCII encoding.

C

Charles, thanks for your followup and investigation of what’s happening, but this has gone way beyond my original question.

I manually enter Control+Returns in DTPO documents to inhibit paragraph spacing. Typically where spacing is added after paragraphs and is temporarily not wanted.

Control+Returns often come in content from websites. Particularly blogs.

All I want is to reliably find Returns and Control+Returns in DTPO RTF and RTFD documents.

Checking deeper I think my problems start when two or more Returns and/or Control+Returns occur together in documents. Maybe even mixed in with Tabs. Not uncommon in content taken from websites. You cannot distinguish the difference between the two types of Returns when they are occur in DTPO. And, then being forced to copy to get the target for a Find adds to the uncertainty of what is being found.

I kinda get the Character Palette OSX Unicode thing, but think practically it boils down to only copy something from within a DTPO document to start a Find within a DTPO document.

Whatever occurs that makes it awkward, it is more practical and more reliable to copy the entire RTF or RTFD document to an external editor. One where you can see the difference between Line Breaks, Paragraph Breaks, and Page Breaks, and then search (find) them. That capability should be part of DTPO.

Suggestion: There are some simple RTF text editors that provide some features not available in TextEdit.

One of them is Bean, for example. It’s hard to beat the price of Bean for simple text editing.

Select the RTF document in your DEVONthink database that you want to format and choose the contextual menu option, Open With. I’ve got Bean in the Applications folder, and it shows at the top of the list of alternative apps to open the file (which is convenient). So open your text file under Bean. NOTE: If you don’t use Smart Quotes, for example, you may want to set up the preferences in Bean to your taste.

In Bean, select View > Show Invisibles. Also note that Bean integrates with the DEVONtechnologies WordService Services.

When finished, press Save.

That’s probably more convenient than copy/paste back and forth.

If you need to print a DT database text document, open it in Bean. You can control margins and add header/footer. Bean doesn’t do footnotes, but is handy to have around. There are other such Mac text editors, some of which can even add footnotes – for free. Just remember that if you have set up a fancy RTF document in an external text editor, save it back to your database and then edit it again with the internal text editor, all the fancy stuff will be lost. (If you want to avoid inadvertent editing of that document within the database, just Lock that document. A bonus of locking a text document is that when reading it, the Space bar will page down through a long document a screen page at a time.)

Thanks for the comments and space bar tip, Bill. I do primarily use Bean as you describe. And, without fancy formatting. Most of the source material is taken from websites and is not paginated.

It’s just that I don’t like the intermediate step of using an external editor for relatively simple search (find) and replaces that should be adequately handled in DTPO. Also, not being able to directly enter Tabs, Control+Returns and Returns in the Find dialog, plus not being able to differentiate invisible characters for a Return and Control+Return.