Searching for punctuation characters

bangersandmash · January 27, 2008, 8:43pm

Is there any way to search for occurrences of punctuation in devonthink? I have several thousand documents that contain strings of text that start and end with {{ and }}. I’d like to be able to search for these braces but cannot figure out how to do so. Any ideas?

thanks

bangersandmash

Bill_DeVille · January 28, 2008, 12:05am

DEVONthink builds a Concordance of the text strings contained in a database. The default is to include alphanumeric strings ranging from 3 to 50 characters in length. Characters such as punctuation marks, parentheses, brackets, etc. are not included in the Concordance. For the sake of convenience, we’ll call all of the strings indexed in the Concordance “words”.

My main database contains 472,370 unique words in the Concordance. As most of the content of my database is in English, the most common word is “the” – it occurs 1,574,775 times, and appears in documents in 358 groups within the database. The word “The” is the 5th most common word in my database, appearing 217.452 times in documents contained within 338 groups. Although I didn’t check, I’m sure that the word “THE” is listed also.

So DEVONthink “knows” every word used in the database, which documents contain them, and the groups in which those documents are contained. DEVONthink can also analyze contextual relationships among words, in each document and across the entire database, which is the basis for artificial intelligence features such as Classify and See Also, as well as for rankings assigned to search results.

The Concordance indexing is used by Search. As might be expected, an Exact search for “The” will be a bit faster than a No Case search for “the”.

But Search won’t find “{{” or “}}” (or strings preceded or followed by those characters). They are ignored in the indexing of text used to construct the Concordance, just as commas, periods and parentheses are ignored.

You can, once a document is opened, use Find to search that document for “{{”.

Can you use Spotlight for such a search? The problem is that “{{” can occur in the code of PDFs and images.

Perhaps the best approach would be to export the database contents using Scripts > Export > Daily Backup and then use an external text editor capable of searching for any character in a “bulk” search.

ndouglas · January 30, 2008, 2:59am

I’ve had a similar problem, Bangersandmash. I like using “[tag] title” as the titles of my documents. Searching for brackets, even as an exact phrase in titles, doesn’t yield the expected results – so I pretty much just use the above convention as a way of sorting now.

I would like it if, say, an ability to define certain strings as delimiters/markers/whatever were added to DEVONthink’s database. Of course, considering all of the improvements that I’m hoping to see soon from DEVONthink, this is fairly low on the list.

Bill, how is the punctuation “sorted out”? If it’s done by a value by value method, like, “ignore ‘{’ and ‘[’ and ‘.’” and so forth, perhaps there’s a value in the Unicode or other text encodings that does not count as punctuation but is clearly distinguishable from normal text. Something like ø or something, which is generally not used in English.

checks ø (option + o) seems to work in DT’s search. It’s my guess that other characters would work too. ≤ and ≥? Maybe that will help. It’s not any more keystrokes than a {, too.

cgrunenberg · February 11, 2008, 2:39pm

The wildcard search is an exact string search, therefore you could use wildcards like {{…}} but this is of course slower than phrase searching.

andrew_dotdot · March 7, 2008, 11:18am

I have databases full of programming code that I work with and the documentation files that I use for the language. It allows me to find both the documented use and examples of usage from the codebase when I’m trying to figure out how to do things.

I’m really suffering not being able to rsearch for the difference between TODAY and TODAY- for example. Since the code base is in hundreds of small script files, it’s not really possible to search documents individually, as has been suggested.

I know I’m using DevonThink in an unconventional way, maybe, but it’s a real frustration. Is there no way to add the ability to search for TODAY^^- , for example, where each “punctuation character” is preceded by a certain flag character (^ in this case) to say “please don’t ignore me!” ?

:A)>

cgrunenberg · March 7, 2008, 1:38pm

Why don’t you use the wildcard *TODAY*, see above?

bangersandmash · March 19, 2008, 12:28pm

Thank you! This is exactly what I needed. Much appreciated.

bangersandmash

andrew_dotdot · March 19, 2008, 4:16pm

This notion of wildcards used like “*TODAY*” with the asterisks enclosing that which you want to search in seems odd to me. Where does this come from?

I’m used to “b*r” gives you bar, beer, boar, badger, breeder, etc… and “b$$r” gives you bear, beer, boar, etc.

bangersandmash · March 20, 2008, 5:01pm

Andrew,

I’m just taking a swing in the dark here, but most searching systems that I have encountered (usually the large scale bibliographic type like Medline and Web of Science) can only match wild-cards when they have at least one letter to start with. This means that they are useful for missing letters in the middle and at the end of words but they are unable to cope with missing letters at the beginning of words. They are unable, for example, to find every occurance of “biot,” only words that start with “biot” (biot* will get you biotic, biotoxin, etc)

Devonthink seems to be able to handle missing letters at the beginning of words which gives it the ability to pick up prefixes as well as situations where your search term is part of a compound word. In the case of searching for “biot,” you could use devonthink to search for biot and pick up abiotic as well as biotic and biotoxin. This is what the asterisk in front is for. You can, of course search using wild-cards in the way that you’re used to… TODAY*

However, it looks like you can still only use wild-cards to search on filenames.

Of course, that’s just a guess based on 10 seconds of testing. I’m sure someone wiser will correct me if I’m out to lunch.