Search (boolean) in DTPO 2

jdean · December 18, 2008, 9:07pm

Question: how do I a search for a term like “Carol” to be distinquished from “carol”? It was possible to do this in previous version. In general: would it be nice to have a guide for boolean search. THX

Greg_Jones · December 18, 2008, 9:38pm

Do a search on “boolean” from within DTPO2 Help menu, and you’ll get a full list of the boolean operators. Also, you’ll note that searches in DTPO2 are case-insensitive, unlike (as you mentioned) how searched performed in earlier versions.

Bill_DeVille · December 18, 2008, 10:02pm

Searches in DEVONthink 2, like searches in DEVONagent, are case-independent. There’s no difference between “Carol” and “carol”.

But you might try a trick like this, to avoid listing documents that refer only to “Christmas carol”.

carol NOT (carol NEAR/2 christmas)

In English, that means "find documents that contain the term “carol” but not if “carol” is within 2 words of “christmas”.

I just ran that search expression in one of my databases. Got 170 results.

If I do the search just for “carol” I get 172 results.

But if I search for “carol” AND “christmas” I get 7 results. That tells me that there are two documents that contain “carol” within two words of “christmas”, but 5 more that contain the two terms more widely separated than 2 words.

Yes, there are times I wish for case-sensitive searching. But the new search operators and syntax are so much more powerful (and faster) than in DEVONthink 1, that I’m still happy.

Use quotation marks to denote an exact string. You can write a query such as:

(“carol smith” OR “carol jones” OR “mrs. thomas jones”) “mercedes 300sl”

if you have a database of car owners and want to check under both her single and married names, and know that she owns a Mercedes 300SL. Of course, you may well have still more information to use in confirming the search results. In English, your query means, look for this person under any of three names, who owns a Mercedes 300SL. You couldn’t directly write such a query in DEVONthink 1.

Here are Christian’s search tips:

Search tips:
- Syntax of operators is compatible to DEVONagent/EasyFind, Finder/Spotlight , common search engines and common languages (C/C++/Objective-C, Java, JavaScript)
- Unlimited complexity of query term, implicit/default AND operator
- Wildcards (*?) including ranges of characters ([a-b]), sets of characters ([abc…] or [a|b|c|…]) and exclusions of characters ([^…]), for example mpg[1|2|3] or mpg[^2]
Note: Ranges, sets and exclusions can be combined and the substring operator (~) of DEVONagent is supported too.
- Advanced operators (AND, OR, XOR/EOR, AFTER, BEFORE, NEXT, NEAR, {BUT|AND} NOT, NOT {NEAR|AFTER|BEFORE|NEXT}), phrases and parenthesis.
Note: Priority of operators is parenthesis > phrase/hyphens > (NOT) BEFORE/AFTER/NEAR/NEXT > NOT > AND/OR/XOR/EOR. Terms with same priority but without parenthesis are evaluated from left to right.
- Range for proximity operators can be specified by user and is unlimited (by default 10 words)
- Words concatenated by non-white separators (e.g. info@devon-technologies.com or page_id) are treated like phrases
- Words separated by hyphens are handled like word1word2 OR “word1 word2”, e.g. e-mail is equal to email OR “e mail”
- Characters separated by dots are considered to be abbreviations and therefore handled like words separated by hyphens, e.g. the term t.a.t.u is equal to “t a t u” OR tatu

jdean · December 19, 2008, 8:29am

Thank you for your reply. The option to choose between Carol and carol (case dependent search) was available in previous versions. What you suggest is too complicated. It would be nice to have back this option.
THX

Bill_DeVille · December 19, 2008, 7:19pm

As I said, there are times when case dependency would be nice.

But remember, your requested case-sensitive search could still return a document about Carol Smith and also a document about’ “A Christmas Carol”. And it would miss a document that refers to CAROL SMITH. So in DEVONthink 1, I almost always had searches set for “no case”.

Searches in DEVONthink 2 are not only much faster, they can be literally orders of magnitude more powerful in their ability to frame a desired query.

I was having a little fun in my previous post, illustrating the kinds of questions one can pose about text content in a database. But my point was to illustrate that, once one becomes familiar with the simple vocabulary of the operators and the syntax (the way a query is understood, or parsed), you can pose quite rich questions to your database.

There are some fundamental issues about why DEVONthink 2 has dropped case sensitivity. Each database builds a Concordance that contains every word (text string) in the database. The artificial intelligence features that are built into DEVONthink 2 also analyze the contextual relationships of words used in each document, and compare those relationships to every other document in the database (See Also), or to the contents of the groups in the database (Classify).

In DEVONthink 1, the Concordance contained a separate entry for each case variant of a word (the Concordance by default lists every alphanumeric string having from three to fifty characters), e.g., “Carol”, “carol”, “CAROL”, and perhaps also typos such as “cArol”. Many words had at least two case variants, simply because they were capitalized at the beginning of a sentence, or in a heading or title. That not only made the memory size of the DEVONthink 1 Concordance larger than the DEVONthink 2 Concordance, it also resulted in more work (and larger memory needs) for the AI features in DEVONthink 1.

The result is that a DEVONthink 2 database takes less memory; so, given the available RAM on a computer, one can manage a larger database and/or multiple open databases with much improved responsiveness. Memory reduction also results from a different database design, so that text, HTML and WebArchive files no longer have to be loaded into memory when a database is opened, but are stored in the internal Files.noindex folder.

AmberV · January 5, 2009, 10:02pm

Argh! Case-sensitive searches were one of the few things setting DT apart from the competition’s search engines. It is supremely maddening to be unable to find two different phrases with different case, when the difference between the two can be completely different! No amount of Boolean operators can salvage the difference between searching for {R1.2} and {r1.2}, tokens which mean something totally different in my files.

Another thing that is killing my searches is the complete ignorance of punctuation. It seems that above example internally parses to /[rR]1.?2/ in regex-speak. That can match a whole lot of stuff that it isn’t meant to match. This kind of weird example aside, there are still thousands of files in my archive differentiated by having a -i- and -I- in their filename. A lower-case ‘i’ means something different than an upper-case ‘I’, for one thing, and for another thing, I cannot even search for that string at all because hyphens are just taken to mean white-space. This means any file name that just so happens to have an ’ i ’ (of either case!) in it, also matches.

I wish there was a “slow” but literal mode, or better yet a regex mode. Something that looked for precisely what I typed in and nothing else. I don’t mind waiting 3.8 seconds instead of 2.9 seconds, really! Such a search mode wouldn’t require changing the concordance—that argument does make sense.

I haven’t used your software heavily in the past, but I am liking 2.0. It’s just frustrating that searching—nearly across the entire Mac OS X spectrum—is becoming this fuzzy, impossible to precisely control, thing.

All right, now I’m done ranting.

sjk · January 5, 2009, 10:44pm

I’m impressed you waited +2.5 years since joining the forum to make it your first post.

I also miss case-sensitive searching.

AmberV · January 5, 2009, 11:05pm

Oh dear! And my first ever post has to be a big long whine. Ha.

sjk · January 5, 2009, 11:37pm

Reasoned opinions/suggestions don’t qualify as whining, IMO.

ndouglas · January 6, 2009, 12:15am

A regex search would be amazing. As would the number of completely bewildered people on this forum.

I have many record names like these:

[Hit List] January 5, 2009
[Introduction] My thoughts on sjk

The current syntax, unless I misunderstand, makes it impossible to search for brackets. At the very least, I’d like an escape character of some sort. Full regex would be amazing.

I like the speed and size of the case-insensitive databases, and I hardly ever use things where case is important, but I think it’s generally bad to take away power… would it be possible to allow users to specify when creating a database whether they want it to be case-sensitive or case-insensitive?

AsafKeller · January 6, 2009, 1:00pm

Another strong vote of support for cases sensitive and regex searches. The loss of case-sensitivity severely and negatively affects the usability of my databases.

sjk · January 6, 2009, 3:04pm

Hmm, subtle revenge for the girl thingy teasing?

Has Dtech posted a reason for removing case-sensitive searching? Presumably it was for performance reasons though it’s hard to imagine it making such a significant difference.

mueller-scheessel · January 6, 2009, 3:10pm

If I had to choose between case-sensitiveness and boolean searches, I would opt for the second choice. However, I agree that case-sensitive searches would be very handy, especially within languages like German, where differences between upper and lower case can be very important.

Nils

ndouglas · January 6, 2009, 3:44pm

Yeah, a couple times (scroll up). The problem is not the performance of Search, it’s the Concordance/AI labor. The database is entirely case-independent right now, so the search can’t be case-dependent.

sjk · January 6, 2009, 4:05pm

Same here. And maybe case-sensitive searching was removed in v2 in order to support boolean searches rather than it being a performance issue? [edit: nevermind; just saw kalisphoenix’s response – thanks]

I sure don’t understand those differences and importance since my German wife often puzzles me using lower case when writing common proper nouns in English.

AmberV · January 6, 2009, 5:45pm

Which is precisely why there should be an advanced search mode which is entirely decoupled from the primary index; something I was kind of trying to get at, but not saying well. I’m all for an extremely efficient internal indexing system, especially in an application like DT that does so much analysis of text—but this shouldn’t be at the expense of the ability to find specific strings.

I don’t see why there needs to be a choice between Boolean syntax and case-sensitivity either. These are two entirely different operations and while they can cooperate, are not mutually exclusive. There are plenty of case-senstivite searches out there that also sport full Booleans and much more.

Arguably, a powerful search should be at the very top of their list. I’m not even opposed to the fuzzy, give-me-lots-of-chaff-that-might-be-what-I-wanted-but-don’t-really-know-how-to-search type searching. That’s fine and does come in handy. Sacrificing any sort of precise search mechanism for this kind of SpotLight-esque exercise in vagary is a dire failing in an application which has the primary intention of mass textual archival and retrieval!

And yes, the lack of any kind of punctuation search, brackets included, is nuts. I have many thousands of cross-referenced interdocument links using a text-based token system which relies upon punctuation delimiters. I can no longer find any of these link targets or sources. I have to use grep while mucking about in the bundle from Terminal, and that doesn’t help me out one bit when it comes to procedural searching.

I think that is a critical thing here. There are in my opinion, two main “modes” for searching. There are searches you perform in order to retrieve a document or a collection of documents. Then there are searches which are performed in order to enact some bulk action upon the search results.

FloodLight type searching is fine for the first mode, in fact it is preferable because we might not always remember the precise terms and phrases that were used in the document(s) we wish to retrieve. But when you are searching with a procedural purpose—to take say 800 results and do a bulk meta-data change on them, or move them to a new sub-group—absolute precision is required. Right now, if I wanted to make a bulk change to the ‘-m-’ fork of files, I would get roughly 2,500 returns, nearly half of which are chaff ‘-M-’ files. This means hunting and pecking through ~1,250 files which is absolutely out the question and therefor my intended task simply doesn’t get done.

So I say keep the toolbar search in fuzzy mode, due to interface limitations you cannot really do much in the way of advanced searching in just one line of real-time results. Then have a nice dialogue box that offers regex options, selective group controls, case-sensitivity, and so forth. Yes, it means a “slow” grep style search through the database, but at least the results are accurate, and if this were optimised with even a minimal separate index from the primary index it would be even better. If the dialogue were threaded, you could start a search and come back to it when it finished (results could be displayed in the dialogue (think BBEdit or TextMate’s project search). If this is all “advanced” and optional and included in only Pro and up, you don’t have floods of confused people wondering why \d keeps finding every single digit in the database. Keeping the results separate means you could even save out search sets easily. Instant group-full-of-replicants so you can work off of the list at leisure.

This kind of stuff seems to me anyway, fundamental to a powerful archival system.

ndouglas · January 6, 2009, 8:05pm

Good points, Amber, and I agree whole-heartedly.

sjk · January 6, 2009, 9:28pm

Is there anyone who disagrees with Amber’s compelling reasons for DTP optionally supporting more precise (e.g. case-sensitive, regexp, exclusionary) searching in situations it’s desirable/necessary because current methods are weak/insufficient?

Just happened upon this apropos definition/example:

Sometimes some of us want/need more granularity from our DTP databases.