"exact" searches not so "exact"

Hi

I am an academic and an keen user of Devonthink Pro office. I has revolutionised the way I do research on topics in my field.

Recently though, a simple problem has arisen that I am finding really puzzling. When using the parenthetical search, for example a single word - “vision”, my understanding is that it should search the database for only that word. However, it finds 'subdivision, ‘division’, ‘provision’, ‘revision’ etc. With large databases this has the potential to soak up a lot of time. It happens whether ‘fuzzy’ is enabled or not.

Can you tell me if my expectations are wrong or if this is normal behaviour for DTPO? If its not, any suggestions as to how I can correct this? Thanks

But in reality, the results you got did return the exact text string you had defined by enclosing it in quotation marks. That string occurs in words such as “vision”, “revision”’, etc. See what happens if you do a search for “vis” (using quotation marks).

If you wish to search for a single word such as “vision”, do a search for that word; do not enclose it in quotation marks, and DEVONthink will find all the documents that contain that exact word.

Often, I want to search for an exact string such as “Jonathon Bentley Smith”. I want to find only documents that contain the set of words in that string and in that order, and not any document that doesn’t meet those criteria. I do not, for example, want to read about Jonathon Jones, Bentley Mackey and Harrison Smith., or about Jonathon Smith Bentley. Enclosing my search string within quotation marks will fit my needs.

Thanks, Bill.
I had been trying various permutations of the search term, including your suggestions, prior to my posting - all to no avail. After your reply, I re-tried, but same results.

Could the problem be in some search set-up option that I am unaware of?

Are you saying that if you search for a word (an exact string of characters) such as “vision” (WITHOUT enclosing it in quotation marks), your results list includes documents that don’t contain that word, and you see highlighted terms other than “vision” in a results document?

If so, make certain that Fuzzy searching isn’t checked.

If I want to search for a word, I don’t enclose it in quotation marks.

If I want to search for an exact text string, usually a multiword phrase, I enclose it in quotation marks.

Yes.

I can search for “box” with OR without quotes. In both cases the same number of “items found” is reported. In both cases the items found include “Dropbox” and “mailboxes”, and these (parts of) words are highlighted appropriately.

“Search for” is set to “All”. Both “Ignore Diacritics” and “Fuzzy” are NOT checked. “Flag”, “Unread”, “Locking”, and “Label” are all set to “Any”. Nothing set set in “Advanced…”

DEVONthink Pro 2.3.2

Followup to my previous post re: “box”

In the two cases I checked (“Dropbox” and “mailboxes”), and possibly others, the documents in question are PDF+Text. In both of those documents, there does occur an unadorned word “box” somewhere in the document. However, the first occurrences, which are highlighted by Find results, happen to be “Dropbox” and “mailboxes”, as reported.

So maybe the following is occurring: (1) Find locates an unadorned instance of “box” in the document and marks it as a “hit”, (2) the results displayed by Find highlight ALL occurrences of “box”, thus producing, IMHO, misleading/confusing results.

In this test, I was searching for “box” WITHOUT the quotes.

Over here, Search for any string always returns every instance of the string, including instances with and without prefixes or suffixes. Enclosing or not enclosing the string in quotation marks, having or not having Fuzzy checked, do not change the results. IOW, what @Dave_Emme and @robstev#258 report. If I need to have an exact hit on a search, I open the document in its native editor (Acrobat, Word, etc.) where it is possible to control the search results.

I stand corrected, in that I didn’t respond completely to the OP’s issue. :blush: DEVONthink’s searches are indeed exact, but confusion can arise for another reason.

My explanations were basically correct. A word is an exact string of alphanumeric characters, and is separated from other words by spaces or by punctuation marks.

If you wish to search for a single word, therefore, it’s not necessary to enclose it in quotation marks, but it makes no difference whether or not quotation marks are used. Quotation marks, however, are required if one wishes to define a multiword text string as an exact string of alphanumeric characters.

If you search for a word such as “vision” (whether enclosed within quotation marks or not), the list of search results will include only documents that contain that word. That’s why a search for a single word, whether or not it is enclosed in quotation marks, will return the same number of results for each variant - enclosed in quotes, or not enclosed in quotes.

Here’s where confusion arises. A document in that search results list will (if possible for the document filetype) highlight the word “vision”. Be assured that somewhere in that document, its name or other metadata (if “All” is the search option), the word “vision” is present and is highlighted; every document in the search results met the search criterion, so that the results list is indeed “exact”. But other words that contain that exact string will also be highlighted, such as “visions”, “revision”, “visionary”, etc. That’s an artifact of the highlighting procedure, and not a failure of the search procedure to return exact results.

Repeat that search using the word “vis”. The number of results will likely be much less than a search for “vision”. But when you examine the highlighted terms in a search result, "vision: will also be highlighted (in part) if that string exists in the document). As the word "vis’ almost always occurs in the phrase, “vis a vis”, do a search for that multiword string, enclosed in quotes. The highlighting will then be much easier to interpret, as the multiword string will be highlighted and the substring “vis” that may occur in other words such as “vision” and “visited” will NOT be highlighted. (The downside of correct highlighting of a phrase: in a long document, where “vis a vis” occurs only on page 155, it may take a while for DEVONthink to highlight it and scroll to that page.)

Need I say, “Bug Report”? Or at least, “Feature Request”?

Please only highlight the things I’ve explicitly searched for.

Providing highlighting of query terms in documents in a list of search results is a convenience. But it’s a convenience that’s simply not available for a number of document filetypes, and the conventional approaches to highlighting, including those built into OS X, can generate false positives.

DEVONthink uses whenever possible coding available in OS X. In the case of PDFs and some other document filetypes, there are existing ‘hooks’ in OS X to generate features such as highlighting of the query term “vision” in PDFs. But the string “vision” will also be highlighted as part of other terms such as “division” or “visionary”. Those are false positives for occurrences of the query term, but are usually quickly recognizable as such, because the entire word isn’t highlighted.

Single word query highlighting is more likely to generate false positives (more so for short words than long words) than are multiword queries enclosed in quotation marks. Compare, for example, the highlight results in a PDF for the string ‘vis a vis’ (not enclosed in quotes) with the highlight results for “vis a vis” (enclosed in quotes). If a document does contain the words ‘vis’ and ‘a’ and the search is done without use of enclosing quotation marks, every occurrence of the letter ‘a’ in the document is highlighted!

Most of the single-word searches I do are for technical terms or proper names, and I don’t find the highlighted results likely to be confusing. Most of the multiword searches I do are enclosed in quotes and are even less likely to be confusing. But if I were depending on highlighting to see how often and in what contexts an author used a certain relatively short word in his writings, I would probably wish for a different career, or I would switch to using software that’s specifically designed to do that sort of thing.

The biggest problem with highlighting of query terms, at least in my databases, is that not all the filetypes that hold important information can be highlighted at all when looked at in a set of search results, especially (but not exclusively) those rendered in DEVONthink by Quick Look. In such cases, all that I can be sure of is that the query criteria were met, and that can be demonstrated by going outside of DEVONthink to the parent application and searching in it. Or perhaps converting the text content of the document to plain text and then including that text in the search.

Would it be technically possible to develop a highlighting scheme that highlights only the exact query terms? Yes, in at least some document types. But to attempt that for all document filetypes that are fairly common in DEVONthink databases, the answer is that it would probably not make sense for the focus of DEVONtechnologies’ resources, because it would require a major dedication of resources.

A common complaint about DEVONthink’s highlighting when a proximity operator has been used is that all the query terms are highlighted, not just those that fall within the proximity span of the NEAR, BEFORE or AFTER operator. But to attempt to highlight only the terms or the text string within the bounds of the proximity operator raises some very tricky logical problems. i’ve seen a couple of search utilities in the Windows world that attempt to do that. I’ve got documents that break those utilities, so that the resulting highlighted documents become essentially unreadable and utterly confusing. Think, for example, of cases where term pairs meeting the proximity criterion overlay each other in multiple instances, especially if ‘n’ (word separation) is not a very small number. Then use more than one proximity operator in the search. :slight_smile:

Cases of such overlays of proximity term pairs are surprisingly common.

Bottom line: Don’t confuse the effectiveness of DEVONthink searches to identify documents that meet a search criterion, with issues of identifying the search terms within a document by means of highlighting. The search always works as designed. But sometimes, depending on the document content, the way the query was designed, and the possibility that a query term - especially a short term - might be highlighted within other, non relevant words in the document, it’s not always easy to identify the number and locations of occurrences of one or more of the query terms. To those issues, add the problem that not all documents accept highlighting. But there’s specialized software designed specifically for heavy-duty analysis of words; DEVONthink isn’t designed to compete with that software category.

Christian tracks user requests and has continually evolved DEVONthink to satisfy many such requests. He’s aware of problems with the user’s convenience in locating occurrences of query terms. I wouldn’t be surprised to see improvements in the future. But it likely won’t prove feasible to always present highlighting of “just the term” in all documents for all queries.

In response to the OP’s initial question, one could search for " vision ".
Notice the spaces before and after the word.
Would that help? Or is my suggestion as naive as I suspect?

Would be a good approach - however DEVONthink ignores leading and trailing spaces in search strings.

I just read all the way through this post, and I still can’t figure out how to search my documents for “command” and not “commanders”.

Thanks,
Ray

E.g. by using the search window (see Tools > Search…) or by using the toolbar search and disabling the option “Prefix while typing”.