Providing highlighting of query terms in documents in a list of search results is a convenience. But it’s a convenience that’s simply not available for a number of document filetypes, and the conventional approaches to highlighting, including those built into OS X, can generate false positives.
DEVONthink uses whenever possible coding available in OS X. In the case of PDFs and some other document filetypes, there are existing ‘hooks’ in OS X to generate features such as highlighting of the query term “vision” in PDFs. But the string “vision” will also be highlighted as part of other terms such as “division” or “visionary”. Those are false positives for occurrences of the query term, but are usually quickly recognizable as such, because the entire word isn’t highlighted.
Single word query highlighting is more likely to generate false positives (more so for short words than long words) than are multiword queries enclosed in quotation marks. Compare, for example, the highlight results in a PDF for the string ‘vis a vis’ (not enclosed in quotes) with the highlight results for “vis a vis” (enclosed in quotes). If a document does contain the words ‘vis’ and ‘a’ and the search is done without use of enclosing quotation marks, every occurrence of the letter ‘a’ in the document is highlighted!
Most of the single-word searches I do are for technical terms or proper names, and I don’t find the highlighted results likely to be confusing. Most of the multiword searches I do are enclosed in quotes and are even less likely to be confusing. But if I were depending on highlighting to see how often and in what contexts an author used a certain relatively short word in his writings, I would probably wish for a different career, or I would switch to using software that’s specifically designed to do that sort of thing.
The biggest problem with highlighting of query terms, at least in my databases, is that not all the filetypes that hold important information can be highlighted at all when looked at in a set of search results, especially (but not exclusively) those rendered in DEVONthink by Quick Look. In such cases, all that I can be sure of is that the query criteria were met, and that can be demonstrated by going outside of DEVONthink to the parent application and searching in it. Or perhaps converting the text content of the document to plain text and then including that text in the search.
Would it be technically possible to develop a highlighting scheme that highlights only the exact query terms? Yes, in at least some document types. But to attempt that for all document filetypes that are fairly common in DEVONthink databases, the answer is that it would probably not make sense for the focus of DEVONtechnologies’ resources, because it would require a major dedication of resources.
A common complaint about DEVONthink’s highlighting when a proximity operator has been used is that all the query terms are highlighted, not just those that fall within the proximity span of the NEAR, BEFORE or AFTER operator. But to attempt to highlight only the terms or the text string within the bounds of the proximity operator raises some very tricky logical problems. i’ve seen a couple of search utilities in the Windows world that attempt to do that. I’ve got documents that break those utilities, so that the resulting highlighted documents become essentially unreadable and utterly confusing. Think, for example, of cases where term pairs meeting the proximity criterion overlay each other in multiple instances, especially if ‘n’ (word separation) is not a very small number. Then use more than one proximity operator in the search.
Cases of such overlays of proximity term pairs are surprisingly common.
Bottom line: Don’t confuse the effectiveness of DEVONthink searches to identify documents that meet a search criterion, with issues of identifying the search terms within a document by means of highlighting. The search always works as designed. But sometimes, depending on the document content, the way the query was designed, and the possibility that a query term - especially a short term - might be highlighted within other, non relevant words in the document, it’s not always easy to identify the number and locations of occurrences of one or more of the query terms. To those issues, add the problem that not all documents accept highlighting. But there’s specialized software designed specifically for heavy-duty analysis of words; DEVONthink isn’t designed to compete with that software category.
Christian tracks user requests and has continually evolved DEVONthink to satisfy many such requests. He’s aware of problems with the user’s convenience in locating occurrences of query terms. I wouldn’t be surprised to see improvements in the future. But it likely won’t prove feasible to always present highlighting of “just the term” in all documents for all queries.