Thomas Huxley was a prominent 19th century scientist and writer. Aldous Huxley was a prominent 20th century writer. They were related. Many years ago, I spent several days with Aldous Huxley when he came to visit the research group in which I worked at the University of Texas. I never met Thomas Huxley. Although I’ve been around a long time, he died before I was born.
I’ve got documents in a database referring to each, and sometimes to both Huxleys.
If I want to search for just those documents that refer to Aldous, but not to Thomas, I could write this query:
Huxley (Aldous NOT Thomas)
That will exclude from the results any document that has the word “Thomas” within it, even those that refer to a Thomas who is not Thomas Huxley. So I might want to refine that query a bit, to
Huxley (Aldous NOT (Thomas NEAR2 Huxley))
That would exclude documents that refer to “Thomas Huxley”, “Thomas Henry Huxley”, “Huxley, Thomas Henry”. But it would not exclude a document that mentioned both Aldous Huxley and Thomas Pinkerton.
Even so, a search result document that contained a sentence like this could still slip in a reference to Thomas: “Thomas and Aldous were two famous members of the Huxley family.” My NEAR2 operator doesn’t exclude that mention of Thomas Huxley.
No. I’m not sure what you are trying to do. But to do a search of that kind you will need Wildcards.
I did a search in one of my databases for all the documents that contain the sequence “xyz” in text strings.
Query: xyz Asterisks turn this into a Wildcards search. Result: 56 documents.(Of those 56, 42 contain the ‘word’ string, “xyz”. The other 14 contain the string “xyz” within a longer text string, so it’s necessary to use Wildcards to find all of them.
Next, I did a search to exclude those 56 documents:
Query: [a-z] NOT xyz Result: 24,269 documents.
That leaves over a thousand documents that were not accounted for by these two searches. I didn’t total up the categories of documents with zero words, but that sounds about right, counting bookmarks, image-only PDFs of handwritten notes or diagrams, pictures, etc.
Thank you for your patience, Bill. But I’m still confused.
I tried to follow your example on a database I happened to have open:
Assuming, I ran “git (svn NOT workflow)” as a search, it should return all documents containing the words “git” and “svn”, and do not contain the term “workflow”, correct?
However, the search results still contain documents containing the term workflow:
Searching is (like in v1.x) field specific, the content is probably matching the term. That’s necessary for proximity/phrase searching and to make results consistent, e.g. word1 NEAR/9999999 word2 returns the same results as word1 AND word2.