Boolean NOT (NEAR ...)?

FrankT · September 1, 2023, 6:30pm

I wish you to be as perceptive in analyzing the documents you find as you are in searching for them.

Mindstormer · September 1, 2023, 6:36pm

As do I!

One of the dangers is that perceptivity in analysis is sometimes proportionate to the time available divided by the number of results one has to go through! Even narrowing the results by 10% could represent hundreds of hours saved over a decade. This all arose because I’d rather dedicate as much time as possible to analyzing results that match a given criteria to the greatest extent currently possible than to spend a lot of that precious time weeding stuff out that was irrelevant to begin with.

FrankT · September 1, 2023, 6:44pm

Our art is to be able to distinguish the important from the unimportant. More data does not automatically lead to more knowledge, even if many believe that … I’ll stop now. I don’t know any better than you.

Mindstormer · September 1, 2023, 7:21pm

You’re completely right, and I agree.

FrankT · September 1, 2023, 8:39pm

I really don’t know anything about complicated/complex searches. Because that’s the case, I search for the most important thing first. Then I search only in the search results further and then (if necessary) again in the search results … but that’s probably useless in your case?

Mindstormer · September 1, 2023, 9:17pm

If I always knew what I was looking for among personal files, that would be one thing. But I’m working with a growing archive of historical documents spanning 180 years. Multiple searches and sub-searches can help, depending on the study methodology and delimitations. Finding and knowing what is important is a process of discovery aided by the nature and strength of the queries.

cgrunenberg · September 2, 2023, 8:41am

Shouldn’t a simple query like "new testament" OR "old testament" be sufficient? OPT doesn’t affect matching or highlighting but this might rank results including the word higher.

chrillek · September 2, 2023, 8:48am

That’ll find all “articles”, but the OP wants to exclude those containing also “new/old covenant”. Since an “article” is simply an undefined part of the document, I doubt that is possible.

cgrunenberg · September 2, 2023, 9:07am

Hmmm… easiest approach would be ("new testament" OR "old testament) NOT ("new covenant" OR "old covenant"). Not very elegant but should work. Just like this approx. solution using wildcards:

"[no][el][wd] testament" -"[no][el][wd] covenant"

chrillek · September 2, 2023, 9:26am

It really is more complicated than that. Or rather “not solvable”, in my opinion. Because we’re not talking about a complete document (in that case, your query would of course do the right thing). But rather something like:

A document has articles 1, 2, and 3 where
article 1 talks about any testament and the old covenant
article 2 talks about any testament and a marriage covenant
article 3 talks about the book of Hiob (or whatever)

What the OP would like to happen is that this document is found because one of its articles matches their “give me any testament and any covenant as long as the latter is neither old nor new”. But a query like "testament" NOT ("old" or "new") "covenant" will weed out the complete document, so that “article 2” will not be found, whatever we do.

As there’s no “limit my search to something that only humans can identify because there is apparently no clear boundary inside the document” operator, it simply can’t work. Cake have not eat. Or cake eat not have.

If there were clearly definable/defined boundaries between the articles, one could somehow script it by splitting the document into articles at the boundaries and then searching only in the articles.

I love the idea of a “now testament”

cgrunenberg · September 3, 2023, 8:22am

Interesting idea but not matched by the wildcards

BLUEFROG · September 3, 2023, 1:22pm

True but it would match the oew testament.

cgrunenberg · September 4, 2023, 5:39am

Now how likely is that? But if it should be an OCR issue, then it’s unclear anyway whether it should be matched or not

aaaaaaaaaaaaaaaa · March 16, 2024, 2:14am

Thanks for that syntax, @chrillek! I was struggling to find a way to search for text excerpts that contain the word “late” while excluding every instance of phrases like “late seventeenth century”, “late twentieth century”, “late 1840s”, “late 1990s”, etc…

This was my solution
late (NOT(late NEXT/2 (centur*) OR [12]???s))

Works wonderfully!

Mindstormer · March 17, 2024, 1:04pm

I tested this at length this morning, and I find that while it appears to return results, it does not seem to do so reliably across file types. I’m guessing there is a bug persistent somewhere, as I kept getting rtf and txt file results that showed up as results, yet would not show in-file results even though they should have based on more simpler searches for the same basic terms.

The syntax you shared does seem to work as expected with PDF files, however.

saltlane · March 17, 2024, 1:19pm

A small idea…I haven’t tried so don’t know if it will work. Why not set up a smart group to initially create a subset and then search within that?

Mindstormer · March 17, 2024, 1:50pm

If referring to the original post, I’m not quite sure what this would look like. Return all results with fox in a smart group, and then how would one isolate cases where sleeping is in the broader context but not within 3 words? The same problem remains.

cgrunenberg · March 18, 2024, 6:52am

Adding the text: prefix to the toolbar search ensure that only the document’s body is searched.

Mindstormer · March 18, 2024, 11:05am

I appreciate the reminder. In most cases, I find the extra step unnecessary.
In this case, it also doesn’t resolve the bug. I did go ahead and submit a bug report yesterday that demonstrates it.

rkaplan · March 26, 2024, 9:57am

I am convinced this is your main stumbling block

One way or another you must separate out the individual articles.

Otherwise the sorting task that you are describing does not have specific well-defined objective criteria - which is why you are struggling to translate it into computer form.