I wish you to be as perceptive in analyzing the documents you find as you are in searching for them.
As do I!
One of the dangers is that perceptivity in analysis is sometimes proportionate to the time available divided by the number of results one has to go through! Even narrowing the results by 10% could represent hundreds of hours saved over a decade. This all arose because Iâd rather dedicate as much time as possible to analyzing results that match a given criteria to the greatest extent currently possible than to spend a lot of that precious time weeding stuff out that was irrelevant to begin with.
Our art is to be able to distinguish the important from the unimportant. More data does not automatically lead to more knowledge, even if many believe that ⌠Iâll stop now. I donât know any better than you.
Youâre completely right, and I agree.
I really donât know anything about complicated/complex searches. Because thatâs the case, I search for the most important thing first. Then I search only in the search results further and then (if necessary) again in the search results ⌠but thatâs probably useless in your case?
If I always knew what I was looking for among personal files, that would be one thing. But Iâm working with a growing archive of historical documents spanning 180 years. Multiple searches and sub-searches can help, depending on the study methodology and delimitations. Finding and knowing what is important is a process of discovery aided by the nature and strength of the queries.
Shouldnât a simple query like "new testament" OR "old testament"
be sufficient? OPT
doesnât affect matching or highlighting but this might rank results including the word higher.
Thatâll find all âarticlesâ, but the OP wants to exclude those containing also ânew/old covenantâ. Since an âarticleâ is simply an undefined part of the document, I doubt that is possible.
Hmmm⌠easiest approach would be ("new testament" OR "old testament) NOT ("new covenant" OR "old covenant")
. Not very elegant but should work. Just like this approx. solution using wildcards:
"[no][el][wd] testament" -"[no][el][wd] covenant"
It really is more complicated than that. Or rather ânot solvableâ, in my opinion. Because weâre not talking about a complete document (in that case, your query would of course do the right thing). But rather something like:
- A document has articles 1, 2, and 3 where
- article 1 talks about any testament and the old covenant
- article 2 talks about any testament and a marriage covenant
- article 3 talks about the book of Hiob (or whatever)
What the OP would like to happen is that this document is found because one of its articles matches their âgive me any testament and any covenant as long as the latter is neither old nor newâ. But a query like "testament" NOT ("old" or "new") "covenant"
will weed out the complete document, so that âarticle 2â will not be found, whatever we do.
As thereâs no âlimit my search to something that only humans can identify because there is apparently no clear boundary inside the documentâ operator, it simply canât work. Cake have not eat. Or cake eat not have.
If there were clearly definable/defined boundaries between the articles, one could somehow script it by splitting the document into articles at the boundaries and then searching only in the articles.
I love the idea of a ânow testamentâ
Interesting idea but not matched by the wildcards
True but it would match the oew testament.
Now how likely is that? But if it should be an OCR issue, then itâs unclear anyway whether it should be matched or not
Thanks for that syntax, @chrillek! I was struggling to find a way to search for text excerpts that contain the word âlateâ while excluding every instance of phrases like âlate seventeenth centuryâ, âlate twentieth centuryâ, âlate 1840sâ, âlate 1990sâ, etcâŚ
This was my solution
late (NOT(late NEXT/2 (centur*) OR [12]???s))
Works wonderfully!
I tested this at length this morning, and I find that while it appears to return results, it does not seem to do so reliably across file types. Iâm guessing there is a bug persistent somewhere, as I kept getting rtf and txt file results that showed up as results, yet would not show in-file results even though they should have based on more simpler searches for the same basic terms.
The syntax you shared does seem to work as expected with PDF files, however.
A small ideaâŚI havenât tried so donât know if it will work. Why not set up a smart group to initially create a subset and then search within that?
If referring to the original post, Iâm not quite sure what this would look like. Return all results with fox in a smart group, and then how would one isolate cases where sleeping is in the broader context but not within 3 words? The same problem remains.
Adding the text:
prefix to the toolbar search ensure that only the documentâs body is searched.
I appreciate the reminder. In most cases, I find the extra step unnecessary.
In this case, it also doesnât resolve the bug. I did go ahead and submit a bug report yesterday that demonstrates it.
I am convinced this is your main stumbling block
One way or another you must separate out the individual articles.
Otherwise the sorting task that you are describing does not have specific well-defined objective criteria - which is why you are struggling to translate it into computer form.