search predicate question

kenliles · June 10, 2015, 7:22pm

Couple of related dumb questions on search (doc content):

How do you search on a word that doesn’t catch the string sequence within other words:
for example:
a search for “cc”, as a word, but not catch suCCessful as an example;
I tried putting quotes around the string, but looks like that still returns positive on string-within-a-word. The ?, *, [list] operators add conditions to the match, but only character based that I can see (maybe the old brain isn’t working). I need something like a Null character surrounding the string… I’m probably missing something embarrassing

Looks like : (colon character) and a space are not explicitly searchable as characters (I assume because they belong to actionable search character list),
but is there a way to force the search to include,
for example, in searching for emails, so-
“From:” NEAR “Sent:”

edit:
point clarification above- Looks like Search IS looking for word per request (not string within word)- but it’s highlighting the string within the word, so made me think that was the positive find. Might make a note of that - a bit confusing from an interface standpoint. For clarity- I’m in a Search window using multiple OR predicates like “From:” NEAR “Sent:” and even though they don’t trigger the finds, the found strings within words are highlighted- leading user to think these are the find cases

anyway-
So the remaining question above would regard whether there is a way to force search of a colon as part of the word or string
This would be a potential easier alternative if I could search for : as part of a word or string, eliminating positive finds of strings within words. thanks

BLUEFROG · June 10, 2015, 9:18pm

Since it appears you are searching for emails, I am curious why you’re approaching it this way.

kenliles · June 10, 2015, 10:01pm

good question-
Massive Legal productions of documents, deliver ‘flattened’ (all in searchable text pdf); Inevitably intermixing sources of native docs, images, and other with email communications, agreements, etc.
An isolation of the email communications requires a predicate search that isolates email layout formats, rather than native file types (since all docs are delivered in the same native pdf searchable text).
I’ve developed several search predicates that fairly accurately isolate different doc types using only the enclosed text format. This is a continuation to refine that effort…

In short, imagine all file types on your computer delivered to you as PDF-searchable; and the job at hand - reverse engineer which are emails, which are native word docs, agreements, images, communication, etc.

the automated job at hand-
discern what a file ‘content-type’ is (different from ‘file-type’ but potentially related), based only on it’s internal ‘word image’ or construct pattern. It’s sort of a textual pattern recognition problem on a massive scale. In this case, I’m isolating email-like communication documents from other types, using only the format of the internal word structure to build the search predicate (since the file type itself has been flattened to a single searchable type; eliminating that (meta-data)distinction external to the file itself); Coupled with doing that automatically on GBs of docs to avoid manual efforts of same…

DT works quite well for much of this- I’m continually refining…

Bill_DeVille · June 10, 2015, 10:03pm

DEVONthink will not recognize a colon as a searchable character. It is treated, as are most other non-alphanumeric characters such as +, @, $ as a space.

In the case of results of a proximity operator such as NEAR or AFTER, both terms are highlighted wherever they occur in the document, whether or not they are part of an actual proximity pair.

kenliles · June 10, 2015, 11:27pm

Got it- thanks Bill; Might be worth considering a Preferences Settings to allow common punctuation characters to be inclusive of predicate searches. Or another mechanism, say when enclosed in a literal quote search, gets included instead of mapped to a space. Food for thought

Got it thanks- Suggestion for the box-
Highlight the actual matching proximity predicate in a different color, so picking out the successful search predicate pair is captured visually. This would augment the process of developing successful search predicates a bit easier

thanks for your help–