Script: Search results with context

pete31 · January 15, 2021, 8:25am

This script translates a DEVONthink query into regex, searches in selected records and creates a Markdown record with search hits plus context.

It makes it possible to view search hits of one or more selected records at once.

Usage

Do a toolbar search in a main window
Select some result records
Run script

Result

A Markdown record with search hits plus context
Click a search hit to open the record with the search hit highlighted.

Context length

Context length can be set in property maxContextCharacters.
Search hits are joined if their ranges plus context overlaps.

Result with maxContextCharacters set to 100:

Result with maxContextCharacters set to 150:

Link behaviour

If property revealLinkedRecord is set to true then clicking a record link reveals the record in the main window.

CSS:

If you use a CSS file and want to change the styling of existing script output records in the future you can add something like

searchResultHit {
  color: #1e1e1e;
  background-color:#ffa718;
}

a.recordLink {
  color: inherit;
  background-color: inherit;
  text-decoration: none; 
}

to your file and then set property theCSS to "".

Thanks @chrillek and @BLUEFROG!

Debugging

This script is quite experimental. In its current state it does not attempt to translate all possible DEVONthink queries.

If it fails …

Note:

If clicking a search hit doesn’t highlight the search hit and therefore doesn’t open the result record on the right page then the search hit plus context probably exceeds the page end. In this case it’s not possible to highlight it. Please make sure to check whether that’s the case before you post a debugging request.

… to translate a query into regex just post the first part of the resulting Markdown record’s source and I’ll see whether it’s possible to add support for this kind of query.

Caution

Regex-searching is an expensive operation. This script should not be used with simply selecting all result records. If possible select only those records you’re interested in.

Download

Search results with context (v 1.1).scpt.zip (89,1 KB)

chrillek · January 15, 2021, 11:53am

Hi Pete,

In your code, I see for example this RE
(?<=^|[^\p{L}|\p{N}|\$|€|£|¥|%|§])"(?:.*?)"(?=$|[^\p{L}|\p{N}|\$|€|£|¥|%|§]) (quoted words).

That’s a positive look behind (?<=) for
“either the beginning of a line or the negation of `a letter, or a digit, or one of four currency symbols, or %, or &'”.
Followed by any number of any symbols enclosed in double quotes, followed by a positive look ahead for either end of line or the same character class as before. More on this character class at the end, that’s TL;DR.

That expression translates into something like ‘symbols enclosed in quotes, but only if they are not part of something that might be a word’. Basically, you’d match

 "abc"
""

but not

x"abc"$

Is that the intention? And what is the reason for this implementation, instead of simply looking for anything enclosed in double quotes?

TL;DR: Character class
It starts with a character class marker [ followed by a negation ^ (referring to all the next characters and character classes), the “allletters” class \p{L}, followed by a vertical bar |, the “all digits” class \p{N}, and another vertical bar.
I suppose that you want the vertical bar for alternation. But that’s not necessary inside a character class, the class itself already matches “any of”. You’d need alternation in cases like (cat|dog|cow). For single characters, one would use either alternation or a class, but not both. Which is by the way exactly what you do on the next line

As I see it (and I might of course be wrong) your RE looks for a single character that is nothing of

a letter
a digit
a vertical bar
$ or € or ¥ or % sign or & sign

The bar inside a character class does not stand for alternation, but just for itself. So you’d in fact also exclude a bar with your character class, which might or might not be intended. Similarly, a $ sign does not need to be escaped inside a character class (nor do most of the other special characters).

If your intention is to define a (huge!) negative character class, you might write it shorter (and clearer) as
[^\p{L}\p{N}\p{Sc}%&] where \p{Sc} stands for currency symbols, which is a lot more than Dollar, Euro, Pound and Yen (Liste der Unicode-Zeichen der Kategorie „Währungssymbol“)

This is not true, generally It depends a lot on the RE engine and on the RE itself. For example, you tend to use a lot of non-capturing groups in your expressions. In the example above, "(?:.*?)" might as well be written as ".*?" which requires 18 steps less, or as "[^"]*" which again shaves off some steps. BTW: These pattern also capture empty strings like "", whereas "[^"]+" does not. The numbers are for a test case with about six (un)matches.
Nowadays, most RE engines are fast enough and well implemented (which explains why REs are available in every modern programming language).

pete31 · January 16, 2021, 11:15am

Thanks @chrillek!

No, I changed regexes several times and obviously just copy pasted at some point.

Yes, read that at some point, removed them but then put them back in as without the pipes it looked messy

Not intended.

Didn’t know, didn’t try.

Yes, this is needed to match exactly what DEVONthink matches, using \p{Sc} won’t work.

Fixed everything and updated the script above. Thanks!

pete31 · January 21, 2021, 6:33am

@cgrunenberg is it possible to get the state of the toolbar search’s options “Ignore Diacritics” and “Fuzzy”? Already compared com.devon-technologies.think3.plist with options enabled/disabled but couldn’t find something useable.

cgrunenberg · January 21, 2021, 8:49am

That’s interesting, there should be a key in the preferences but isn’t The next release will add the key SearchComparison, it’s a bitmap. Bit 0 means no case (not relevant for toolbar search), bit 1 ignores diacritics and bit 2 is fuzzy.

pete31 · January 21, 2021, 8:52am

Ok, so it wasn’t me

Awesome, thank you very much!

rkaplan · September 4, 2024, 5:29am

Hi @pete31

I know this is an old post - I put it aside back then and some recent work makes it more pertinent. And it still works great.

This may be one of the all-time most helpful scripts I can recall - much appreciated.