Script: Search results with context

This script translates a DEVONthink query into regex, searches in selected records and creates a Markdown record with search hits plus context.

It makes it possible to view search hits of one or more selected records at once.

Usage

  • Do a toolbar search in a main window

  • Select some result records

  • Run script

Result

  • A Markdown record with search hits plus context

  • Click a search hit to open the record with the search hit highlighted.


Context length

  • Context length can be set in property maxContextCharacters.

  • Search hits are joined if their ranges plus context overlaps.

Result with maxContextCharacters set to 100:

Result with maxContextCharacters set to 150:


Link behaviour

  • If property revealLinkedRecord is set to true then clicking a record link reveals the record in the main window.

CSS:

If you use a CSS file and want to change the styling of existing script output records in the future you can add something like

searchResultHit {
  color: #1e1e1e;
  background-color:#ffa718;
}

a.recordLink {
  color: inherit;
  background-color: inherit;
  text-decoration: none; 
}

to your file and then set property theCSS to "".

Thanks @chrillek and @BLUEFROG!


Debugging

This script is quite experimental. In its current state it does not attempt to translate all possible DEVONthink queries.

If it fails …

Note:

If clicking a search hit doesn’t highlight the search hit and therefore doesn’t open the result record on the right page then the search hit plus context probably exceeds the page end. In this case it’s not possible to highlight it. Please make sure to check whether that’s the case before you post a debugging request.

… to translate a query into regex just post the first part of the resulting Markdown record’s source and I’ll see whether it’s possible to add support for this kind of query.


:warning: Caution

Regex-searching is an expensive operation. This script should not be used with simply selecting all result records. If possible select only those records you’re interested in.


Download

Search results with context (v 1.1).scpt.zip (89,1 KB)

2 Likes

Hi Pete,

In your code, I see for example this RE
(?<=^|[^\p{L}|\p{N}|\$|€|£|¥|%|§])"(?:.*?)"(?=$|[^\p{L}|\p{N}|\$|€|£|¥|%|§]) (quoted words).

That’s a positive look behind (?<=) for
“either the beginning of a line or the negation of `a letter, or a digit, or one of four currency symbols, or %, or &’”.
Followed by any number of any symbols enclosed in double quotes, followed by a positive look ahead for either end of line or the same character class as before. More on this character class at the end, that’s TL;DR.

That expression translates into something like ‘symbols enclosed in quotes, but only if they are not part of something that might be a word’. Basically, you’d match

 "abc"
""

but not

x"abc"$

Is that the intention? And what is the reason for this implementation, instead of simply looking for anything enclosed in double quotes?

TL;DR: Character class
It starts with a character class marker [ followed by a negation ^ (referring to all the next characters and character classes), the “allletters” class \p{L}, followed by a vertical bar |, the “all digits” class \p{N}, and another vertical bar.
I suppose that you want the vertical bar for alternation. But that’s not necessary inside a character class, the class itself already matches “any of”. You’d need alternation in cases like (cat|dog|cow). For single characters, one would use either alternation or a class, but not both. Which is by the way exactly what you do on the next line :wink:

As I see it (and I might of course be wrong) your RE looks for a single character that is nothing of

  • a letter
  • a digit
  • a vertical bar
  • $ or € or ¥ or % sign or & sign

The bar inside a character class does not stand for alternation, but just for itself. So you’d in fact also exclude a bar with your character class, which might or might not be intended. Similarly, a $ sign does not need to be escaped inside a character class (nor do most of the other special characters).

If your intention is to define a (huge!) negative character class, you might write it shorter (and clearer) as
[^\p{L}\p{N}\p{Sc}%&] where \p{Sc} stands for currency symbols, which is a lot more than Dollar, Euro, Pound and Yen (https://www.compart.com/de/unicode/category/Sc)

This is not true, generally :wink: It depends a lot on the RE engine and on the RE itself. For example, you tend to use a lot of non-capturing groups in your expressions. In the example above, "(?:.*?)" might as well be written as ".*?" which requires 18 steps less, or as "[^"]*" which again shaves off some steps. BTW: These pattern also capture empty strings like "", whereas "[^"]+" does not. The numbers are for a test case with about six (un)matches.
Nowadays, most RE engines are fast enough and well implemented (which explains why REs are available in every modern programming language).

1 Like

Thanks @chrillek!

No, I changed regexes several times and obviously just copy pasted at some point.

Yes, read that at some point, removed them but then put them back in as without the pipes it looked messy :smile:

Not intended.

Didn’t know, didn’t try.

Yes, this is needed to match exactly what DEVONthink matches, using \p{Sc} won’t work.

Fixed everything and updated the script above. Thanks!

@cgrunenberg is it possible to get the state of the toolbar search’s options “Ignore Diacritics” and “Fuzzy”? Already compared com.devon-technologies.think3.plist with options enabled/disabled but couldn’t find something useable.

That’s interesting, there should be a key in the preferences but isn’t :slight_smile: The next release will add the key SearchComparison, it’s a bitmap. Bit 0 means no case (not relevant for toolbar search), bit 1 ignores diacritics and bit 2 is fuzzy.

1 Like

Ok, so it wasn’t me :slight_smile:

Awesome, thank you very much!