Automatically managing Concordance word exclusions

chrisgve · June 28, 2025, 4:19pm

I’ve started using DT4 to collect all my reference documents. One feature that seems to have great potential is Concordance; unfortunately, it grabs a lot of noise, like numbers, non-words (I have a lot of equations in my documents), roman numerals, and common English words. At the moment, in my migration, I have about 15k words in Concordance, and curating them is no longer realistic. Is there a way to automate the process?

BLUEFROG · June 28, 2025, 5:18pm

Welcome @chrisgve

No, there is no automation for excluding words from the Concordance.

cgrunenberg · June 28, 2025, 5:51pm

At least excluding numbers should be easy, just sort by name, then select all numbers and exclude them via the contextual menu.

troejgaard · June 28, 2025, 10:40pm

Welcome @chrisgve

I actually wasn’t aware the concordance inspector has a broader scope than single documents. I never used it much—and so, never bothered excluding words—but perhaps that might change. I also didn’t realize excluded words are both persistent and application-wide.

What use cases or workflows are you imagining? How many words are you excluding / would you want to exclude?

Out of curiosity, I tried excluding a few words and then ran defaults read com.devon-technologies.think3 (For DEVONthink 4 use com.devon-technologies.think. I look forward to upgrading, but I’m still on DT3 right now).
The excluded words are stored as an array in the key ExcludedWords (at least for DT3).

I don’t understand what you want to automate (or why), but in some cases it might be easier to “manage” the list using the defaults command. However, I’m by no means an expert here, so I wouldn’t do that without hearing what @cgrunenberg says. And remember, don’t use defaults while an application is running.

For example, defaults has an -array-add option. It doesn’t look like the array needs any sorting, so that seems like the most straightforward CLI method?

… Okay, since it seemed relatively harmless and I don’t care about the excluded words yet, I did some testing anyways Quit DEVONthink, took a backup, then tried some defaults.

-array-add adds entries (of type string) to the array.

~ % defaults read com.devon-technologies.think3 ExcludedWords
(
    In,
    Of,
    The,
    A,
    Or,
    To,
    And,
    You
)
~ % defaults write com.devon-technologies.think3 ExcludedWords -array-add Me Them They I II III IV
~ % defaults read com.devon-technologies.think3 ExcludedWords
(
    In,
    Of,
    The,
    A,
    Or,
    To,
    And,
    You,
    Me,
    Them,
    They,
    I,
    II,
    III,
    IV
)

Launched DEVONthink and checked – works like expected.

Quit again, then tested -array. This option replaces the existing array completely.

~ % defaults write com.devon-technologies.think3 ExcludedWords -array One Two Three
~ % defaults read com.devon-technologies.think3 ExcludedWords 
(
    One,
    Two,
    Three
)

Launched DEVONthink and checked – works like expected.

So. One option could be to export the current array to a text file:

defaults read com.devon-technologies.think3 ExcludedWords >> ~/Documents/DT_ExcludedWords.txt

Then clean up the result, sort the lines alphabetically and add any new entries you want. Then join the lines with a single space between strings. Use the end result with the -array option to replace the existing one.

Still, I recommend waiting for developer input before anyone goes wild with this. (And do check if anything has changed with DEVONthink 4)

cgrunenberg · June 29, 2025, 7:45am

Usually modifying internal preferences of DEVONthink is highly discouraged but in this case it’s the only option and a quite harmless preference.

troejgaard · June 29, 2025, 9:37am

Thank you.
I assumed as much and feel I should apologize for disregarding my own advice. I only dared to go ahead because it did seem relatively harmless and I made sure to take precaution.

chrisgve · June 29, 2025, 9:45am

Thanks for diving into this, that’s a hack but it could be a savior. Now if it were possible to extract the list of included word some kind of automation would be possible, including making sure that DT4 is closed when executing the script.

chrillek · June 29, 2025, 9:57am

Proof of concept in JavaScript:

const app = Application("DEVONthink") // use "DEVONthink 3" for DT3
const dbName = "Whatever"; // change to your database
const db = app.databases[dbName].root();

const concordance = app.getConcordanceOf({record: db});
console.log(concordance);

Let this loop over all your databases, join the results to one gigantic Set of strings. That should be what you’re looking for.
Using a Set takes care of duplicates.

troejgaard · June 29, 2025, 10:16am

What do you mean it’s a hack? defaults has shipped with macOS for a long time.

You still didn’t really explain what you want and why or give any context. I can’t even tell if tight curation of Concordance is a good approach or if you’re trying to fit a square peg into a round hole.

chrisgve · June 29, 2025, 11:05am

I am sorry, I did not mean to offend you. You are absolutely right that defaults is a long-standing tool in macOS. What I meant is that the solution is using a non-documented (from a Devonthink point of view) approach to modifying the settings. I’ll try to work a combination of your solution and @chrillek proof of concept. Thank you, and I apologize for being clumsy.

chrisgve · June 29, 2025, 11:28am

Javascript scripting does not work, but AppleScript does so I’ll explore using AppleScript for the time being.

chrillek · June 29, 2025, 11:48am

I tried the script and it did work. So, perhaps more useful information than “does not work” might help to clarify.

how to you run the script,
do you see an error message (if so, what does it say)
if not, what else makes you conclude that “it does not work”?

chrisgve · June 29, 2025, 11:56am

I’ve used the Script Editor on Javascript, adjusted for the right database name, and after running for a few seconds I get “Script Error” and the console result gives me “Error -1741: An error occurred.”

chrisgve · June 29, 2025, 11:58am

I’ve added a semi-colon at the end of line 1, the script ran for longer, but with the same result.

chrillek · June 29, 2025, 12:06pm

The semicolon is irrelevant. -1741 means “buffer for AEFlattenDesc too small”, and I have no idea what that’s supposed to mean. In any case, the code is ok.
How many words do you have in your concordance, on what line in the script does the error occur and what do you see when you save the code to a file (say “script.js”) and execute that with
osascript -l JavaScript script.js
in the Terminal?

chrisgve · June 29, 2025, 4:33pm

Thanks for your guidance, I’ve been able to execute it from the terminal, and I am getting “test_script.js: execution error: Error: Error: An error occurred. (-1741)” again, but no more information. I guess I’ll have to work with smaller elements in the database and eventually loop over them to build up the list, then I can build a filtering and use @troejgaard recipe to update the Exclude list. It looks like I have some JavaScript to learn in my future

chrillek · June 29, 2025, 5:06pm

I tried the code with different databases between 62000 and 15000 words – no issues. So, you’re probably right that it’s a problem of size. You could try to loop over all groups passing them as record parameter to getConcordanceOf.

chrisgve · June 29, 2025, 5:31pm

Interesting, my database is of the order of 15000 words, so it is odd that it would break. Anyway, I’ll figure it out at some point, it’s not too urgent right now

BLUEFROG · June 29, 2025, 5:54pm

The Concordance is not limited to the database. It is relative to the selection or location. So why not just curate them when dealing with individual documents?

chrisgve · June 29, 2025, 6:06pm

Because I am migrating many (as in hundreds) of documents into DT4, so curation by document is a bit of a stretch. But maybe Concordance is not a feature that will be useful to me, or at all, and I should not worry about this.