I’ve started using DT4 to collect all my reference documents. One feature that seems to have great potential is Concordance; unfortunately, it grabs a lot of noise, like numbers, non-words (I have a lot of equations in my documents), roman numerals, and common English words. At the moment, in my migration, I have about 15k words in Concordance, and curating them is no longer realistic. Is there a way to automate the process?
Welcome @chrisgve
No, there is no automation for excluding words from the Concordance.
At least excluding numbers should be easy, just sort by name, then select all numbers and exclude them via the contextual menu.
Welcome @chrisgve
I actually wasn’t aware the concordance inspector has a broader scope than single documents. I never used it much—and so, never bothered excluding words—but perhaps that might change. I also didn’t realize excluded words are both persistent and application-wide.
What use cases or workflows are you imagining? How many words are you excluding / would you want to exclude?
Out of curiosity, I tried excluding a few words and then ran defaults read com.devon-technologies.think3
(For DEVONthink 4 use com.devon-technologies.think
. I look forward to upgrading, but I’m still on DT3 right now).
The excluded words are stored as an array in the key ExcludedWords
(at least for DT3).
I don’t understand what you want to automate (or why), but in some cases it might be easier to “manage” the list using the defaults
command. However, I’m by no means an expert here, so I wouldn’t do that without hearing what @cgrunenberg says. And remember, don’t use defaults
while an application is running.
For example, defaults
has an -array-add
option. It doesn’t look like the array needs any sorting, so that seems like the most straightforward CLI method?
… Okay, since it seemed relatively harmless and I don’t care about the excluded words yet, I did some testing anyways Quit DEVONthink, took a backup, then tried some
defaults
.
-array-add
adds entries (of type string
) to the array.
~ % defaults read com.devon-technologies.think3 ExcludedWords
(
In,
Of,
The,
A,
Or,
To,
And,
You
)
~ % defaults write com.devon-technologies.think3 ExcludedWords -array-add Me Them They I II III IV
~ % defaults read com.devon-technologies.think3 ExcludedWords
(
In,
Of,
The,
A,
Or,
To,
And,
You,
Me,
Them,
They,
I,
II,
III,
IV
)
Launched DEVONthink and checked – works like expected.
Quit again, then tested -array
. This option replaces the existing array completely.
~ % defaults write com.devon-technologies.think3 ExcludedWords -array One Two Three
~ % defaults read com.devon-technologies.think3 ExcludedWords
(
One,
Two,
Three
)
Launched DEVONthink and checked – works like expected.
So. One option could be to export the current array to a text file:
defaults read com.devon-technologies.think3 ExcludedWords >> ~/Documents/DT_ExcludedWords.txt
Then clean up the result, sort the lines alphabetically and add any new entries you want. Then join the lines with a single space between strings. Use the end result with the -array
option to replace the existing one.
Still, I recommend waiting for developer input before anyone goes wild with this. (And do check if anything has changed with DEVONthink 4)
Usually modifying internal preferences of DEVONthink is highly discouraged but in this case it’s the only option and a quite harmless preference.
Thank you.
I assumed as much and feel I should apologize for disregarding my own advice. I only dared to go ahead because it did seem relatively harmless and I made sure to take precaution.
Thanks for diving into this, that’s a hack but it could be a savior. Now if it were possible to extract the list of included word some kind of automation would be possible, including making sure that DT4 is closed when executing the script.
Proof of concept in JavaScript:
const app = Application("DEVONthink") // use "DEVONthink 3" for DT3
const dbName = "Whatever"; // change to your database
const db = app.databases[dbName].root();
const concordance = app.getConcordanceOf({record: db});
console.log(concordance);
Let this loop over all your databases, join the results to one gigantic Set
of strings. That should be what you’re looking for.
Using a Set
takes care of duplicates.
What do you mean it’s a hack? defaults
has shipped with macOS for a long time.
You still didn’t really explain what you want and why or give any context. I can’t even tell if tight curation of Concordance is a good approach or if you’re trying to fit a square peg into a round hole.
I am sorry, I did not mean to offend you. You are absolutely right that defaults
is a long-standing tool in macOS. What I meant is that the solution is using a non-documented (from a Devonthink point of view) approach to modifying the settings. I’ll try to work a combination of your solution and @chrillek proof of concept. Thank you, and I apologize for being clumsy.
Javascript scripting does not work, but AppleScript does so I’ll explore using AppleScript for the time being.
I tried the script and it did work. So, perhaps more useful information than “does not work” might help to clarify.
- how to you run the script,
- do you see an error message (if so, what does it say)
- if not, what else makes you conclude that “it does not work”?
I’ve used the Script Editor on Javascript, adjusted for the right database name, and after running for a few seconds I get “Script Error” and the console result gives me “Error -1741: An error occurred.”
I’ve added a semi-colon at the end of line 1, the script ran for longer, but with the same result.
The semicolon is irrelevant. -1741 means “buffer for AEFlattenDesc too small”, and I have no idea what that’s supposed to mean. In any case, the code is ok.
How many words do you have in your concordance, on what line in the script does the error occur and what do you see when you save the code to a file (say “script.js”) and execute that with
osascript -l JavaScript script.js
in the Terminal?
Thanks for your guidance, I’ve been able to execute it from the terminal, and I am getting “test_script.js: execution error: Error: Error: An error occurred. (-1741)” again, but no more information. I guess I’ll have to work with smaller elements in the database and eventually loop over them to build up the list, then I can build a filtering and use @troejgaard recipe to update the Exclude list. It looks like I have some JavaScript to learn in my future
I tried the code with different databases between 62000 and 15000 words – no issues. So, you’re probably right that it’s a problem of size. You could try to loop over all groups passing them as record
parameter to getConcordanceOf
.
Interesting, my database is of the order of 15000 words, so it is odd that it would break. Anyway, I’ll figure it out at some point, it’s not too urgent right now
The Concordance is not limited to the database. It is relative to the selection or location. So why not just curate them when dealing with individual documents?
Because I am migrating many (as in hundreds) of documents into DT4, so curation by document is a bit of a stretch. But maybe Concordance is not a feature that will be useful to me, or at all, and I should not worry about this.