Some questions on concordance and one question on custom meta data

ngan · August 22, 2019, 5:09pm

Just begin to learn more about concordance and have a few stupid questions:

(1) Some words in concordance are grey out. I can only see one rule - when the word only exists in one group? Are there other meanings?

(2) I can still see the list of concordance when I click on a group or a tag. Are those the concordance of items within the group? If yes, only for immediate children or all levels of children within the group?

(3) According to the DT3 dictionary, " get concordance of record …" can now be applied to the record, selection, and group?

(4) The suggestions from “See Also and Classify” are generally quite sensible (perhaps it will be even better if I exclude more non-applicable groups). Just curious, has DT ever considered to provide “See Also and Classify” to Tags because tag and group are pretty much the same types of record. Is this a technical reason (e.g. weight can only be based on one kind of parents; documents are overlapping a lot more in tags thus the weight are less meaningful), or it’s just too confusing to offer classification to both groups and tags?

(5) Can custom metadata applied to groups and tags?

Thanks again

BLUEFROG · August 22, 2019, 8:40pm

The Concordance changes based on the selection. And yes, this includes all children of a group. This is easily testable by selecting a group and noting the number of words, then selecting a subgroup. There is no change.
I wouldn’t say it’s applied to a record, selection, or group. The command reports on a record, whether that’s a selection or a record defined by a get record command.
Development would have to weigh in on this, but Tags aren’t merely groups. They are special in that they contain the replicants for files, not files themselves.
Yes, but they do not propagate to the children.

ngan · August 22, 2019, 8:48pm

Thank you very much for the explanation!

BLUEFROG · August 22, 2019, 8:55pm

You’re welcome. @cgrunenberg will have to weigh in on the grayed out items. It wasn’t previously documented so I’m not sure on it and don’t want to speak out of turn.

PS: Be cautious with the concordance command. Targeting a database as the record can easily cause a stall.

Interestingly, this code crashes script editor…

tell application id "DNtp"
	get concordance of record (current group)
end tell

Do you see it as well?

cgrunenberg · August 23, 2019, 4:18am

The words have no occurrences outside the document or selection.

cgrunenberg · August 23, 2019, 4:19am

No. What was the current group? Maybe too much data returned by DEVONthink crashed the editor?

ngan · August 23, 2019, 4:20am

I am using Script Debugger. It seems the stall is depending on the unique word count. Anything under 10,000 seems fine and fast (range from 0.01 to 0.04 sec). The editor stalls for count over 30,000 but I don’t have any group between 10,000 and 30,000.
I am just focusing on the implication of the concordance of single document. It’s fast enough to let me think about using script to do some sort of auto tagging (just a pet project…). A really rough idea is to store two word lists in the comment or custom metadata field of each tag in a group of 100-200 tags. One list for OR and and another for AND. Default orList is just the name and aliases of the tag. Auto-tagging can be achieved by using “contains” to get a match between the concordance and the two lists.
Loop through the tag list and using

 if (conList contains orList) and (conList contains andList ) then ...

should do. So, (1) My constrained tag script can pop up and show the recommended tags in first section, with the entire constrained tag tree being shown below for additional choices. (2) the condition of matching can be mutated by changing the word list in the tag’s comment/field directly.

I think auto-tagging is more meaningful for single or a handful of items (e.g., classifying the inbox items) each containing reasonable small amount of words. Any sort of auto tagging on large amount of documents or long document risk assigning too many unnecessary tags and there is no good way to check/audit the quality of assignment on several hundred or thousands of auto-tagged items after-the-fact.

ngan · August 23, 2019, 4:59am

Thanks for the info.

BLUEFROG · August 23, 2019, 12:03pm

It was actually a fairly small group.
It doesn’t seem to be consistent, but here’s a report, in case it’s of any use.

Script Editor_2019-08-22-165948_IcarusX.crash.zip (24.3 KB)

cgrunenberg · August 23, 2019, 1:52pm

Only if Apple should read this

DCBerk · October 21, 2019, 1:13am

I looked at the Concordance for a single rtf file (not in a group). The words with “no occurrences outside the document” are greyed out".

WIth all due respect:

First, the greyed out words are so faint they are difficult to read.

Second, I don’t care if a word doesn’t appear anywhere else. And if the words that are not greyed do occur somewhere else, where might that be? Again, this is a single file.

Third, when a file is in a group, it’s a nuisance to have to scroll through over 8000, or 20,000, or whatever number of words trying to find he ones relevant to the one document I want the information on. If I wanted that info, it would be far better to just select several files.

You might want to look at how the concordance words in Scrivener; more options; more useful, not least that it lets you eliminate certain words like “that” “an” “or”, etc. Not many people would find that useful information and including them makes the resulting list ridiculously long. Most important it is an option, so anybody who wants to know how many times they used “and” can find out…

Always nice to have options,
Thanks, June

cgrunenberg · October 21, 2019, 8:52am

Words can be excluded via the contextual menu but by default no words are excluded. Which options do you miss?

DCBerk · October 21, 2019, 10:38pm

Thanks for response; issues with the Concordance, plus suggestion re Sidebar/Inspector follow:

Graph of word frequency shows graph line; no other information. Not explained in manual. What am I missing here?
Cannot adjust width of columns in Concordance except Words - all remain same width even if Inspector is made wider; column titles truncated.
Don’t understand utility of knowing how often a word appears elsewhere in db - irrelevant if it appears in Goethe and Beckett, for instance; if I cared, I could search. Can see word frequency in entire group by selecting all sub-docs.
Concordance for single file has words greyed out even with option “hide excluded”. Showing excluded words crossed out means list is just as long as before.
Scrivener Concordance allows user to create editable list of excluded words that applies to entire “Project”, i.e., DT database (like a user-created tag list). If “the” is excluded, there is no need to reset it for each search. (Can send screenshots if you don’t have the app.)
I do like that I can select a word in the Concordance and turn it into a Tag, and appreciate that selected word is highlighted in document.
Related Word map would be far more useful if connected to Mac dictionary as in Scrivener (or even better, with something like Visual Thesaurus: https://tinyurl.com/hn4ctyh). The words that appear in DT map do not seem related at all. Selected “Altogether” and got: said, talking, disagreeable, gentlemen, think, arrangement, suppose, contrive. Huh?
And may I suggest floating palettes for Inspector and Side Bar would take up less space; could leave open, toggle to hide behind main View. For those with two monitors, palettes could be “torn off” and put to one side. (I’m on a MacAir; work in 3-pane View/List, with both palettes turned off; result is 2-panes; lots more eye-friendly white space.)

cgrunenberg · October 22, 2019, 7:58am

Thank you for the feedback!

You can hide not required columns via the contextual menu of the table header.

Such words appear only in the current document/selection.

The relationships are neither predefined nor based on online dictionaries, they depend completely on the contents of your database: