Getting organized: using concordance to group by keywords

ryanjamurphy · February 15, 2018, 3:25pm

Hey team!

I’m playing around with some ways to get my reference database a little more organized. I have some 4000 articles, books, bookmarks, and so on saved egregiously in three folders. Useless for browsing, I know. DEVONthink already helps me search through these files, but I’m thinking it’d be nice if they were better organized so that I can use auto-classify more aggressively as new things come in.

It seems like there could be a neat way of doing this relatively automatically: grab the top five Concordance words of 5-12 characters in length and create replicants of each file in a group titled by each of those keywords. (I’m not looking for sorting perfection, here; just an improvement over 2587 files in a single group.)

Before I strike out to write a script or something to do this, I wanted to check with the community: has this been done before? Am I approaching it in the wrong way? Can you think of something better?

Any thoughts are appreciated! Thanks!
Ryan

cgrunenberg · February 15, 2018, 3:34pm

Instead of creating replicants you could also tag the documents like this:


tell application id "DNtp"
	set theSelection to the selection
	repeat with theRecord in theSelection
		set theWords to get concordance of record theRecord sorted by weight
		
		set theTags to {}
		set n to count of theWords
		if n > 5 then set n to 5 -- Max. 5 words
		
		repeat with i from 1 to n
			set theWord to item i of theWords
			set theLen to length of theWord
			if theLen ≥ 5 and theLen ≤ 12 then set theTags to theTags & theWord
		end repeat
		
		if theTags is not {} then
			set theTags to (tags of theRecord) & theTags
			set tags of theRecord to theTags
		end if
	end repeat
end tell

ryanjamurphy · February 16, 2018, 12:27am

Whoa, more than I could have asked for. Thanks for that – saves me hours of foolish trial and error!

My thinking around replicants was that it would allow me to have groups for Auto-Classify. I suppose I could create smart groups to do that instead – or, I suppose that tagging would really make it unnecessary. I’d just have to run this script on new files and, presto, they’d be “sorted” too even if they’re all in the same group.

Is there a best practice for this kind of thing?

cgrunenberg · February 16, 2018, 10:57am

Some users prefer tags, others replicants/groups/classifying and others smart groups. But DEVONthink supports all these workflows. In the end it depends on personal preferences and also the files.

MicaOlaAdams · February 20, 2018, 2:00pm

Many thanks for this script. I already tweaked it a bit for my workflow. One question though: Is it possible for the script to omit keywords containing numbers?

cgrunenberg · February 20, 2018, 2:19pm

Here’s a small snippet checking whether a word contains digits:


set containsDigit to false
set len to length of theWord
repeat with i from 1 to len
	try
		set val to (character i of theWord as integer) -- Throws an exception if it's not a digit
		set containsDigit to true
	end try
end repeat

MicaOlaAdams · February 20, 2018, 2:54pm

Many thanks!

My AppleScript skills improve with DevonT and this forum

Friar · May 8, 2020, 2:20am

I would like to try eliminating the numbers from those words that are tagged but I couldn’t figure where to place this script snippet into your prior script on this post. (I’m trying it as a smart rule.) Thanks.

BLUEFROG · May 8, 2020, 3:06am

I would like to try eliminating the numbers from those words that are tagged

For example?

Friar · May 8, 2020, 1:45pm

The script listed under this quote works well in finding those words with high weights but for some of these are numbers such as today’s date: 200508. This is helpful for other reasons but not as a tag.

This small snippet script seems to eliminate the numbers but I could use some guidance as to where to insert this into the previous script. Thanks.