I’m playing around with some ways to get my reference database a little more organized. I have some 4000 articles, books, bookmarks, and so on saved egregiously in three folders. Useless for browsing, I know. DEVONthink already helps me search through these files, but I’m thinking it’d be nice if they were better organized so that I can use auto-classify more aggressively as new things come in.
It seems like there could be a neat way of doing this relatively automatically: grab the top five Concordance words of 5-12 characters in length and create replicants of each file in a group titled by each of those keywords. (I’m not looking for sorting perfection, here; just an improvement over 2587 files in a single group.)
Before I strike out to write a script or something to do this, I wanted to check with the community: has this been done before? Am I approaching it in the wrong way? Can you think of something better?
Instead of creating replicants you could also tag the documents like this:
tell application id "DNtp"
set theSelection to the selection
repeat with theRecord in theSelection
set theWords to get concordance of record theRecord sorted by weight
set theTags to {}
set n to count of theWords
if n > 5 then set n to 5 -- Max. 5 words
repeat with i from 1 to n
set theWord to item i of theWords
set theLen to length of theWord
if theLen ≥ 5 and theLen ≤ 12 then set theTags to theTags & theWord
end repeat
if theTags is not {} then
set theTags to (tags of theRecord) & theTags
set tags of theRecord to theTags
end if
end repeat
end tell
Whoa, more than I could have asked for. Thanks for that – saves me hours of foolish trial and error!
My thinking around replicants was that it would allow me to have groups for Auto-Classify. I suppose I could create smart groups to do that instead – or, I suppose that tagging would really make it unnecessary. I’d just have to run this script on new files and, presto, they’d be “sorted” too even if they’re all in the same group.
Some users prefer tags, others replicants/groups/classifying and others smart groups. But DEVONthink supports all these workflows. In the end it depends on personal preferences and also the files.
Many thanks for this script. I already tweaked it a bit for my workflow. One question though: Is it possible for the script to omit keywords containing numbers?
Here’s a small snippet checking whether a word contains digits:
set containsDigit to false
set len to length of theWord
repeat with i from 1 to len
try
set val to (character i of theWord as integer) -- Throws an exception if it's not a digit
set containsDigit to true
end try
end repeat
I would like to try eliminating the numbers from those words that are tagged but I couldn’t figure where to place this script snippet into your prior script on this post. (I’m trying it as a smart rule.) Thanks.
The script listed under this quote works well in finding those words with high weights but for some of these are numbers such as today’s date: 200508. This is helpful for other reasons but not as a tag.
This small snippet script seems to eliminate the numbers but I could use some guidance as to where to insert this into the previous script. Thanks.