Gather all related (by filename) records

This is slightly specialized, but hopefully not too much. Perhaps it will at least be useful as a starting point for someone.

The task to be solved

Having identified one or more records (PDFs or annotations exported from PDFs) as useful for my current research task, I wish to retrieve all of the records associated with the relevant document(s). So, having found an annotation from paper 1 and paper 2, I now wish to gather the PDFs for both papers and all annotations related to them.


I have a database called “PDFs” (which indexes another folder, but I don’t think that matters). It contains the PDFs I use for my research, annotations I’ve exported from Skim in multimarkdown, and the .skim files that Skim automatically creates (which contain all the annotations in a file).

The naming scheme is along the lines of author-year-ID-etc., where ID is a unique number I’ve assigned the document.

  • Jones-2011-15.pdf
  • Jones-2011-15.skim
  • Jones-2011-15-3 Canonical reference for social
  • Jones-2011-15-3 Definition of social
  • Jones-2011-15-5 Applications of social

The last three items are exported annotations, including the page on which they appear and a “headline” that summarizes their import.

I’ll refer to Jones-2011-15 as the “searchName”.

The code

This qualifies as “ugly, but workable”. Suggestions for improvement welcome.

-- Retrieve into a  folder all items sharing the same name as the selected items in the PDF database.
-- So, you can start with the pdf, md notes or skim notes and get everything.
--	Created by: Glenn Hoetker
--	Created on: 12/01/13 14:58:22
-- Use at your own risk.  I accept no responsibility for damage that may be done.
--	Copyright (c) 2013 Arizona State University
--	All Rights Reserved

set retrievedRecords to {}

	set dialogResult to display dialog ¬
		"What should the new group be called?" buttons {"Cancel", "OK"} ¬
		default button "OK" cancel button ¬
		"Cancel" default answer ("New Group")
on error number -128
	display dialog "User cancelled."
end try

set newGroupName to text returned of dialogResult

 -- Extract searchName (Jones-2011-15) from the selected records, whether the selected record is a
-- PDF, .skim, or Markdown file.
tell application "DEVONthink Pro"
	set theRecords to the selection
	repeat with eachRecord in theRecords
		set recordName to the name of eachRecord
		set saveTID to AppleScript's text item delimiters
		set AppleScript's text item delimiters to {"-"}
		set searchName to words 1 through 3 of recordName as text
		set AppleScript's text item delimiters to saveTID
		set retrievedRecords to retrievedRecords & (search searchName in current database)
	end repeat

	-- Create a new folder at the root of the PDFs database and replicate the relevant records to it

	set theGroup to create location newGroupName in "PDFs"
	repeat with eachRecord in retrievedRecords
		if parent of eachRecord is not "Trash" then replicate record eachRecord to theGroup
	end repeat
end tell

I played with smart groups and new windows, etc., but ultimately decided that a new static group with replicants of the retrieved records worked best. Once they are all in there, I can go through them, toss out any that aren’t relevant, manually add new items, etc.

Obviously, this depends on a regular record naming scheme. Perhaps it will be useful to someone.

Not a bad bit of code but I would consider how you’re going to make it more flexible in terms of the searchString you’re trying to match. (I don’t imagine all your files are named "Jones-2011-15"something. :smiley: )

Very clever approach. I’ve often thought of solving this problem with a script and was just too lazy to try. This is a keeper, Glenn. I know I can use it well. Thanks :slight_smile:

Some small suggestions, I think this

set retrievedRecords to retrievedRecords & (search searchName in current database)

might work better as this

set retrievedRecords to retrievedRecords & (search searchName in current database within titles)

this would keep the search from extending to document contents.

The code looks at the first 3 “words” of each selected document and assumes those 3 words are the index string – this rule could be a weakness for general purpose users, as Jim and Eric also note below. To adjust for the hardcoded limit to searching for occurrence of the first three words, you might replace this

set searchName to words 1 through 3 of recordName as text

to the text returned of a dialog that prompts for the search string.

Finally, you might want to change the error handling. As it is, the routine applies only to the first “try” block and reports only one single error. To make it more general you could enclose all of the logic in a “try” block and let all errors be reported.

I didn’t mean it was hardcoded (though his delimiter is. Another issue that could be looked at). I am just suggesting that Glenn look at making it even more flexible in terms of string matching. As he noted himself, “Obviously, this depends on a regular record naming scheme.”

It is a nice little bit of code but I am encouraging him to see how much further he can take this. I want the gears to start turning. :smiley:

This looks like a great script for other users if it becomes more flexible, means: works with all kinds of file name structures, e.g. by allowing the user to enter the prefix that is common to everything that should be gathered?

Thank you all for the useful input. I’ll implement some of these changes (like better use of “try”) and post the revised script once I get some time. Several people suggested I look at how to generalize the script. Fortunately, Korm was kind enough to do just that at this posting [url]Gather related items by replicating]. So, rather than duplicate his work, I’ll refer folks to his posting.