Hereâs my (hopefully) complete example. Many thanks to everyone on the forum that helped to contribute to a working solution.
Use case
Automatically rename, set creation date, move a scanned and OCRâd document in a DevonThink Inbox via a Smart Rule. The document text has an irregular format with numerous dates in various formats with only one date being the desired date that stifles and/or confuses thereby preventing the use of the in-built Scan Text
Date
functionality.
The resulting name will be in the format: YYYY-MM-DD - Vendor
Text snippet of the target to extract:
Please send payment to address on reverse side - do not staple
ACCOUNT NUMBER
Bill Date
Sep 9, 2020
AMOUNT DUE
128.20 #IP#09#
There are no less than 10 dates in 3 different formats preceding this snippet 
Requirements
- a reusable library that supports PCRE (full featured regular expressions) that command-line tools such as
sed
and grep
lack
- ability to capture multiple fields of data with much more complex regular expressions that this simplistic example
- multi-line matching
Solution
Create the external library in order to use AppleScript+ObjC Foundation framework
If it doesnât already exist, create the folder ~/Library/Script Libraries
- AKA /Users/your-user-name/Library/Script Libraries
. This can be done in the shell CLI with mkdir ~/Library/Script\ Libraries
.
Canât find the Library folder? How to Always Show Library Folder in MacOS
Using Apple Script Editor create the library file named DevonThinkLib
with the following contents.
use AppleScript version "2.4"
use framework "Foundation"
use scripting additions
-- returns a list of regex captures
on regexFindCaptures(theText, thePattern)
try
set aString to current application's NSString's stringWithString:theText
set {theExpr, theError} to current application's NSRegularExpression's regularExpressionWithPattern:(thePattern) options:0 |error|:(reference)
-- execute the regex
set theMatches to theExpr's matchesInString:aString options:0 range:{0, aString's |length|()}
-- extract the captures from the regex matches
set theResults to {}
repeat with aMatch in theMatches
set theRange to (aMatch's rangeAtIndex:1)
set theString to (aString's substringWithRange:theRange) as text
if theString is not in theResults then
set end of theResults to theString
end if
end repeat
return theResults
on error error_message number error_number
activate
display alert "Error: Handler \"regexFindCaptures\"" message error_message as warning
error number -128
end try
end regexFindCaptures
Save/move this script into the ~/Library/Script Libraries
folder.
Create the external script for the Smart Rule in DevonThink 3
Using Apple Script Editor create a new script named Parse regex test
with the following contents.
-- set to true to enable testing harness in script editor
set debug to false
-- for dev/testing in Script Editor
if name of current application is "Script Editor" then
tell application id "DNtp"
set sel to selected records
my performSmartRule(sel)
end tell
end if
-- the smart rule handler
on performSmartRule(theRecords)
tell application id "DNtp"
repeat with theRecord in theRecords
set theText to plain text of theRecord
set theRegex to "Bill Date.*\\n(\\w{3}\\s\\d{1,2}[,\\s]+\\d{4})"
set theMatches to script "DevonThinkLib"'s regexFindCaptures(theText, theRegex)
set theDateString to item 1 of theMatches
try
set theCreationDate to script "DevonThinkLib"'s parseDate(theDateString)
if creation date of theRecord is not equal to theCreationDate then
set creation date of theRecord to theCreationDate
end if
on error error_message number error_number
activate
display alert "Error: Handler \"setting creation date\"" message error_message as warning
error number -128
end try
end repeat
end tell
end performSmartRule
and save/move the file into the <sarcasm>conveniently named and easily accessible</sarcasm> folder ~/Library/Application\ Scripts/com.devon-technologies.think3/Smart\ Rules/
Create the Smart Rule in DevonThink to use the scripts
- All of the following are true
- Content Matches
123-accountnum-34 AND vendor
- Kind is
PDF/PS
- Word Count is not
0
- Perform the following actions:
On Demand
- Change Modification Date Current Date
- Execute Script External Parse regex test
- Change name to Short Creation Date - Vendor
- Move to Group/folder of your choice
Test
Select a document, right-click, Apply rule, select the rule and with luck the magic will happen.
Finale
<sarcasm>
Stand back, review your script and, marvel at how friendly and readable your new English-like script and library syntax is. Nod knowingly that AppleScript has made automation and programming easily accessible to everyday people such as yourself while confounding many, many software engineers for nearly 3 decades (as of 2021). After-all, it is this accessibility and openness that had led to Apple hiding the Library folder in the first place; to help you, help yourself. In hind-sight, the solution was down-right obvious in its pure simplicity.
</sarcasm>
Your mileage will most certainly vary.
I sincerely hope that this above example helps others avoid banging their head against the wall. Copy/paste, modify, expand to you needs. Please feel free to comment with bugs, errors, corrections as I inevitably have injected typos or omissions or other shortcomings whilst putting this together.