Hi,
Sorry if this is a repeated question but I want to find the best ‘practice’ to my searches.
Well, most of the documents in my database are in non-English language. So you can already see that the main problem is in the accented characters…
Scenario 1 - searching for a word without an accent:
a) If I use the toolbar search, I get the list of results and, when selecting a particular document, the inspector search identifies the results within that document;
b) If I want a simple search inside one document, then I use directly the inspector search.
Scenario 2 - searching for a word with an accent:
a) If I use the toolbar search I need to enable the option “Ignore Diacritics”. Then I get the list of results and when selecting a particular document, I can’t identify that word in the document because the inspector search not support the “Ignore Diacritics”.
b) If I want a simple search inside one document, well… not possible! (if I write the word without the accent, the search won’t detect it either)
My main question: How to do the search and visualize the results in these cases?
Thanks.
Tested with égalité
, it does work in a Inspector search when Ignore Case
is enabled.
You’re right, I expressed myself wrong.
In this case, I meant to say that, unless the word is spelled exactly (with accents), then I don’t get any results (using your example, if I type egalite
or egalité
, no results will appear).
Could you please try to replace every accented character with a ?
Wildcard?
Great! That works! Thank you so much.
Now I can do a search for “Lukács” (using Luk?cs
), including the results of the examples that omit the accent!
Thank you!
Using a ?
Wildcard works but it’s not really a clean solution as it includes matches we don’t want.
This script replaces decomposable characters with a wildcard that includes the composed character and its base character. See the difference in the capture.
Caveats:
-
Option “Operators & Wildcards” must be enabled in the Search Inspector (couldn’t do it via script as it seems it’s not possible to enable it via UI scripting)
-
The script replaces the current Search Inspector query via UI scripting which means it’s possible that the script fails. In my testing it worked fine, if it doesn’t work for you or someone else let me know.
-- Replace decomposable characters with wildcards (in Search Inspector)
use AppleScript version "2.7"
use framework "Foundation"
use scripting additions
tell application id "DNtp"
try
if not (exists think window 1) then error "Please open a window"
set theWindowClass to (class of think window 1) as string
on error error_message number error_number
if the error_number is not -128 then display alert "DEVONthink" message error_message as warning
return
end try
end tell
activate application id "DNtp"
tell application "System Events"
tell process "DEVONthink 3"
try
if theWindowClass is in {"viewer window", "«class brws»"} then
set theTextField to text field 1 of splitter group 1 of window 1
else
try
set theTextField to text field 1 of splitter group 1 of window 1
on error
set theTextField to text field 2 of splitter group 1 of window 1
end try
end if
set isEnabled to value of attribute "AXEnabled" of theTextField
if isEnabled = true then
set theQuery to (value of theTextField) as string
if theQuery = "" then
my displayAlert("Please enter your query in the Search Inspector")
return
else
set focused of theTextField to true
key code 36 -- save old query
set theQuery_withWildcards to my replaceDecomposableCharactersWithWildcards(theQuery)
set value of theTextField to theQuery_withWildcards
set focused of theTextField to true
key code 36
end if
else
my displayAlert("Please open a preview")
end if
on error
my displayAlert("Please open the Search Inspector")
end try
end tell
end tell
on replaceDecomposableCharactersWithWildcards(theQuery)
try
set theString to (current application's NSString's stringWithString:theQuery)
set theString_decomposed to theString's decomposedStringWithCompatibilityMapping()
set theCharacterSet to current application's NSCharacterSet's decomposableCharacterSet()
set theList to {}
repeat with i from 1 to (theString's |length|())
set thisCharacter to (theString's characterAtIndex:(i - 1))
set isDecomposableCharacter to (theCharacterSet's characterIsMember:(thisCharacter)) as boolean
if isDecomposableCharacter = false then
set end of theList to (character i of theQuery) as string
else
set thisCharacter_Range to (theString_decomposed's rangeOfComposedCharacterSequenceAtIndex:i)
set thisCharacter to (theString_decomposed's substringWithRange:thisCharacter_Range)
set thisCharacter_withoutDiacritics to (thisCharacter's stringByApplyingTransform:(current application's NSStringTransformStripDiacritics) |reverse|:false)
set thisCharacterWildcard to ((("[" & thisCharacter as string) & "|" & thisCharacter_withoutDiacritics as string) & "]") as string
set end of theList to thisCharacterWildcard
end if
end repeat
set d to AppleScript's text item delimiters
set AppleScript's text item delimiters to ""
set theQuery_withWildcards to theList as text
set AppleScript's text item delimiters to d
return theQuery_withWildcards
on error error_message number error_number
activate
display alert "Error: Handler \"replaceDecomposableCharactersWithWildcard\"" message error_message as warning
error number -128
end try
end replaceDecomposableCharactersWithWildcards
on displayAlert(theMessage)
tell application id "DNtp"
activate
display alert "Script: \"Replace decomposable characters with wildcards\"" buttons {"Ok"} default button 1 message theMessage as informational
end tell
end displayAlert
Thank you very much for the availability and the script!
It took me a while to figure out what was at stake because in reality I am a ‘dumb’ when it comes to scripting. I am a little basic in these matters…
But I managed to create the script, put it in the DT scripts folder and try it.
And it seems to work. In reality it is quite accurate.
But in reality, sometimes there are variants in which the solution with the ?
wildcard can be more advantageous (even admitting the inclusion of some unwanted matches). Taking the example of Lukács
, in some articles the accent used is different (Lukàcs
or Lukäcs
), which is a big mess. In these cases, even with a lot of ‘noise’, the search with the ?
wildcard is more inclusive.
Once again, thanks.