How DT search link handle symbols and punctuation marks?

This search link can find and highlight the search string in the file:
[ (H.Li et al., 2008: 1  P.009e) ](x-devonthink-item://CBE0CBC1-581B-4D97-AB6A-2B3B610FAC07?search=This%20paper%20examines%20the%20role%20of%20affiliation%20with%20the%20ruling%20Communist%20Party%20in%20the%20operation%20of%20private%20enterprises%20in%20China)

But this doesn’t. The difference is the period “.” at the end of the search string, which exists in the original text.
[ (H.Li et al., 2008: 1  P.009e) ](x-devonthink-item://CBE0CBC1-581B-4D97-AB6A-2B3B610FAC07?search=This%20paper%20examines%20the%20role%20of%20affiliation%20with%20the%20ruling%20Communist%20Party%20in%20the%20operation%20of%20private%20enterprises%20in%20China.)

The question: does that mean we need to make sure that there are no symbols or punctuation marks at the end of the search string when constructing a search link in AppleScript?

Thank you!

What kind of document is referenced? This shouldn’t matter actually.

Just an OCRed pdf.

The actual copy and pasted text from the pdf is normal:

“This paper examines the role of affiliation with the ruling Communist Party in the operation of private enterprises in China. Using a nationwide survey of private firms, we find that the Party membership of private entrepreneurs has a positive effect on the performance of their firms when human capital and other relevant variables are controlled. We further find that Party membership helps private entrepreneurs to obtain loans from banks or other state institutions, and affords them more confidence in the legal system. Finally, we find Party membership to be more important to firm performance in regions with weaker market institutions and weaker legal protection.”

It’s not a single case, this happens to all of my pdf files.

Thanks for asking.

Another example is when the word at the end of search is containing ():

This won’t work:

[ (Lounsbury, 2007: 1  P.173) ](x-devonthink-item://A2E404DD-0263-4850-9B74-5645A70C539B?search=At%20the%20organization%20level,%20logics%20can%20focus%20the%20attention%20of%20key%20decision%20makers%20on%20a%20delimited%20set%20of%20issues%20and%20solutions%20(Ocasio,%201997),%20leading%20to%20logic-consistent%20deci-%20sions%20that%20reinforce%20extant%20organizational%20identities%20and%20strategies(Thornton,%202002))

This will work

[ (Lounsbury, 2007: 1  P.173) ](x-devonthink-item://A2E404DD-0263-4850-9B74-5645A70C539B?search=At%20the%20organization%20level,%20logics%20can%20focus%20the%20attention%20of%20key%20decision%20makers%20on%20a%20delimited%20set%20of%20issues%20and%20solutions%20(Ocasio,%201997),%20leading%20to%20logic-consistent%20deci-%20sions%20that%20reinforce%20extant%20organizational%20identities%20and%20strategies)

A copy of the PDF would be useful.

PM to you. Thanks again.

Thanks for the files, I’ll check this.

The first issue is that %20 is missing right before (Thornton, the second issue is one of DEVONthink as the string search is anchored, for example that a search for “the” does not match “therefore”. The next release will improve this.

Hmm (1) I create this link by scripting to replace all space with %20 so wonder why it happened - will check the script BUT (2) how about the “.” in post#1 - with “.” at the end the search link doesn’t identify the text in the pdf?

See above, that is the second issue that has to be fixed.

Please allow me to ask a stupid question coz I don’t see how the above issue is related to “China” at the end of search string works but “China.” doesn’t when the original text in the pdf contains “China.”. Only when you have the time… Thanks again.

The anchored search was intended for text without punctuation, therefore the second example doesn’t work.

If I understand your explanation correctly, the end of a search string shouldn’t include symbols and punctuation marks (for now)? Perhaps I don’t understand the term “anchored search”. If you can show me a reference link I can study the meaning and implication by myself ( I tried googling “anchored search” but only find “anchored text” which is irrelevant… Thanks again.

That’s right and as the next release will fix this you shouldn’t worry about the details :slight_smile:

1 Like

Thanks!

And after more search, I think the term is related to regex. I guess that means search string in DT search link is using anchored search/regex and ignore punctuation marks in the plain text of an OCRed pdf. So the search string with ending of “China.” is looking for a match of all words (w/o punctuation marks) in the string but anchor the end with “.” - so it won’t find the match.