Quick question: how to search for a whole word only?

Hi. I’m wondering how to search for a whole word in Devonthink Pro. I looked in DevonThink Pro’s help and I couldn’t find the answer (maybe I missed it):

For instance, I want to search my database for the word “chi” (the Chinese concept of life-force) and I don’t want the results to include other words that contain the letters “chi” such as “Chicago.”

Thanks.

There are two options…

  • Quote it with a space: "chi ". This is the most obvious answer.
  • If you’re using the searchField at the upper right corner of the window, click the magnifying glass and deselect Prefix while typing. Then type chi but don’t quote it. (Prefix while typing adds and asterick (*) to the end of the string, so it matches the string plus anything beginning with that string). This will only return hits from the current database.
  • If you’re using the Full Search window, there is no Prefix while typing option, so you just type chi. However, you’ll also get hits from all open databases unless you restrict the search by clicking the “Databases” button.

Bluefrog, thanks for the help.

I guess I’m still confused about the Full Search window: in the Full Search window, when I just type chi, or chi with a space, I’m still getting results showing any words that contain chi.

So I’m still not clear on how to do a whole word search in the Full Search window.

Thanks.

I don’t believe this is possible as DEVONthink only searches on alphanumeric characters. I would expect in a Full Search that a space added to ‘chi’ would be ignored, unless things have changed and/or I’m overlooking something.

What I described to you is equivalent to a full word search.

Bluefrog, well then there must be something I’m doing wrong. Below is a screenshot which shows that I’m getting all sorts of results when trying to search for just “chi.” Maybe I need some instructions that are a little more detailed. Thanks.

I downloaded of an exercise PDF with hits like “stretching”, “achieve”, and “watching”. This PDF never came up on the results for “chi” (no quotes).

It’s possible your index is out of whack. Personally, I would do a Tools > Rebuild Database… to clean up my DB and have the index rebuilt. And if you’re worried about losing something, you can always do a File > Export > Database Archive first (though I have only done it for Support and have never needed one personally).

I decided not to rebuild the database. I wasn’t sure if it would cause Time Machine to rebackup the entire database – and that particular database is pretty large.

But I did a test with a smaller database of mine. I created some files that have words containing “chi” as well as one file that has the word “chi.”

I used the Search Window and, again, I wasn’t able to search for “chi” as a whole word. All the files with words containing the letters “chi” showed up. So I still don’t know how you were able to get it to work.

However, using the search field in the regular Devonthink window and searching for “chi” with a space after it ("chi ") works just fine for searching within one database. The content in my databases are pretty distinct so I don’t imagine a case when I would need to search among all the databases anyway.

I cannot recall when my databases searches ever worked as you described, so I rebuilt my Inbox database as a test. I have about 40 documents in the Inbox, including a test document and a PDF+Text that I copied into the Inbox as I knew it contained multiple instances of ‘chi’. My results are still, let us say, unpredictable. The text document only returns if it contains chi-Chicago and archive alone are not enough to include the document in the test results.

However, the PDF is returned in the search results when the word ‘chi’ does not appear in the document, but instead words such as archive, achieve, children, childhood, Appalachian, enriching, etc. are highlighted.

As I mentioned earlier, I don’t recall a time when searching behaved differently.

Just to mention another aspect that can affect results. Searching PDF+Text depends also on the quality of the OCR. If the OCR engine you used interpreted “chi” as “ch1” then it will not be found. Your eyes will see “chi”; the computer will not, since it only searches the text layer created by OCR.

To check quality, select a PDF+Text and convert it to Plain Text. That will extract the text layer (created by the OCR process) and you can see exactly what the search space looks like.

Interesting stuff… “Not all PDFs are created equal”, so we are going to have to look deeper into this. Like I said, the PDF I imported last night on Yosemite, never got any hits because there were no explicit instances of “chi”. I just imported it into a DB on Mountain Lion and it’s still not found.

I’m going to make sure Criss sees this.

I just converted the PDF found above to text as korm suggested. Now the search for ‘chi’ finds 3 documents, with the third being the newly-created text file. However, the only highlighted word in the text file is ‘Chi-’ where Chinese was hyphenated for a line break. None of the instances of archive, achieve, children, childhood… were highlighted. After I edited the text file, changing Chi-nese to Chinese, the text file was no longer found in the search for ‘chi’. Why then is DEVONthink returning/highlighting ‘archive, achieve, children, childhood…’ in the PDF but not in the text file?

Going one step further, I duplicated the PDF in question and then redacted the Chi-nese text using PDFpenPro. I then converted that PDF to text and ran the search again. I get both PDFs in the search, with archive, achieve, children, childhood… highlighted and the original text file (that contains the text Chi-) created by converting the original PDF. The text file created by converting the redacted PDF does not appear in the search.

May be a “quick question” :laughing: but what a very interesting topic! (Posting this so I get notifications of contributions).