DT3b6 : Erratic search results

Hello,

I am wondering what could be causing the following: I’m doing a global search through a database with a string such as “cheminée tags:s:Archive* scope:workDB” (cheminée means “chimney” in French).

I get five results. I click on each of them with the Search pane open to the right. The string “cheminée” is automatically filled in search field at the top of the pane, but as I advance through each document I don’t see any occurrences appearing below.

If I click on each document, then click in the search field, then press Enter, there are no occurrences either.

So I figure there must be a problem with the accented “é” and enter “cheminee” in the search bar (still in the Search pane on the right). Now if I click on each of the five documents in the search results, click to the right and press Enter, I get some occurrences for the first two documents only, which highlight the words cheminée and cheminées. However if I enter “cheminee” in the search bar — without the accented é — no occurrences are displayed at all, in any document.

If I open all five documents in an external viewer such as PDF Expert, I can find “cheminée” in only the first two documents, like in DT.

So why is this erratic behavior with accented characters, and why is DT coming up with documents in search results where even its own code can’t find the string itself?

BTW, this second issue (with “ghost” results) has happened to me several times already, even with searches not remotely involving accented characters.

Thank you for your help.

The search term is added to the Seach inspector automatically to allow for quicker in-document searching.

If you don’t specify a search prefix, the scope of a term is All. This means cheminée is being searched for in other places than just the text content.

Is Ignore Diacritics or Fuzzy enabled in the search options under the magnifying glass?

I know that. I mentioned it to make it clear that that part was functioning as expected.

What could an “other” place that text content in an imported PDF which has been OCRed but not otherwise edited or added to?

I had already tried all four possible combinations of these two settings, it didn’t seem to have any effect.

Metadata would also be searched. Have you checked the filename, keywords, subject?

Well, I wanted to check that for you but now, using the exact same search term above, I only get 1 search result (compared to the 5 I got before posting, of which only 2 were proper hits, as I described). I can’t provide any explanation on why this is happening, I’ve been working on this database since this morning and haven’t changed anything in these PDFs.

Here are some more details on what’s happening here. Please understand that this is very frustrating:

  • I enter the search terms. I get 5 results.
  • In order to make sure I answer your question correctly, I try all 4 possible combinations of Fuzzy & Ignore Diacritics: I changed one of the options, click in the search field and press Enter. Nothing happens. I see the same results.
  • However, if I change an option, copy my search term, close the search field by clicking the X, then click again inside it, paste the search term and press Enter, I get different search results. This is a very inconsistent behavior from a UI standpoint. This is of course another problem I’ve just discovered which is unrelated to the original post.

So to answer your question properly now, “Ignore Diacritics” doesn’t change anything. “Fuzzy” gives 5 results when on (of which only 2 contain the actual word). When off, it gives 1 result only (one of the 2 which I just described). This is the result regardless of the “Ignore Diacritics” setting.

As to the other 3 results that I get, I checked filenames, metadata, group name and couldn’t find any occurrence of “cheminée” or “cheminee”. I looked at the “Cloud” pane and found that one of the documents had the word “Chemins” and another one the word “Chemin” which should not count as a search hit in any case, diacritics or not. The third didn’t have any word in the cloud resembling “cheminée”.

To sum up, “Fuzzy” seems to not be helpful here because it gives 3 false positives, while turning it off gives only one result when there are clearly two. I’ve checked the PDFs to see if the actual OCR text is bad, but it’s not: copying-pasting it into a text editor shows “cheminée” clearly.

The other issue I mentioned in my original post is unrelated, I think (entering “cheminée” in the search box in the right pane doesn’t show occurrences, but “chemine” does — although it actually highlights “cheminée” as explained).

It might be a good idea — once this is fixed — to show fuzzy results in the right pane when using that option, otherwise the user doesn’t understand why a document was listed in the search results in the first place.

I hope it helps. I’ve tried to be as detailed as possible, I hope this is clear.

Thanks.

ZIP example PDFs - ones that are beig found regardless if they seem to match or not - and attach them to a support ticket, please. Thanks.

I’m sorry but I am not at liberty to do so, these are documents from French archives which I’m not supposed to pass around. I could try and recreate an equivalent scenario from scratch with some text files, when I have the time.

Any ideas on the non-document-specific issues I mentioned (the search bar behavior, the highlighting of occurrences)?

Thank you.

TestDB.dtBase2.zip (29.0 KB)

I’ve created this test database which tries to mimick the behavior above, using plain text files. Unfortunately, it doesn’t. The occurrences are shown on the right as expected. I suppose it has to do with the handling of PDF files but, like I explained, I can’t share those specific ones.

The difference in behavior with my test database as far as the “search occurrences” problem goes is related to the file size. On my test database which only contains small text files the occurrences appear instantly. With larger PDFs this is not the case, as I’ve already mentioned three months ago. May I suggest at least adding a spinning wheel or any other interface element that appears while the occurrences are being populated (something that is available both in Preview and PDF Expert). Otherwise DT gives the impression that it isn’t doing anything.

Any feedback on the search bar behavior issue described above?

Thanks.

The next release will include a progress indicator.

Thank you @cgrunenberg, this is great news.

Please let me sum up the UI issue with the search bar if you haven’t read the entire original post:

  • Enter a search term. Press Enter.
  • Change any of the “Ignore Diacritics” or “Fuzzy” options.
  • Click in the search field and press Enter. Nothing happens — you get the same results.

However, if you do this:

  • Enter a search term. Press Enter.
  • Change an option.
  • Click in the search field. Press Command+C.
  • Close the search field by clicking the X.
  • Click inside it again.
  • Command+V. Press Enter.

You get the new search results which now reflect the modified options.

This is a very inconsistent behavior IMHO and has caused much confusion when @BLUEFROG asked me to try out some settings earlier.

I hope this helps.

These options are limited to the toolbar search, the search field of the Search inspector does neither use nor support them.

I was referring to the toolbar search :slight_smile:

Too many search fields :slight_smile: And which results aren’t updated after changing these options? The database results or the ones in the Search inspector?

The database results. I’m sorry if this is confusing when I sum it up. I believe it’s clearer if you read the entire thread, but it’s a bit long…

TL;DR The search bar issue I’ve just described with two bulleted lists has nothing to do with the Search Inspector. Both issues got connected in this thread when @BLUEFROG asked me to do some tests. I hope it’s clearer now :slight_smile:

Now I got it :slight_smile: The next release will automatically update the results after changing the options.

Two great news in one morning! As always, appreciate your support very much.