Thanks Eric.
Yes, reindexing finished. I just did it again and saw it counting down to zero from 8000+ items.
Yesterday, I also deleted the app and reinstalled. I chose CloudKit sync, because the Mac finally finished that (I used iCloud legacy before). After the reinstall, the number of PDFs without text increased to 138.
After the reindexing today, it still shows 138 PDFs without text.
Text from the PDFs is found on Mac in DT with a standard search.
The documents were added to DT on Mac on 2020-10-01, 2020-12-09 and 2020-12-06.
Further tests in DTTG reveal some reproducible behavior.
Selecting text from one of those listed PDFs works and can be copied. Pasting that copied text into the search field within the PDF, reveals the text and page as a result.
However, from the global search field – when the PDF is not opened in DTTG – that same copied text pasted into the global search and pressing enter reveals no results. Selecting a shorter phrase from the document, like a name and just 3-4 words instead of 1-2 sentences, finds the PDF.
This revealed that there might be an unrelated/additional issue in DTTG with the global search, especially when punctuation is part of the search string.
For example, searching the “Hypothesis” PDF I sent you for “Although the mechanisms involved in memory are still debated, they seem” reveals zero results from the global search, but inside the PDF, this phrase is found.
Now, when searching for “Although the mechanisms involved in memory are still debated they seem” or “Although the mechanisms involved in memory are still debated*
they seem” (with an asterisk in place of the comma), the PDF is found via global search, indicating an issue with punctuation.
The same thing happens when searching for “University of Debrecen, Hungary” (no results) and “University of Debrecen*
Hungary” (PDF is found).
Since this PDF was listed as without text in the smart group, I had to download it first in DTTG to test the things described above. Curiously, after doing this, I noticed that the number of PDFs without text decreased. Downloading all those files emptied the smart group, but the number remained.
Then, removing and re-adding the smart group also updated the number to 0.
Summary:
The entries in the PDFs without text smart group needed to be downloaded first for DTTG to recognize that they in fact contain text. Since I don’t download most of my files, there might be an issue with some documents in DTTG (<2% of all my files), falsely not being recognized as containing text.
Testing this revealed a possible issue with global search not finding text that searches within an opened PDF find. I can reproduce this when punctuation is part of the search string.