I just solved an interesting problem. I am mostly uploading OCRed PDFs into Devonthink3. I did a search yesterday, and I knew that some of the documents were missing from the results, because I knew the search terms were in there.
I had no explanation.
I rebuilt the database - no luck.
Then I copied and pasted some text out of the pdfs that didn’t come up in the search and received… gobbledygook. That means, instead of words that I had copied, I pasted completely mangled groups of letters.
I first thought DT3 had done something to these documents on import. The words were OCRed, selectable, perfectly clear and readble, but somehow selecting, copying and pasting was not doing what it should do.
I duplicated my process and found that the source files were ok.
BUT I run every pdf through a compression program to decrease the size (PDFSqueezer) before putting it in DT3.
Apparently, some pdf documents - not all - come out corrupted. The text is still readable, but when you copy and paste the text, you receive … well… nonsensical garbage.
Of course, these garbled words will not show up in searches.
I put the uncompressed file in DT3 and - voila - it came up in a search.
So this is NOT a problem with DT3, but you should be aware that if you use pdf compression programs, you should test the result.