BE AWARE! PDF Search in DevonThink can be off when pdfs are compressed!

I just solved an interesting problem. I am mostly uploading OCRed PDFs into Devonthink3. I did a search yesterday, and I knew that some of the documents were missing from the results, because I knew the search terms were in there.
I had no explanation.
I rebuilt the database - no luck.
Then I copied and pasted some text out of the pdfs that didn’t come up in the search and received… gobbledygook. That means, instead of words that I had copied, I pasted completely mangled groups of letters.
I first thought DT3 had done something to these documents on import. The words were OCRed, selectable, perfectly clear and readble, but somehow selecting, copying and pasting was not doing what it should do.
I duplicated my process and found that the source files were ok.
BUT I run every pdf through a compression program to decrease the size (PDFSqueezer) before putting it in DT3.
Apparently, some pdf documents - not all - come out corrupted. The text is still readable, but when you copy and paste the text, you receive … well… nonsensical garbage.
Of course, these garbled words will not show up in searches.
I put the uncompressed file in DT3 and - voila - it came up in a search.

So this is NOT a problem with DT3, but you should be aware that if you use pdf compression programs, you should test the result.

That happens with PDF with DRM and PDFs generated with some tools, like some virtual PDF printers in Windows.

And it is not related to macOS. Even pure Linux software, like Okular (that can be set to ignore DRM restrictions), have that issue.

Sometimes the only way to resolve that is pass PDF by a “dumb” OCR like Cisdem PDF, that “flattens” all PDF stuff, and then by another “intelligent” OCR tool, like Abbyy Fine Reader Pro with MRC enabled.

BTW, MRC completely fails to show anything in Bug (Big) Sur.