Broken words at the end of the line

PDF-generating softwares put hyphens at the end of a line if word needs to be broken into two. The search algorithm in DT seems to consider these broken words as separate.

Here, I have attached a sample page.

sampe_page.pdf (28.8 KB)
Download the file, import it to Devonthink and search for the word “possessives”.

  • How many occurrences does DT recognize?
  • It gives me none.

But, if I search the same word in Acrobat Adobe, it gives me 1 occurrence, because Acrobat is intelligent enough to remove those carriage returns (recognized the word as one, broken due to technical reasons).

I think DT needs to do the same. The results of the co-occurrence and many other systems will be different (more accurate).

DEVONthink uses the PDFkit framework of macOS to index PDF documents. Neither does it handle this on its own nor does it provide sufficient information unfortunately.