Broken words at the end of the line

PDF-generating softwares put hyphens at the end of a line if word needs to be broken into two. The search algorithm in DT seems to consider these broken words as separate.

Here, I have attached a sample page.

sampe_page.pdf (28.8 KB)
Download the file, import it to Devonthink and search for the word “possessives”.

  • How many occurrences does DT recognize?
  • It gives me none.

But, if I search the same word in Acrobat Adobe, it gives me 1 occurrence, because Acrobat is intelligent enough to remove those carriage returns (recognized the word as one, broken due to technical reasons).

I think DT needs to do the same. The results of the co-occurrence and many other systems will be different (more accurate).

DEVONthink uses the PDFkit framework of macOS to index PDF documents. Neither does it handle this on its own nor does it provide sufficient information unfortunately.

1 Like

I encountered the same issue. Words with hyphen for the line break cannot be searched. However, searching the keywords in the same PDF within PDF Expert works well.

Considering numerous bugs of Apple’s PDFkit, I really hope you could consider to switch to an another PDF framework.

1 Like

Considering numerous bugs of Apple’s PDFkit, I really hope you could consider to switch to an another PDF framework.

Considering it is one thing; implementation is another.
This is not some trivial thing to do or we would have done this long ago.

I understand the difficulty. But it’s really causing problems lately.

I tried DT, PDF Expert and Adobe Acrobat, and Adobe Acrobat is the one who found all the results including ones with line breaks.

Truly hope DT could solve this. Because there’re cases in which I missed important results because of this, which force me to seach again in other pdf readers.

Also, instead of switching the whole framework, if there exists any workaround I can implement, please let me know!

Welcome @DaDa

This isn’t a matter of DT solving this. This is an issue what Apple’s framework, not something we developed ourselves.

PS: Acrobat is made by Adobe who invented the PDF format and specification in the first place. It can do many things no other PDF application can do. And PDF Expert doesn’t use PDFKit. It uses its own framework.