What's Considered a Duplicate?

I made a PDF of a Word document, then brought both the PDF and the DOCX file into DEVONthink, where they both show as duplicates of each other. In one sense they are duplicates, in that when viewed, their text content is the same and their appearance is similar. However they have different file sizes, and certainly differ in their MD5 hash or CRC checksums. What is the basis for their being shown as duplicates?

The files size and checksums are not considered. The content is the same, therefore they’re considered duplicates.