i still have a problem where dt3 does not recognize duplicates. i’ll ocr two copies of the exact same document and get different word counts, or sometimes they have the exact same word count and don’t register as duplicates. does the word count include metadata? right now that’s my theory on why it doesn’t work (if one doc includes, say, author’s name in the metadata but another doesn’t). if it does include metadata, is there a way to have it not do that? what about documents that were in dt2 and later added to dt3 - would there be a difference between them that would make it so Dt3 would be less likely to catch the duplicates there?

i exported 10k dt3 documents and ran them through another program that catches duplicates. they found 3k duplicates that dt3 hadn’t found. wondering how to address this. (no, stricter recognition of duplicates is not checked on my comp)

No, the metadata doesn’t matter, only the text of the documents. Is a conversion to plain text of both documents actually identical?

Are you referring to copies of the same original scan or scanning the same document twice?