Manually forcing "Duplicate files" designator

I periodically have files that are clearly duplicates, but DT does not mark them as such. Can I manually force the dup designation?


Why are they “clearly duplicates”?

Note duplicates either show in blue type or two side-by-side rectangles, depending on the DEVONthink’s Preferences > General > Mark duplicates and replicants in color.

The are dups because they are the exact same file content with a different file name.

I’m a consulting engineer and frequently get the same file submitted to me from a half dozen different people the file originated as an email, Word file, ACAD drawing or Excel file. They normally have been converted to a PDF and usually OCR’d. Each one of the individuals have frequently assigned a different name in the conversion to PDF process. As a result, I’m plagued with a bunch of duplicate content files with different names. DT recognizes these as duplicates about 60 to 80% of the time.

I then convert the dups to replicants so that when I append annotations to the PDF the various files in each of the folders get’s updated notations. Those dups that are not caught by DT end up providing a logistical nightmare for me.

Therefore a way to manually accommodate these “orphans” would be a real time saver.

The problem with duplication recognition is a score of similarity, and usually digital duplicates are 100% identical.

The way you receive files though suggests that each file is a scan from a different person.

While the source (paper) document is the same, the files have been created very differently:

  • Different scanning hardware
  • Different “Created by” metadata
  • Different scanning software
  • Different OCR recognition software and quality
  • Other problems such as lighting conditions, bent paper etc.

So in the end, each file is very different, and DEVONthink simply has to fail that test sometimes, even if the duplicate recognition has some tolerance.

I would suggest creating a tag for dupes so you can find and filter them accordingly. It is not the same as marking it as duplicate, but pretty close.

1 Like

I think you assesement is exactly correct. I have toyed with the idea of either tagging them or copying the known dup file into the secondary folder. Unfortunately this can be a tedious process plus, is the first files is called say “Ford100.pdf” the second identical content file is called “Chevy100.pdf” and I renamed them both to “Toyota100.pdf”, neither party would know what I was talking about when I told them the changes I made to “Toyota100.pdf”.

Again, if I could manually force the “duplicate” designator, I could have my cake and it it too!