De-duping

LDunville · October 23, 2018, 3:24am

I have a reference library of about 5000 engineering specifications and articles. These documents have been both scanned in then ocr’d or downloaded from the web. Files often have different names. Frequently DT detects the duplicate but often it does not.

Is there any way to tell DT that the two files are in fact a duplicate? Any suggestion as to how to handle this issue?

Thanks
Larry

cgrunenberg · October 23, 2018, 7:14am

No, this isn’t possible. Duplicates are automatically detected on the fly by using the indexed data and other metadata. A future release will include an option for a stricter duplicate recognition.

BLUEFROG · October 23, 2018, 2:07pm

See Also should show if there’s a tight relationship between the files, even if they’re not marked as duplicates. The score would likely be a full green bar.

LDunville · October 23, 2018, 5:08pm

If “see also” yields some files that are actually dups. What action can I take?

Thanks
Larry

BLUEFROG · October 23, 2018, 5:50pm

This blog post covers dealing with duplicates: blog.devontechnologies.com/2014 … evonthink/

You can also right-click items in See Also and use the various context menu items.

russelmontgomery · December 29, 2018, 2:23pm

Hi

I have a related issue that I would like some feedback on.

I have a few years worth of bank statements stored in a database. Bank statements are all very similar in layout. DT is labelling files as duplicates that are not. I think it is because they are so similar.

The issue is that the genuine duplicates are lurking in among the non-duplicates that DT is falsely labelling as duplicates and too hard to find.

Will a future release of DT provide users with a way to more tightly control how DT defines duplicates? Or is there something in the current release that I have not noticed?

Thanks for any help you can provide.

BLUEFROG · December 29, 2018, 4:50pm

A future version will have a more strict duplicate detection option. Thanks for your patience and understanding.

russelmontgomery · December 30, 2018, 6:24am

Thank you for your answer.