How to find EXACT DUPLICATES and MERGE with pre-review?

As near as I can tell, searching for “instance is duplicate” means “find anything that may be somewhat related in some way”, and then throw the list up in a window. Or perhaps by analogy when I would be trying to find two needles in a haystack, the search is the same as asking DTP to … “Show me the barn, every cow, all the strands of straw, and the field nearby too”.

How do folks handle the desire to find sets that show only EXACT DUPLICATES using DTP? Is this something that DTP is not designed to do (e.g. do I have to use some other app)? Is this something that requires extra “tricks” (e.g. do I have to use an AppleScript)? Is this something that I will be told is to be found when I would just RTFM (and I have bought and read the Getting Started with DT2 book already to no real avail)?

(Why does the word “duplicate” in DTP not mean what it means in the real world–the same; a copy? Did we forget that “similar” is fine when it comes with adverbs but duplicate is easier to understand when it stands by itself? Is something ever “less duplicate” than a “duplicate”?)

My dream is to have this work akin to what is done in Papers or Mendeley. Duplicates are just that (OK, Papers has some comparable glitches to DTP but they are not nearly as maddening). Also, I as I go to MERGE duplicates in either of those two apps, I can review and police for the differences in the documents that are to be merged. Is this level of pre-review also impossible in DTP?


JJW

An optionally stricter duplicate recognition is planned for future releases but you have to use external apps (e.g. FileMerge) to compare duplicates.

Thank you. I look forward to the stricter duplicate recognition. I might add a few remaining points of reference.

  • I would benefit by having a way to associate duplicates with each other visually in groups. As it is now, the list of duplicates (list of similar files) is shown with no immediately clear rhyme nor reasoning behind the order of files next to each other. I refer again to how Mendeley lists duplicates in groups. One can readily determine from the group what files it believes belong to what “duplicate set”.

  • In databases that contain Indexed files especially, I would benefit by the ability to merge the one index to point to all instances and to replace (eliminate future listings of) the multiple instances. As it is now, when I merge, I get a new set and the previous duplicates remain. I hope this plays off what I understand should be the differences between a file merge at the OS level (e.g. via FileMerge) versus an index merge that could be incorporated into DT.

–> Possible Bug Report: When I merge a set of documents for the first time using the context menu, nothing happens. When I merge the set again, I end up with TWO sets of “2 merged documents”.

macOS 10.12.6 with DTP 2.9.17 on a MBP showing DTP window on a second monitor in full screen mode


JJW

The optional script “Dupes to replicants” does this (see menu Scripts > More Scripts…)

What kind of documents did you merge and which view did you use?

I appreciate the pointer and will try this. I’d propose this approach might be better appreciated as a built-in option during merge (again as per Mendeley).

Duplicate PDFs viewed in the three pane view.


JJW

Are you able to reproduce this? Over here it’s working as expected.

See this video.

dropbox.com/s/siaobe4mcd18u … e.mp4?dl=0


JJW

The video shows that you are working from a duplicates smart group. The merged documents will only appear in that smart group as duplicates after you have merged them a second time.

Ha! Silly me. OK. That makes sense. …

Although it does not in another sense seem “proper”, nor is it an intuitive design (as proven by my “bang my head against the wall a few dozen times” experience). My reading is as follows: I do something in a place where that thing should be logical (merge duplicates in a list of duplicates). I get no indication that what I attempted has actually done anything. What??? Let me try that again. OK, now I get twice the indication that something has happened. What???

OK, so I have to take EXTRA steps to merge duplicates beyond when I see them in a duplicate list. Hmmmm …???

I hope this provides rationale to consider the current approach, while logically EXACT in nature, as being un-intuitive at best and counter-productive otherwise.


JJW

Or you could switch to a different view. The Three Panes document pane will display only the files that are in the current selection of group(s) in the groups pane. Since in your example you have a duplicates smart group selected, the contents of that group are the only documents that will appear in the files pane.

Try switching to the Split view (I also use the Widescreen option in Split view) and repeat your merge process. The newly created merged document will not only be visible (created in the root of the database), DEVONthink will actually select it in the list of groups and documents in the left panel and preview it in the right panel.

You could also restore the default Recently Added smart group to the Sidebar, and you would get visual confirmation of the merge when the badge count increments by 1. An additional benefit of this is that you could select it in the Sidebar to preview the merge without manually navigating to the merged document in the files list.

Yep. I might try these options. There is enough flexibility it seems to hang yourself endlessly on the one option that does NOT work as one plans. :confused:

As a conclusion here, having other options that do work in a reasonable manner is NOT the same as having an excuse for why an option that can have a confusing outcome has been designed to allow such an outcome at all. I hope this issue might be addressed in a future release.


JJW

In retrospect, when a document is marked as a duplicate even though its filename has been changed, why is a merged document not marked a duplicate of the originals that remain behind?

Ouch. Ouch! Ouch!!! My head hurts.


JJW

The results returned by the smart group search should be consistent with the criteria specified in the smart group. You created a document that doesn’t meet the criteria of the smart group you are working with, yet you are surprised when the smart group returns the results that you specified? As a FYI, you would have seen the results that you expect using the Three Panes view had you merged documents in a regular group as opposed to a smart group.

Seriously? Come on now, you are an academic-correct? A=A and B=B, but A+B≭A or B.

But of course in the case of DT, duplicate means A ~ B or possibly A = A or possibly A >~ B or possibly B = A + \delta A or possibly … After conceding this approach, all bets are off for further arguments in my book. Math is not fuzzy one time and exact the next, whereas “duplicate” is supposed to be???

Ouch!


JJW

The filename is not a consideration in duplicate detection. It is a content / contextual match.