Existing tags disappear when removing duplicates

I am running into the problem of having duplicates

Here is my setup:

  • I am trying to manage around 20k research documents as PDF. My application is research: a paper can be relevant for different projects or classes.
  • I like devonthink3’s ability to model hierarchical tags. My current tag hierarchy is up to 3 levels deep.
    Example:
    Current > Some course > Course topic 1
    That will list paper relevant for the “Course topic 1” of “some course” in preparation.
  • I have been using “Expression Media 2” for that purpose for the last 10+ years. It is a photo tagging application that I have found years go, repurposed for my task, yet it is not supported on Mac with 64 bits anymore. I pushed it off for a long time to upgrade from macOS Mojave 10.14 so far. I looked around and Devonthink3 can support what I need (see above).
  • I have all my papers in one single folder (actually I do have a few subfolders but that is not relevant for the question here). I do not want to important all my 20k papers into devonthink3’s database and thus decide to just index them.
  • An alternative I looked at is called “Leap”. It is more lightweight (I don’t need most of devonthink3’s functionality). However it cannot do hierarchical tags.

Here is how I envision the transition:

  • So I now have still a hierarchy of say 100-200 tags in this other system. I chose to “index” all my 20k papers. Now I want to carry over the old hierarchy to devonthink3 by selecting all the papers say with the tag “Course topic 1” in the finder (or the old tool) and dragging them into the new “Course topic 1” tag in devonthink3. I am holding CMD + ALT to avoid the default of importing the PDF
  • That creates “duplicates”. So in the devonthink3 database I now have the dragged pointers both in the inbox, and under the smart group “Duplicates”
  • At that moment both duplicates (the earlier original one and the newly created one) are tagged with “Course topic 1”. So far so good (although I don’t understand devonthink3’s duplicates)
  • Now the problem comes: I need to get rid of the duplicates.

What is the problem:

  • Going to the “Duplicates” smart folder, Script > Data > Move duplicates to trash
  • Duplicates appear and the remaining ones appear to still have the tags. But that is not the case. The folder (or another tool called Leap) sees that the original tags were also removed during the deduplication
  • Devonthink3 is confused and still shows the papers under the entry
    Current > Some course > Course topic 1
  • However, clicking on any of those PDFs with preview enabled, deletes the tag from the PDF (again, Folder already before saw the tag deleted, just devonthink3 was confused, thought it still had the tag, but then interacting with the file in any way makes deveonthink3 realize the tag is gone

Question

  • Does deveonthink3 have any possibility to deduplicate duplicates without removing all the tags that a files was tagged with

I have been playing with devonthink3 now for a day or two. If that import of my hierarchies can be solved I think I take the plunge. Finally…

Thanks!

Welcome @wolf

Did you index the hierarchy then index some of the same files into a tags group?
If so, why?

Thank you for your response!

Yes, I think that is what I did in DT3 nomenclature.

The reason is that I want to copy the existing nested categories (equivalent to hierarchical tags, but living inside this legacy tool that I like to replace) into DT3. That is the legacy work I need to preserve to again be able to find PDFs related to a given topic (a big portion of the 20k PDF files categorized into multiple overlapping (thus like nested tags) nested hierarchies.

The only way I can think of achieving this was to:

  • recreate the hierarchy inside DT3
  • pick all PDFs that are in a category
  • an dragging them into the equivalent category in DT3
  • repeat for all tags

Since the same PDF can appear in multiple categories (= under multiple tags), this creates duplicates in DT3 nomenclature.

Do you see any way to either achieve the same goal (replicating the tag hierarchy) in a different way, or to remove duplicates after the fact without deleting the associated tags?

Thanks!

BLUEFROG, do you have any thoughts on how to else achieve the transfer?

Or how to deduplicate without loosing tags?

Thanks!

Actually the script just moves the duplicates to the trash, this shouldn’t affect the remaining items at all. A screenshot of the tags before/after using this script would be nice, thanks.

Having multiple tags on a file does not duplicate it.

If you dragged the same file onto multiple tags will create duplicates but it’s also not the way tags should be assigned when indexing or importing from the Finder.

I would suggest you delete the indexed group in DEVONthink and empty the Trash. If this is the indexed parent group - and it should be - you should be prompted to only remove the references in the database.

Then index the parent folder again but do not drag it to a tag group. Index it to the database. After the files have been indexed, you can drag the files in the database to a tag group to tag them.