Duplicates appearing?

We might want to continue this discussion directly until we have a solution to share

  • Are these files you’re actively editing?
  • What file types?

Now this is interesting. I’ve been using a smart group to keep an eye out for new duplicates and using the “Move Duplicates to Trash” script as necessary. (Every few days as I notice them) I had about five this morning but none yesterday. This is on my main Mac.

Trying to answer your questions, and perhaps to trigger an event I just fired up the work laptop and allowed it to sync (Bonjour). Realizing that the smart groups don’t sync, I gave the laptop a “Dup-detector” as well.

211! So clearing the duplicates from one database doesn’t clear them from sync’d copies. (Perhaps that is unsurprising to you.)

They are of all file types and a surprising range of modified dates. Some of these I haven’t touched since bringing them over from Evernote years ago. And yet, these represent only the smallest fraction of the number of documents that I have indexed.

A minor annoyance is now a baffling mystery.

Oh, the laptop was running 3.8 but I’ve upgraded to 3.8.6 now so we have partity between the two Mac.

When exactly did you update the notebook from 3.8 to 3.8.6? Before or after noticing the duplicates? In addition, are all duplicates indexed?

Saw the duos on the laptop while it was still running DT 3.8. Then I upgraded to 3.8.6

All of my documents are indexed.

On the bus to work now, and forgot to check my main Mac this morning. The status of the laptop is unchanged from yesterday.

The issue might be related to the old version. Does this still happen after ensuring that all devices/computers run the latest version of DEVONthink (To Go)?

I’m away from home for the next several hours, but I can zap the duplicates on the laptop and monitor both Macs for recurrence. Thanks.

No dups at this point. I’ll monitor and let you know.

Okay, more strangeness. This morning I checked my desktop and found no duplicates. I fired up the laptop, which I had just de-duped yesterday and it had a new duplicate! (Which then synced via Bonjour to the desktop)

So duplicates will sync between systems but trashing duplicates is not synced.

A final oddity: the two duplicates have slightly different names: “PSUR/PBRER” vs “PSUR-PBRER” (the former is what I orignally named the file. A couple of weeks ago.)

Both systems are running 3.8.6.

You should not use colons or slashes in filenames. DEVONthink converts those illegal characters behind the scenes.

Okay, that makes sense in retrospect. (Though I note the offending entry is still present) I’ll follow that guidance in the future.

But I cleaned the dups off of this machine yesterday, and now we’ve got a new situation.

Aha! I just used the script to trash the duplicate and it left the version with the offending character.

I wildly speculate that DEVONthink has agents patrolling the databases looking for trouble, and when they see something like an illicit character in a file name it creates a duplicate with a corrected name, but leaves the original lying around. When I clean up with the script, it deletes the newer “approved” copy and leaves the original. And the cycle repeats. But not quickly.

I don’t know if this explains every situation that generates duplicates, but it does some and I can remove the duplicates more permantently going forward.

Does all this sound feasible?

Is this an indexed file? Is the path of the two items identical?

Yes, all of my items are indexed.
Yes, both of the duplicates appear in the same group.
Since yesterday, no new duplicates have appeared.

Oh, I need to correct something I stated yesterday: when I used the script to remove the duplicate “PSUR/PBRER” & “PSUR-PBRER” it left “PSUR/PBRER” in DEVONthink. But when I navigate to that group in the Finder the actual file left is “PSUR-PBRER”, which I expect is correct behavior.

Still weird that it happened in the first place. Might be an unexpected result of DEVONthink finding the file with the “/” in the name and correcting it to “-”. It changes the file, which is picked up by the index. But the original DEVONthink record with the “/” is left untouched.


Item names and filenames are actually independent and item names can contain almost any character contrary to filenames.

And that is good. It is how I would want things to work.

But it seems that we are leaving the original record in place. Or, that we are allowing a second record to be created with the “sanitised” file name.

That seems to be the cause of some of the duplicates we are seeing. We only need one record for the file.

Are you able to reproduce this?

I have set up a test to try. Based on previous observations, I expect there to be some significant latency before a duplicate is created. If a duplicate is created.

I’ll update later.

Thanks for the diligent assistance on this! :slight_smile:

Bad news guys, another wrinkle on duplicates.

This morning I created a single “test/document.md” and used the finder to drop it into an indexed folder. DEVONthink picked it up right away, and left the “/” alone. For the next few hours, no duplicate. (Perhaps. See below.)

Then I went out for a few hours. I returned and checked the desktop machine. No duplicates. Then I fired up the laptop, which had been sleeping. The indexed file had sync’d over. Still no duplicates.

Until a short time later:

Note that it left the “/” character alone, blowing my favorite theory. Also note that the second document was added 32 minutes after the first one. I’m pretty sure that at 7AM I was still not seeing any duplicate. But I’m beginning to doubt myself now.

Here’s the single file in the Finder:

I’m rapidly running out of potential explanations.

I am sure a list of characters to avoid in a macOS file system is documented somewhere but not handy for me to look now. But i am pretty sure that forward and backward slashes in the file name are to be avoided.

That’s right.
We’re experimenting about how those characters might trigger duplicate records in DEVONthink.

Where is the indexed folder in the Finder?