How can I restrict and redo the duplicates smart group?

After initial indexing the Duplicates smart group has 32,320 items in it. However, most of the files I can see are NOT, in fact, duplicates. Most don’t even share the same file name.

How can I adjust and then re-run the duplicates group to show me only truly duplicate files?

Filenames don’t matter, only file contents. You might want to enable the stricter reocognition, see Preferences > General.

1 Like

I did find that setting and set it to stricter. Upon reboot, it might be doing something, number is now down to 32,282. External disk is spinning, RAM use is at 3.5GB. I’ll report back later.

What kind of files did you actually index?

basically everything - total approaching 200k files.

I have now split everything into three separate databases. Moving things around temporarily maxed out my 8GB of ram (MacMini M1) but hoping this helps resolve some issues.

Also, FYI, I have indexed, not imported. I have other apps that need access to the majority of these same documents, and I don’t want/need copies of everything inside DT.

After splitting I closed 2 of the 3 databases as I am trial-ing DT primarily to see how it might help me declutter my genealogy records. Having only one db open has dropped my RAM usage for DT down to around 2GB - much better!

Please clarify what you mean by this. Are you referring to indexing any kind of file or do you mean you’re indexing your home directory, entire hard drives, etc.

Also a screen capture showing the duplicates, including their kind column could be useful.

Two main directories were indexed:

  1. my username folder
  2. my Google Drive folder

Screen capture will follow shortly

  • my username folder

This is not recommended. You shouldn’t put entire home directories into a DEVONthink database. See…

my Google Drive folder

Following along with the thesis of the linked blog post, you should be judicious in what you put in your databases. If you have specific folders you’d like to include in your databases, indexing those is suggested over just adding entire directories of uncurated data.

As has been noted often by us, DEVONthink is not a Finder nor a Spotlight replacement.

1 Like

After removing everything that isn’t documents or files, etc. I now have these three databases. Quick inspection says that the duplicates have a lot of false positives, but I’ll need to clear out the actual duplicates before I can be sure of the numbers.

SCR-20240201-ldpv

3893 duplicates

SCR-20240201-ldrx

30 duplicates

SCR-20240201-ldkc

1292 duplicates

Removed how?
If in DEVONthink, they’d be in the databases’ Trash.
The Trash needs to be emptied as that’s a location in each database. However… you may have put yourself in a bad spot if you cherry-picked things to put in the Trash.

Please do not proceed. Open a support ticket and we can guide you to safety.

PS: Did you read the In & Out > Importing & Indexing section of the built-in Help and manual before you started?

1 Like

I’ll be honest, as a former IT career guy, I’m not big on manuals. Any tool that requires reading manuals to avoid losing data is not intuitive enough for me. Reading manuals reminds me of work, and I didn’t retire early to keep working…

In other words, it’s not you, it’s me.

So yes, I did manage to delete some things I did not intend to. Fortunately these were all on Google Drive so all I had to do was take anything put in the trash today and restore it from Google Drive’s trash folder. The files are all syncing back to my hard drive. No harm, no foul.

In reading the Importing & Indexing section it is now obvious to me that DevonThink does not handle files in any way that will be helpful to me. Whatever I hoped it would do with regard to finding related files isn’t worth the negatives for me. Nor does it seem to work well with 8GB of RAM, unlike every other app I have installed. ¯_(ツ)_/¯

Thanks for your prompt replies in this thread.

You’re welcome.
Indexing is a very useful function but it certainly needs to be understood and thoughtfully considered.

Enjoy the rest of the week!

I’d nevertheless recommend persisting while you’re in the trial period. Leaving the Duplicates issue on one side for now, just treat it as an additional tool for searching among the files you’ve already indexed. Run a few searches that would be useful to you; select a file and check out the Documents pane of the See Also and Classify inspector; try out the Go => To Document and Go => To Group popovers with their shortcuts and as tear-offs.

If any of this looks as if it might be useful, do these three things (in your own time):

  1. Simply because DT is so massively feature-rich, the manual can be a bit overwhelming for first-time users; Joe Kissell’s Take Control of DEVONthink 3 is a newbie-friendly gateway that won’t trigger your manual-fatigue PTSD.

  2. Max out the trial period by quitting the app when not in use (to stretch the 150-hour quota). The trial version has the full feature set not just of the basic version but of the Pro and (very expensive) Server versions. If you decide to purchase before you’ve used up the 30 days or 150 hours, defer the purchase till the trial has expired, because once you’re on the paid version you’ll lose these features forever unless you subsequently decide you want them enough to pay for them.

  3. Decide which folders you actually need to search for things in and start over with a much more focussed indexing approach. As the DT team say, kitchen-sink indexing isn’t a good use of the app’s power; stick with folders that actually have documents in them. You might end up, for example, with a database into which you index your Desktop, Downloads, and Documents folders, and another for relevant directories on your Google drive, though this would probably still be overkill and starting small with a few core folders that have a lot of documents in them would give you more control and speed.

DT is a huge, life-changing app, and ideal for working with stuff like genealogical data. Most users go through a similar journey of starting with “Hmm, not sure I get this and I don’t think it’s for me”, to discovering a single killer use (likely searches on large collections of indexed documents), to gradually discovering more and more features that help with their workflow, to eventually noticing that it’s quietly become the core app for everything they do. There’s often a point in this journey when you kick yourself and go “Doh! I wish I’d checked out that Pro or Server feature during the trial period. Now I’ll just have to pay up and cross fingers it does what I’m hoping.” The good news is that usually by that point it’ll be more than worth paying up.

2 Likes

I’m an indexer who is strongly considering moving to an importer. The original question highlighted one of my frustrations with indexing.

Hard drive space is not an issue.

Any reason why I shouldn’t convert from indexing to importing (other than time)?

Welcome @Clyde_Barrow

The original question highlighted one of my frustrations with indexing.

That question being…?

And no, there’s no generically compelling reason I can think of to not move to imported files. Indexing isn’t the default way to get data into a database for good reasons.

PS: You don’t even have to import the indexed items if you don’t want them to be in the Finder folders still. You’d just select indexed items and choose Data > Move Into Database.

1 Like

Finding lots of duplicates in my indexed databases.

And thank you for the encouragement to shift over to importing.

You’re welcome :slight_smile: