getting rid of myriads of tags

I’m battling the myriad tag issue :imp:

I guess somehow in the past these have crept in through rss feeds and not having the preferences > rss > convert categories to tags unchecked.
It’s unchecked allright now…

I tried Korm’s method here Remove/delete unused tags, but somehow it doesn’t seem to work (anymore)?

Double clicking the tags opens up a new window with tags only allright, view as list ok.
But then when I delete a couple of tags, the number of tags doesn’t go down…?
Even after emptying the trash…

Now I seen to remember that in fact tags are kind of equal to groups, just showing differently.

So what am I missing here?
If I delete a tag, does a group with the same name survive?
Is there any script to help get rid of tags without item? (count = 0)?
tags.png

Of course. They are two different entities.

Create a Smart Group with criteria: Kind is Group and Size is 0 and the Search In dropdown set to the Tags group. Easy!
EmptyTagsSmartGroup.png

1 Like

Let’s look at what might happening.

In a data base you can have groups that are children of the Tags group in the root of the database. These might be thought of as “real” tags. These real tag groups should contain replicants of documents replicated from elsewhere in the database. When you use the tags bar in a DEVONthink view and add tag “abcd” to a document, then DEVONthink will replicate the document to the group Tags > abcd if abcd doesn’t exist at the location, then DEVONthink will create a group with that name. If you import a file that has Finder tags, the same thing will happen: a tags group for each Finder tag is created and the document will be replicated to that group.

Also, in a database, you can open Database Properties and uncheck (turn off) the Exclude Groups from Tagging – turn off that checkbox is virtually the same as enabling “Include Groups for Tagging”. Once a “normal” group (i.e, a group that is not a child if Tags) is “included for tagging” then it will behave exactly like the groups that are “real” tag groups. A big difference though is that Finder tags on imported documents are always replicated to children of Tags.

So, you can get a lot of tags in Tags if (a) you tag a lot, (b) you import documents that have a lot of Finder tags. Even if you have enabled “include groups for Tagging” you won’t have a proliferation of tag groups in Tags.

The simplest way to get rid of “tags” in Tags is to delete them. H O W E V E R if you have been in the habit of importing documents directly into the children of Tags, then you have a problem. Because you have mixed replicants with non-replicants. That’s a mess for another posting.

Gentlemen, thanks for your prompt answers!

I’ll have a go at this & report back.

Korm, I’ll have to re-re-re-read your post and let it sink in, but you knew that didn’t you :laughing:
I don’t think I find myself with the last case you describe, as, if I understand correctly, I should be using tags as equal to groups, so, unfold the tags in the triple view pane and for example while having a specific tag selected, drag exterior items, docs or whatsoever into / under the tag-as-group.

I NEVER do this…

Hmmm,

I knew it… I’m stumped… :question:
Need a lot of :bulb: :bulb: :bulb: :bulb:

Did the smart group trick;
Went into the smart group, selected all tags in it and deleted them.

Count of tags doesn’t go down…
Smart Tags group is empty now, and Trash too…
I even end up with a couple of more tags…

???

Ok - I’m not there yet… :mrgreen:
Count of tag number does not go down? That I don’t understand.

I tried a couple more things:

  • create the same smart group as before, but Size is not 0 but 1. Had hoped this would yield the tags with a single member, alas this doesn’t work.

=> Do you have any smart idea to filter out tags that only have 1 member?

  • I found the remove tag script, in the scripts > more
    Using this I managed to select tags with a couple of thousand / hundred members and remove those tags

  • Next I tried to setup a smart group to filter out tags on the following pattern:
    ----
    Alas this doesn’t seem to work either.

Open for suggestions!
Image.png

Erwin, I don’t have an answer, but there’s two things I notice about your tags in the screen shots.

You have a lot of tags that have names that appear to be UUIDs generated by some system: 1d21a266-cf6a-ac90-0050569d32b9 and so on. That is definitely NOT normal. If there is nothing inside those tags then just delete them (i.e., delete the tag from inside Tags and empty the trash), then see if they regenerate themselves. I am scratching my head wondering how these things go there to begin with.

The second thing is you have a bunch of tags that seem to be path names: /invest… s-income. That’s also strange. Did you actually manually tag documents with tags with those names?

The third thing (I said there were two things – I lied) – if you have a tag group that has no content, then just delete the tags and empty the trash. You’ll not going to lose anything.

Have you rebuilt this database at all recently? Might be useful. I have never seen a database with ~40,000 tags :exclamation: :exclamation: :exclamation: :exclamation: :exclamation: :exclamation: :exclamation:

Definitely a problem.

(And why does your inbox have ~24,000 documents? Time for a little database hygiene maybe :laughing:)

I did try a repair, not a rebuild yet. What about sync? This DB is synced across multiple machines… Should I remove sync first?

The uuid type also caught my attention. But I must admit that it’s been a while since this originated. I’ve been with this on the forum before, and because of lack of time and solutions I just let it rest… Until now.

The /blabla/zzz type of tag is definitely not mine. I gather it’s also from an rss source.
It’s to do with investing, that’s why there are also stock symbols here and there.

I have already been thinking of a way to get everything out to another DB without tags and build afresh, preferably step by step…

-using the remove tag script seems to work. Problem is I first need to select the tag, copy it’s name, call the script, paste the name, and this like 30000 times…
If I could automate this?

(tags haven’t been trashed yet in the screenshot)

Thanks for thinking with me!
tags.png

Strange - all these items come from Motley fool. (fool.com)
They most probably come in through rss.
When I remove a tag, so the tag group is 0, then trash the empty tag and empty the trash, AND then do a search on the deleted tag, it comes up with the document from which I removed the tag…

Which makes me think the tags are kind of embedded into the post. But I can’t see them, and ctrl-f doesn’t find them in the document either.
But it does in the DB??

Edging closer…
I found the uuid type is located in the keyword.

Also near 99% positive they all come from fool.com rss.
Now I wonder if it’s feasible to:

  • expand the remove tag script to also remove the keyword
  • somehow automate this in some sort of routine, or first isolate the uuid keywords + tags and then get rid of them this way?

What’s also strange:
I created a new DB and will put these rss feeds separately in there for starters.
I then tried if I could drag the rss items over to the new DB in another window.
No go…

I tested the same with another rss feed.
This one I could drag over without a problem??
So it seems like it’s locking the item somehow too?

I might have misunderstood if you’ve already done this, but if you uncheck “Convert categories to tags” in Preferences > RSS you shouldn’t be getting “infected” by any of the weird metadata coming from the Fool. Of course, whatever was there before you turned off that preference is going to remain until you delete it.

Korm, this is indeed unchecked.
This is also traceable for me, meaning if I sort on date added, it shows up to where in time these uuid keywords were added.

So in fact the problem is under control.

But:

  1. I prefer not to loose the old posts (with uuid)
    2. I need to find a way to edit these keywords out. I’ve been digging around but these don’t seem to be editable anywhere?
  2. If I find a way to edit out the keywords, it would be preferable to use some automation to do it recursively for all the posts.

Opening the post in Dreamweaver I can see and edit away the uuid…

Now I need to find help to try to automate this…
Any ideas? Suggestions? Help!!

If you delete a tag named 1d21a266-cf6a-ac90-0050569d32b9 from the Tags group you are not going to lose the post. The posts should still be stored in their normal groups, they just will not longer have that tag assigned.

Example:

If I deleted the tag bears are mendacious, I am not going to lose the document A laughing duck says because the document is “really” stored in the Examples group.

I think if you delete the tags you won’t have the problem reoccur, since those tags were created when the original was imported, and you’ve turned off the option that caused the tags to be created.

Deleting the UUID tags should not be difficult since they are probably clustered and you can select all of them and use Data > Move to Trash.

I feel I’m missing something. :confused:

By the way, just in case you need it, there is a hidden preference to DisableFinderTags. Look in Help > Documentation > Appendix > Hidden Preferences. Once enabled, you can import or index documents without grabbing their tags, and export documents without writing your DEVONthink tags to Finder tags. This is helpful if you want to export your database, create a new database by importing documents, and not have the all the weird bad tags attached to those exported (and then imported) documents. You need to enable that preference before you do the export.

Korm, thanks for your perseverance!

Deleting just does not do it on my side.
I select 10 uuid tags, delete, data > move to trash
the items are indeed moved to trash, the tag count # doesn’t change.
Then go into trash > empty and the tag count # doesn’t change either.

What does work:

  • I created a new DB tag_test-2
  • In the other window, with global inbox selected, and focus on too-many-tags, I select some > drag these over to the other window DB tag_tested-2.

=> The dragged tags change to groups (with 0 items of course)
=> On the originating DB window, the tag count # goes down

Last night I did the following:

  • from the rss feed > get info > removed the url of the feed (I expect this to stop the feed group from receiving new items)
  • I then started a Rebuild DB

I’m not 100% sure as I approach the issue from my macbook but also my Mac mini workstation, but today it seems I can drag over the uuid items to another DB, which I couldn’t yesterday.

Do you have a way to create a smart group that will group all uuid groups / tags?
I tried with the pattern but it does not work.
I also tried with a pattern like 9b* but this yields no results either.
I guess I would need to add the Keyword field to the search, or do this using a script?

This would ease the cleaning out of this mess.

If you are deleting empty tags, that is, tags that are not assigned to any documents, then your badge count of the Tags group will not go down. The badge count is not the number of tags in the Tags group-it is the number of documents in the Tags group that have tags assigned to them.

Putting it another way, you can have 1,000 tags in the Tags group, and if none of those tags have been assigned to documents, the badge count of the Tags group will be 0. You can then assign all 1,000 tags to a single document and the badge count for the Tags group will now be 1.

Thanks for clarifying.
i’m slowly getting there.

I’m looking for a way to filter on tags using wildcards.
Like
tag: A*
or
tag: BE*

I looked in advanced search but I can’t seem to find a way?

This works. Start the tag name, then enter the wildcard. Remember that wildcards can only be alphanumeric.

Correction: wildcards can be any character. Search, which we are not discussing here, must be alphanumeric. Sorry for any confusion.
Screen Shot 2017-07-01 at 7.45.49 AM.png