How to remove pre-existing Tags from a batch of documents?

Hi:

This is my first post to this Forum. I have been reading it for a long time and must say admire the helpfulness of you all.

For some years, I’ve been using DTPO as a document warehouse and search facility (and the sometimes use of the AI), I’ve now decided to start clean all over again. I cleaned out all my documents spread over multiple macs and external drives. I’m now renaming and importing them back into DT Pro Office. These are tens of thousands of documents - mainly pdf and word. Some are duplicates and I’m slowly culling them. I’ve renaming the ones I keep with the YYYY-MM-DD convention and a lot information in the title. It’s going to be a long journey …

I’m importing my files into a DTPO database and then cull, rename and sort. I want to use tags as well. I’ll post on my tagging convention for social sciences/legal/political research and work separately. For now, I’ve got a problem.

When I import my documents, most of them have pre-existing tags that are unhelpful and do not fit my worldview. I’d like to delete all tags from all the documents and start afresh. But I’d like to hold on to the documents as they are – I only want to delete the tags. I’ve been searching this Forum and google but haven’t been able to figure out how to batch delete tags from a set of files imported (not indexed) into DTPO. (I’ve also got the originals outside).

The difficulty with using the enclosed DT Script is that it requires me to enter the tag I want to delete. But with a pre-existing tag list of some 500 tags, this is sort of impossible.

So my question: any way I can batch delete all tags from all documents in DTPO without deleting the documents themselves?

Would appreciate your thoughts. Once again, glad to part of the DTPO Community. It is an amazing program and its users are a highly intelligent and seemingly friendly helpful group. Hat’s off.

In your database(s) expand the Tags group. Delete any tag from that group that you do not want to use. That will delete the tag(s) from all documents to which the tag(s) are assigned.

Alternatively, if you want to delete a tag from some but not all documents, then navigate to that tag (it is a child of Tags), look at the contents of that tag, and delete the document replicants listed there.

For example – in this image of Tags > abracadabra, I can delete any one or more of the replicants of the “How to get rich” series and keep others. When I delete replicants from Tags > abracadabra, the “abracadabra” tag is removed from the “original” documents in the database.

Many thanks Korm! Grateful for your comments.

Do I understand you correctly – that deleting individual entries in the Tag Group only deletes the tag entry but not the underlying document to which the tag entry is attached? Or does deleting the Tag entry also delete the document to which it is attached? This was the worry that kept me from selecting the individual tag names in the Tag Group and just deleting them – I didn’t want to lose the document to which the tags were attached.

Deleting tag names would be so much easier provided they do not delete the document from the database as well! Seems obvious, but given the weeks I’m dedicating to this “deep clean my files” project, I didn’t want to take a chance.

So, being unsure of the effect of deleting tags on the underlying documents, and not wanting to spend my Saturday in the warm sun but rather wanting to spend it cleaning my files, I used the following long-winded workaround.

Use case: remove multiple tags from multiple documents all in one go. In my case, I had multiple documents with multiple tags – each document had multiple tags and each tag was attached to multiple documents.

  1. Create a smart folder in the database called “Tagged Docs”. Rules are: All are true for the following: (a) Kind is Any Document; (b) Instance is Tagged.

–> This will create a smart folder listing every tagged document in the database irrespective of the name of the tag.

  1. Expand the Tags “group” in the three-window view. Select all the tags and copy to clipboard with Cmd+C.

–> We now have a clipboard list of every tag used in the database.

  1. Paste the clipboard contents into a text file or another text cleaning program (I use Clean Text).

–> We now have a vertical list containing the name of every tag used in the database. It looks as follows:

Tag1
Tag2
Tag3

  1. Select all. Now run the following Find+Replace command: Find paragraph return (hitting enter once in the “Find” window) and replace with semi-colon (typing ; in the Replace window). Please do not add any spaces following the semi-colon in the Replace Window.

–> We now have the vertical list in Step 3 above displayed horizontally as follows: Tag1;Tag2;Tag3…

–> It is critical that there be no spaces between the semi-colon and the Tag name immediately following the semi-colon. You need: Tag1;Tag2;Tag3 and not Tag1; Tag2; Tag3.

  1. Copy this horizontal semi-coloned structure to clipboard.

  2. In DevonThink, navigate to the Smart Folder titled “Tagged Docs” and select all entries. In the script, run “Remove all Tags from selection”. In the pop-up window, paste the clipboard contents.

–> All tags from all documents will be deleted at one go.

Note: if there is a space between the semi-colon and the TagName that follows it, this will not work – only the first recognised Tag will be removed from the “Tagged Docs” Smart Folder. However, eliminate the spaces (Tag1;Tag2;Tag3;Tag4…) and you can now remove multiple tags from multiple documents at one go.

Once again, thank you for your much simpler approach of select Tags and delete! It is too late to spend Saturday in the glorious weather outdoors, but will do so on Sunday given that your simple “delete tags” solution saves me a load of time!

Not sure I follow all that, but it looks painfully complicated. :neutral_face:

There’s no reason to guess at how things work – it’s so easy to experiment in a test database before working with real data.

The “normal” use of tags is to add a tag to a document using the tags bar, or one of the scripts that adds tags, or to import a file that has Mavericks and/or OpenMeta tags, or to index a file that has Mavericks and/or Open Meta tags. In all of these cases, a replicant of the document is added to that tag in the Tags group. So, deleting the tag does not remove the original document.

It is possible to create or import or index a document there directly inside a tag group. In that case, deleting the tag will delete the original. (Personally, I avoid importing documents directly into tag groups, though some readers here choose to do that for their own reasons.)

Of course before you make any significant changes you’ve backed up your databases and your original files, haven’t you?

Plus 1 on this. I don’t want to appear to gang up on the OP as this is not an unusual situation anymore. Users will spend more time asking questions about the responses received than it would take to just experiment with the proposed solution(s).

As to the original question on how to remove pre-existing tags on a selection of documents-there is a simpler way. A script is included with DEVONthink to do just that. Go to the Help menu, choose the Support Assistant, Download extras, Script menu, and on the second page there is a script to Remove tags from selection. Once installed, it will appear in the Scripts menu, Tags sub-menu.

The problem with that script is that one needs to type into the dialog box the tags that one wishes to remove. The OP wanted to remove all tags, and wrote this comment about the included script:

Before I gave my first suggestion, above, I spent several hours experimenting with changing the included script and found that it was actually rather tricky to script “delete all tags for a selection of documents” because of the tags for groups included for tagging. I did not want to risk posting something that wrecked the OP’s data hierarchy.

And, merely deleting a lot of tag children from Tags is quite simple to do.