Excluding a tag value in a search / smart folder

Hi,

I am creating a smart folder to collect PDFs that still have to be OCRed. So I search for document type PDF and number of words less than 1. (More elegant solutions extremely welcome).

But I have also East Asian PDFs which the OCR engine won’t read anyway. They are marked with a tag “Language-Japanese”, and I want to exclude them all from the smart folder.

But I can only include tags into the search, I cannot exclude.

I found some hints on the forum, but I do not get the search running, and I just do not see any explanation about where to write my operators when I have to write matches NOT or so.

Any help around?

Maria

Hi again Maria-

A few comments:

  1. Do you know that a scanned PDF has the type “PDF + Text”? So you can make a distinction between scanned and unscanned PDFs by type.

  2. The above (and also your solution) is a bit problematic. I get PDFs from JSTOR, for example, that have a publication page and an essay. The publication page is text, and the essay is a scanned, un-OCRed, image. So I get a “PDF + Text” type with about 150 words in it, but the essay is not OCRed. Ugh!

  3. How about using a label instead of a tag? That would enable a “not” condition if you needed it, and moves your housekeeping issue (Japanese text) out of the semantics of tagging, where it really doesn’t belong, correct?

HTH, Charles

Hi Charles,

Thanks a lot for taking your time!

Yes, but I do not find this type in my selection of types. There is only “PDF+PS” on my PC, works for PDF + Text as well as simple PDF. Did I oversee something?

Yes, I have the same JSTOR bulk of papers, and both solutions are problematic. But I somehow gave up getting a 100% solution. I thought of gradually setting the word limit up in order to cover these cases later, when I am done with the vast majority of PDFs to be OCRed. When I am done with all the papers I have accumulated before DTPO, I hope I can treat each newcomer in my database with more care…

This is my workaround at the moment. I never thought of a tag for the language not really belonging to a document – but the others are all about content indeed. So it is worth thinking about it, although I do not like the colorful labels in my database for aesthetic reasons… (though they are nicely designed).

Still, there may be other cases where I want to exclude one tag from my search. Can anyone tell me how to do it?

Exhausted greetings from a hot & humid South Japanese night,

Maria

You can set the preference ‘General>Colorize icons with labels’ to checked and it will be a more muted approach. Change the label colors to white or light pastels and they will be more muted still.

Currently not possible, although it is an often requested feature so hopefully we will see this added in the future.

Hi, Maria. I hope your weather cools soon.

Remember how I come up with kludges and workarounds to do things that are not directly designed into the application?

I used to emulate the NOT search operator in version 1, by performing a search for other criteria, replicating the search results to a new group, then searching in that group for the criterion to be “thrown out” and removing those results from the group.

That’s no longer necessary in DT version 2, as there is a NOT operator for most search objectives – but it’s still a useful kludge for substituting for the current lack of “is not” in advanced searches for tags.

Yes, it’s a clumsy workaround, and it would be preferable to have an “is not” comparison for tags. The workaround is limited to the current state of my database, so it can’t create a smart group.

But it works right now, so I don’t have to wait for a future version of DT to get a task over and done with. I don’t care if it is an ugly kludge. I can get a job done, probably in a couple of minutes.

I’m sure NOT comparison will be included in the smart group editor in the future.

But no matter how far the tools of DEVONthink evolve, there will ALWAYS remain cases where a particular task cannot be done in one step. In many such cases, one can analyze the logic of the desired objective and come up with a workable kludge to achieve it.

It can be fun to figure out how to cut the Gordian knot. :slight_smile:

Bill, Greg, Thanks a lot.

I am going with the label-approach and colored them a bit weaker.

Bill, I do not get the “NOT”-approach working. I do not find real examples in the documentation about where to write what (there are operators etc., yes). I have tried a lot.

So which phrase do I have to write where to exclude a tag?

Cheers from a hot and humid morning in South Japan,
Maria

Because we don’t yet have the “is not” comparison for tags in the smart group editor or in the advanced filter of the main Search window, we’ll have to “trick” DEVONthink into showing us a list of item that meet other search criteria but are NOT tagged “X”.

This is a multi-step procedure, rather than the single-step procedure we could construct if the “is not” comparison for tags was available.

  1. Carry out a search for the criteria OTHER THAN the tag in the main Search window. The resulting Search results will include all the items in the database or a subset of them, depending on your query.

  2. Create a new group the purpose of which is to hold replicants of the search results list, and replicate the search results into that group.

  3. Do a search using the Advanced button in the main Search window, for all items in that new group that are tagged “X”.

  4. Using the new list of search results, remove any items that are tagged “X” from the group. Now the group holds only items that meet your other search criteria and are NOT tagged “X”.

Objective achieved. Yes, it’s messy, but can be done in a couple of minutes. Now, if I wish, I can tag the replicants remaining in that group with tag “Y”. Or do something else with them as needed. Perhaps, for example, I was creating a bibliography of speculative fiction, and had labeled one category of them, such as Edward Bellamy’s Looking Backward. with the tag, “Utopian”. Now I want to separate those out, and pick others, such as Aldous Huxley’s Brave New World, to be tagged as “Dystopian”.

Any updates on when this feature will become available? It may not be technically easy to implement, I’m assuming that is why it’s not on the menu yet, but its a basic must have if one is to do anything useful with tags once you finished tagging. I feel DEVONthink has added the tagging feature but left out an important feature which makes half my searches impossible or really difficult. When using the tag browser one should just be able to do a option+command to exclude a tag and then save the search. I mean this is not even groundbreaking its a common feature in any tag based application so what happened DEVONthink? It’s mind boggling really!

I want to push up the discussion above and repeat lightbox10’s question:

when will there be a NOT option or operator in the smart groups definition (for tags and other criteria as well)?

It seems to be a very basic feature and (again today) I’m missing it a lot.

(BTW: thanks Bill for pointing out the workaround which helps me for the moment, but is not elegant at all…)

Dear Bill and others,

I just tried to make that work and I have a problem with the following fact:

I have replicated the whole group to filter in a new dummy group.
Now I have invoked a search (Cmd-Shift-F) and filtered out those I want to remove from the dummy group.
But how to remove them?
In the context menu, there is only “move ALL instances to trash” available (which is definitely not what I want).
The “simple” “move to trash” command is only available in the “real” list view of a group, or a I overlooking something?

So how to remove the selected items from the dummy group without sending all their replicates to the trash?

The search window toolbar has a delete command.

You mean the traffic sign (see attached image)?

At least its tooltip text sais “move all instances to trash”!
Bildschirmfoto 2011-02-26 um 19.21.21.png

Well that’s surprising, disregard that. I thought for certain that it behaved the same as the delete icon in the main toolbar. I just tried it from a search window and it does delete all replicants.

so, unfortunately, even the hard way “by hand” of this workaround seems impossible.
The only thing that went for me was assigning a temporary label to all the found items to filter them in the original dummy group and remove them from it.

But that’s really strange and a hard limitation of the possibilties…
:frowning:

With every new update I keep HOPING I see a “NEW: Added NOT operator for tags in Smart Groups” but after many months, this is still not there.

From the latest release notes, why focus on things like “third mouse button support for web view” and “special groups retain their icons” when there are more important features such as tag exclusion to work on?

The workaround suggested is very cumbersome. I bought DevonThink in order make my life simpler, not a bigger hassle.

Can someone from the company answer as to when this feature is coming? If not, I will need to look for other software options.

you’re absolutely right!
I made a feature request in the requests and suggestions forum and hope that our wish will be heard soon!
viewtopic.php?f=4&t=12710

Just to clarify, the procedure I outlined above does allow emulation of NOT for tags. Which is to say, it does work and the logic can be expressed in various ways.

Yes, it’s messy and requires creation of some extra groups just to make it work. I would like to have the NOT operator available for tags, but in the interim I can use the kludge and obtain the same results in a couple of minutes or less. As I’ve needed this ability a couple of times, I used the kludge.

Here’s a stepwise variant that eliminates the deletion of replicants from a group (for those concerned about that).

  1. Using the full Search window for all searches in this procedure, do a search for the criteria other than tags that define the universe of items of interest.

Suppose, for example, that I’ve got a project involving literature that contains visions or projections about the future, especially about future science and technology.

That would include the broad category of what we call science fiction, as well as non-fictional extrapolations based on speculation about future developments in science and technology.

A) Polemical. Some of this literature emphasizes values, that is, a focus on whether human life would be better or worse as a result of a scientific or technological development. I might tag literature that sees positive impacts of science and technology as ‘utopian’ and literature that sees negative impacts as ‘dystopian’. For example, the speculative fiction of Edward Bellamy, e.g., Looking Backward, would be labelled as utopian. The speculative fiction of Aldous Huxley, e.g., Brave New World, and that of George Orwell, e.g., 1984, would be labelled as dystopian. Comment: Personally, I would view the utopia of Edward Bellamy as even more frightening and destructive of human values than the dystopia of George Orwell.

I’ll create a new Tags group called ‘Polemical’, with subgroups ‘Polemical utopia’, ‘Polemical dystopia’.

B) Projective: Some of this literature is primarily value-independent, projecting possible scientific and technological developments. That doesn’t mean that human values are totally removed in all cases, however.

For example, articles that appear frequently in scientific journals such as Science Magazine or Nature on topics such as future developments in the field of genomics often emphasize the potentials for diagnosis and control of diseases.

I would include in this category Vannevar Bush’s article, “As We May Think” and Ray Kurzweil’s book, The Singularity is Near.

I would include in this category many articles about genetic engineering of foods, that emphasize scientific information. But I would place others on that topic under the Polemical category instead.

I’ll create a Tags group named ‘Projective’ including as subgroups ‘Projective positive’, ‘Projective negative’, ‘Projective neutral’. I would tag Bush’s article as ‘Projective positive’. But one’s reaction to Kurzweil’s book might be more subjective and mixed. :slight_smile:

C) Entertaining: Most of this literature emphasizes entertainment of the reader as the primary purpose, whether or not values are noted in the content.

I would exclude Kurt Vonnegut’s “Harrison Bergeron” from this category and tag it as ‘Polemical dystopia’ instead. But I would include some other Vonnegut stories in category C.

Daniel Keyes’ “Flowers for Algernon” is one of my favorite science fiction stories and heavily emphasizes values. Nevertheless, I would include it in category 3. (This one emphasizes the difficulty of developing any rational tagging/categorizing scheme.)

Many SciFi stories - especially the pulp variety - are analogous to the old “cowboy and Indian” stores whose implicit values might be viewed differently, depending on whether one is a cowboy or an Indian. :slight_smile:

I’ll create a Tags group called ‘Entertaining’ with subgroups ‘Entertaining positive’, ‘Entertaining negative’, ‘Entertaining neutral’.

To tag each item, I’ll ascribe it to one of the subgroups under each of my tag headings. Never tag an item with the top-level tag group, but only with a subgroup tag among these categories.

COMMENT: Are these sufficiently disjunctive and adequate categories and tags to describe a large body of literature? Certainly not, but suppose I’ve decided to use this simplistic and highly subjective system. Imagine that I’ve used the categories to define my groups and tags in a database and have applied them as consistently as I can.

  1. Suppose I’ve got all of Kurt Vonnegut’s writings concerning science and technology in my database and have tagged them according to the above scheme. Assume also that my database includes other items by or about Vonnegut, so that it includes items that were not tagged according to this scheme. Now I ask myself whether Vonnegut wrote anything that isn’t explicitly dystopian or negative about science and technology.

  2. First, I’ll do a search that will separate tagged and non-tagged items by or about Vonnegut. I’ll use the full Search window with its default settings, enter ‘Vonnegut’ as the query term and click on the ‘Advanced’ button.

Note that with my tagging system I could simply do a search for all items that have these tags: ‘Polemical utopian’ OR ‘Projective positive’ OR ‘Projective neutral’ OR ‘Entertaining positive’ OR ‘Entertaining neutral’. (To create multiple ‘OR’ statements for tags, hold the Option key when clicking on the “+” button to add a new predicate, select ‘Any’ instead of ‘All’ for the new item, then select ‘Tag’ and type the first letter of the new tag predicate and select the correct one.) (ASIDE: Note that if group tagging is enabled, this is a convenient way to create a smart group that lists the contents of multiple groups.)

The resulting list would show all the items that were tagged as Vonnegut writings that are non-pessimistic about science and technology.

But that search didn’t illustrate my technique for emulating a NOT operator for tags.

Let’s do that in the next step.

  1. Create a new group and name it ‘Vonnegut Non-Pessimistic Entertainment’. Select All the items in the above search results and replicate them to the group you just created.

  2. Open a new Search window. Click on the ‘Reset’ button if necessary, to reset the ‘Advanced’ button. Leave the query field blank. Click on ‘Advanced’ and enter a search that’s ‘Any’ (equivalent to the OR operator) for these multiple Tag predicates: ‘Polemical’ OR ‘Projective’.

  3. If there were any search results listed, create a new group and name it ‘Vonnegut Non-Pessimistic Polemical or Projective’. Select the search results from the Search window and move them to the new group. Now the results from the previous search results group are equivalent to having used a NOT operator for the last ‘Advanced’ search for tags. It contains only items having the ‘Entertaining’ tag.

Apologies for the long post, but I wanted to cover some of the difficulties and issues about classification/tagging, use of the full Search window and the Advanced button and use of the Option key when entering multiple predicates.

Bottom line: You can’t currently create a smart group that excludes a tag. But you can replicate the list of items in that smart group to a new group and in that group you can search for the items that contain one or more tags and move out those items from that group.

Note that when I use hierarchical tags (which can be tricky), I never tag items directly to the top level (but they will automatically be included there).

Caveat: I almost never tag new content as it is entered, as I think that’s entirely too much work. But when I’m working on a project, I may use tags to help me deal with the notes and documents used for that project – and often delete the tags when the project is completed.

Hi Bill,

thanks for your detailed explication.

However, I must say that for a non-english-native-speaker your examples about literature are quite puzzling :frowning:

Sorry about that. Yes, all those publications are in English. I picked those genres of writing just as an example of how one might try to come up with a tagging scheme, however incomplete and unsatisfactory – and then use those tags in DEVONthink.

I read a lot of science fiction and speculation about the future and I’ve met Aldous Huxley and Kurt Vonnegut. Vonnegut was a lot more fun than Huxley. :slight_smile:

Sure, it’s your choice and its a topic that you seem to know very well.

I just wanted to say that for an explaining example it was not very well suited IMHO, as I was quite confused after reading. :frowning:

Well, I’m still struggling with my database and finally decided to rebuild, remove all indexed files and start fresh again. :frowning:

Hope it will go better this time, but I think I’ve learnt a lot about DT Pro in the last days.
That’s good but it also hurts very much, as time is very short at the moment and fiddling with “the data” keeps me away from the real work.