Tag Case Sensitivity

llcckk · March 19, 2011, 2:00pm

Hi,

Is there a way to turn off the case sensitivity setting for tag labels?

Thanks,

Phil

cgrunenberg · March 22, 2011, 8:37am

No, that’s not possible.

llcckk · April 2, 2011, 1:39pm

Christian,

Curious - what is the thought process behind having tag labels being case sensitive? Doesn’t really make sense to me.

What is the difference between “receipts”, “Receipts”, “RECEIPTS” etc.?

Phil

Bill_DeVille · April 2, 2011, 4:31pm

As your example demonstrates, that can allow the user to create a tagging system in which case conveys additional information. Text strings such as your example can be ‘coded’ by the user as multiple tags depending on case of each character. Thus, depending on the user’s system of case assignments, your example term could have variants specific to multiple cost centers, for example. How many cost centers could be case-specified by the string, ‘receipts’?

So although all the documents in my example are receipts, arbitrary assignments of those receipts to cost centers based on character case add more information content to that 8 character string.

Think of such an approach as analogous to the use of the Dewey Decimal System for coding the holdings in a library.

Cryptic? Yes, but smart groups can then be created to list receipts by tag for each cost center, or in the aggregate.

I probably won’t make much use of such an approach. But if I were working with documents for an archeological dig, or analyzing a collection of Medieval manuscripts, I might appreciate the ability to extend the information content of a descriptive term used as a tag.

sjk · April 2, 2011, 7:00pm

Perhaps unnecessarily, for most people?

lamp · April 10, 2011, 11:00pm

IMHO The case sensitive on Tag is unnecessary. This cause confusion to users when they see duplicate tags.

This is inconsistent with DT2 implementation of Case-insensitive search in DT2. (Then again PDF Find search in DT2 is “Case sensitive” unless you turn on “Ignore Case”). In Engineering and Science, there are lots of times the need to search for acronym which is in CAPITAL letters. While the Boolean search enhancement in DT2 is nice, the removal of case-sensitive search in DT2 is a BIG feature regression and caused a lot of griefs for users.

So my opinion is Tagging should be case insensitive, while search should be case-sensitive (with knob to turn on case-insensitive).

sjk · April 11, 2011, 3:46am

Yep, like my response to Bill was intended to suggest.

That would definitely be my personal preference, until convincing enough counterarguments/reasons persuade me otherwise.

Reminds me of what Brian Tiemann wrote On Unix File System’s Case Sensitivity in 2001 (when my migration from Solaris to OS X started), which so eloquently expressed my eventual sentiments on that matter:

It’s taken me a long time to come to terms with the appropriateness of a case-preserving, case-insensitive filesystem, but I’ve done it.

IMO, mandatory reading for anyone who (still) wants to seriously debate case preservation/sensitivity topics in different contexts.

Bill_DeVille · April 12, 2011, 5:19am

I don’t think the issues of case sensitivity/insensitivity/preservation in a filesystem have much relevance here.

No more, perhaps, than the question of why Germans start the first line of a message after the salutation in lower case, while English speakers capitalize the first letter of the first line.

Or than why Archy the c ockroach always typed Mehitabel the cat’s name in lower case. (Actually, that was because he couldn’t jump on the shift key and the letter key simultaneously.)

In DEVONthink 1.x searches were case sensitive. I was initially taken aback by Christian’s move to case insensitivity for searches in DEVONthink 2.x. But by building the internal Concordance of words as case insensitive, DEVONthink 2.x is more memory efficient and searches are much faster than in version 1.x.

True, that doesn’t let one search for “John Smith” instead of “john smith” or “JOHN SMITH”. But case insensitivity is more inclusive than case sensitivity, so I’ll find all documents that reference that individual, regardless of the use of case.

DEVONthink 2.x is focused on characters that make up words, and so (in European languages) on alphanumeric characters to the exclusion of other characters such as punctuation marks or special symbols such as a symbol used in legal documents, “§”. The exclusion of punctuation marks makes DEVONthink searches faster and more memory efficient.

True, that means that I couldn’t use DEVONthink searches to analyze an author’s usage of semicolons versus commas. And lawyers may be surprised that a DEVONthink search cannot distinguish between “§14.56” and “$14.56” or “#14.56”. But it will find all instances of “14.56”.

Should there be user preferences to switch between case sensitivity and case insensitivity in searches, or to include or exclude punctuation marks or other special characters? To do that would not be as simple as it might seem, as the internal Concordance would have to be recreated each time different options were selected, and memory requirements could be significantly increased. There might also be consequences for the Classify and See Also features, probably not for the better.

Rather than screw up Christian’s very fast search routines (and other features as well), perhaps a simple Find routine that could toggle between case sensitivity or insensitivity and that could ‘see’ non-alphanumeric characters in documents would suffice. Much slower in a big database, but this could supplement searches for special needs. Then I could compare the relative frequencies of commas and semicolons in the writings of two authors.

Unlike searches, tags are case sensitive. “Purple” and “purple” are not the same tag and might have different assigned meanings. That doesn’t bother me, as I’m likely to seize on the potential to use case as a means of coding a little family of tags that will be clustered together by alphabetical sort in the Tags view.

True, if I imported tags from other sources, variations in case might well result in duplicates all over the place.

That won’t bother me. I’m a confirmed curmudgeon regarding tags created by other people, or even by myself at different times. I refuse to use tags created by others.

Many years ago I worked with a system of searching millions of documents by means of descriptors that had been assigned to them. There are three criteria that should be met to adequately tag a document:

The descriptors should be apt, well-chosen terms that convey the descriptions;
The descriptors should be comprehensive, identifying the major ideas, events, places, etc. that may make the document useful for various purposes.
The descriptors should be consistent, so that the same term would be applied consistently for a given topic, etc. in all documents.

I trained staff to translate technical information requests into search queries based on the descriptor terms assigned to documents by federal agency personnel. I visited those federal agencies to observe the procedures used by their personnel to assign descriptors to documents.

That system for the purpose of describing documents so that they could be found for various purposes worked to a degree, but with severe limitations.

The aptness of descriptors often reflects the familiarity of the assigner with the subject matter, and all too often contains subjective elements.

The completeness of the descriptors reflects the ability of the person making the assignments to recognize the potentially important bits of information in the document, and frequently is affected by pressures of workload or thinking about an upcoming vacation, resulting in leaving out potentially useful descriptors.

The consistency of application of descriptors (above the level of the most trivially obvious ones) tended to be terrible. Different individuals were inconsistent with each other, and often the same individual treats similar documents inconsistently over time.

I’m pretty familiar with the scientific and technical disciplines that go into my reference collections, as well as with the laws and regulations and policy issues that are also included.

Long ago, I gave up on the idea of systematically tagging content as I add it to a database. It’s not just that I feel that it would take a lot of time and effort to do a reasonably good job on most new items, but that I also found that when I take on a new project, I’ll evaluate the importance of a reference quite differently than I would have for a previous project. I’ll often have a very different perspective about the same bits of information.

So I limit classification of new content to placing it into an appropriate database, and then into at least a general group that’s appropriate.

I don’t import any existing tags associated with that new content. As I say, I’m a curmudgeon. I don’t have faith in other people’s assignments of descriptors. I don’t find them useful. I don’t want them cluttering up my databases.

Tagging becomes a useful tool when I tackle a project and want to pull together documents and notes that are important in various ways.

But when I finish that project, I’ll usually throw away the tags that I had created to help me work on that project.

If I show you the Tags view of my databases, the right column listing the tags I’ve created will be empty, or contain only a few tags – unless I’m working on a project at that time. Some of those tag names may look strange, as I sometimes use case variations to code permutations of a descriptor.

But that’s just the way I think and work. Feel free to consider me an eccentric, and use tagging in any way that is useful to you.

rolfschmolling · April 12, 2011, 7:50am

Hi,

just to chime in, in my case I keep my tags (all my tags) in lowercase. That means I am able to name groups (which then become tags of a different kind/role) in uppercase to organize my stuff.

That “stuff” means material gathered or organized for my PhD-project. Tagging (lowercase) enables then searching across file-structure while at the same time browsing and actively organizing material, links and connections, time-lines/chronicles etc. prior to WRITING (sorry couldn’t resist that one - capitalizing to emphasize both difficulty and importance alike).

My problem is that of many buckets, a very old Filemaker-database (FileMaker 6.0) with a lot of transcribed material (Some fields in a lot of entries are several pages long) ; a BibDesk-database for quotation-management (using LaTeX for writing), attached is a papers-folder indexed by DTPO, the Finder with its files structure augmented by OpenMeta-tagging (Leap; HoudahSpot) – again all in lowercase; some timelines written in Word, when I still used it etc. I am trying to organize things around DTPO, though all material I will use in my text needs to be a BibTeX/BibDesk-entry too. Luckily BibDesk allows me to use file-urls to link to sources within DTPO’s texts.

I am not looking forward to Lion since this will do away with Rosetta (think FileMaker X.?; MS Office ?)

I wonder if there is a way to move stuff from FileMaker into a DTPO-database while preserving all information/metadata (not in the strictest sense of the word). I wonder if DTPO is as fast as FileMaker in searching and displaying material. Anybody having experience with this?

I’d need to convert data into rtf-files, the content (transcribed information AND notes in one field) in the body of the text; title of source of book/article etc. together with location (archival material; library-stuff) in the title of the rtf as well as in the beginning of the rtf-file (like the “notes”-template) keywords converted into tags.

Best regards,

Rolf

PS: why – why is the forum throwing me out - literally almost every time I write down a longer entry???

adoolittle · January 6, 2012, 6:45pm

I vote for a case sensitivity option as well. There is a big difference between “RET” (Registered EEG Technician) and “Ret” (Retired). You wouldn’t believe how many people are retired. A lot more than there are Registered EEG Technicians.