On tags and groups - some real world insights

Johannes · December 11, 2009, 7:15am

(This is an updated re-post from the internal beta forum)

The last days I tried to unleash the full power of tags on my main database. I think it was a wise decision in the long run to go for the group=tag approach. It is so powerful and flexible.

So if tags are groups why do I not stick with groups but use tags? Basically it is the interface that makes the difference (even with some limitations in pb8). The toolbar makes it much faster to replicate a document to several groups, the AppleScript property tags is easier to handle than asking for parents and Smart Searches can be focused with tags. And finally I like that replicates within the tags group don’t render the document name red. So most of my documents have black names again, giving the red font indication back to those files only that are “real” replicates. Tags are a great tool.

But the new tool calls for well thought decisions. The main question is: which groups should be excluded from tagging and which should become tags only (=move to the tags-group) and which groups should work as tags too?

The simple road (exclude all groups from tagging and tag independently from them) was not so compelling to me, so I started to find a more holistic approach.

This is still a work in progress, but I like to share some of my results so far.

Basically it boils down to this:

A group is where something lives/belongs to.
A tag describes what something is.

Here are my details:
I am an author and use the database for collecting notes on characters/places/things/events, for outlining the story and for holding the written text itself.

I exclude groups from tagging, if they are „only“ part of my work structure and don’t add anything to the nature of the content. A description of a person might live in a chapter-group in my outline (where I expect the person to appear), but that does not qualify that chapter-group to be a tag (because it does not say anything about the content of the description). For the same reason I exclude from tagging the groups (=chapters) with the final text and all groups that organise the more general notes for rewrite.

On the other hand I include every group that would help to describe the content (nature) of the documents inside. So all my geographically nested groups that hold everything about locations are used for tagging. It adds meaning if the character description gets the tag of the town where he/she lives (and the country that town belongs to). Using the already existing classification avoids doubling the same groups hierarchy in the tags groups. In this case belonging and being are the same. A person living in Germany is German (even if he/she would might consider him/herself as very un-German .

But some terms/groups have to be doubled. Because not all items in my top level group „places“ are actually places (but people who life in the places) I have to exclude that top level group „places“ from tagging and create a tag „place“ within the tags group (I think it is not accidentally that one is plural and one is singular). All documents that actually are describing a location get this tag. People who live in location are stored in those location groups, but are tagged with „people“.

I avoid tagging groups that are used for tagging themselves because it would inherit its own tags to the children.

An indication for tags that should not be in the special tags group seems to be the sort order of a group: if the content of a group needs to be in a certain manual order (the document “lives” at a certain position) it is better not part of the tagging domain.

Johannes

korm · December 11, 2009, 9:36am

I am a tagging skeptic, mainly because (for the data characteristics of my databases) the work involved seems greater than the benefit. I am also not fond of the DT design decision that tags=groups. (For me, tags are metadata and metadata is not physical structure. Groups are physical structure.)

But, the design is water over the bridge. The clarification that Johannes posted above (in fact, the whole post) is quite helpful in this respect:

If the tags=group constraint is confusing, it is easy to unwind. Make all groups excluded from tagging unless you have an explicit reason (as explained by Johannes’) to include the group.

I believe there is a tendency to expect more from tags than tags can deliver UNLESS one first has a carefully considered rationale and set of personal decision rules about the use of tagging. In the environment where I have found tagging to be essential and unavoidable (large-scale datasets for photography) it is my experience that once the investment in tagging starts, it becomes increasingly difficult to unwind. This is because each tag adds another degree of freedom to your understanding of your data. Such complexity can collapse into confusion.

Being a skeptic, I’d first consider whether comments, keywords, flags, highlights, or any of the other metadata tools available in DT are useful for you. If you need more description or elaboration, then consider tagging. But makes some notes (in DT, of course) about where you are going before you start. The use cases posted by Johannes and others in these forums are going to be helpful. (We have some notable pack rats here - you know who you are - and I hope one of them is collecting a set of how-to-tag topic links that will be helpful in this respect.)

And then there are the shortcomings (or benefits? Jury’s out.) of DT’s design and the tagging tools provided thus far and maybe later. For example, it dawned on me today that the tags=groups design introduced strong consequences to the sequence in which things are done in a database. Interactions that are subtle, were not there before, and might prove difficult to unwind. IMHO, as it is today, tagging in DT does not materially differentiate it from the competition. But, neither does DT need to be better at tagging than anyone else. The other powers of the product are allure enough for me.

Usable_Thought · December 11, 2009, 12:48pm

To me the power of tags - specifically tags that can be seen & used both inside and outside of DT, e.g. OpenMeta tags - has to do with allowing reuse of the files in question - not just inside of/outside of DT, but also, when outside of DT, via multiple applications & workflows.

This is because, as Johannes wrote & korm quoted, “a tag describes what something is.” Therefore the document in question can be a single entity, inside a very flat filing hierarchy. For many projects this is not needed but for some it may be very useful.

Historically, one of the criticisms of database-driven tools such as DevonThink is that they are too much like walled or moated castles. On top of the existing Spotlight transparency, Tags has the potential to obviate this critqiue - to effectively remove the wall or moat as an obstacle to flexible usage.

rolfschmolling · December 15, 2009, 9:17am

Please forgive me to quote a post from another thread, but this is important to me:

http://www.devon-technologies.com/scripts/userforum/viewtopic.php?f=4&t=9843&p=45959#p45959

rolfschmolling:

…there are a number of issues here (and at stake for me):

I’d lobby for a much wider support of openmeta-(meta-)data within DTP(O), meaning a live inclusion of meta-data when indexing (especially when indexing) external content within DTP(O)-databases. I want this additional layer of information I have added and use within the Finder-environement. I want to enhance this with information I gather/keep/organize within DTP, e.g. notes, links annotations etc. linked to files kept in Finder. I wan t to be able to look at data from different perspectives, from within DTP, (hopefully) helped by the built-in AI and from Finder (e.g. tags-view/Folder-structure of my material). Currently this is not implemented, an implementation would make DTP(O) so much more powerful. Openmeta supports this, the API is available so this should no be too difficult to put into. I’d suggest to link this to a user-decision based on a preference.

Data in DTP-database is already searchable by spotlight, so why not attach (openmeta-)tags to the relevant cache-files which identify items in the database (or indexed into…)? There would be the need to re-write this metadata every time the cache/index is re-written by DTP(O) but I believe this is not such a performance-problem.

Leap and Yep offer perspectives to look at my data/files which are different from DTP but far superior in connection with (openmeta)-tags. The current implementation of tags-view/search etc. in DTP is far from powerful and I am eagerly awaiting (and expect) improvements. Still, live resemblance/inclusion of openmeta-data within DTP would make it possible to use this superior perspective.

The export of data as a one-time-only operation to get openmeta-metadata attached to files makes – in my opinion – sense only when finishing a project and the exporting to an archive…

It is true, metadata-support by Apple is somewhat lacking, SnowLeopard broke some part of openmeta and that led to a change in namespace where openmeta-tags are kept. This change is – to my knowledge - currently supported by apps made by ironic e.g. Deep, Yep, and Leap (latest releases) as well as DefaultFolderX (DFX). Some other applications like Punakea or Tags have not yet followed. This is not very big problem since there is a mechanism built in to – live – convert different flavours of openmeta-tags to the current/latest version.

So important question to the development-team at Devon (I know Bill is not part of it): did you implement the latest version of openmeta into the im/export-mechanism?

Disclaimer: These thoughts are work-in-progress, I have yet to grasp the full potential of the DTP(O)-way of conversion of Group-structure into tags (but bought the upgrade yesterday…)

Best regards,

Rolf

Horace_Greeley · December 15, 2009, 9:55am

I would not at all mind Openmeta support that is more complete as Rolf has stated, at the same time I would be extremely unhappy if Devonthink forced the Openmeta “standard” into all my files, without being explicitly told to do this. I very much do not want Devonthink to start putting Openmeta into my file system without my asking it to please!

Openmeta does not seem to be a very solid or well designed “standard”, it is using private address space reserved by Apple and can be wiped out with any system upgrade Apple releases. In addition to this, the standard so far is Ironic Apps and Default Folder X, maybe one or two more small indie apps that I have missed. This is not exactly a very big “standard” for anything and if I truly wanted Openmeta everywhere I would use Leap. Options are very good but please do not force me to use this if I do not want it!

korm · December 15, 2009, 10:17am

No flavor of tagging in either the Mac universe or cross-platform is “standard”, or is likely to ever be standard. Ironic’s approach is overly self-serving and lacks the market leverage to coerce widespread acceptance; and their approach is too technically flawed to have a long lifespan. As noted by others in this thread, Apple could kill it with a tweak of the file architecture and not blink.

For avid tagistas, DT’s tagging architecture is half-hearted and odd, but it is commercially conservative and might have a certain wisdom from a long-term data-viability perspective. What matters is “my data is my data, don’t mess with it” and DT would best avoid exporting files following any particular tagging concept without asking. “Export to Files and Folders…” should options or have multiple flavors, or DT should provide in-built scripts, to do “export tags as Spotlight comments?”, “export tags as Open Meta tags?”, or both, or none.

rolfschmolling · December 16, 2009, 11:10am

Hi,

you both are making some valid points here:

I am lobbying for a wider/more extensive support of openmeta-tags in DTP(O), based on my personal needs. I suppose most helpful would be the ability to decide if and how tags are written out based on which group/folder they belong to, as user-customizable as possible.

I disagree on your opinion on openmeta and its implementation, it is much much superior than previous tagging-concepts based on spotlight-comments for example.

Regards,

Rolf

s.hoffman · December 16, 2009, 1:28pm

OpenMeta data from Mailtags messages does seem to be imported and working for me, as does preservation of this data when I export files. Any additional flexibility would naturally be good, but I would rate improving the tagging at a much higher level of importance then Openmeta.

Have you read the openmeta development group lately? There is a sense of abandonment, no road-map for the future and many user and developer complaints. I truly enjoy what Ironic software does with their user interface, all of their programs look very good, they are unfortunately not so good for dealing with large volumes of files. Not so good is a kind way of saying near-useless. Leap is also the only program so far which has managed to kernel panic Snow Leopard for me when going crazy trying to reindex all of my hard drives and update from whatever old openmeta was doing, to the new one.

I am very unimpressed with the current implementation and functionality of what Ironic software is doing and would rate fixing the smart groups and bringing them to iTunes 9 levels and fixing the overall tagging to be more reasonable as much higher priorities.

This is only my opinion but I am a person who loved Yep when it came out, I truly can’t stand Yep 2 and Leap is not usable anymore for me unless I make it only look at a very small directory of files. For system wide use it is a nightmare.

I do not appear to be alone with my problems. Read the openmeta group and it is almost nothing but complaints recently. Openmeta does not appear to be so open, it is open only in the sense that it is a marketing tactic for Ironic Software.

groups.google.com/group/openmeta

jwiegley · December 18, 2009, 12:58am

I don’t think is correct, actually, because it both tags and groups are essentially the same idea – just that one is inherently hierarchical and one is not.

A group defines the taxonomy of an item, with replicants allowing for multiple, orthogonal taxonomies. Thus, one taxonomy could describe the item’s category, another it’s identity, another it’s type, etc.

Tags are just another taxonomy, it is just not hierarchical. You could have a folder called “Tags” with a bunch of groups, and there’d be no difference between that and current tags. What I like about DTP’s new tagging interface is that you don’t have to use drag-and-drop, or navigate huge menus, to replicate your items into this “tags” group. They’ve made the Tags group a top-level entity, and replication into its member items something that can be achieved very fluidly.

What this tells me is that we’re not talking about definition here really, but interface. It doesn’t matter if something is part of a “tag” or a “group”, but how it got there and how you undo it.

John

MDAnderson · December 18, 2009, 6:12am

I know Christian posted how tags work, then your explanation is something of a mix between technical and use with tags being simpler to make then replicants and groups. I understand there is really no difference between replicants and groups and tags, but when all of that is tangled up into tags, I become very confused about what is where and how it got to be whatever it is.

Johannes explanation makes perfect sense to me from a use perspective even if it’s not exactly technically correct. Your explanation confuses me more, instead of less, even if it is technically right. I don’t care how it does whatever it’s doing, only that it works and makes sense to me

cycheney · December 31, 2009, 7:47pm

Just posting in order to subscribe to this thread. [I don’t see, in any threads, the subscribe button mentioned in the forum faq.]

KeithKendrick · January 4, 2010, 10:54pm

I continue to be surprised at skepticism about tagging. Please tell me what I am missing. The metaphor I think about involves a personal information librarian or personal assistant. Imagine that this personal assistant is always nearby when you have some information that you want added to your collection - thoughts, ideas, questions, articles you wrote, articles someone else wrote, a bill from the Vet., a slide deck from a recent presentation, …

The reason to keep any of that stuff is that you might someday want to see it again, read it again, think it over, prepare your taxes, … Because you have a personal assistant - who happens to be very talented at information management kinds of things - you don’t have to worry about the information being available to you in the future; nor do you care to know the details of the filing system that he/she uses.

This is where my confusion about groups comes in. Ideally, you want to give the information - in various sized and shaped pieces - to your personal assistant and then forget about it until the moment that you want to see it again. Can you actually imagine devising the filing system for your assistant and then telling him/her where to place various duplicates or replicates? Isn’t that what what we are doing when we maintain “physical” folder structures to store files? There are situations where the tools are such that an approach like that makes good sense - the school or neighborhood library managing physical copies of books, for example. But, that kind of thing has it’s limits. And I would suggest that we should treat the management of electronic information as if it will breach the limits of such a management model - because in many cases it will.

Back to the personal assistant. I can easily imagine that he/she might suggest to you that you rattle off a few “keywords” when you hand in a piece of information to be managed. Isn’t that what tags are? And, that he/she might have some clever ideas as time goes, and that he/she might share those ideas with you (like a “Usage” thread in a user-forum) so that you can help them evolve the system over time.

Finally, one day you call your personal assistant and say something like "I was having a fascinating discussion with some friends at dinner over the weekend. Would you please send me links to all of the Malcolm Gladwell articles that we have in our database for the past 3 years. And will you send me a list of references to our collection of information having to do with: creativity, design, object-oriented thinking, and organizational learning? Using a set of nicely put together smart-groups, the personal assistant assembles a report that contains the requested references and emails it to you. That last part was facilitated by a clever use of tags. I can’t even imagine all the work involved in collecting and accessing a large amount of information like that using groups organized in a 2-D “structure” - like the physical library, that approach will become inadequate much more quickly than electronic databases accessed using advanced searching techniques and algorithms.

I see no reason to be skeptical of tags, what am I missing?

Thanks in advance for any feedback!

Keith

Johannes · January 4, 2010, 11:14pm

The name of this personal assistant is AI.
According to my experience with DTP the AI would deliver at least 90% of the documents you need in your scenario without bothering about tags at all.
Maybe that is the reason of tag skepticism of some people here.

Johannes

jwiegley · January 4, 2010, 11:55pm

For me, tags are not so much about organizing data as they are a kind of “persistent, named label”. Like, for identifying a related group of documents without having to actually relate them hierarchically.

The ability to do intersections and unions is pretty much required, though, otherwise tags are more of a plaything. Once you get a really serious amount of groups going, the tag view in DT becomes useless, because it turns into a sorted listed of thousands of group names.

So far I like Punakea and NiftyBox for their tag browsing capabilities, but Punakea displays woefully little information about what it finds, and NiftyBox is dying.

John

KeithKendrick · January 5, 2010, 12:55am

The smart-groups allow some ability to do the union and intersection searches, although I suspect that will be further enhanced. And, it would be great to see easier access to that search interface - like a button in the toolbar.

KeithKendrick · January 5, 2010, 1:11am

Thanks, Johannes, maybe what I’m missing is a thorough relationship with the AI capabilities in DTPO. I do think, though, that we - as a culture - should be careful about assigning higher and higher degrees of responsibility to the technology and the designers of the technology. While I think that we should keep making great advances in that area, I think it would be wise to stay well engaged as users of the technologies. It seems to me that tags are a good thing in this regard. In the personal assistant metaphor, I need to participate in the design and use of the interface between myself and the personal assistant - which is what I see as the development and implementation of good tagging habits. Having said all that, I’m only beginning to explore what it means to put those ideas into practice!

Here is a thought I had recently - I’d be interested to hear your feedback. It seemed like I might usefully create a smart-group called “design books” that had all of the files tagged with “design” and “book”. I’m thinking that all of my pieces of information - bookmarks, web archives, articles, receipts, notes, etc. are simply stored in a group called “Library”. When I create the smart-group called “Design Books”, that also would simply be stored in the “Library”. I might never actually browse the “Library” or the “Tags” group; instead I would browse the results of searches. The successes there would be directly related to the quality of my tagging (whatever that exactly means!). What would be the AI way to think about it? Am I right that the AI way involves developing and maintaining a group structure with lots of branching and replication? If so, don’t you see that as a bigger management strain than the addition of some tags every time a new file is added to the library? And, one think I like about tags is that it seems pretty easy to add new tags if you find a file in your Library that is of interest and use but the quality of the tagging could stand some improvement. Thanks again for your comments!

kewms · January 5, 2010, 7:45am

KeithKendrick:

Thanks, Johannes, maybe what I’m missing is a thorough relationship with the AI capabilities in DTPO. I do think, though, that we - as a culture - should be careful about assigning higher and higher degrees of responsibility to the technology and the designers of the technology. While I think that we should keep making great advances in that area, I think it would be wise to stay well engaged as users of the technologies. It seems to me that tags are a good thing in this regard. In the personal assistant metaphor, I need to participate in the design and use of the interface between myself and the personal assistant - which is what I see as the development and implementation of good tagging habits. Having said all that, I’m only beginning to explore what it means to put those ideas into practice!

I’ve actually worked with a human personal assistant. I don’t “tag” things for her to file, I tell her to “put it with other items like this.” Which is more or less exactly the way the DTP AutoClassify function works.

More to the point, my biggest problem with tagging is scalability. Every tagging scheme I’ve tried becomes unworkable for systems with more than a few hundred items or a few dozen tags. My largest current DTP database has thousands of items and millions of words. Beyond broad topical headings – which are handled quite well by the existing group mechanism – how would I even begin to tag all that data? Even if I could come up with useful tags, I’m certainly not going to invest the time to retrospectively tag the material that’s already there, but tagging is useless unless it’s universal. So I’d ultimately have to rely on an AI-like mechanism to do the tagging for me anyway. But if the AI is good enough to assign tags, it’s good enough to find the material without them.

I see some applications for temporary ad hoc tags, for instance for material relevant to a particular project. On the other hand, DTP’s improved support for multiple databases makes it pretty easy to just replicate such files to a project-specific database.

But overall, no, I don’t see any particular value in making things harder for myself by using tags rather than relying on the AI. While I enjoy a nice walk as much as anyone, I don’t hesitate to drive when that’s the best way to get there.

Katherine

Johannes · January 5, 2010, 8:04am

True.

As far as I know the AI is drawing its conclusions from word statistic/relations in the document itself and in documents it is grouped with. So it depends on human interaction and you will improve the AI by grouping documents nicely. Whether you do this by a sophisticated group hierarchy or by a sophisticated tagging strategy does not matter as groups and tags are identical from the technical side. Whether you tag a file with “book design” or drag it into a group “book design” the result is the same: The AI knows that the documents sharing the same group/tag are related. The difference is the UI (and maybe the mental approach to each). The tagging interface is already a very nice alternative to replicating. Instead of mousing endless hierarchies of groups I just simple type a few characters and hit return. There are some limitations in the UI in the present Version (pb8), but I expect to see improvements here with each new release.

Johannes