What is meant by Metadata?

Cassady · June 25, 2014, 7:53am

Hello all,

Not wanting to derail this thread ( [url]What does the future hold for DEVONthink?] ) - I figured I would pop something up here…

This is a sincere question, btw!

Many posts from several users have requested the ability to add custom-metadata to documents… Which had me thinking, what exactly is meant by this?

Metadata, being ‘data about data’, is obviously a very broad term/concept/field… So figured it might be useful/interesting if we could pop up some more concrete examples of what different requirements would be?

Or am I missing the point – are the requests specifically that requirements should not be an issue, since the user should have the ability to create a completely unique set of meta-data for themselves?

If the latter – Scrivener offers something like this. And I’m actually embarrassed to mention how much time I’ve spent staring at the ‘edit custom meta’ box, trying to work out what I would use…

It’s almost like the first few times I sat down in front of a pc and opened this new search engine called Google, that was connected to this (almost) new thing called the Internet – there were so many possibilities, I almost didn’t know what to start searching for…

Not sure if anyone will bite at this – since it might require sharing too much information?

I’m also being a bit sneaky, since I’m hoping that someone pops up some examples of things I could start looking at this side, to possibly include elsewhere…

Oh - and one more thing – can a list of “custom metadata” not simply be inserted into the Spotlight Comments section, that would virtually do exactly what is being asked for? Or are there problems with that approach?

korm · June 25, 2014, 9:49am

The simplest explanation of metadata with respect to files (documents) or groups (folders) is that it comprises property-value pairs. Let’s say you have a collection of documents describing your sightings made while out in the field studying birds. You could tag those documents with color names (“red”,“grey”,“osage”), but those tags are vague because they do not answer the question “red what?”. “Red wings?” “Red wattles?”. So, if you are using a database or other data structure that supports custom metadata, you might create a property named “wing_color” and assign values to wing_color – your property-value pairs might be wing_color:red, wing_color:brown, and so on. The first part of each of those pairs, the property, is static. The second part, the value, is dynamic – meaning you insert the value for that property for that document using some sort of property editor.

The columns in a DEVONthink database are properties. The content of each column / document intersection in the matrix are the values. “Comment” is a property, “fix the grammar in this document” is a value assigned to “comment” for that document.

In many cases, the database program will allow you to control what kind of values go into a property. If you have a property “sighting_date” for your ornithology database, you would want to ensure that that property contains only valid dates, so you could not insert the value “french toast” into “sighting_data”.

A difference between metadata, in the form of property-value pairs as explained above, and tags is that tags are freeform. There is no “intelligence” to tags (or comments, for that matter). In a very broad sense, tags are also property-value pairs. “Tag” is the property, and whatever you stuff into a tag is the value.

In the case of DEVONthink, I suspect what most users who want “custom metadata” want is the ability for the database to let you add whatever properties you want to define, and then add values to those properties. I don’t read many comments in the forum from readers who want to also constrain the data types for those properties, though that would be reasonable. For example, define custom properties that are “date”, or “string” or “numeric” – or properties that are constrained to certain values: “red, blue and green are valid, but osage is not”.

You’ll need to write your own story for how you would use such a feature if DEVONthink supported it. You could look into a product called Tinderbox, in which documents can be defined with literally thousands of custom property-value pairs (in Tinderbox, these are called “attributes”). I don’t imagine anyone would want DEVONthink to be as extensible as that, however.

Cassady · June 25, 2014, 11:10am

korm - that was very insightful. Plenty of food for thought there.

And the indirect result (for me at least) was also a revisiting of my tag-regime…

So big thanks for this.
And I agree (FWIW) – based on how you’ve explained things, the ability to constrain entries to a particular type, would be very useful…

Expanding on this a bit – how might one go about this in the context of DTPO’s interface?

a.) An additional “View” option/window - i.e. after Tags?
b.) A sidebar ‘inspector’, like in Pathfinder (or the See-Also sidebar), that pulls up the meta-data of selected files?
c.) Floating window perhaps? This would then presumably work like the Finder Quick-view/ DTPO Info tab does now, and live-updates as different files are selected in the background?

Will think a bit more about the above…

korm · June 25, 2014, 12:02pm

I’m sure there are lots of clever ways to do this – and trust that Criss and his team would do something delightful if they choose to add a custom metadata feature

ostwaldj · June 25, 2014, 8:34pm

Academics in the humanities (non-programmers at least) tend to think of metadata like the fields in bibliographic software or a library catalog database: a book (the “data” being the content of the book) has data about the data (“meta-data”), i.e. an Author, Year Published, Publisher… which is separate from the content of the book itself. In historical research, if I have a transcript of a letter, the metadata could include the bibliographic info (the book it was published in, the book editor, date published…), and it should be separate from the content of the letter, and also (in a pure sense) separate from whatever tags or keywords I might assign the content of that letter - those tags/keywords for a single file are much more ephemeral, whereas the metadata in this scheme shouldn’t really change once assigned. That usage may differ from how programmers use it, but it’s somewhat compatible with at least some of the metadata OS system properties (Author, Title).

Having requested the feature, I come from relational database-land (MS Access) where you can indeed define valid formats for a field’s properties and create any number of fields. For example, in the case of historical research, I’d ideally want a variety of date “fields” for each document: the system’s Date created, added and modified fields for each file of course, but additional possible date fields like Date document written, Date document received, Date of event discussed in the letter… They really need to be separate fields to keep them straight (a date is not a date…), vs. just throwing a bunch of number strings in the Spotlight Comments or in the title of the document. Hopefully the software is smart enough to know that they are actual dates and not just text or plain numbers. Check out my blog http://www.jostwald.wordpress.com for more detail on what historians would like to see.

As for how to implement it, I’d really want these fields to (at least) display as columns in three-pane view, like the system meta data fields currently can. That way you can sort documents by the various dates. For example, even if DT’s AI finds a whole bunch of documents relating to the same topic, I want to examine them over time, by location, by author, by recipient… (i.e. sort them by column). I don’t think you can sort results by group, and the current Tag column setup with all the tags displayed in one column (and no way to control the order of the tags) is of almost no use if you’re using multiple tags.

Stepping back a bit: As I describe in my blog post Organizing with Devonthink, I think Devonthink gives you six different ways to categorize things - 1) DT groups; 2) DT tags; 3) system metadata file properties like Author, Subject, Keyword; 4) Spotlight Comment; 5) naming conventions; and 6) putting some kind of keywords in the file’s content. Problem is, I need another type of organizational scheme because either they are intended to serve specific functions or they have serious limitations.
We want to use #1 groups for topics/subjects for the AI, which also should rule out #6 (store source information in the content) - I want AI to find things by content, and not be confused by the fact that two different letters came from the same source, information I already know. It’s not clear whether the AI would work well with several different ‘layers’ of groups and it would be more confusing for the user: one layer for geography (Geog group1=France, Geog group2=England…), one layer for chronology (Year group1=1700, Year group2=1701…), one for topic (I currently have a few thousand groups just for topics), one for authors (Au group1=John, Au group2=Fred…), one for recipient (Recpt group1=John, Recpt group2=Fred…)…, with each document replicated in each set of groups. Practically, I don’t need to use the AI for the bibliographic info anyway.
We’ve been told #2 tags should not be used for bibliographic information (because they may be the only place where some records are stored). I do it anyway, but I’d be happy to switch them to metadata fields if those metadata fields were more easily editable (and usable).
#5 naming conventions get too long to read (in search results, columns and smart groups) when you start tacking prefixes on. Ideally the title of the file will summarize the content so you can see that summary in your search results rather than having to read through every file. And you can’t sort by more than the first prefix (again the sorting problem - we want to categorize our results after using the AI, and sometimes without using the AI at all).
That leaves #4 Spotlight Comments doing a lot of work, but many academics have multiple types of keywords/metadata: bibliographic info, keyword info on geography and chronology of the author and the recipient and the subject under discussion, and of course the topics in groups so the AI can work… The bibliographic info and other metadata can’t all fit in the Spotlight Comments (plus, same sorting problem as with Tag column in 3-pane view).
The #3 file system metadata properties (Author, Subject…) would be perfect because there are numerous, short, distinct fields - and some of them even have the proper names like Author… But they are barely editable within DT and with Applescript, not to mention you can’t use most of those fields in most document types anyway (only email files, which can’t be created in DT). I’d want basic ‘metadata’ fields like: author, recipient, date, place, and a few custom fields would be great as well.
So when people say they want “customizable metadata”, they’re likely thinking of a different way to categorize the data, akin to OS’s metadata fields, but more controllable in DT. I’d be happy if they were native to DT and not reliant on OS X - that’s already the case with tags and groups anyway.
Sorry to take so long, but I’ve written longer!

Cassady · June 26, 2014, 9:40am

Really enjoyed that - and the same could be said of your blog! Some useful usage scenarios there…

Your examples in your June 22 post "The Deceptive Nature of Note-Taking" [and the promised post on DTPO, which I will be sure to read! ] raised some interesting points…

Coming back to what I’m busy with – and I’m certainly not sure to what extent this will be applicable to many (if any) others – but an off-shoot of working with custom meta-data would be improved ‘linking’ between documents… In other words, building independent links between different documents, based on different metadata, but integrated with the editable metadata of those particular documents…

Going to pop up something in the “What the future holds for DTPO” thread about this – seems to have received a fair number of views, so here’s hoping the Developers have flagged it!

alanshutko · June 26, 2014, 6:25pm

One specific type of metadata I would like to have or be able to define: expiration date.

I store various financial records in DTPO, and I would like to automatically delete them.

Bill_DeVille · June 26, 2014, 6:37pm

You could add a prefix or suffix to a document Name to accomplish that objective. For example, add the expiration date as a Name prefix in the format YYYYMMDD.

You could then do a Name search for the string, e.g., 201409* to find all the documents with an expiration date of September, 2014. If the expiration date had been added as a prefix to the existing document Name, a sort by Name will sort the search result documents by expiration date. It would then be easy to select those to be deleted.

alanshutko · June 26, 2014, 8:53pm

You are absolutely correct. That’s a really hackish solution and clutters up my names, so I haven’t done it (and don’t really want to). I’d rather make a feature request for metadata to put it into.

Bill_DeVille · June 26, 2014, 9:49pm

I prefer “kludge” to “hack”.

As an old geezer who did some bleeding edge research back in the days before there was much funding for research, and before the equipment needed to do work in new areas could be bought from a shelf, I had to put together lab equipment from whatever was available in the storeroom, the local hardware store and perhaps Radio Shack. So most of my lab equipment was kludged. But it worked.

I’ve always thought that was useful experience in life. Very often, in computer software and other venues, one encounters situations such that what’s available to work with wasn’t designed explicitly to do what needs to be done. That’s when I figure out a kludge to get the job done. :}

Yes, kludges can be messy and involve extra steps.

Cristian noted some time ago on the forum that user-generated metadata will become available in a future generation of DEVONthink. I’m looking forward to that day, as I’d prefer not to get involved with messes and extra steps for an issue like yours. Meanwhile, I’ve got things to do and problems to solve. Kludges still come in handy.