How to you associate additional data with a document in DT?

Frederiko · April 27, 2013, 2:35pm

Perhaps one of the gurus here can help me.

I need to attach bits of data to documents (and especially pdfs) after they are in my database. Typical such things as document reference codes, duration of the events occuring in the document, pariticpant reference codes and the like. I often have to pull out sorted tables of subsets of this information.

At present I get this information in an csv file and I use a python script to put the information into data fields in scrivener for each document. Its also possibile in Scrivener to produce a table sorted by these data fields which is incredibly useful for collation and subsetting

Scrivenr is brilliant and incredibly flexible at what it can do but at 12000 documents scrivener is beginning to fall over regularly and so I think I need to move to a real database. I would also really like to be able to take advantage of DT’s more advanced syncing and its webserver

My problem is that I cant find any way to associate bits of information with individual documents in DT

I tried tags but it doesnt really work because your tag cloud is then cluttered up with thousands of tags that will only ever appear once and the usefullness of tags for conceptual concepts is lost.

Could someone please give me some hints about I might tackle this problem?

Much appreciated, Frederiko

korm · April 27, 2013, 5:00pm

Frederiko · April 27, 2013, 6:37pm

Thank you for your help Korm.

Your idea will work (even if its a bit clumsy), Now I just need to right scripts to automate the process. From my digging around it seems that DT’s database format is more opaque and considerably less hackable than Scrivener’s xml although its applescript api is pretty powerful.

Regards, Frederiko

AccordionNoir · May 20, 2013, 11:58pm

No fair, the “answer” disappeared to this question that I wonder about too. I’ll keep looking.

Frederiko · May 22, 2013, 3:52pm

AccordionNoir,

The answer which was given by Korm that one should use the Annotation template (Data->New from template-> Annotation) to connect the links to that document. The Devonthink manual says the following about the template under the section 'Pre-fabricated templates":

“Annotation: Adds an annotation for a selected document. The annotation is placed next to the selected document (and replicated to all other instances of the document if necessary) and contains a link to the annotated item. If possible, this template links to the current page of a PDF document. If there is already an annotation for the selected document, a second one is not created, but the existing annotation is opened.” (at p138)

The entry which Korm removed did a nice job of explaining it together with a neat screen shot.

Effectively all an annotation is is an rtf document with the same name as the document being annotated with ‘Annotation’ inserted after the name and a reference to the document placed in the text. Despite what the quoted text from the manual says there does not seem to be any underlying programatic link to the document itself (except for the link in the text). So for example where you have more than one document with the same name (a common scenario for me) its is unclear immediately which annotation applies to which document.

To be fair this template is described as an annotation template and that is exactly what it does. It wasn’t designed for being a source of psuedo tags. This is why I described the solution proposed by Korm as clumsy for my purpose (no offense was intended towards Korm)

This solution did not work for me (without the prospect of some very convoluted programming) because I need to have key-value fields that I can work with programatically. This is commonly referred to as meta-data. So by way of example every document I work with has certain attributes that have to be associated with it such as a document number, a duration and a date. Typically with a key value store these would be stored as separate key-value meta data pairs such as ‘docno=12343’,‘duration=3’ or ‘date=2013/12/01’.

Presently I have been working with Scrivener which allows for very extensive and flexible meta-data to be asociated with each document which can then be used to generate flexible indexes or fed to another programme such as Aeontimeline.

I have been trying another approach to that suggested by Korm. Instead I have been experimenting with building pseudo tags in spotlight comments so that a comment contains a string of text such as ‘docno=12343#duration=3#date=2013/12/01’

This approach has a number of disadvantages but is still better than the annotation template for my purposes. Ultimately I can’t make it work for my needs but maybe you can.

The disadvantage of using spotlight comments:

It imposes no structure and is very prone to error by those entering the data. Typically the meta data for the document in my case will be entered by an intern. I am dealing with thousands of documents so the potential for user errors, where the tag name has to be entered as well, is enormous.
Its just much harder to deal with programatically. I extract lots of information for the pseudo tags programatically from the documents to enable them to be categorised.
You cannot sort visually on a pseudo tag or easily construct search queries around them.

After many hours of experimenting with python and applescripts to try and simulate the key-value metadata I am afraid I am close to throwing in the towel I am really sorry about this because in so many ways Devonthink is a deep and fantastic product and is a better fit for what I do than Scrivener. I just can’t do without document metadata.

I am really really hoping that key value metadata is something that will be added in the near future then I can really step up my use of Devonthink to another level. I hope a dev is reading this please

Frederiko

BLUEFROG · May 22, 2013, 4:37pm

Why can’t you add the duration and date as Tags? And if you don’t want so many unique Tags for document numbers, prefix or suffix the filename with the document number??

I am sure there are many documents whose duration would be “duration=3”, especially if the notation is that coarse a value. (It would be different if the duration was “03:17:29”). And I’m sure there are documents that would be related to the same date.

Adding the document number to the name would provide easy searching and be good for at-a-glance information.

korm · May 22, 2013, 5:45pm

I suggest reading Seth Bunsen’s approach to document naming (Naming and Searching Files) - I don’t suggest Seth’s specific structure, but the compartmentalization of his naming approach which could be adapted (IMO) to meet these requirements, with the benefit of easy search as well as the programmatic access to the name attribute. That can then be parsed with regular expressions, Applescript, etc.

sjk · May 22, 2013, 6:59pm

Bill_DeVille wrote:

User-assignable “fields” are planned for a future generation of DEVONthink.

Is that what you mean by “key value metadata”?

BLUEFROG · May 22, 2013, 7:42pm

In a simple fashion you can add key-value metadata (though it is string based). For example, there is nothing stopping you from entering a Tag of “duration:3”. I have similar test Tags for dates as well.

Note that this does diminish numberical sorting unless the notation was reversed to “3:duration”, coerced values, or split by delimiters.

Bill_DeVille · May 22, 2013, 11:31pm

Perhaps because of my experience in doing cutting edge research in science in the past, I’m a great fan of kludges, accomplishing what needs to be done by adapting what’s available. (By definition, the equipment to perform research in new areas is very unlikely to be found in a catalog. Hence the need to kludge.)

Let’s take the issue of adding user-defined metadata ‘fields’ (key-value metadata) to documents in a database. DEVONthink doesn’t create a proprietary document type that allows that. And there’s no universal system to do that that works for all the filetypes one might encounter in a database.

But there are some easy to implement kludges that mimic this, often well enough to satisfy the user’s objectives.

For example, Jim mentioned approaches to the Name of a document that might be useful.

Suppose I want to attach citation information to documents in a form that’s searchable, and will apply to documents of any filetype.

I can do that in a rich text note that’s linked to the referenced document. I can define metadata ‘fields’ such as TITLE, AUTHOR, SUBJECT, PUBLISHER, etc.

Example: I create text ‘placeholders’ for the entry of my metadata, like this:
TITLE (followed by a Space)
AUTHOR (followed by a Space)
SUBJECT (followed by a Space)
PUBLISHER (followed by a Space)
etc.

To avoid the drudgery of typing this for each document on which I wish to apply this scheme, I can create a Template document that holds the text, and create a new note from the template, or simply copy/paste from it into another note such as an Annotation note already linked to the referenced document.

Suppose I’ve got digital copies of the works of Aldous Huxley in a database. One of those would have this citation information:
TITLE Brave New World
AUTHOR Aldous Huxley
SUBJECT dystopian science fiction
and so on. (Filling out the forms would often be done by copy/paste.)

Now I can search for all the items of which Aldous Huxley is the author, in this way:
In the search query field, type 2 quote marks, then place the cursor between the quote marks. Type author aldous huxley to enter the criterion. As the criterion is defined by the quote marks as an exact string, only those documents that contain that string will be found.

As needed, I might add some other tricks, such as making the note’s Creation Date the date of publication of the book. As that’s a sortable characteristic, I could then sort Huxley’s publications by date. Or search for Huxley’s publications before or after a given data, or within a specified date range.

Because I’m using free form text ‘fields’, I’m not limited to the few metadata fields available for PDFs. I can have as many sets of metadata characteristics as I wish, for whatever purposes for which they are useful. And of course I can also use other criteria and filters in searches in combination with them. If I want, I could sort Huxley’s writings by word count.

Even assuming a future generation of DEVONthink that provides for user-created metadata fields, my text-based kludges might be preferred for some purposes, as they can be exported from the database.

korm · May 23, 2013, 12:24am

Amen. They can be exported and they won’t die when some proprietary software that created the “kludge/metadata” dies. Or Apple decides to change the filesystem and kills things like OpenMeta. (Don’t shoot me Jim! I’m just saying…)

AccordionNoir · May 23, 2013, 1:19am

I think for me as a new user, what I might find helpful is a user-interface for such a text-based metadata system.

As I commented over here: [url]Reference Management]

Could there be an easy way to enter, retrieve, and edit basic metadata which was then stored in a text file linked to each primary document? The user-created script solutions do some of this, but i wish there was a basic solution that “came out of the box.” It’s not a bad thing for me to spend a week figuring out how to start setting up my database, but it could be easier.

I wish there was a GUI to enter my basic metadata about a PDF while I’m reading the info off the PDF. I’d love to experiment on the best way to do this, but jumping back and forth between two open files seems clumsy when I’m on my laptop. I wish I could split screen or have a little “add annotations” window like the Get Info one. Something like that would be nice.

“How do I add a bit of information to these documents?” does seem to be a reasonable question several people are facing. It sounds like it’s not simple to do this because there’s no universal way to include that information in the original file.

I’m supportive of the obvious good of non-proprietary formats. When I exported from Scrivener to Devon, I found that Scrivener kept all its metadata in readable text files. This was good; I need to figure out how to make it useful in Devon, but at least it’s openly available to me.

If Devon could integrate this file-based “Kludge” in a way that uses exportable text-file formats, but makes it easily editable within the DevonThink window for each linked item (for users who want to use a GUI method) that might help people solve this issue. I think that would make things quite easy for me.

I’d be happy to beta-test!

Frederiko · May 23, 2013, 9:32am

First of all, thank you for all the advice and references! I will keep plugging a way at a set of applescripts to manage custom meta data and post them if I can make them more generalised.

I am with AccordionNoir on this one. I think custom document metadata is so fundamental to a document database that it shouldn’t have to be a kludge.

I like the way Scrivener gets this aspect right and I think its a model that would fit in quite nicely to Devonthink’s layout and structure:

This is the custom meta data pane (like Devonthink’s info pane) for a document

I also like the way the metadata fields can then be added to an outline view for sorting purposes. The really neat thing about this view is that it can be exported as a csv document for preparing indexes or summaries.

Scrivener, like Devonthink, also has an auto summary generator but the neat thing is that it becomes part of the document metadata as the synopsis.

(To be clear this is not a criticism of DT. I love its applescript API, much more flexible ui, sorter, its robustness and stability, the webserver … and so on. Its very deep and I discover more and more as I burrow away. Nevertheless i think Scrivener has a lot of good ideas which would work well in the DT paradigm)

Frederiko

Frederiko · May 23, 2013, 9:54am

Its easy getting the data out of Scrivener (either in the way you suggest through the export mechanism) or more flexibly by opening its package. Scrivener has a very open readable xml format. Open the scrivener package and then open the .scrivx file in a text editor (such as TextWrangler which will recognise it as xml). How to use DT and Scrivener together is the subject of some long threads (an convoluted solutions( in the Scrivener forum which are worth searching for.