Storing key-value pairs against documents in DevonThink

Hi

I really like DEVONthink’s custom metadata and am using it extensively for storing data about specific documents. I have a use-case that I think may not be supported yet, and wanted to see if anyone has any ideas about how to do it.

I have a smart rule that analyses thousands of statement of work documents and determines the top three skills required for that SOW. For example, it might come up with these combinations of skills and scores:

Containerisation: 60%
Serverless: 30%
Databases: 10%

There could be, say, 50 different skills to choose from.

I’m wondering how I could do this with custom metadata. Theoretically I could (by hand) create new custom fields for every skill, setting each to a number->percentage type. But these fields only relate to a small fraction of my documents (SOWs) and there are so many that it would be quite unwieldy to create everything as custom fields.

This key-value data, would be represented pretty well in a JSON format or as a relational (lookup) field. I wondered if a) there’s a field type I could use to store this data or b) if there are any plans to add such a function.

Finally - are there any workarounds to store this data? I thought about maybe creating a sheet for each file and storing this; but I’m not sure if that would allow for easy searching / filtering in DEVONthink.

I also thought about using a completely separate database - perhaps PanoramaX. This would store the document ID, plus the metadata. This would hopefully allow searching / filtering via the database, and would allow complex metadata on a document basis (so different types of document could have different fields or key-value pairs).

One other workaround, for this particular use-case; is to have 3 fields “Skill1”, “Skill2”, “Skill3” and and other 3 fields for the percentage associated with each skill. But it feels inflexible and clunky…

You could create 50 or so number-valued custom metadata fields. Then you can search for documents where “Containerisation is at least 30% and Database not more than 50%” with a query like mdcontainerisation>30 mddatabase<=50.
(Make sure to not put quotes before the operators!).

Why would you do that “by hand” if you apparently already have a smart rule that knows about the skills?

Perhaps your data structure needs adjusting? If you have “so many”, it might also be difficult to search for them (or aren’t you envisaging searching for them?).

So, for JSON, you’d create the property names automagically? Then you could do the same thing for custom metadata. And having a huge number of possibly unknown property names doesn’t make much difference to having a huge humber of custom metadata fields.

Since JSON is just a string, you can of course store it as a custom metadata field, say “My JSON SOW values”. But searching/filtering that like you could do with separate MD fields will not work unless you read all the data with a script and then work on the JavaScript object after JSON.parse()ing it.

Your requirements are blissfully imprecise. Perhaps you can clarify what you want to do with that data? I don’t see how using a database with a column for each skill helps if there are “so many” of them.

Yes. In a relational database, I’d use a table with fields SKILL, PERCENTAGE, ID where ID is the UUID of your DT document. Then you could enter whatever name you want for SKILL, even “so many”. You’d have to make sure that you only enter three values per document, but you can use SQL to query your data, group it etc. SQLite comes to mind, which is installed with macOS anyway.

But I still think you should pare down the number of skills considerably. Otherwise, searching for them becames a PITA.

1 Like

I’ve shilled enough for DEVONthink on the Tinderbox forum that it’s only fair I return the favour.

The two applications work surprisingly well together, and TB is ideal for mixing note ‘prototypes’ with custom metadata fields; you can also tell TB to ‘watch’ a DEVONthink group and automatically import and link those documents.

DEVONthink for document storage, analysis and management; Tinderbox for complex logic, metadata, ‘objectification’ of imported/linked documents.

2 Likes

Are the exact numbers really that critical?

If you simplified this to simply adding those skills as tags it would be a much easier project.

1 Like

Thanks so much for the detailed and insightful reply.

You could create 50 or so number-valued custom metadata fields. Then you can search for documents where “Containerisation is at least 30% and Database not more than 50%”

Great - yes; that is exactly the kind of thing I’d like to do.

Why would you do that “by hand” if you apparently already have a smart rule that knows about the skills?

I mean, defining the fields — as far as I know, there’s no programatic way to do that; it has to be done in the Data section of settings. If it could be done via a script that woudl be much easier! But yeah - setting the fields per document would be done programatically in the smart rule.

Perhaps your data structure needs adjusting? If you have “so many”, it might also be difficult to search for them (or aren’t you envisaging searching for them?).

Maybe; but I guess it’s more of a general point about having large amounts of metadata and wanting to have key-value pairs. Skills are an example, where you have a pair of skill plus percentage; but I might want to do the same with other attributes. For example, a list of line items and hours. These examples might work if adding a new field every time there’s a new skill, but it becomes unweildy - especially if the smart rule (AI) discovers a new skill and wants to add it to the list; it has no way to do this without the user opening the Settings → Data panel and adding it manually.

So, for JSON, you’d create the property names automagically? Then you could do the same thing for custom metadata. And having a huge number of possibly unknown property names doesn’t make much difference to having a huge humber of custom metadata fields.

Yeah - similar to what you’d do in a document database. In the case of the skills, you’d have a “skills” property, which would be an array of skill objects (each of which has a skill and a percentage. Like this:

{
"SOW": "1234",
"Skills": [ 
  {"Skill": "Containers", "Score": 0.5}, 
  {"Skill": "Serverless", "Score": 0.3}, 
  {"Skill": "Networking", "Score": 0.2} 
  ]
}

And then ideally you could do a query using something like JSONata or SQL in order to filter on SOWs that have containerisation>50% and so on.

Your requirements are blissfully imprecise. Perhaps you can clarify what you want to do with that data? I don’t see how using a database with a column for each skill helps if there are “so many” of them.

Mainly searching and filtering. I wouldn’t have a column for each skill (as that would be analogous to adding them each as a custom field in DT) but rather would have a documents table, with a row per document and a skills table, with a row per skill. Then a document-skills table, which allows a many-to-many relationship linking documents to skills, and includes the percentage score.

That’s a good idea re: SQLLite, and potentially I could script this with AppleScript. I think the only downside would then be the lack of ability to filter in DT. Unless one day DT can support querying an external datasource to get a list of documentIDs to display or something like that.

If you simplified this to simply adding those skills as tags it would be a much easier project.

That’s a good idea… And perhaps a good simplification for this specific use-case.

One nice addition to DT would be the ability to query tags that look like this:

containerisation:0.5

In other words - have tags that are key-value pairs.

… A bit similar to how Omnifocus does it.

Then you could still perhaps query all documents that have a “containerisation” tag with a value of 0.5 or more.

TB is ideal for mixing note ‘prototypes’ with custom metadata fields

That’s a brilliant idea; thanks. I do have TinderBox, but found it a bit scary having it import/watch large DT groups. If it can watch a smart group (for example, a group of all documents with a category of SOW) then I think that could be ideal. I’ll give that a go!

Perhaps I could even still use Smart Rules in DEVONthink (with AI) to produce the skill scores as json and output it into a multi-line text field in DT. Then perhaps something on the TB side could parse that and produce the correct metadata.

How are you going to use this data?

If your use case is fairly limited, I would suggest storing the key-value pairs as plain text in the Finder Comments. At any time you can use a script to generate a list of documents related to the skill “Serverless” (or any other skill name), sorted by its percentage score.

This is not perfect, but it does not require maintaining a large number of custom metadata fields. Simple storage, complex retrieval.

1 Like

I think you can set a note (also a group) in TB to monitor anything with a URL in DT; you may even be able to fudge some AppleScript to carry any properties over.

Which means you should be able to pull the URL of a smart group in DT.

I just tested this, rather than do what I need to do, by pointing TB at a DT group - then I set a note’s ‘DEVONthinkGroup’ to the URL of the group copied from DT.

It pulled everything over (which is the first time I’d seen images in TB, which was exciting for me!) - it carried over the ‘Tags’ from DT (as well as links and source links) - tags might give you a neat way to filter them. You could then assign DT ‘prototypes’ to items that enter in that group, and build some agents to do the logic and automated collection of matching items.

A script can do that with addCustomMetadata: If the data field is not there, it gets created. In DT4, you can even define a data type for the field.

Yes. Provided that you know “containerrisation” exists as a field in your JSONdata in the first place. That’s what I was talking about previously: If you do not structure your data, you can’t use it in queries. And

how would you know about this “new skill”? And if you don’t know about it, how can you ask for SOWs requiring 10% of this skill? And are there really an unlimited number of skills in this IT world?

If you insist, yes. In my opinion, there are either ways to mess up a workday then creating shell scripts to send to sqlite3 and then splitting up the output of that in AS. At the very least, I’d use JavaScript.

Don’t hold your breath.

You’re trying to bend a tool to your requirements. Perhaps finding the right tool or thinking again about the requirements is a more promising approach.

For example: Do you really need a percentage value if you’re only using the three most relevant skills per document anyway? If the percentages are not necessary, you could go for tags and query with DT. Or, sacrilege: Put all your data in a database and use SQL. Then you can store as many attributes and key-value pairs as you like and you have a very strong query language at your disposal.

I doubt that. Smart groups are nothing you can “watch”, they are simply logical constructs within DT. Same for groups – you’re thinking about folders in the file system, but (smart) groups are not those!

1 Like

Hi Davem - thanks!

So I’m able to get this to work great with a regular DT group. I simply paste the URL into the DEVONthinkGroup attribute, and it pulls in the notes fine.

However, I can’t get this to work if I paste in the URL of a smart group, such as this:

x-DEVONthink-smartgroup://02B25645-E933-45F9-BEA0-F6E3BDB3D1A5

… When I do that, I get a single note within my note, called note from DEVONthink, and inside, it says: “trouble fetching notes in 02B25645-E933-45F9-BEA0-F6E3BDB3D1A5”

… So I’m not sure if TB can watch or import from smart groups.

My SOWS are spread across many groups and sub-groups, though I suppose I could replicate them into a real group, potentially, rather than use a smart group…

You might be right about smart groups, which is unfortunate.

DT offers replicants into a group, though, and there’s the opportunity to use a tag as a glue between them. But also this is to layer a hack on a fudge on a cognitive burden.

Apologies for a premature frisson of excitement and potential. I think there may be sone use here, but probably for future projects when you can start with a cleaner sheet and be mindful of tying the two things together.

Thanks, Dave! I tried duplicating my smart group into a smart rule, which replicates all matching documents to a new group, and that actually works great. I can now import / “watch” the group in TB.

So now all my SOWs are in TB - and hopefully I can build something flexible enough to do what I need. There’s still going to be a few rough edges; but hopefully there’s a lot of room for getting something that works.

I may also try the relational database approach; as I think it should be relatively trivial to track all matching docs in the DB via their identifier, and then build a skills database quite easily in that.

Thanks for the help!

1 Like

Years ago, I developed and used a “skills database” as you are writing about. Used a relational database which worked a treat as the topic of “skills, people, and procedures” is all about relationships–“one-to-many”, “many-to-one”, “many-to-many”. I’d do the same today if I had the need again and even consider putting documents related to skills inside the database or if not, links to them in DEVONthink–maybe both. If other users of the tool involved, the former works better, probably. Just my two bits.

2 Likes