Generated Metadata Field Type

bkonia · May 11, 2023, 7:08pm

Currently, all the metadata field types require manual user input to populate the field. However, it would be valuable to have a Generated field type that would autopopulate from a pre-defined string template, which could include placeholders.

Thus, you could create a template, combining multiple placeholders into a single field that could be displayed as a column in any view. This would work the same way that placeholders work in Smart Rules. However, instead of using a Smart Rule to populate a metadata field based on event triggers, the field would populate automatically. Therefore, it would always reflect the current information. A secondary benefit is that this would be a readonly field, so there’d be no danger of the user manually editing its contents.

DTLow · May 11, 2023, 7:15pm

Applescript can populate the field

bkonia · May 11, 2023, 8:31pm

Yes, of course, and so can Smart Rules. My point is that with a generated field type, this could happen automatically, without having to rely on any scripting, rules, triggers, etc… A generated field type would always contain up-to-date information, without requiring the user to do anything special.

In many database engines, like MySQL, you have triggers, which can execute scripts to update fields. However, MySQL also has a generated field type, which is less powerful than scripting, but far more convenient. Generated fields in MySQL can be used to create composites of other fields in the same record. For example, you could create a “Full Name” field which would combine the First Name and Last Name fields with a space in between.

The same principles apply in DT. Scripting is great for complex tasks, but for simple tasks, like combining two fields for display purposes, generated fields would be much simpler, and the field would update automatically without having to wait for a trigger or execute a script. It would also be readonly, so unlike with scripting, there would be no danger of the user accidentally overwriting a value in a generated field.

cgrunenberg · May 12, 2023, 6:25am

Sounds like another & more generic approach to solve this problem:

Noted.

bkonia · May 12, 2023, 1:01pm

Agreed. I still think Group Name should be a standard metadata field. However, this approach would allow users to create anything they want, by combining existing placeholders.

BLUEFROG · May 12, 2023, 1:01pm

What’s the problem with scripting or smart rules to elicit the behavior?

Also, in your example, where is the First Name and Last Name metadata coming from to created this unified Full Name attribute? You essentially seem to be asking for three bits of metadata to be automatically created here.

bkonia · May 12, 2023, 1:52pm

Several problems:

Scripts depend on a trigger to execute. The user has to perform some action and DT has to catch that action, then update a field according to the script or the rule. This usually works OK, but from a backend development perspective, it’s not good system design. DT is doing a lot of extra work to update a field, when it already has the information stored in other fields. With a generated field, DT would simply output information that already exists in the record. No triggers, scripting, or rules needed.
Triggers are unreliable. With the group name rule, for instance, the user has to think of every possible scenario that would require updating the group name metadata. In the screenshot you posted, you selected four triggers: On Import, On Moving, On Classifying, On Demand. Those cover most of the scenarios, but what happens if the user renames the group itself? Suddenly, the group name metadata will be out-of-sync with the actual group name. Like I said, using scripting and rules for this use case isn’t good system design. With a generated field, it’s guaranteed to always be accurate.
The existing metadata fields were designed for user input. Sure, you can populate a text field with the group name using a rule, but it’s still just a text field that happens to contain the group name. Nothing prevents the user from typing something else in that field. A generated field would be readonly, so there’d be no possibility of the user accidentally overwriting the rule action.

In summary, the existing approach of using a rule to update a field with placeholders is a workaround for something that should be built-in to DT.

I know DT doesn’t have these fields built-in, and I’m not suggesting it should create something that doesn’t already exist. That was just a generalized example of how generated fields work in MySQL and other database engines. However, here’s an example of how it could be used in DT:

Suppose you created custom fields for First Name and Last Name. This would make sense, if you wanted to store your contacts in DT, right? So when entering your contacts, you’d enter First Name and Last Name into separate fields. However, when viewing your contacts, you might just want to view their full names in one column, instead of separate columns for first and last name. Therefore, you could create a generated field called Full Name, which would use the First Name and Last Name placeholders, separated by a space. See how much simpler this would be than creating a rule to regenerate the Full Name field each time you update the record? It also avoids wasting space storing redundant data. You don’t need to store the full name as a separate field because DT can generate it on-the-fly from the First Name and Last Name fields.

BLUEFROG · May 12, 2023, 3:29pm

but what happens if the user renames the group itself? Suddenly, the group name metadata will be out-of-sync with the actual group name. Like I said, using scripting and rules for this use case isn’t good system design. With a generated field, it’s guaranteed to always be accurate.

And what if that group (and its subgroups) contain 1000, 10000, or 100000+ items? Think of the overhead in reconciling the metadata on all those files. I understand what you’re saying but I don’t believe it would work at scale. While it could be argued that most people don’t have 100,000 items in a group, we have to plan for extremes. And imagine the effects if something was renamed accidently then re-renamed in quick succession. I imagine instant or frequent beachballing for users with very large databases.

Also, it isn’t SQL (or sqlite3, mySQL, etc.) under the hood. So while something like you’re suggesting may be possible in an SQL-based database, how DEVONthink works is very different.

bkonia · May 12, 2023, 3:48pm

Generated fields would be thousands of times faster and more efficient than rules or scripts. The difference with generated fields is that the information already exists in other fields within the same record. So when you talk about “reconciling” metadata, it makes no sense, because there’s nothing to reconcile. With a generated field, you’re simply displaying existing data according to a specified template. For example, with the Group Name placeholder, DT already knows the group name, which is why the placeholder exists in the first place. It has the information in the record already, we’re just telling it how to display that information. There’s zero additional overhead, unlike with rules and scripts, which create tons of overhead.

I know DT isn’t SQL-based, but SQL is just a language for querying a database. DT’s database model is irrelevant to this discussion, because fundamentally, all databases function the same. A database contains records which can be viewed/updated, etc… When you use a process like a script or rule to update a record, that’s always going to be thousands of times slower and less efficient than simply reformatting existing data within a record for display purposes.

chrillek · May 12, 2023, 4:28pm

Perhaps. But in order to be useful, you’d need some kind of language to create these calculated fields (that’s what they are in MySQL). DT doesn’t have that language (unless you seriously envisagé AppleScript or JavaScript for that, which makes me wonder why such fields would be “thousand times faster”). And, not much DRI here in constructing something from data we already have. That’s one of the reasons, the MySQL developers were very reluctant to add these calculated fields.

And all that for a visual effect?

What fundament do you mean? Bits and bytes, yes. Apart from that, I beg to differ. Relational databases (which DT very much is not) are different beasts from hierarchical and NoSQL databases. Obscuring the differences behind a common term doesn’t really help here.

Certainly not in “all databases”. Frontend stuff like firstName + ' ' + lastName (Javascript) is not shower than firstName || ' ' || lastName (SQL). Why would it be?

Btw, MySQL has virtual and physical calculated fields. The first ones are created each time they are queried, and only the second ones are really part of the table (taking up space there)

bkonia · May 12, 2023, 5:03pm

It’s nonsensical to claim that DT has no “language” because obviously DT was written in a language, most likely Swift. You seem to be confusing the database language, which is irrelevant to this discussion, with the application language. Generated fields would be displayed at the application level, in the same way that DT displays any of the existing fields. The only difference is you’d be able to access placeholders and combine them if you wish. Example in PHP:

echo $first_name . " " . $last_name;

Combining the First Name and Last Name fields into a single full name field for display purposes literally takes no more time (perhaps a nanosecond longer) than outputting the individual fields without combining them. The processing time required to combine fields using string operators is negligible, even over thousands of records.

Databases are all fundamentally the same in the same way that all cars are fundamentally the same. A database exists to store and retrieve information. It’s obvious and common sense that outputting information from existing fields is going to be way faster than running a process that has to perform an update operation each time something changes.

I never said it was. Those are identical operations and both would execute almost instantaneously. The existing process of updating metadata fields using scripts and rules, however, is WAY slower, because it has to go through a trigger/action process. It has to listen for certain triggers, then calculate the new value, then write that value to the database. This is much slower than simply reformatting existing data when it’s displayed.

The difference in performance between stored and generated virtual fields is generally very small. The main benefit of stored virtual fields is that they can be indexed. So, if you need to search for data in a virtual field, the stored approach would be the way to go. Also, MySQL allows for complex calculations in virtual fields, which could theoretically be slower. DT only supports string operations, which are always going to be extremely fast. The DT developers could implement this feature using either approach, or even give the user the option, as MySQL does. The generated field approach would be the fastest and least overhead, since it’s not writing anything to the database. The stored virtual field approach would be slightly slower since it has to store the value in the database each time it’s updated. However, it would still be way faster and way more convenient than using scripts or rules.

cgrunenberg · May 12, 2023, 7:10pm

A read-only custom metadata field that just uses the already existing placeholders would be no big deal, the code is already there and this doesn’t affect the database/index/sync or overall performance.

chrillek · May 13, 2023, 7:13am

And that would presumably work using string concatenation, like first name last name. But what about things like amount/(1-VAT) (for net amount) or yearsof(today - birthday) (for age)? For these things one would need at least operators and some functions…

cgrunenberg · May 13, 2023, 7:29am

That’s a completely different request, IMHO. At least as far as I understood the initial request.

chrillek · May 13, 2023, 11:08am

Right, the OP said „string template“. But why stop there in generating data…