Tags and AI or Machine Learning

BlueKnight · September 18, 2019, 1:41am

As I use Devon Think more and more I started thinking about an idea of how to apply AI to help not only categorize but tag.

The Machine learning is already built in and the software already does a lot of this but some things would need to be created.

Here are some ideas.

The concordance engine could be used to take the words and relationships and remove the common things to come up with tagging recommendations.
the software can utilize the AI engine to see the similar × items and see the tags that are currently used and recommend them.
Use the AI to combine the two above based on similarities.
Have a tab with recommendations and check boxes to select the tags to use.

Of course this is just a start as this can be extended tremendously by utilizing already existing ML text processing libraries, etc.

ryanjamurphy · September 18, 2019, 2:01am

Your ideas take the notion a bit further, but I have a smart rule set up to auto-tag based on concordance thanks to a script written by the DEVONtech team. I wrote it up a bit here: https://github.com/ryanjamurphy/augcog/tree/master/DEVONthink%203%20Autotagging

BlueKnight · September 18, 2019, 2:22am

Thank you for providing that information. But with Machine Learning you can go quite a bit further.

I am actually writing some ML / AI things now for a project I am working on for tagging news items based on ML and utilizing a full text search of article to come up with accurate tags (not for DevonThink, but for my own project).

Considering this is theoretically already done in DevonThink with their internal AI model for documents, if the information could just be surfaced and expanded this would be one amazing feature.

ngan · September 18, 2019, 3:29am

A primitive attempt here: Tagger V2. Two unique features, half-useful, but still interesting (I think).
I’m neither a programmer nor have knowledge in AI or ML. I am just trying to use the “see-also” to do the heavy lifting work. I think whether auto tagging can be successfully done in DT is depending on (1) the objective/nature of tagging, (2) the word count of the items within the database, (3) consistency of existing groups, (4) or as simple as naming of the file.
A very systemic naming system can achieve high quality auto-tagging simply by word match.
See-also tends to give relatively good quality of common-tags-based suggestion.
A database with very consistent and distinctive groups in the database may/will help to narrow down a more targeted set of common tags (only extract the common tags from those see-also documents that are within the higher scored classify-to group).
Unless frequency count is the main criteria of tagging, I think concordance is too “raw” to be used for perception- or interpretation-based tagging because concordance in itself doesn’t constructed schemata. See-also and classify are the processed products of concordance so they may be more useful.

Just a very non-professional reading of auto-tagging, so u need to forgive my naive comment!

BlueKnight · September 18, 2019, 4:00am

@ngan the menu items is nice, but it is not utilizing any AI, it does bing up a menu item and lets you look and tag things which is a great idea.

I think that a combination of your script and what @ryanjamurphy has with the concordance would be a great start. I might play around with what you have and what ryan has to see if I can combine them, but not tonight (laugh).

But ultimately you want to not only do the lookups as you are doing, and concordance top words as @ryanjamurphy is doing, but to actually apply the AI logic across documents and use the actual words within the documents to create a tag list.

Currently concordance is within a document, but can be applied to multiple documents in the list for comparison, it can also be applied against documents with similarities as identified by the AI.

There are a lot of documents out there for machine learning and tagging, there are perl scripts etc, google has even wrote up AutoML info. This is more what I am talking about vs what exists now.

If I am correct, this data already exists within DevonThink to some degree and can probably be utilized or enhanced with all the available libraries (open source) that are out there.

ngan · September 18, 2019, 4:09am

I attempt to use the results of see-also document. So definitely not AI but perhaps attempting to use the results of the AI of DT…

Hope you will post ur results in this forum (success or not) at the end of your work for it’s interesting, and if it is based on the initial work of the members of this forum who contribute their knowledge.

ngan · September 18, 2019, 4:26am

I don’t know DT client base but I speculate 95% won’t use see-also or classify so that won’t be a priority for … perhaps forever. So it’s you or no one

atdnorth · November 4, 2019, 12:28am

I have a large number of journal articles that I store in DT. I use the “Download Bibliographic Information” in the Smart Rules to get the title of the article and metadata (author, journal, etc).

I also store these articles in a flat structure. Many articles have several tags. I don’t want to store the article in a group because then I would need to put thought into what group (of several different options) the article belongs in.

If I a hundred articles related to a topic (e.g. tag), I can then use the Tags “explorer” to easily filter down to a subset of documents based on the tag I choose. That is a great new feature of DT3.

However, my big struggle (and time sink) is tagging the documents in the first place. As mentioned above, I would love to have tag suggestions with the “See also and Classify”. This will save a huge amount of time by presenting tags that are similar to other documents that the AI engine finds. It could even list tag suggestions by rank.

The current “Groups” in “See also and Classify” is not that useful to me because I store my documents in a flat structure. I remember reading that DT treats tags and groups the same way (e.g. a group is a tag or vice versa). I have no idea how hard an effort it would be to include this new functionality, but it certainly would nicely complement the “See also and classify”, the tag “explorer”, and the efficiency of workflow within DT.

Thanks for your consideration.

jongilizwe · November 4, 2019, 12:49am

I wrote a semi smart keyword suggestion script for Bibdesk a few years ago - it might be adaptable. Basically it comes up with a list on the assumptions that 1. if a keyword is in the title, it’s probably appropriate, 2. the same author writes about similar topics, 3.,the same journal publishes on similar topics, and 4. I think it falls back to place names if nothing else about the article matches. I tested it and about 50% of the time it had at least some keywords I wanted (seeded with a 8500 or so item bibliography). Mostly though it removes the initial resistance created by an empty field.

rkaplan · November 4, 2019, 1:28am

Can someone help with an example of how this works?

Do you run the Script on a PDF file and it gets the metadata from the PDF?

atdnorth · November 4, 2019, 11:24am

I don’t know technically how the DT solution works, but my guess is that DT finds the doi on the pdf and uses this as a lookup to a journal metadata service.

Here’s my smart rule setup.

atdnorth · November 4, 2019, 11:30am

@cgrunenberg - is anything like this being considered for DT3? (if you are able to reveal)

I have a queue of over 1000 reference journal articles to tag. I generally would do this in small batches to get through it, but if there is possibly something in the works for the future, I would hold off and use my time most wisely.

cgrunenberg · November 4, 2019, 11:59am

It’s considered (like most requests) but not planned in the near future.

ryanjamurphy · November 4, 2019, 2:02pm

Have you tried the auto-tagging scripts I mentioned above? Tags and AI or Machine Learning

If so, what do they not do that you’re looking for?

atdnorth · November 4, 2019, 2:51pm

I did try it out. A couple of reasons it does not work for my intended goals:

My preference is the ability to see the tag suggestions prior to applying the tags to the document itself
I would like tag suggestions to be based on the tags that have been applied to other similar documents (which requires the DT AI engine - aka See also and classify)

BlueKnight · November 15, 2019, 11:55pm

After using Ryan’s script, I have to say that it works pretty well.
The other thing that I have learned is the following:

You can click on any word and Add it to the Tag in the document, by right clicking. That is a useful feature.
Any of the words in the concordance list can be clicked on as well to add to the list. Especially when you can click and organize the words.

So my workflow is the following:

Use Ryan’s Script to get general tagging.
Remove some of the Tags
Look at the Tags with Concordance, or if I am reading document and not just saving the document then I will Tag as I go in the document.

Oh and of course I have a Smart rule to clear all the Empty Tags once in a while.

ryanjamurphy · November 16, 2019, 5:26pm

Just a tiny correction: while I might be it’s biggest advocate, the script was put together by DT staff! Credit where credit’s due.

And indeed, that’s roughly my workflow as well.

I am thinking about some other applications, though. For instance, I’ve recently been paying more attention to my blog, which has a particular focus (and therefore only really needs a certain collection of tags). It should be trivial—though tedious—to create a list of those tags and let the auto-tagging script identify and add them whenever I author a new post. That is, run the script as normal, but filter the options by a preset list such that only those preset tags are added if they come up.

Then, I’ve been thinking about a fun extension of that concept. Say I have a tag “Education”, and I write a lot of things about that topic, but a lot of what I write doesn’t feature the exact term “Education”. Instead, things like “University” and “Academy” and “Learning” come up.

It shouldn’t be difficult to develop a list of those related topics. Then, the auto-tagging script can use them to “seed” the Education tag whenever they come up. This way I don’t have a blog with a couple dozen education-related tags—I have just the one tag, but the system can automatically tag posts that relate even if that tag isn’t mentioned.

pete31 · November 17, 2019, 9:17pm

Very interesting and thanks for (drawing attention to) these scripts!

I would like to use the scripts to tag a database of PDFs but with a concordance that excludes all german words except nouns.

So far I’ve found that it’s possible to add words to ExcludedWords in /Users/user/Library/Preferences/com.devon-technologies.think3.plist and afterwards reload the plist with defaults read "/Users/user/Library/Preferences/com.devon-technologies.think3.plist" in Terminal.

Has anyone links to german corpora of verbs, adjectives etc.?

BLUEFROG · November 18, 2019, 1:02am

So far I’ve found that it’s possible to add words to ExcludedWords in /Users/user/Library/Preferences/com.devon-technologies.think3.plist and afterwards reload the plist with defaults read "/Users/user/Library/Preferences/com.devon-technologies.think3.plist" in Terminal.

This smells like trouble. I would see if @cgrunenberg has any advice on what you’re proposing.

cgrunenberg · November 18, 2019, 11:52am

It’s definitely not recommended.