Thank you for providing that information. But with Machine Learning you can go quite a bit further.
I am actually writing some ML / AI things now for a project I am working on for tagging news items based on ML and utilizing a full text search of article to come up with accurate tags (not for DevonThink, but for my own project).
Considering this is theoretically already done in DevonThink with their internal AI model for documents, if the information could just be surfaced and expanded this would be one amazing feature.
A primitive attempt here: Tagger V2. Two unique features, half-useful, but still interesting (I think).
I’m neither a programmer nor have knowledge in AI or ML. I am just trying to use the “see-also” to do the heavy lifting work. I think whether auto tagging can be successfully done in DT is depending on (1) the objective/nature of tagging, (2) the word count of the items within the database, (3) consistency of existing groups, (4) or as simple as naming of the file.
A very systemic naming system can achieve high quality auto-tagging simply by word match.
See-also tends to give relatively good quality of common-tags-based suggestion.
A database with very consistent and distinctive groups in the database may/will help to narrow down a more targeted set of common tags (only extract the common tags from those see-also documents that are within the higher scored classify-to group).
Unless frequency count is the main criteria of tagging, I think concordance is too “raw” to be used for perception- or interpretation-based tagging because concordance in itself doesn’t constructed schemata. See-also and classify are the processed products of concordance so they may be more useful.
Just a very non-professional reading of auto-tagging, so u need to forgive my naive comment!
@ngan the menu items is nice, but it is not utilizing any AI, it does bing up a menu item and lets you look and tag things which is a great idea.
I think that a combination of your script and what @ryanjamurphy has with the concordance would be a great start. I might play around with what you have and what ryan has to see if I can combine them, but not tonight (laugh).
But ultimately you want to not only do the lookups as you are doing, and concordance top words as @ryanjamurphy is doing, but to actually apply the AI logic across documents and use the actual words within the documents to create a tag list.
Currently concordance is within a document, but can be applied to multiple documents in the list for comparison, it can also be applied against documents with similarities as identified by the AI.
There are a lot of documents out there for machine learning and tagging, there are perl scripts etc, google has even wrote up AutoML info. This is more what I am talking about vs what exists now.
If I am correct, this data already exists within DevonThink to some degree and can probably be utilized or enhanced with all the available libraries (open source) that are out there.
I attempt to use the results of see-also document. So definitely not AI but perhaps attempting to use the results of the AI of DT…
Hope you will post ur results in this forum (success or not) at the end of your work for it’s interesting, and if it is based on the initial work of the members of this forum who contribute their knowledge.
I don’t know DT client base but I speculate 95% won’t use see-also or classify so that won’t be a priority for … perhaps forever. So it’s you or no one
I have a large number of journal articles that I store in DT. I use the “Download Bibliographic Information” in the Smart Rules to get the title of the article and metadata (author, journal, etc).
I also store these articles in a flat structure. Many articles have several tags. I don’t want to store the article in a group because then I would need to put thought into what group (of several different options) the article belongs in.
If I a hundred articles related to a topic (e.g. tag), I can then use the Tags “explorer” to easily filter down to a subset of documents based on the tag I choose. That is a great new feature of DT3.
However, my big struggle (and time sink) is tagging the documents in the first place. As mentioned above, I would love to have tag suggestions with the “See also and Classify”. This will save a huge amount of time by presenting tags that are similar to other documents that the AI engine finds. It could even list tag suggestions by rank.
The current “Groups” in “See also and Classify” is not that useful to me because I store my documents in a flat structure. I remember reading that DT treats tags and groups the same way (e.g. a group is a tag or vice versa). I have no idea how hard an effort it would be to include this new functionality, but it certainly would nicely complement the “See also and classify”, the tag “explorer”, and the efficiency of workflow within DT.
I wrote a semi smart keyword suggestion script for Bibdesk a few years ago - it might be adaptable. Basically it comes up with a list on the assumptions that 1. if a keyword is in the title, it’s probably appropriate, 2. the same author writes about similar topics, 3.,the same journal publishes on similar topics, and 4. I think it falls back to place names if nothing else about the article matches. I tested it and about 50% of the time it had at least some keywords I wanted (seeded with a 8500 or so item bibliography). Mostly though it removes the initial resistance created by an empty field.
I don’t know technically how the DT solution works, but my guess is that DT finds the doi on the pdf and uses this as a lookup to a journal metadata service.
@cgrunenberg - is anything like this being considered for DT3? (if you are able to reveal)
I have a queue of over 1000 reference journal articles to tag. I generally would do this in small batches to get through it, but if there is possibly something in the works for the future, I would hold off and use my time most wisely.
I did try it out. A couple of reasons it does not work for my intended goals:
My preference is the ability to see the tag suggestions prior to applying the tags to the document itself
I would like tag suggestions to be based on the tags that have been applied to other similar documents (which requires the DT AI engine - aka See also and classify)
Just a tiny correction: while I might be it’s biggest advocate, the script was put together by DT staff! Credit where credit’s due.
And indeed, that’s roughly my workflow as well.
I am thinking about some other applications, though. For instance, I’ve recently been paying more attention to my blog, which has a particular focus (and therefore only really needs a certain collection of tags). It should be trivial—though tedious—to create a list of those tags and let the auto-tagging script identify and add them whenever I author a new post. That is, run the script as normal, but filter the options by a preset list such that only those preset tags are added if they come up.
Then, I’ve been thinking about a fun extension of that concept. Say I have a tag “Education”, and I write a lot of things about that topic, but a lot of what I write doesn’t feature the exact term “Education”. Instead, things like “University” and “Academy” and “Learning” come up.
It shouldn’t be difficult to develop a list of those related topics. Then, the auto-tagging script can use them to “seed” the Education tag whenever they come up. This way I don’t have a blog with a couple dozen education-related tags—I have just the one tag, but the system can automatically tag posts that relate even if that tag isn’t mentioned.
Very interesting and thanks for (drawing attention to) these scripts!
I would like to use the scripts to tag a database of PDFs but with a concordance that excludes all german words except nouns.
So far I’ve found that it’s possible to add words to ExcludedWords in /Users/user/Library/Preferences/com.devon-technologies.think3.plist and afterwards reload the plist with defaults read "/Users/user/Library/Preferences/com.devon-technologies.think3.plist" in Terminal.
Has anyone links to german corpora of verbs, adjectives etc.?
So far I’ve found that it’s possible to add words to ExcludedWords in /Users/user/Library/Preferences/com.devon-technologies.think3.plist and afterwards reload the plist with defaults read "/Users/user/Library/Preferences/com.devon-technologies.think3.plist" in Terminal.
This smells like trouble. I would see if @cgrunenberg has any advice on what you’re proposing.