Extracting more statistics from DevonThink

I’m wondering if the community has any suggestions on the best way to get more extensive statistics out of DevonThink and items stored within it.

In particular, I’m looking for the easiest way to do things like:

  1. Frequency of occurrence of a Tag, by year, month, week, day (or other time parameter).
  2. Other date-based summaries for Searches
  3. Things I’m not thinking of but might be useful

I know I can create a series of Saved Searches, but, that gets to be cumbersome, so, what I’m ideally looking for is a way to perform a search, and have the results displayed as a statistical summary, by user-selected parameters – rather than the search actually containing the matching items.

For example, I have a few dozen tags and I’d like to know how common certain tags are on a month-by-month basis. The goal being to spot new or interesting trends over time.

Similarly, I’d love some sort of a frequency histogram (or even just the numerical equivalent that I can dump into Numbers myself for graphing) based upon a defined search. When did a certain phrase first occur, and then how often by day, or week (assuming the Created Date is accurate), etc.

I know I can manually create searches for each month, but, it would be great if there’s some way to view this or even create “virtual groups” that function like Smart Groups, but have rules/conditions applied to them for sorting purposes. Like a Smart Group that not just matches what I’m searching for, but, creates sub-smart-groups based upon any of the usual parameters DevonThink can search by.

It may be this is already present, but, I’m just overlooking it…but, any suggestions would be appreciated!

You can create a global smart group to show ordinary tags modified in a certain timeframe e.g., This Week.

This could also be a local smart group specific to a database.

No, there are no histograms or graphs of tag data. It’s the first such request.

Thanks for the suggestion - I did see that as a potential avenue, but, my particular need would be pretty complicated to sort out with this approach.

Fair enough that I’m the only person that’s ever wanted this sort of statistical sub-categorisation of my DevonThink data; the graphing is easy enough to handle via Numbers or some other app of course, but GETTING the statistics is the part that is going to be time consuming for me.

My need is the ability to quickly answer questions like:

“How many DevonThink items do I have that contain the word ArgyBargy, broken down by month, for the last 10 years?”

To answer that at the moment, I believe I’d have to create Smart Groups for items created for each month of every year for all of the years of data that I have. I could then search the Smart Group for ArgyBargy and get the answer I want.

Fundamentally, what I was hoping for was a way to take a Smart Group and sort/categorise the matches based upon things like Tag, or by the Created year/month/day/hour, or any of the other parameters that could appear in the drop-downs as you’ve got there for the Date Modified / Kind / Item.

I’ll continue to play around to see if there’s another way to do what I need. Thanks!

Maybe you should consider writing your own app (or hiring a programmer) for this sort of thing. Something at first glance to me seems that Python (or R) has all the string analysis and computational/statistical features you’d need to analyse the texts. Might not be that complicated. Others will have other favourite technologies and methods, of course. Export (if now already Imported) the files you wish to analyse into the macOS file system to avoid the complexity of interrogating DEVONthink. They can be, of course, be re-Indexed back into DEVONthink to remain inside the the DEVONthink world while you do you analysis coding outside DEVONthink.

You could script it. But I wouldn’t expect it to be fast, depending on the size of your databases and documents. Something like this:

(() => {
  const app = Application('DEVONthink 3');
  const matches = app.search('ArgyBargy');
  /* matches contains all records where the word "ArgyBargy" is found (case-insensitive) */
  const thisYear = (new Date()).getYear();
  const last10years = matches.filter(m => thisYear-m.creationDate().getYear() <=10);
  /* matching records for the last 10 years */
  const monthGrouping = [];
  for (let month = 0; month <= 11; month++) { /*in JavaScript, 0 is the first month */
     monthGrouping.push(last10years.filter(y => y.creationDate().getMonth() == month))
    console.log(`month ${month} has ${monthGrouping[month].length} documents`);

In my test case, though, it was blazingly fast. The script relies heavily on the filter method for arrays. It loops over all elements of an array, testing if they meet the condition supplied. Those are added to the array returned by filter.

In the script, the first call to filter builds the array last10years containing only the matching records whose creationDate was in the last 10 years. The second filter inside the for loop over the months simply gets the records whose creationDate was this month.

Disclaimer: The same thing is certainly possible in other programming languages like AppleScript and Python. I simply used JS because I’m at home there.

1 Like

Thank you very much for the idea - I think that this will end up being the way I go, with a twist.

In looking through the AppleScript dictionary for DevonThink, as well as a few examples from this and other forums, I can probably accomplish the goal I’m after quickest by using AppleScript (or Python or something) to tell DevonThink to create a series of Smart Groups with a search predicate to include items created >= the 1st of a month at midnight, and < the 1st of the next month.

Each Smart Group could then be applied within the Group which already contains items created by the year.

I’m a heavy MailMate user, and the Statistics mode is what I’m trying to replicate here, sort of. I had around a million messages piled up in my IMAP archives, so, decided to do something about that and moved most years into various Databases within DevonThink.

However, from time to time, I still need to go back and check the frequency of messages, usually by month, based upon messages containing certain content, tags, senders, recipients, subjects, etc.

I just have to decide if it’s worth learning AppleScript to automate the process, or committing myself to a couple of hours of boredom of doing it manually.

That would give you at least 120 smart groups. For me that would look unwieldy. But than I do not have a million messages…

I’d rather use a script that can be parametrized with the search term (s).

It doesn’t have to be AppleScript. JavaScript works as well. Perhaps Python, too.