DT4 - Generate file name by examine content by using internal DT-methods NO external AI

Hi there,

I searched for a function which derives a meaningful name based on the content of a record in DT’s manual, but found none. Personally, I would prefer to NOT use an external AI for this. Is there something usable already in DT4? Something I could use in Smart Rules.

Something more enhanced than

Icon Proposed Name: A suggested name derived from a document's title or from the first line of the document if no title is present. (%recordProposedName%)

I am aware of Experimenting with OpenAI API for automatic classification and renaming - #14 by rkaplan.

Background for my question:
I scan my documents, the scanner places the documents in a WebDAV-directory which are load into my INBOX by Hazel. The filenames are meaningless.

What is “meaningful” in this context? Your request is so general that it’s impossible to give a meaningful answer.

From my perspective, the placeholder %chatSuggestedTitle% is what OP is looking for.
This could be achieved by using a local (but still external from DT pov) LLM.
However, the degree of “meaningfulness” might needs to be evaluated.

Sorry, for not being specific about what I expect the filename to contain. When I read the file name, I want to get an idea about …

  • the type of document - e.g. invoice, order, email
  • the subject - research about XY, order for product XY
  • the “counterparty” - company XY, doctor, laboratory XY, research group XY

I thought about using a local LLM for this, but it is quite slow, I hoped for something which is fast - give results in about 1-2 s per document. Running mistral-nemo, gemma2 on top of ollama takes more than 30-60 s per document.

I’m about to archive old documents of mine. Having “2025-01-01-scan.pdf” does not help a lot when looking for files using the filename. Besides that, I found that having a good filename improved my search results in DEVONthink.

And an email can’t be an order or invoice? I’m not trying to piss you off here, but I think that at least this requirement is not fully thought through. Not sure about the other two, either.

I’d probably go from “type” (order, invoice, account statement) to “counterpart” and “subject”. Trying to use regular expressions and some rules in the process.

But all that requires some kind of regularity in the documents. You could, for example, define a list of companies and tags for those. Then work on the files with the same tag.

If you want to have it faster via AI, you will need to use an external service. I had the same experience with local LLMs when experimenting with Paperless.
And if you go down that road (not saying you should, just if you did) - you could via AI used in smart rules fill all what you are looking for and add it to your filename.

However, you can as well go the way that @chrillek has shown up - setting up rules based on regularities. And finally you can as well mix and match - like building these rules for your standards and using AI (maybe then a local LLM would be sufficient?) for the non-standards - it is up to you.

One the advantages I learned to appreciate with DT.

, but I think that at least this requirement is not fully thought through

Thanks for pointing that out! You’re right. While composing the post, I was uncertain whether to include the term “type” in the name at all.

“counterpart” and “subject”.

Seems to be “enough”.

But all that requires some kind of regularity in the documents. You could, for example, define a list of companies and tags for those.

Based on other posts you wrote, I build this script:

A minute is pretty slow. What’s your hardware? For this task a smaller parameter model would be fine.

1 Like