DT4 - Generate file name by examine content by using internal DT-methods NO external AI

MarvinMarvelouis · April 18, 2025, 10:53am

Hi there,

I searched for a function which derives a meaningful name based on the content of a record in DT’s manual, but found none. Personally, I would prefer to NOT use an external AI for this. Is there something usable already in DT4? Something I could use in Smart Rules.

Something more enhanced than

Icon Proposed Name: A suggested name derived from a document's title or from the first line of the document if no title is present. (%recordProposedName%)

I am aware of Experimenting with OpenAI API for automatic classification and renaming - #14 by rkaplan.

Background for my question:
I scan my documents, the scanner places the documents in a WebDAV-directory which are load into my INBOX by Hazel. The filenames are meaningless.

chrillek · April 18, 2025, 11:05am

What is “meaningful” in this context? Your request is so general that it’s impossible to give a meaningful answer.

Connor · April 18, 2025, 11:54am

From my perspective, the placeholder %chatSuggestedTitle% is what OP is looking for.
This could be achieved by using a local (but still external from DT pov) LLM.
However, the degree of “meaningfulness” might needs to be evaluated.

MarvinMarvelouis · April 18, 2025, 12:28pm

Sorry, for not being specific about what I expect the filename to contain. When I read the file name, I want to get an idea about …

the type of document - e.g. invoice, order, email
the subject - research about XY, order for product XY
the “counterparty” - company XY, doctor, laboratory XY, research group XY

I thought about using a local LLM for this, but it is quite slow, I hoped for something which is fast - give results in about 1-2 s per document. Running mistral-nemo, gemma2 on top of ollama takes more than 30-60 s per document.

I’m about to archive old documents of mine. Having “2025-01-01-scan.pdf” does not help a lot when looking for files using the filename. Besides that, I found that having a good filename improved my search results in DEVONthink.

chrillek · April 18, 2025, 12:39pm

And an email can’t be an order or invoice? I’m not trying to piss you off here, but I think that at least this requirement is not fully thought through. Not sure about the other two, either.

I’d probably go from “type” (order, invoice, account statement) to “counterpart” and “subject”. Trying to use regular expressions and some rules in the process.

But all that requires some kind of regularity in the documents. You could, for example, define a list of companies and tags for those. Then work on the files with the same tag.

Connor · April 18, 2025, 12:47pm

If you want to have it faster via AI, you will need to use an external service. I had the same experience with local LLMs when experimenting with Paperless.
And if you go down that road (not saying you should, just if you did) - you could via AI used in smart rules fill all what you are looking for and add it to your filename.

However, you can as well go the way that @chrillek has shown up - setting up rules based on regularities. And finally you can as well mix and match - like building these rules for your standards and using AI (maybe then a local LLM would be sufficient?) for the non-standards - it is up to you.

One the advantages I learned to appreciate with DT.

MarvinMarvelouis · April 18, 2025, 12:58pm

, but I think that at least this requirement is not fully thought through

Thanks for pointing that out! You’re right. While composing the post, I was uncertain whether to include the term “type” in the name at all.

“counterpart” and “subject”.

Seems to be “enough”.

But all that requires some kind of regularity in the documents. You could, for example, define a list of companies and tags for those.

Based on other posts you wrote, I build this script:

cornchip · April 18, 2025, 8:09pm

A minute is pretty slow. What’s your hardware? For this task a smaller parameter model would be fine.

MarvinMarvelouis · April 29, 2025, 4:07pm

MacBook Pro M2

Sorry for the delay. I was on vacation.