OpenAI ChatGPT for automatic generation of matching filenames

syntagm · September 15, 2023, 12:33am

With GPT4 at least, when instructed in the system prompt with something like “use the language of the content for the language of the filename”, it handles it correctly, but I didn’t do too many tests because I like to have all my stuff in English (and Japanese)

Have you tinkered with different quantizations? I think for llama2 using the chat models is always going to have very poor results, better to use the normal text (or instruct) models, but even then I just couldn’t get it to reliably do what I wanted…

Anyway I’ll create a new thread with the scripts that I used

But just using ChatGPT over llama gave a million times better results with little prompt tinkering that I kinda just wanna roll with that for now haha

Another case I would looove to try sometime would be automatic grouping and classification. So find something common in all the documents and then group them into logical groups, similar to the auto-group feature DT had in the past.

Was thinking multiple passes for this, like:

Generate a short summary, or a bunch of keywords and store it as annotation
Chain all the short summaries/keywords together
Send to ChatGPT for analysis

Problem is that the context window is just way too small for bigger text blobs, and esp on gpt4 that gets very expensive. The 3.5-16k one is better but still expensive just for grouping some files, so need to do a couple of passes to make it as compact as possible first

If llama2 can be fine-tuned for document naming and grouping (which I’m sure it can), that would be perfect, no more worrying about cost