DT4: AI tags are a mix of German and English even when the document is only in English

chrisgve · June 30, 2025, 5:32am

I have uploaded a few papers in my library (sorry I don’t have all the links anymore but here are the titles:

“Alignment Faking in Large Language Models”, tags (gpt 4.1-nano):

Alignment-Faking
Große Sprachmodelle
KI-Ausrichtung
Modellverhalten
Verstärkendes Lernen

“Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs”, tags (gpt-4.1-mini):

AI alignment
Alignment-Faking
emergent misalignment
fine-tuning
Große Sprachmodelle
ICML
Jan Betley, Daniel Tan, Niels Warncke, Anna Sztyber-Betley, Xuchan Bao, Martín Soto, Nathan Labenz, Owain Evans
KI-Ausrichtung
language models
Machine Learning
model safety
Modellverhalten
New
Verstärkendes Lernen

I’ve tried to check if there’s a setting incorrectly set in my config, but I couldn’t find any.

My macOS is set to English, the default language, and the secondary language is French (Switzerland) if that has anything to do with my observations.

cgrunenberg · June 30, 2025, 6:38am

How exactly did you add the tags? Did the documents have any tags initially or were multiple options enabled in Settings > Files > Tags? Because Data > Tags > Add Chat suggestions to documents* adds only up to 5 tags whereas batch processing depends on your prompt.

chrisgve · June 30, 2025, 7:39am

Good question. I’ve checked using preview and the first document has no keywords while the second has two “ICML” and “Machine Learning”.

Now I’ve reset the tags for both documents, and I tried:

data → tags → convert hashtags to tags: nothing for both
data → tags → convert keywords to tags: nothing for the first, second comes with “ICML” and “Machine Learning”
data → tags → convert properties to tags: nothing for the first, second comes with the list of authors of the paper
data → tags → assign existing tags: for the first I get “as”, “the”, “The”, “This”, “we”, and same for the second
data → tags → add chat suggestions: for the first I get “Alignment-Faking”, “Grosse Sprachmodelle”, “KI-Ausrichtung”, “Modelllverhalten”, “Verstaerkendes Lernen”, and for the the second I now get “AI alignment”, “emergent misalignment”, “fine tuning”, language model", “model safety”

I’m using gpt 4.1-mini

So there is a difference now with the second document with no German tags anymore but still for the first document.

I hope this helps

cgrunenberg · June 30, 2025, 7:46am

Is the first document a public one that you could share (or its link)?

chrisgve · June 30, 2025, 7:50am

Alignment faking in large language models.pdf (3.1 MB)

They are both public, I attached the first one here

cgrunenberg · June 30, 2025, 9:38am

Thank you for the document! It’s actually not an issue of AI but of the detection of the document’s language. The next release will make this more reliable.