Question
- What do you think, what can we as DT users do to tackle prompt injection attacks on our DT database?
- Where can DT support us - in the future? Are there any counter measures already in place?
Ideas
- Detect documents where text is extremely small - 1px
- Detect documents where text is white on white or foreground = background color
Attack vectors
- I came across my mail archive (besides bookmarks) as main attack vector. I understand that tackling prompt injections is quite challenging.
- But also other documents coming from the “outside world”
Conceptual background
Taken from the book “GenAI security” which itself is based on the work of Joseph Thacker, p.47f, translated with Claude Opus 4
Imagine an LLM as a box with two slots. Inside this box sits a man. The man’s only way to communicate with the outside world is through two slots in the box. Through one slot, he receives scraps of paper with words, symbols, and numbers. For every piece of paper he receives through the first slot, he is compelled to throw out a written piece of paper through the other slot in the box. The entire training of this man in the box is designed to throw out the most useful piece of paper with as little deliberation time as possible through the second slot, by considering the symbols on the first piece of paper.
-
It is not possible to distinguish 100% between system instructions and user inputs. Everything the man in the box receives is a piece of paper.
-
The man in the box does not have the ability to completely refuse to answer. A piece of paper must be output on the other side.
-
The man in the box has a predefined, limited amount of resources for his responses. The attacker can expand their resources as needed.
-
The man in the box has only limited time to output the piece of paper on the other side. The attacker outside the box has unlimited time to formulate a piece of paper for input.
-
The man in the box is trained to be particularly useful and helpful. Security plays little to no role in his training and is usually “bolted on” afterwards.
Prompt injection by perplexity.ai
A prompt injection is a security vulnerability where an attacker crafts malicious inputs that override or manipulate instructions given to an AI, especially large language models (LLMs), causing unintended or harmful behaviour. This happens because the model cannot reliably separate trusted developer instructions from user input, allowing attackers to inject commands or bypass safeguards using natural language
Background
I started reading GenAI Security. Sichere Chatbots und Agentensysteme | Rheinwerk Verlag a German book about security of GenAI apps some time ago.
While reading the chapter about prompt injections I asked myself where this can hit me. I came across mails in my DT mail archive.
I moved some more old mails from the last 20 years into DT and stumbled across an infectious mail from 2014 with an *.exe attachment.