A novel cyberattack technique is emerging - one that exploits Large Language Models (LLMs) hallucinations, using them as an attack vector to install malware into developer's systems.
But let's take a step back - the basics of this attack resemble a technique known as dependency confusion, where an attacker creates a malicious package with a similar name to a legitimate one - say, a python package named 'panda' instead of 'pandas'. This simple typo might lead to a developer installing and executing malware on their machine.
Now, the same principle can be applied to AI-generated code.
LLMs generate code by predicting the most likely next token based on patterns in their training data. This can result in "hallucinating" a response or part of it, which can appear plausible but is often incorrect.
According to research conducted by Vulcan, hallucinated package names tend to be surprisingly common - around 20% of Node.js code snippets generated using ChatGPT contained an unregistered package, while for Python that number jumps up to 35%.
An attacker can use these hallucinations to their advantage - even if they are often not consistent between responses - by running thousands of ChatGPT queries and registering all non-existing packages they find, or by concentrating their effort into finding a cluster of low-entropy tokens that might be hallucinated with a higher probability.
This opens the door to a novel form of supply chain attack, exploiting the trust developers place in the AI's responses.
The attack chain is quite simple, and looks like this:
Mitigation Strategies
References