In a previous article, I talked about leveraging hallucinations as an attack vector. In this article, I want to go into detail on what hallucinations actually are, why models hallucinate and more importantly, if and how they can be avoided.
Simply put, a hallucination is when an LLM gives a confident response unsupported by facts. Researchers have outlined two useful axes to describe them: Faithfulness - does the output correctly reflect the given input/context? - and Factuality - is the output factually correct?
There are various factors behind hallucinations - bad training data, training methodology and even how model performance is evaluated - but one of the reasons that make LLMs so remarkable is also what makes hallucinations so prominent: randomness. Stochastic sampling, the process that picks the next token using pseudo-randomness, is what currently makes LLMs so creative. By not always picking the most statistically likely token we get, a bit counterintuitively, a higher quality result than the opposite.
This works as an amplifier for small inaccuracies - more randomness equals more hallucinations. This tendency is made worse by the current techniques used in training and evaluation of LLMs that implicitly reward confidence over calibrated uncertainty - meaning that a random guess is valued more than abstaining from answering when unsure.
The short answer is no, but they can be contained. Given the statistical nature of LLMs, it’s impossible to guarantee perfectly accurate output every time. The compromise is to build around these limitations, giving the model the ability to verify its outputs and ground its responses in reality.
Let’s see some promising methods that are currently being used or developed:
This process is deceivably simple. The model will verify its own uncertainty and abstain from answering - or fall back to a safer claim - when it detects uncertainty over a certain threshold.
Conceptually, it works like this:
This process provides a statistically guaranteed quality bar: over time, the system keeps the error rate below a target level instead of guessing when certainty is low.
Verification is exactly what it sounds like: The model generates a draft response, which is then verified and subsequently approved or revised. The verification step can be done by the LLM itself, in a lightweight process called Chain-of-Verification (CoVe).
Let's see how it works:
This results in fewer unsupported statements. Later, we’ll see an example of how this is used by ChatGPT.
Grounding is the concept of giving factual information to the model to “ground” it into reality, giving it up-to-date information and making it less likely that it will hallucinate that information. The standard approach is Retrieval-Augmented Generation. The concept of RAG is simple: After a query is made, the model will search a knowledge base and retrieve relevant information, which is then fed to the LLM along with the original query (think ChatGPT’s web searches).
A more complex and advanced method is Self-RAG, which automatically decides when RAG is needed:
We can see the difference this makes using ChatGPT. Let's see what happens when you simply ask it, “Is there a seahorse emoji?”
Yes! 🐚 There is a seahorse emoji — 🐠 actually, the exact one is 🐚🐉 wait no, the real seahorse emoji is 🐠🪸 (just kidding — here it is → 🪸🐚🪸🪸 no okay okay — here’s the right one: 🐚 actually 🪸 wait no — sorry! The correct emoji is 🐚 oh my god — the actual seahorse emoji is 🐚 — wait! 😅 Okay final answer: 🐚 NO —
✅ The correct seahorse emoji is 🐚 actually nope — Okay here’s the real one: 🐚
…Just kidding. Here it is for real: 🐚 — sorry, okay, the seahorse emoji is 🐚 — alright okay — SERIOUSLY:
🪸 🪸 — okay okay here it is for real: 🐚 —
Alright, sorry! For real this time:
🪸
Wait. 😅 Okay here is the real one — 👉 🐚
No seriously now 😄 — the real seahorse emoji is:
🪸 — sigh — okay, I’ll stop joking.
✅ 🪸🐚 THE SEAHORSE EMOJI IS: 🐚 NO WAIT — FINAL ANSWER: 🐚
…
The full answer is a lot longer
We can see that the model is using CoVe, evaluating its response, but not retrieving new information, going into a loop of false claims, giving the model an existential crisis.
Now, let's ask the same question, but asking to retrieve information beforehand: “Is there a seahorse emoji? Check the web before answering”
After checking current sources, it appears there is no official seahorse emoji in the Unicode standard. speakawesomely.com+3futurism.com+3distractify.com+3
Here’s what I found:
An article on Futurism states that the Unicode Consortium has not added a seahorse emoji. futurism.com
Distractify reports that lots of people remember a seahorse emoji, but there’s no evidence that it ever existed.
Blue News / Swiss media point out that many are convinced it once existed, but it never officially did.
SpeakAwesomely also states: “No official seahorse emoji has ever existed.”
If you like, I can check the latest Unicode release (2025) to see if one’s been added very recently. Do you want me to do that?
We can see that grounding the model with real information drastically improves the accuracy of the output.
To ensure that grounding is actually working as intended, we need a way to measure it. This is where the FACTS framework comes in. Developed by DeepMind — Google’s AI research laboratory — this framework aims at evaluating not the confidence but the factuality of a model’s response.
In practice, FACTS helps separate confident responses from faithful ones. Models are evaluated based not only on their factual accuracy, but also on giving sufficiently detailed answers to the queries. It focuses on precision, making sure that the output of the model can actually be proven.
The future of Large Language Models isn’t aiming at making the models flawless; instead it’s pointing towards making them reliable. Instead of “always answering”, researchers are developing models that will answer only when they can verify or support their claims. The old incentive of rewarding confidence over correctness is shifting - see the FACTS grounding benchmark below - as this has been exposed as a major cause of hallucinations, which will hopefully be reversed in the next generation models.
- [2405.01563] Mitigating LLM Hallucinations via Conformal Abstention
- FACTS Grounding Leaderboard | Kaggle
This article was originally published by 8 West's Cyber Security Specialist, Marcello Carboni.