OpenAI Research Explores Causes of LLM Hallucinations and Potential Solutions

2025-10-13


In a recent research paper, OpenAI highlighted that the tendency of large language models (LLMs) to generate hallucinations stems from standard training and evaluation methods that reward guessing rather than acknowledging uncertainty. According to the study, this insight could pave the way for new techniques to reduce hallucinations and build more reliable AI systems, although there is no universal agreement on how to define hallucinations.


OpenAI researchers argue that hallucinations are not mysterious but rather originate from errors during the pretraining phase. During this phase, models cannot distinguish between false and factual statements since they are only exposed to positive examples. Nevertheless, the researchers note that these errors persist even if all pretraining data were labeled as true or false.


We observed that current primary evaluation methods excessively penalize uncertainty, so the core issue lies in inconsistent evaluation practices. Suppose Model A is a properly aligned model that accurately conveys uncertainty and never hallucinates, while Model B is similar but never expresses uncertainty and always "guesses" when unsure. Under a 0-1 scoring system, Model B would outperform A, which forms the basis of most current benchmarks.


Based on this insight, OpenAI researchers conclude that reducing hallucinations requires rethinking how models are evaluated. One proposed approach is to impose heavier penalties on confident errors while relatively rewarding models that appropriately express uncertainty. Although this idea has garnered attention, the OpenAI team advocates an even stronger position:


Simply adding some new uncertainty tests is insufficient. Widely used accuracy-based evaluations need updating so that their scoring does not encourage guessing. If the main scoreboard continues rewarding lucky guesses, models will keep learning to guess. Fixing the scoreboard could expand the adoption of hallucination-reduction techniques, both newly developed and previously studied.


Indeed, results from OpenAI researchers indicate success in reducing hallucinations in GPT-5-thinking-mini, decreasing the error rate from 75% in o4-mini to 26%. However, as meshugaas noted on Hacker News, this also implies that "more than half of the responses would end with 'I don't know.'" As they put it, "nobody would use something like that."


While OpenAI researchers express confidence in mitigating hallucinations, they acknowledge that no consensus exists regarding their precise definition due to their multifaceted nature.


This optimism is tempered by critics who challenge the anthropomorphization of LLMs. On Hacker News, didibus emphasized the marketing motivations behind labeling LLM errors as hallucinations and suggested, "If you stop anthropomorphizing them and return to their actual nature as predictive models, it's not surprising that predictions can sometimes be wrong."


At one end of the LLM hallucination debate stands Rebecca Parsons, Chief Technology Officer at ThoughtWorks. As reported by Martin Fowler, she believes that LLM hallucinations are not bugs but features:


All LLMs do is generate hallucinations—we just find some of them useful.


As a final perspective on the LLM hallucination debate, Gary Marcus emphasizes that although LLMs mimic the structure of human language, they lack a sense of reality. Their superficial understanding of their own outputs prevents them from fact-checking their accuracy.