Los Alamos method detects AI vision hallucinations

The Prelim Attention Score tool helps detect whether a model’s output is grounded in the image or driven too strongly by its own generated text.

Vision-language models are AI systems that combine image analysis with large-language models. These widely used AI systems have a persistent problem: hallucinations, or outputs that describe objects that are inconsistent with, or absent from, the input image.

Los Alamos National Laboratory researchers have developed the Prelim Attention Score (PAS), a tool that helps detect whether a model’s output is grounded in the image or driven too strongly by its own generated text.

“The PAS is a real-time, plug-and-play metric that acts as an internal monitor for the AI,” said Manish Bhattarai, a Los Alamos computer scientist. “The system works with major existing vision-language models and requires minimal additional computational overhead, making it an efficient way to detect potential hallucinations. PAS achieves state-of-the-art accuracy in catching hallucinations, offering developers a practical path toward safer and more trustworthy multimodal AI systems.”

Most commonly used vision-language models are autoregressive, meaning they generate each new token, or word, based partly on the words they have already produced. The PAS system monitors a vision-language model’s prediction of each token, allowing PAS to identify where the model is drawing its information from and where hallucinations are likely to occur. PAS presents a score that alerts users to the possible presence of hallucinations in the output.

A useful screen for practical applications

Many autoregressive vision-language models are built on transformer architectures, a class of deep-learning neural networks that use attention patterns to weigh information as they generate an output. The Los Alamos research team examined how these models attend to the image, the text prompt and the model’s own preliminary generated words.

When integrated into a vision-language model workflow, PAS can run alongside the model as it handles a request. For object mentions in the model’s response to an image and text input, PAS computes an attention-based score that indicates how strongly the model relied on its own previously generated words. The closer the PAS score is to zero, the less likely it is that the model has produced a hallucination.

“By understanding the way a vision-language model pays attention to preliminary information, PAS can help identify the exact instance where a model begins to over-rely on its own words,” said Xuan Nhat Hoang, Los Alamos intern. “Our tool reads signals the AI is already producing, representing a low-overhead way to help ensure that information is reliable and useful.”

PAS could be employed in scenarios where images, documents, diagrams and text are analyzed by vision-language AI models. For instance, it could eventually support reliability checks in settings such as medical imaging, scientific document analysis, engineering diagrams, remote sensing and other mission-relevant visual workflows where unsupported visual claims could affect downstream decisions.

The Los Alamos team is presenting PAS at the prestigious Computer Vision and Pattern Recognition 2026 conference, sponsored by the IEEE and Computer Vision Foundation, in Denver this month.

Funding: This work was supported by the Laboratory Directed Research and Development program at Los Alamos.

LA-UR-26-24473