July 30, 2025

New approach detects adversarial attacks in multimodal AI systems

Topological signatures key to revealing attacks, identifying origins of threats

2025-07-30 — In this representation of the adversarial threat detection framework, vibrant filaments carry incoming text and image icons into a central node, while a faceted topological shield composed of glowing simplices deflects a dark, glitchy mass on the right. The composition emphasizes the contrast between clean data flows and adversarial interference. Credit to: DALL-E by Manish Bhattarai.

New vulnerabilities have emerged with the rapid advancement and adoption of multimodal foundational AI models, significantly expanding the potential for cybersecurity attacks. Researchers at Los Alamos National Laboratory have put forward a novel framework that identifies adversarial threats to foundation models — artificial intelligence approaches that seamlessly integrate and process text and image data. This work empowers system developers and security experts to better understand model vulnerabilities and reinforce resilience against ever more sophisticated attacks.

“As multimodal models grow more prevalent, adversaries can exploit weaknesses through either text or visual channels, or even both simultaneously,” said Manish Bhattarai, a computer scientist at Los Alamos. “AI systems face escalating threats from subtle, malicious manipulations that can mislead or corrupt their outputs, and attacks can result in misleading or toxic content that looks like a genuine output for the model. When taking on increasingly complex and difficult-to-detect attacks, our unified, topology-based framework uniquely identifies threats regardless of their origin.”

Multimodal AI systems excel at integrating diverse data types by embedding text and images into a shared high-dimensional space, aligning image concepts to its textual semantic notion (like the word “circle” with a circular shape). However, this alignment capability also introduces unique vulnerabilities. As these models are increasingly deployed in high-stakes applications, adversaries can exploit them through text or visual inputs — or both — using imperceptible perturbations that disrupt alignment and potentially produce misleading or harmful outcomes.

Defense strategies for multimodal systems have remained relatively unexplored, even as these models are increasingly used in sensitive domains where they can be applied to complex national security topics and contribute to modeling and simulation. Building on the team’s experience developing a purification strategy that neutralized adversarial noise in attack scenarios on image-centered models, this new approach detects the signature and origin of adversarial attack on today’s advanced artificial intelligence models.

A novel topological approach

The Los Alamos team’s solution harnesses topological data analysis, a mathematical discipline focused on the "shape" of data, to uncover these adversarial signatures. When an attack disrupts the geometric alignment of text and image embeddings, it creates a measurable distortion. The researchers developed two pioneering techniques, dubbed “topological-contrastive losses,” to quantify these topological differences with precision, effectively pinpointing the presence of adversarial inputs.

“Our algorithm accurately uncovers the attack signatures, and when combined with statistical techniques, can detect malicious data tampering with remarkable precision,” said Minh Vu, a Los Alamos postdoctoral fellow and lead author on the team’s paper. “This research demonstrates the transformative potential of topology-based approaches in securing the next generation of AI systems and sets a strong foundation for future advancements in the field.”

The framework’s effectiveness was rigorously validated using the Venado supercomputer at Los Alamos. Installed in 2024, the machine’s chips combine a central processing unit with a graphics processing unit to address high-performance computing and giant-scale artificial intelligence applications. The team tested it against a broad spectrum of known adversarial attack methods across multiple benchmark datasets and models. The results were unequivocal: the topological approach consistently and significantly outperformed existing defenses, offering a more reliable and resilient shield against threats.

The team presented the work, “Topological Signatures of Adversaries in Multimodal Alignments,” at the International Conference on Machine Learning.

Funding: This work was supported by the Laboratory Directed Research and Development program and the Institutional Computing Program at Los Alamos.

LA-UR-25-26886

Contact

Public Affairs | media_relations@lanl.gov

Energy Department Launches ‘Genesis Mission’ to Transform American Science and Innovation Through the AI Computing Revolution

Subscribe to our Newsletter

July 30, 2025

New approach detects adversarial attacks in multimodal AI systems