Neutron star mergers contribute to AI training models
Collaborative effort uses Laboratory simulation to train AI for scientific discovery
December 11, 2024
Neutron star merger simulations developed at Los Alamos National Laboratory are making important contributions to a collaborative initiative, Polymathic AI, which is training artificial intelligence models to help drive scientific discoveries in seemingly disparate fields. The simulations, accurately tracking the aftermath of some of the most energetic events in the universe, offer unique code to a foundation model dataset that can help train AI models to make predictions relevant for fields such as astrophysics, biology, acoustics, chemistry, fluid dynamics and more.
“The Polymathic AI project is focused on foundation models, where you take an artificial intelligence model and train it on as much information as you possibly can in some space,” said Jonah Miller, astrophysicist at Los Alamos. “While some AI models are built on text and language, the scientific foundation model is built on datasets from simulations. Training the network on as much information as possible from physics simulations leads to it picking up on underlying trends that can be useful in other applications.”
Miller contributed his neutron star merger simulations to one of the two datasets that Polymathic AI has released. Known as “The Well,” the dataset contains numerical simulations of biological systems, fluid dynamics, acoustic scattering, supernova explosions and other complicated processes, including Miller’s specialty, neutron star mergers. These mergers occur after two stars spend billions of years in binary orbit before colliding and leaving a black hole surrounded by hot, neutron-rich material, which powers a gamma ray burst, an incredibly energetic release of high-energy photons.
That violent process produces the heavy elements we have in the universe. The radioactive decay of some of the fused heavy elements powers an optical-to-infrared afterglow, called a kilonova, which can be seen on Earth.
Employing simulation data to make useful predictions
The equations used in understanding neutron star mergers are difficult to solve, even with supercomputers. But when AI is able to detect general trends — for example, the conservation of mass and/or the conservation of energy — it can then use that raw data to help researchers make predictions in specific instances instead of running expensive and time-consuming simulations. Each of Miller’s simulations took three weeks on 300 cores with a Los Alamos supercomputer; a trained foundational model or neural network could supplement those expensive calculations.
“The benefit of using AI in this way is that the approach picks up things we might not know ourselves,” Miller said. “A foundation model could offer predictions that help save simulations, and also help inform better simulations going forward. After all, the laws of physics are universal, and the way we write our computer codes relies on certain rules of mathematics. Foundation models can likely pick up on those laws and rules.”
Datasets available for free download
The Well is one of two open-source training datasets released to the public and is available to download for free from the Flatiron Institute and accessible on HuggingFace, a platform hosting AI models and datasets. The Polymathic AI team provides more information about the dataset in a paper accepted for publication at the leading machine learning conference, NeurIPS, in Vancouver, Canada.
The second dataset, known as the Multimodal Universe, contains hundreds of millions of astronomical observations and measurements, such as portraits of galaxies taken by NASA’s James Webb Space Telescope and measurements of our galaxy’s stars made by the European Space Agency’s Gaia spacecraft. The datasets represent 115 terabytes from dozens of sources for the scientific community to use to train AI models.
Publication: “The Well: a Large-Scale Collection of Diverse Physics Simulations for Machine Learning.” NeurIPS. December 2024.
Funding: This work is supported by the Laboratory Directed Research and Development program at Los Alamos.
LA-UR-24-33021
Contact
Public Affairs | media_relations@lanl.gov