Extreme Scale Computing, Co-design
- Sharon Mikkelson
- Theory, Simulation, and Computation
To address the increasingly complex problems of the modern world, scientists at Los Alamos are pushing the scale of computing to the extreme, forming partnerships with other national laboratories and industry to develop supercomputers that can achieve “exaflop” speeds—that is, a quintillion (a million trillion) calculations per second. To put such speed in perspective, it is equivalent to 50 million laptops all working together at the same time. Researchers are also developing the interacting components of a computational system as a whole. This approach, known as computational co-design, may facilitate revolutionary designs in the next generation of supercomputers.
- Extreme-Scale Computer Development and Operation
- Data Science at Scale
- Coupled Computational Physics Applications and Simulations at Scale
- Computational Co-design
- Complex Networks
- Next-Generation File Systems
- Advanced Digital Libraries
Since the late 1940s, scientists at Los Alamos have pushed the scale of computing to the extreme. To design the world’s first atomic bomb, scientists created the Monte Carlo method of computing. In 1948, work began on MANIAC, one of the first electronic, digital computers. In 2008, Los Alamos unveiled the Roadrunner supercomputer, which was the first to break the “petaflop” barrier.
Working with other national laboratories and industry, researchers are developing supercomputers that can perform a quintillion (a million trillion) calculations per second—known as the exascale. Such speed will enable scientists to address extremely large datasets and extremely high rate data streams needed to address problems in national security, cyber security, energy security, global climate modeling, astrophysics, and biology.
Computational co-design involves developing the interacting components of a computational system as a whole. Such an approach produces significantly better, perhaps even revolutionary designs in the next generation of extreme computers. Such work is primarily focused on how to co-optimize constraints related to physics, method, implementation, and architecture.
Technologies and Applications: Emerging, Developed, or Potential
- Deployed Luna, a supercomputer designed to support work for the Directed Stockpile Work Program, including the B61 Life-Extension Program. This supercomputer has a total of 24,640 processors for a combined peak capability of 539.1 teraflops/second. Such speeds mean that users can obtain faster turnarounds on their calculations and that the computer can run with higher fidelity, thus improving the end results. Such features are particularly advantageous when performing weapons safety calculations.
- Worked with Sandia National Laboratories under the Advanced Computing at Extreme Scale (ACES) partnership to design and develop the supercomputer Cielo (Spanish for “sky”), which was built by Cray Inc. Cielo can perform more than one quadrillion floating-point operations per second. Cielo supports work related to stockpile stewardship.
- Helped to develop Roadrunner (in collaboration with IBM and the National Nuclear Security Administration), the world’s first petaflop/second computer. Los Alamos scientists have used Roadrunner not only for efforts in stockpile stewardship but also to make scientific breakthroughs in areas such as materials, astronomy, and laser plasma science.
- Established in 2011 a Co-design Summer School that brought together a small, multidisciplinary team of students of various universities. These students focused on problems related to computational co-design. The goal of this summer school is to encourage qualified students to work in computational co-design as the world approaches the exascale era.
- Created CoCoMANS (Computational Co-design for Multiscale Applications in the Natural Sciences), a project designed to forge a qualitatively new predictive-science capability that exploits evolving high-performance computer architectures for multiple application areas, including materials, plasmas, and climate, by simultaneously evolving science, methods, software, and hardware in an integrated computational co-design process.
- Designed Cruft, a suite of molecular dynamics proxy applications (software) developed to explore co-design opportunities with hardware vendors and scientists. This code enables researchers to conduct different elements of a molecular dynamics application and explore the ramifications on relative performance when changing hardware and software. Cruft was developed to help guide design decisions for codes written to support the Exascale Co-design Center for Materials in Extreme Conditions.
- Used the Roadrunner supercomputer to run PetaVision, a computational code designed to model the human visual system. PetaVision mimics more than one billion visual neurons and trillions of synapses. It was previously not possible for computers to match human performance on simulation human vision. Roadrunner enables such simulation, which on day may enable the development of “smart” cameras capable of recognizing danger or automobile autopilot systems that could take over driving duties if a driver became incapacitated while driving in heavy traffic.
- Collaborating with Oak Ridge and Sandia national laboratories under the Scientific Partnership for Extreme Scale Computing (SPEC) to develop the hardware and software needed to achieve an exaflops computing system. Such a system would be able to perform a quintillion—a million trillion—calculations per second. SPEC’s goal is to have a prototype developed by 2015 and an operations supercomputer by 2018.
- Collaborating with EMC Corporation to support the Department of Energy’s Exascale Initiative, which is aimed at boosting high-performance computing levels to the exaflops—a thousand times faster than current petascale capabilities. One effort includes the development of an open-source, extremely scalable data-management middleware library known as PLFS (Parallel Log Structured File System), which will be used on computing platforms that range from small clusters to the largest supercomputers in the world.
- Working to develop a framework for hardware-software co-design as a formally posed optimization problem. Although the optimization framework will apply to multiple problem domains, for the target application scientists will use molecular dynamics, an exemplar for the need for computational scaling. Scientists working under this project view co-design as search and selection from a vast space of hardware and software designs that map to performance metrics. The objective function designed for optimization has as main components run time (or computational rate), problem size, simulated time duration, energy use, and hardware cost.
LANL Facilities Used
- Strategic Computing Complex. Also known as the Nicholas C. Metropolis Center for Modeling and Simulation, this complex houses supercomputers that support the calculation, modeling, simulation, and visualization of complex nuclear weapons data in support of the Stockpile Stewardship Program. The complex includes a Data Visualization Corridor, which enables scientists to view the models and simulations created by the supercomputers. The Data Visualization Corridor includes a Powerwall Theater and a five-sided CAVE Immersive Laboratory, as well as desktop visualization and collaborator capabilities.
- Exascale Co-design Center for Materials in Extreme Environments. Exascale computing presents an enormous opportunity for solving some of today’s most pressing problems, including clean energy production, nuclear reactor lifetime extension, and nuclear stockpile aging. At their core, each of these problems requires the prediction of material response to extreme environments. This center’s objective is to establish the interrelationship between software and hardware required for materials simulation at the exascale while developing a multiphysics simulation framework for modeling materials subjected to extreme mechanical and radiation environments.
- Information Science & Technology Institute. This institute covers a range of fields, including information science and technology, computer science, computational science, and applied mathematics. Topic areas include extreme-scale data management, high-performance computing, data-intensive computing, computational co-design, reliability and resilience at scale, algorithms and methods (including informatics), and multicore and hybrid computing.
- Manuel Vigil (082371): Cielo Project Director
- Gary Grider (106273): ACES Co-director
- Timothy Germann (147622): Exascale Co-design Center for Materials in Extreme Environments
- Allen McPherson (107908): Computer Science Lead for the Exascale Co-design Center for Materials in Extreme Environments
- Paul Dotson (091371): Acting Associate Director for Theory, Simulation, and Computation
- Sharon Mikkelson (098252): Communications for Theory, Simulation, and Computation
- Susan Seestrom (092217): Associate Director for Experimental Physical Sciences
- Karen Kippen (192051): Communications for Experimental Physical Sciences
Sponsors, Funding Sources, or Agencies
- Department of Homeland Security
- Department of Defense
- Department of Energy Office of Advanced Scientific Computing Research
- Department of Energy Office of Science
- James Powell, Linn Collins, Ariane Eberhardt, David Izraelevitz, Jorge Roman, Thomas Dufresne, Mark Scott, Miriam Blake, and Gary Grider, “At scale” author name matching with Hadoop/MapReduce,” Library Hi Tech News 29(4), 6–12 (2012).
- Ning Liu, Jason Cope, Philip Carns, Christopher Carothers, Robert Ross, Gary Grider, Adam Crume, and Carlos Maltzahn, “On the role of burst buffers in leadership-class storage systems,” IEEE Symposium on Mass Storage Systems and Technologies (2012).
- John Bent, Gary Grider, Brett Kettering, Adam Manzanares, Meghan McClelland, Aaron Torres, and Alfred Torrez, “Storage challenges at Los Alamos National Lab,” IEEE Symposium on Mass Storage Systems and Technologies (2012).
- John Bent, Sorin Faibish, Jim Ahrens, Gary Grider, John Patchett, Percy Tzelnic, and Jon Woodring, “Jitter-free co-processing on a prototype exascale storage stack,” IEEE Symposium on Mass Storage Systems and Technologies (2012).
- Nithin Nakka, Alok Choudhary, Gary Grider, John Bent, James Nunez, and Satsangat Khalsa, “Achieving target MTTF by duplicating reliability-critical components in high performance computing systems,” IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum, 1567–1576 (2011).
- Matthew L. Curry, H. Lee Ward, Gary Grider, Jill Gemmill, Jay Harris, and David Martinez, “Power use of disk subsystems in supercomputers,” PDSW’11 - Proceedings of the 6th Parallel Data Storage Workshop, Co-located with SC’11, 49–53 (2011).
- Eugene Normand, Jerry L. Wert, Heather Quinn, Thomas D. Fairbanks, Sarah Michalak, Gary Grider, Paul Iwanchuk, John Morrison, Stephen Wender, and Steve Johnson, “First record of single-event upset on ground, Cray-1 computer at Los Alamos in 1976,” IEEE Transactions on Nuclear Science 57(6 PART 1), 3114–3120 (2010).
- Hsing-Bung Chen, Gary Grider, Cody Scott, Milton Turley, Aaron Torres, Kathy Sanchez, and John Bremer, “Integration experiences and performance studies of a COTS parallel archive system,” IEEE International Conference on Cluster Computing, 166–177 (2010).
- S.M. Mniszewski, M.J. Cawkwell, and T.C. Germann, “Molecular dynamics simulations of detonation on the Roadrunner supercomputer,” AIP Conference Proceedings 1426, 1283–1286 (2012).
- Christian Brandl and Timothy C. Germann, “Shock loading and release of a small angle title grain boundary in CU, AIP Conference Proceedings 1426, 1299–1302 (2012).
- Ramon Ravelo, Qi An, Timothy C. Germann, and Brad Lee Holian, “Large-scale molecular dynamics simulations of shock induced plasticity in tantalum single crystals,” AIP Conference Proceedings 1426, 1263–1266 (2012).
- Frank J. Cherne, Guy Dimonte, and Timothy C. Germann, “Richtymer-Meshkov instability examined with large-scale molecular dynamics simulations,” AIP Conference Proceedings 1426, 1307–1310 (2012).
- A. Hunter, I.J. Beyerlein, T.C. Germann, and M. Koslowski, “Influence of the stacking fault energy surface on partial dislocations in fcc metals with a three-dimensional phase field dislocations dynamics model,” Physical Review B - Condensed Matter and Materials Physics 84(14) (2011).
- J. Wang, I.J. Beyerlein, A. Misra, S.M. Valone, and T.C. Germann, “Atomistic modeling of dislocation-interface interactions,” Conference Program for the 3rd International Conference on Heterogeneous Materials Mechanics, 39–46 (2011).
- James Ahrens, Li-Ta Lo, Boonthanome Nouanesengsy, John Patchett, and Allen McPherson, “Petscale visualization: Approaches and initial results,” 2008 Workshop on Ultrascale Visualization, 24–28 (2008).
- James Ahrens and Kurt Debattista, “Guest editor’s introduction: Special section on the Eurographics Symposium on Parallel Graphics and Visualization (EGPGV),” IEEE Transactions on Visualization and Computer Graphics 18(1), 3–4 (2012).
- Uliana Popov, Eddy Chandra, Katrin Heitmann, Salman Habib, James Ahrens, and Alex Pang, “Analyzing the evolution of large scale structures in the universe with velocity based methods,” IEEE Pacific Visualization Symposium, 49–56 (2012).
- Christopher M. Brislawn, Jonathan L. Woodring, Susan M. Mniszewski, David E. DeMarle, and James P. Ahrens, “Subband coding for large-scale scientific simulation data using JPEG 2000,” Proceedings of the IEEE Southwest Symposium on Image Analysis and Interpretation, 201–204 (2012).
- James Ahrens, Bruce Hendrickson, Gabrielle Long, Steve Miller, Rob Ross, and Dean Williams, “Data-intensive science in the US DOE: Case studies and future challenges,” Computing in Science and Engineering 13(6), 14–23 (2011).
- Jonathan Woodring, Katrin Heitmann, James Ahrens, Patricia Fasel, Chung-Hsing Hsu, Salman Habib, and Adrian Pope, “Analyzing and visualizing cosmological simulations with ParaView,” Astrophysical Journal, Supplement Series 195(1) (2011).
- Christopher Mitchell, James Ahrens, and Jun Wang, “VisIO: Enabling interactive visualization of ultra-scale, time series data via high-bandwidth distributed I/O systems,” Proceedings - 25th IEEE International Parallel and Distributed Processing Symposium, 68–79 (2011).
- Sean Williams, Mark Petersen, Peer-Timo Bremer, Matthew Hecht, Valerio Pascucci, James Ahrens, Mario Hlawitschka, and Bernd Hamann, “Adoptive extraction and quantification of geophysical vortices,” IEEE Transactions on Visualization and Computer Graphics 17(12), 2088–2095 (2011).
- Jonathan Woodring, Susan Mniszewski, Christopher Brislawn, David DeMarle, and James Ahrens, “Revisiting wavelet compression for large-scale climate data using JPEG 2000 and ensuring data precision,” IEEE Symposium on Large-Scale Data Analysis and Visualization 2011, 31–38 (2011).
During the 1950s, Los Alamos built MANIAC, one of the world’s first electronic, digital computers. MANIAC was used to carry out calculations related to hydrogen bomb research, as well as studies in thermodynamics, simulations related to applying the Monte Carlo method, and attempts to decode DNA.
Housed at the Los Alamos Strategic Computing Complex, the Cielo supercomputer runs the largest and most demanding workloads related to modeling and simulation. Cielo’s primary purpose is to perform calculations for the Stockpile Stewardship Program.
The Powerwall Theater inside the Los Alamos Strategic Computing Complex enables researchers to view the complex models and simulations they have created using some of the world’s fastest supercomputers.