CONTACTS

Meeting Planner
Peggy S. Vigil, Protocol
(505) 667-8448
peggysue@lanl.gov
Agenda Contact
Erika Maestas
(505) 664-0673
emaestas@lanl.gov

Workshops - Wednesday, October 13

Delta Force Exascale: Runtime and Tools Requirements for the Programming Models of the Future

Exascale Co-Design for Materials in Extremes

Organizing Committee:
Tim Germann, Los Alamos National Laboratory
Jim Belak, Lawrence Livermore National Laboratory
Sriram Swaminarayan, Los Alamos National Laboratory
Scott Futral, Lawrence Livermore National Laboratory

Exascale computing presents an enormous opportunity for solving some of today’s most pressing problems, including producing clean energy, extending nuclear reactor lifetime, and nuclear stockpile aging. At their core, each of these problems requires prediction of material response to extreme environments. The purpose of this workshop is to discuss the role of co-design in establishing the inter-relationship between software and hardware required for materials simulation at the exascale. In particular, we will discuss the research components needed to create a multiphysics exascale simulation framework for modeling materials subjected to extreme mechanical and radiation environments. The ultimate goal is to develop a UQ-driven adaptive physics refinement method in which coarse-scale simulations spawn sub-scale direct numerical simulations as needed. This task-based approach leverages the extensive concurrency and heterogeneity expected at exascale while enabling fault tolerance within applications. A key step in the co-design process is the creation benchmark codes that stress all aspects of the exascale design. This 1/2-day workshop will bring together participants with expertise in aspects of computer science, applied math, and computational materials science required to achieve this goal.

Topics may include:

Computational co-design
Current and future programming models
Domain specific languages
Scale-bridging techniques
Uncertainty quantification methodologies and concepts
Scalable tools, including visualization
Performance modeling and simulation
Vendor interaction

7:30 – 8:30	Breakfast
8:30	Welcome & Introduction
8:35	“Overview and vision for the Exascale Co-Design Center,” Tim Germann (LANL)
8:50	Using Domain-Specific Languages to Enable Innovative Hardware and Software, Pat Hanrahran (Stanford) –
9:20	“CoOperative Parallelism programming model,” David Jefferson (LLNL)
9:40	“Novel Algorithms in Computational Materials Science: Enabling Adaptive Sampling”, Nathan Barton (LLNL)
10:00	Coffee Break
10:30	“Related task parallelism programming model,” Paul Henning (LANL)
10:50	“Structural Simulation Toolkit (SST),” Jim Ang/Arun Rodrigues (SNL)
11:10	“Performance modeling and analysis,” Philip Roth (ORNL)
11:20	“Emerging Architectures,” Kyle Spafford (ORNL)
11:30	“Exascale Data Analysis and Visualization for the Multi-scale Materials Co-design Center,” Jim Ahrens (LANL)
12:00 – 1:30	Lunch Break (on your own)
1:30 – 3:00	Exascale Co-design Center organizational meeting (invitation only)
3:00	Coffee Break
3:30 – 6:00	Exascale Co-design Center organizational meeting (invitation only)

Hardware Trends and SW/HW Co-Tuning Opportunities

http://lph.ece.utexas.edu/merez/LACSS2010/HWSWWorkshop

Organizing Committee:
Mattan Erez, University of Texas at Austin

Reaching exascale will involve significant changes to the underlying system components, such as the processor, memory, and interconnect. These changes include new technology as well as continued advances and improved efficiency of known techniques. For example emerging non-volatile memory can potentially be integrated as part of the main memory system rather than just used as solid-state disks and integrated optical interconnect can significantly change communication tradeoffs. At the same time new opportunities are emerging for improving the efficiency of the processor architecture itself and both on-chip and off-chip electrical links. In this workshop we will focus on trends in hardware components, projections on future capabilities and constraints, and the implications on applications. We will also discuss opportunities for co-tuning software and hardware and the potential for new paradigms. The goals of the workshop are to present predictions on where hardware is heading and identify potential problems and opportunities resulting from this new technology on applications and software.

7:30 – 8:30	Breakfast
8:15 - 8:30	Welcome and Introduction, Mattan Erez
8:30 - 9:05	Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?, James C, Hoe
9:05 - 9:40	From GPU Computing to Exascale: Technology Trends, Brucek Khailany
9:40 – 10:00	Minipanel: Processor Trends
10:00 - 10:30	Coffee Break
10:30 - 11:05	Sustainable Silicon: Energy-Efficient VLSI Interconnects, Patrick Chiang
11:05 - 11:40	Optical Interconnects for Exascale Systems, Moray McLaren
11:40 – 12:00	Minipanel: Interconnect Trends
12:00 - 1:30	Lunch Break (on your own)
1:30 - 2:05	Low-power/Low-voltage Computing, Shih-Lien Lu
2:05 - 2:40	Processors have evolved, why haven't main memories?, Al Davis
2:40 – 3:00	Minipanel: On- and Off-Chip Memories
3:00 - 3:30	Coffee Break
3:30 – 4:00	Quick Recap, Mattan Erez

Resilience Summit 2010

http://www.csm.ornl.gov/srt/conferences/ResilienceSummit/2010/

Workshop general co-chairs:
Stephen L. Scott
Computer Science and Mathematics Division
Oak Ridge National Laboratory

Chokchai (Box) Leangsuksun
eXtreme Computing Research Group
Louisiana Tech University

Program co-chairs:
Christian Engelmann
Computer Science and Mathematics Division
Oak Ridge National Laboratory

Program committee:
Sean Blanchard, Los Alamos National Laboratory
Jim Brandt, Sandia National Laboratories, USA
Greg Bronevetsky, Lawrence Livermore National Laboratory
Franck Cappello, UIUC-INRIA Joint Laboratory on PetaScale Computing
Nathan DeBardeleben, Advanced Computing Systems Program, DoD
Ann Gentile, Sandia National Laboratories

Recent trends in high-performance computing (HPC) systems have clearly indicated that future increases in performance, in excess of those resulting from improvements in single-processor performance, will be achieved through corresponding increases in system scale, i.e., using a significantly larger component count. As the raw computational performance of the world's fastest HPC systems increases from today's current peta-scale to next-generation exa-scale capability and beyond, their number of computational, networking, and storage components will grow from the ten-to-one-hundred thousand compute nodes of today's systems to several hundreds of thousands of compute nodes and more in the foreseeable future. This substantial growth in system scale, and the resulting component count, poses a challenge for HPC system and application software with respect to fault tolerance and resilience.

Furthermore, recent experiences on extreme-scale HPC systems with non-recoverable soft errors, i.e., bit flips in memory, cache, registers, and logic added another major source of concern. The probability of such errors not only grows with system size, but also with increasing architectural vulnerability caused by employing accelerators, such as FPGAs and GPUs, and by shrinking nanometer technology. Reactive fault tolerance technologies, such as checkpoint/restart, are unable to handle high failure rates due to associated overheads, while proactive resilience technologies, such as preemptive migration, simply fail as random soft errors can't be predicted. Moreover, soft errors may even remain undetected resulting in silent data corruption.

The goal of the Resilience for Exascale HPC is to bring together experts in the area of fault tolerance and resilience for high-performance computing from national laboratories and universities to present their achievements and to discuss the challenges ahead. The secondary goal is to raise awareness in the HPC community about existing solutions, ongoing and planned work, and future research and development needs. The workshop program consists of a series of invited talks by experts and a round table discussion.

7:30 – 8:30	Breakfast
8:30 – 10:00	Welcome and Introduction Stephen L. Scott, Oak Ridge National Laboratory, USA "Hard Data on Soft Errors: A Global-Scale Assessment of GPGPU Memory Soft Error Rates", Imran Haque, Stanford University, USA "Soft Errors, Silent Data Corruption, and Exascale Computing", Sarah E. Michalak, Los Alamos National Laboratory, USA
10:00 - 10:30	Coffee Break
10:30 – 12:00	"Scalable HPC System Monitoring", Christian Engelmann, Oak Ridge National Laboratory, USA "Scalable HPC Monitoring and Analysis for Understanding and Automated Response", Jim Brandt, Sandia National Laboratories, USA "Mining Event Log patterns in HPC Systems", Ana Gainaru, University of Illinois at Urbana-Champaign, USA
12:00 – 1:30	Lunch Break (on your own)
1:30 – 3:00	"Integrating Fault Tolerance into the Monte Carlo Application Toolkit", Rob Aulwes, Los Alamos National Laboratory, USA "HPC Rejuvenation and GPGPU Checkpoint Model", Chokchai (Box) Leangsuksun, Louisiana Tech University, USA "An Uncoordinated Checkpoint Protocol for Send-deterministic HPC Application", Amina Guermouche, INRIA, France
3:00 – 3:30	Coffee Break
3:30 – 5:00	"VolpexMPI: Robust Execution of MPI Applications through Process Replication", Edgar Gabriel, University of Houston, USA Discussion: "The Future of HPC Resilience - Research Challenges and Opportunities", Stephen L. Scott, Oak Ridge National Laboratory, USA, and Chokchai (Box) Leangsuksun, Louisiana Tech University, USA Closing, Stephen L. Scott, Oak Ridge National Laboratory, USA

Interlab 2007 Workshop October 1, 2, and 3

CONTACTS

Workshops - Wednesday, October 13

Places to Visit in New Mexico