Workshops - Wedesday, October 15
Presentations slides are now being made available within the abstract by clicking on each link.
Los Alamos Computer Science Symposium, October 14-15, Santa Fe, New Mexico
Workshop Chairs: Ben Bergen, Los Alamos National Laboratory
Computer architecture development is currently undergoing something of a Cambrian explosion, with many different approaches competing in an effort to maintain performance advances relative to transistor count and Moore's Law. One thing that is clear is that developers of algorithms and numerical methods will need to identify greater concurrency to efficiently utilize the computing resources of the future. It also seems clear that, at least, for the time being, and in the absence of a clear winner, developers will need to be able to adapt their codes to run on multiple, and likely, very different, architectures. Consequently, researchers in all areas of scientific computing are struggling to design strategies that can address these issues, while still offering extensibility and preserving the intellectual and financial investments that go into developing scientific codes.
This workshop will continue the dialog that has been developing over the past several years in the HPC community. Some specific questions that we will attempt to address are:
This will be a full-day workshop, with invited speakers followed by an informal gathering where beer, wine, and light appetizers will be served to stimulate open dialog between the participants. Please join us for a lively discussion of the ideas that will help see us through this time of change.
Organizers: Adolfy Hoisie (Los Alamos) and Jeff Hollingsworth (Maryland)
Building extreme-scale parallel systems and applications that can achieve high performance is a dauntingly difficult task. Today's systems have complex processors, deep memory hierarchies and heterogeneous interconnects requiring careful scheduling of an application's operations, data access and communication to achieve a significant fraction of potential performance. Furthermore, the large number of components in extreme-scale parallel systems makes failures inevitable; therefore, achieving fault-tolerance in hardware and/or system software becomes an integral part of the performance landscape.
In addition to "classical" performance considerations, the notion of high productivity of systems at scale is now of paramount importance. Productivity encompasses availability, fault tolerance, ease of use, upward portability (including performance portability), programming environments, as well as code development time. A related workshop on programming models for hybrid and heterogeneous systems will also occur at LACSS.
Given this multi-disciplinary mix of performance and productivity, in this workshop we will concern their interplay across system architecture, network, applications and system software design. The invited speakers will not only cover these areas, but will also address the state-of-the-art in methodologies for performance analysis and optimization including benchmarking, modeling, tools development, tuning and steering, as well as metrics for productivity.
The invited speakers will include people from academia, national labs, funding agencies, and R&D people representing computer vendors.
HPC Resiliency Summit: Workshop on Resiliency for Petascale HPC
Recent trends in high-performance computing (HPC) systems have clearly indicated that future increases in performance, in excess of those resulting from improvements in single-processor performance, will be achieved through corresponding increases in system scale, i.e., using a significantly larger component count. As the raw computational performance of the world's fastest HPC systems increases from today's current tera-scale to next-generation peta-scale capability and beyond, their number of computational, networking, and storage components will grow from the ten-to-one-hundred thousand compute nodes of today's systems to several hundreds of thousands of compute nodes and more in the foreseeable future. This substantial growth in system scale, and the resulting component count, poses a challenge for HPC system and application software with respect to fault tolerance and resilience.
Furthermore, recent experiences on extreme-scale HPC systems with non-recoverable soft errors, i.e., bit flips in memory, cache, registers, and logic added another major source of concern. The probability of such errors not only grows with system size, but also with increasing architectural vulnerability caused by employing accelerators, such as FPGAs and GPUs, and by shrinking nanometer technology. Reactive fault tolerance technologies, such as checkpoint/restart, are unable to handle high failure rates due to associated overheads, while proactive resiliency technologies, such as preemptive migration, simply fail as random soft errors can't be predicted. Moreover, soft errors may even remain undetected resulting in silent data corruption.
The goal of the Workshop on Resiliency for Petascale HPC is to bring together experts in the area of fault tolerance and resiliency for high-performance computing from national laboratories and universities to present their achievements and to discuss the challenges ahead. The secondary goal is to raise awareness in the HPC community about existing solutions, ongoing and planned work, and future research and development needs. The workshop program consists of a series of invited talks by experts and a round table discussion.
Workshop general co-chairs:
- Chokchai (Box) Leangsuksun
With the scale of simulations now at the trillion-particle frontier, there are many questions that need to be confronted aside from how to make the simulation bigger and/or faster. As the dynamic range of simulations increases, physical processes that could be ignored earlier must now be included. In fact, the inherent multi-scale nature of many dynamical problems is only now becoming treatable as the dynamic range of the underlying codes has improved by several orders of magnitude. (Additionally, error controls must be proportionally tightened.) These advances, in many cases, have been due not only to improvements in hardware, but also due to the development of new methodologies, such as acceleration techniques.
This workshop aims to bring together researchers in this nascent area to share ideas, experiences, and plans for the future. The workshop will consist of a small number of invited talks followed by an open discussion session wherein all attendees are strongly encouraged to participate.
*All workshop participants need to register for the symposium.
Places to Visit in New Mexico