USRC LANL Staff

Meet the USRC LANL staff.

Sean Blanchard 

Sean BlanchardSean Blanchard has spent the last 20 years troubleshooting, designing and building some of the largest supercomputers on the planet. He has worked at every level from pushing bits in the BIOS, operating systems internals, fast parallel filesystems, runtime systems, and writing parallel scientific applications. Sean hates black boxes and opens every one he finds. Before engineering computers, Sean was an experimental nuclear physicist that opened protons to see what was inside. In recent years he has leveraged that experience to study the behavior of large scale computer systems in radiation fields from Cosmic Rays. He has Masters degrees in nuclear physics and electrical engineering, and is currently pursuing a PhD in computer engineering in order to collect all the degrees.

 

Lei Cao

Lei CaoMarFS development.

 

HB Chen

HB ChenSoftware defined networks, distributed machine learning, universal namespace, and HPC virtualization and containerization.

 

Rusty Davis

Rusty DavisRusty Davis is a staff scientist working on resilience and virtualization. He attended Clemson University majoring in Computer Science receiving a masters and bachelors degree. His research interests include artificial intelligence, software development, resilience, and data analytics. Rusty is working on the decaf-fsefi fault injector and the charliecloud container system.

 

Nathan DeBardeleben

Nathan DeBardelebenNathan is a senior research scientist at LANL and is the Co-Executive Director for Technical Operations of the USRC.  His research focuses on resilience and reliability of supercomputers, particularly from a hardware and systems perspective.

Nathan joined LANL in 2004 after completing his PhD in computer engineering from Clemson University with a focus on parallel computing.  He was a founding member of the DOE Resilience Council, an early technical organization which shined a light on the need for more reliable software and hardware at the extreme scales which DOE was targeting.  Nathan collaborates with several universities and mentors junior staff, students, and leads a team of researchers.

 

Chris DeJager

Chris DeJagerDeveloping the file system MarFS, acceptance testing, and supporting the testing tool Pavilion.

 

Andy Dubois

Andy DuboisHPC advanced/future technology research, HPC networking I/O fault injection and algorithm acceleration, FPGA design, HW/SW co-design. 

 

Hugh Greenberg

Hugh GreenbergHugh participated in the design and implementation of the Linux Noise Detective. The Linux Noise detective is a Linux kernel module and a GUI to collect process data directly from the kernel (on multiple cluster nodes simultaneously) and analyze the data to determine the sources of system noise. He also participated in the design and the development of the XGet file transfer software. XGet scalably transfers files to nodes within a cluster by building a tree of participants and delegating serving duties to optimal slave nodes. He participated in the development of the XCPU cluster management system. XCPU keeps the state of the cluster distributed across all nodes, allowing easy configuration of hot-spare management nodes and graceful failover that doesn't require canceling the running jobs in case of head node failure.

Hugh's work at USRC is to perform system software research intended for Exascale class super computers and beyond.

 

Terry Grové

Terry GroveTerry Grové is a staff scientist and software developer involved in a variety of HPC related projects.  He attended  Coastal Carolina University majoring in Computer Science with an emphasis on Software Development, and minoring in Mathematics. His research interests include artificial intelligence, software development, software design, resilience, fault tolerance, and data analytics.  Terry is leading the development efforts for DECAF-FSEFI, a soft error fault injector designed to test and profile the resilience of applications.  He is also involved in development for Slurm, creating and maintaining the USRC website, and a variety of other projects.  In his personal time, Terry is involved in a variety of software development projects and is a hobbyist game and mobile app developer.

 

Brett Holman

Brett Holman

Brett is a scientist working in High Performance Computing on the Infrastructure-Network team.  Brett joined LANL in June 2018.  Brett's work focuses on High Speed Networking (particularly Infiniband and Omni-Path). 

 

Rodney Howeedy

Rodney HoweedyRodney is a scientist with the LANL High Performance Computing Archive Storage team. His interests include data warehouse design, database performance tuning, meta data design, software automation and filesystem characterization. Rodney's industry experience includes database, data warehouse and system architecture as well as software Application Programming Interface design for web data interchange with databases. 

Rodney collaborates on High Performance Storage System software projects at LANL in conjunction with other national labs and software vendors. He joined LANL in June of 2018 after working for several Original Equipment Manufacturer hardware and software companies across several industry sectors.

 

 

Jeff Inman

Jeff InmanSoftware development for Campaign Storage, storage-system cost-optimization modeling, GPU algorithms, HW acceleration, and future HW/SW system architectures.

 

Latchesar Ionkov

Latchesar IonkovLucho co-developed the v9fs filesystem, which is now a standard part of the Linux kernel distribution. His previous work includes CellFS programming model and XCPU and XCPU2 process-management systems which addressed issues of large-scale system complexity, resiliency, and manageability.

At USRC, Lucho works on scalable system software and accelerated access to application data.

 

Jeff Kuehn 

Jeff KuehnHPC Strategic Partnership Programs

 

Mike Lang

Mike LangMichael has been working with UNIX systems for over twenty years, joining LANL from 1999 to 2010 as a member of the Performance and Architecture Lab (PAL) focusing on performance of large-scale systems. Currently he is the team leader for Ultrascale Systems Research focusing on resilient scalable systems software for large-scale systems. He received his MS in Electrical Engineering from University of New Mexico, and BS in Computer Engineering from UNM.

 

Josip Loncaric 

Josip LoncaricHPC Technology Futures Lead

 

Lena Lopatina 

Lopatina LenaConsult Team

 

Dominic Manno

Dominic MannoStorage systems and system software

 

Laura Monroe

Laura MonroeLaura is a researcher in resilience and novel computing techniques, especially probabilistic computing. Her current interest is the design of algorithms and systems to address expected increasing fault rates in hardware in a probabilistic manner. Another interest is the application of discrete mathematics to the design and understanding of computing systems. She also led the production visualization effort at LANL for many years, and was the originator and project leader of the recent redesign and redeployment of the LANL visualization corridor, encompassing the computing systems, networking, and display systems used for LANL ASC large-scale visualization. She served on the design teams for the Cielo and Trinity supercomputers and was one of the designers of the Viewmaster visualization compute cluster. She has published in the areas of probabilistic computing and algorithms, resilience, error-correcting codes, virtual reality and visualization. She received her Ph.D. In Mathematics and Computer Science in the field of Error-Correcting Codes, working with Dr. Vera Pless.

 

David Montoya

David MontoyaDave works in the intersection of application and architectures. HPC Software Environments encompass what is needed by users, developers and system individuals. Workflow characterization and quantification is being used to map the need with performance metrics captured to map the direction needed for that community as well as vendor architecture efforts. Dave is also involved in cross-lab programming environment open-source projects, monitoring efforts, and university projects.

 

Elisabeth (Lissa) Moore

BasemanLissa is an applied machine learning researcher and data scientist working on the resilience and fault-tolerance team. At USRC, her work spans using statistical relational models for fault characterization and mitigation as well as developing anomaly detection techniques for large-scale monitoring of supercomputing facilities.  Before joining USRC, Lissa contributed to quantum algorithms for machine learning at LANL’s Center for Nonlinear Studies. Her background, including work on social network analysis with the Human Language Technology group at MIT Lincoln Laboratory and a short time at a startup back in Massachusetts, is primarily in the development and application of probabilistic graphical models to new relational and/or temporal domains. Lissa received her MS in Computer Science from the University of Massachusetts Amherst and her BA, also in Computer Science, from Amherst College.  

Visit Lissa's LANL profile

 

Dave Morton

Dave MortonLANL HPC Design Group Leader

 

Paul Peltz

Paul PeltzPaul Peltz works in the HPC Design group as a System Integrator. He is responsible for deploying HPC systems into production and solving difficult system problems as they come up. His focus now is to evolve our system software stack to use more modern tools to help automate the administrative workflows in order improve the efficiency of our system administrators.

 

Howard Pritchard 

Howard PritchardHoward Pritchard is researcher in HPC network software. He is actively involved in the Open MPI project and Open Fabrics Interfaces Working Group. He is also involved in the OpenSHMEM community, and leads a project to combine this programming model with the Habanero asynchronous task-based runtime. Before joining USRC and LANL, Howard was a Principal Engineer at Cray Inc. where he worked on the design and implementation of various components of the Cray XE and XC network software stack.

 

Brad Settlemyer 

Brad SettlemyerBrad Settlemyer is a storage systems researcher and systems programmer specializing in high performance computing. He received his Ph.D in computer engineering from Clemson University in 2009 and works as a research scientist in Los Alamos National Laboratory's HPC Design group. He has published papers on emerging storage systems, long distance data movement, network modeling, and storage system algorithms.

 

Terry Tarnowsky

Terry TarnowskyConsult Team

 

Scott White

Scott WhiteTooling and processes to support next generation HPC filesystems and storage administration.

 

Lowell Wofford

Scott WhiteHigh Performance Computing