Los Alamos National LaboratoryInformation Science and Technology Institute (ISTI)
Implementing and fostering collaborative research, workforce and program development, and technical exchange

Parallel Computing Summer Research Internship

Creates next-generation leaders in HPC research and applications development

Program Leads  

This is an image of Los Alamos National Laboratory
2021 Parallel Computing Summer Research Internship

June 7 - August 14, 2021

Applications for the Summer 2021 internship are now under review. The application review process will be completed by February 8th. Applicants will be notified by email for selection status by mid-February.

The Parallel Computing Summer Research Internship is an intense 10-week program aimed at providing students with a solid foundation in modern high performance computing (HPC) topics integrated with research on real problems encountered in large-scale scientific codes.

Note: Due to the current health pandemic, the Summer 2021 internship will be held remotely. Student projects will still be offered along with a series of career-enrichment talks and virtual social hours.

Program Overview

During the 10-week program, students will receive training and lectures on modern topics in HPC and software development, including:

  • Parallel programming
  • Programming models
  • Algorithms
  • Hardware architecture and its impact on code design choices
  • High-quality software development in collaborative environments
  • Visualization and workflow

Students will collaborate in teams to identify and investigate different computational problems within the scientific focus area, and implement solutions guided by mentors with scientific and computational expertise. By working on cutting-edge HPC hardware, students will gain hands-on experience and learn how to effectively communicate their work through posters and oral presentations. 

Download the 2021 Brochure (pdf)

Application Guidelines

This highly-selective program is designed for upper division undergraduates to early graduate students from all STEM fields. Recent graduates from an accredited U.S. university may also qualify as a post-baccalaureate or post-masters if they are within a year or two of their degree. As a general guideline, students should have moderate experience with a compile scientific computing language, such as C, C++, or Fortrain and with the Linux operating system. Applicants must be enrolled at an accredited U.S. university and have U.S. work authorization. As part of the application process, please provide the following documentation:

  • Current resume (Please state citizenship in your application)
  • Unofficial transcript
  • Letter of intent describing your:  
    • Research interests and experience
    • Computational/computing experience
    • Interest in the program
    • Overall strengths and goals

Applications for the Summer 2021 internship are now under review. The application review process will be completed by February 8th. Applicants will be notified by email for selection status by mid-February.

Fellowship Stipend

Participants will receive a fellowship stipend, the amount to be determined based on your current academic rank.  The stipend will be paid in three installments over the course of the summer.  You will be responsible to cover your own travel, food, and housing.  Housing is in short supply in Los Alamos during the summer, but we will do our best to provide resources to find housing.


For the 2021 internship, there will be 8 different projects for students to choose from. Students will work in pairs and will be assigned various mentors that will oversee their project. The mentors are all established scientists at Los Alamos National Laboratory. For more detailed information, please click on the topic names below.

CP-2K molecular dynamics

Mentor: Pavel Dub (C-IIAC), pdub@lanl.gov

Students: 1-2

There are two possible projects

  1. Improving parallel scalability for the CP-2K ab initio molecular dynamics code (https://www.cp2k.org/) that uses MPI and OpenMP. The current parallel scalability is poor for both MPI and OpenMP.
  2. Parallelization of qfactor code between multiple nodes (https://github.com/edyounis/ qfactor)

The project will require the profiling of each application and the development of a parallel strat- egy for one or both applications. For the OpenMP implementation, the proper process placement and affinity should be looked at. Technologies that will be involved are profiling, serial perfor- mance, vectorization, MPI for distributed parallelism and OpenMP for threading parallelism using shared memory.

GPU Acceleration of a High-Order Compressible Navier-Stokes Code

Mentors: Jon Baltzer(CCS-2), jbaltzer@lanl.gov and Daniel Livescu (CCS-2), livescu@lanl.gov

Students: 2

Implement GPU acceleration to a subset of the FORTRAN CFDNS high-order accurate Navier- Stokes code targeting compressible isotropic turbulence. The summer project will include code benchmarking, evaluating hot spots, and GPU acceleration of selected hot spots.

CFDNS is a high-order accurate Navier-Stokes code, currently being applied to a multitude  of turbulent flows. We are implementing GPU acceleration to a subset of this FORTRAN code targeting compressible isotropic turbulence. The summer project will include code benchmarking, evaluating hot spots, and GPU acceleration of selected hot spots.

Technologies involved will be profiling the code with the goal of identifying and off-loading intensive calculations to the GPU. The plan is to use OpenMP, primarily with the IBM XL com- piler, for much of the offloaded code with GPU libraries and/or CUDA kernels where appropriate. Given the time constraints of the summer program, we intend to improving a focused subset of the code.

Particle beam dynamics

Mentor: Rakotoarivelo, Hoby, (T5), hoby@lanl.gov

Students: 1

This project involves a particle beam dynamics simulation code. It tackles a fundamental prob- lem related to nonlinear beam dynamics from radiation self-fields that underpins many particle accelerator design issues in ultra-bright beam applications. It supports two levels of parallelism with MPI and Kokkos to leverage multi-GPU nodes.  The goal of the student project would be   to improve the current performance and optimize the code for pre-exascale systems such as the NERSC Perlmutter.

Technologies that will be used include profiling, MPI for distributed parallelism and Kokkos for GPU kernels.


Mentors: Mark Petersen (CCS-2), mpetersen@lanl.gov, Luke Van Roekel (T-3), lvanroekel@lanl.gov, Matt Turner (T-3), mturner@lanl.gov

Students: 2-3

There are four potential projects involving The Model for Prediction Across Scales-Ocean (MPAS-Ocean). MPAS-Ocean is the ocean component of the Energy Exascale Earth System Model (E3SM), a global climate model developed by the DOE. The potential projects are

  1. Advection optimization on GPUs: MPAS-Ocean uses Flux Corrected Transport (FCT) ad- vection on an unstructured This project would extract the advection kernal (potentially using KGen) and optimize for GPU computations.
  2. Mixed Precision on GPUs Look at the impact of using mixed precision (some portions of the model in single precision, with others in double) on both the accuracy and performance of MPAS-O running on
  3. Data Access and Loop Order Optimizations Many of the loops in MPAS-O have common factor calculations outside of the innermost loops. This provides better performance on CPUs (less computation), but performance on GPUs is better for tightly nested loops. We want to look at the impact on performance of both approaches on both CPUs and GPUs. MPAS-O loops also have indirect accessing of data within the loops, where the arrays are not accessed contiguously in We want to look at the performance benefit of pre- gathering the data to create regular access patterns. The project would evaluate performance trade-offs on CPUs versus GPUs, and across different architectures.
  4. Lagrangian Particle Optimization on CPUs and GPUs MPAS-Ocean currently includes in- situ Lagrangian particles, which requires a large portion of the compute time. Lagrangian particles are useful to analyze mixing, track water masses, visualize currents, and compare to ARGO float observations. As particles move across processor domain partitions, they must be handed off and change ownership from one processor to the next. This is currently implemented as a linked list, which turned out to have poor The student would compare this to a more traditional array structure on CPUs, and then write a GPU implemen- tation of the particle advection computations.

Background: All E3SM components were designed to use variable resolution horizontal meshes for enhanced high-resolution regions within a global low-resolution mesh. These mod- els are built for scalability and performance on large clusters with MPI and OpenMP, and we are transitioning to run parts on GPUs. The students will work together, but each will be responsible for particular development and testing. All projects would include the development of ocean test cases and performance profiling.

Accelerating parallel Monte Carlo simulations for statistical physics

Mentor: Ying Wai Li(CCS-7), yingwaili@lanl.gov

Students: 2

Scientific background: Monte Carlo simulations are deemed “embarrassingly parallel” be- cause they can employ multiple random walkers to scale up and achieve strong scaling. In this project, we will instead focus on improving the weak scaling, i.e., how to make use of more pro- cessors to simulate larger system sizes. This provides two sub-projects that offer flexibility to suit different students’ interest.

Potential projects:

  1. Portable parallel Monte Carlo simulations on many-core processors and GPUs

The students will implement parallel domain decomposition algorithms and optimize them on many-core processors (such as ARM or Intel Xeon processors) and GPUs using portable programming models, so that the code can run on all these architectures. I would like to examine the C++ threading library in the new C++ Standard (C++14, C++17 or later), and compare its performance and portability to either Kokkos or OpenMP or CUDA.

  1. Combining machine-learned material models in Monte Carlo simulations

Better weak scaling can also be achieved if a machine-learned model is available to act     as a computationally inexpensive surrogate model. In this project, the students will first implement an existing neural-network model using the C++ binding that the PyTorch library provides, and combine it with an existing C++ Monte Carlo code. We will profile, examine, and optimize the threading and/or GPU utilization of the neural network kernels. If time permits (or depending on students’ interest), the students can perform large-scale Monte Carlo simulations (parallelized over MPI) to understand the physics of alloys.

MPI Optimization project

Mentors: Bob Robey (XCP-2), brobey@lanl.gov, Patrick Bridges (University of New Mexico), patrickb@unm.edu, Anthony Skjellum (University of Tennessee, Chattanooga) <Tony- Skjellum@utc.edu>, and Amanda Bienz (University of New Mexico), bienz@unm.edu, Howard Pritchett (HPC-ENV), howardp@lanl.gov, Sam Gutierrez (HPC-ENV), samuel@lanl.gov and application specialists

Students: PSAAP, maybe 1-2 students

This project will look at optimizations in unstructured halo communication for distributed com- puting with MPI. The communication will use MPI_Type_indexed types and communication calls to move the operation from hand-coded implementations in applications to the MPI library. The summer work will focus on the application side, identifying similar functionality and adapting for taking advantage of future performance improvements in MPI.

The project will also consider other MPI optimizations that might impact application perfor- mance. This includes structured halo communication, data migration, and asynchronous reduc- tions.

Gas and solid dynamic simulations on GPUs

Mentor: Nathaniel Morgan (XCP-4), nmorgan@lanl.gov

Students: 1-3

This project focuses on performance portability of gas and solid dynamics codes using the Kokkos library. This project will allow summer students to acquire experience and a greater understanding of compressible material dynamics, advanced methods and models, modern soft- ware development practices, and software portability over computer architectures (e.g., CPUs and GPUs). There are two potential research topics.

  1. The first research project will focus on creating a high-order Eulerian compressible gas dy- namics solver using
  2. The second research project will focus on incorporating Kokkos into a grain structure aware solid constitutive model that is coupled to a Lagrangian compressible solid dynamics

Eulerian numerical methods solve the governing physics equations using a fixed mesh while La- grangian numerical methods use a mesh that moves with the flow.

Parallelize interface model

Mentor: Jon Reisner (XCP-4), reisner@lanl.gov

Students: 1-2

This project will involve the parallel exploration of a newly developed time-dependent mesh and interface model within a general multi-scale methodology. Students will examine various parallel aspects of the multi-scale model including scaling and comparison against experimental data. Further, documenting any weaknesses associated with its current formulation and possibly suggesting new potential parallel approaches and/or changes to the overall method is encouraged.