Los Alamos National Labs with logo 2021

Parallel Computing Summer Research Internship

Creating next-generation leaders in HPC research and applications development

Program Leads  

Image shows aerial photo of Los Alamos National Laboratory

Los Alamos National Laboratory

2022 Parallel Computing Summer Research Internship

June 6 - August 12, 2022

Apply Now

The Parallel Computing Summer Research Internship is an intense 10-week program aimed at providing students with a solid foundation in modern high performance computing (HPC) topics integrated with research on real problems encountered in large-scale scientific codes.

Note: Due to the current health pandemic, it is still to be determined if the Summer 2022 internship will be held in person or remotely. Student projects will still be offered along with a series of career-enrichment talks and social hours.

Program Overview

During the 10-week program, students will receive training and lectures on modern topics in HPC and software development, including:

  • Parallel programming
  • Programming models
  • Algorithms
  • Hardware architecture and its impact on code design choices
  • High-quality software development in collaborative environments
  • Visualization and workflow

Students will collaborate in teams to identify and investigate different computational problems within the scientific focus area, and implement solutions guided by mentors with scientific and computational expertise. By working on cutting-edge HPC hardware, students will gain hands-on experience and learn how to effectively communicate their work through posters and oral presentations. 

Projects

For the 2022 internship, there will be 11 different projects for students to choose from. In the projects listed below, students will work in pairs and will be assigned various mentors that will oversee their project. The mentors are all established scientists at Los Alamos National Laboratory. For more detailed information, please click on the topic names below.

MPI Parallelization of Python DNS code
Mentor: Daniel Israel (dmi1@lanl.gov)

Students: 2

Parallelization of a python direct-numerical simulation (DNS) code to use a pencil domain decomposition instead of a slab decomposition.

Parallel mesh generation for E3SM

Mentor: Darren Engwirda, T-3 (dengwirda@lanl.gov)

Students: 1

The generation of complex, high-resolution unstructured meshes for the E3SM climate model is a computationally challenging task, requiring efficient and scalable algorithms to build, optimize and manipulate unstructured grids and geometric datasets from global to coastal scales. This summer project will focus on improving performance through parallelism and algorithmic optimization, addressing (a) the implementation of multi-threaded mesh optimization kernels within the jigsaw mesh generation library, and (b) algorithmic improvements to E3SM’s mesh manipulation and initialization workflows to employ asymptotically-fast approaches. Development will involve a mixture of C++ and Python, and will target multi-core architectures.

Comparison of B-grid and C-grid formulations for the sea ice dynamics on MPAS-Seaice

Mentors: Giacomo Capodaglio (gcapodaglio@lanl.gov) and Mark Petersen, (mpetersen@lanl.gov), CCS-2

Students: 1

In MPAS-Seaice, the dynamics is currently discretized on a B-grid, where the velocity components are defined at the vertices of the mesh cells. To facilitate the coupling with MPAS-Ocean, we are currently developing a C-grid variational formulation where the velocity components are instead defined at the cell edges. This summer project will consist of testing the performance of the C-grid formulation compared to the B-grid in terms of spatial convergence, theoretical consistency, computational cost, and quality of the solution. The student will run and design test cases and discuss analogies and differences of the two approaches in the final presentation. Work is primarily in Python.

Shared memory parallelization of sea level model

Mentor: Matt Hoffman, T-3 (mhoffman@lanl.gov)

Students: 1

To enable projections of regional sea level changes, E3SM will be incorporating a new sea level model that solves a spatially varying sea level (i.e. geoidal height) based on viscoelastic solid Earth deformation and adjustments to the Earth’s gravitational field from changing ice sheet, glacier, terrestrial water, and ocean loads.  The existing sea level model is a serial model using spherical harmonic methods written in Fortran90.  

This summer project will parallelize the sea level model using OpenMP by adding threading to loops in the existing code.  The student will then quantify performance improvements at a range of thread counts for a series of different model resolutions.  The student will also explore potential memory limitations and possible solutions when the threaded sea level model is run coupled to an ice sheet model that makes use of MPI parallelism.

Why explosions looks like earthquakes and what earthquakes can tell us about explosions?

Mentors: Carene Larmat (carene@lanl.gov) and Ting Chen (tchen@lanl.gov) , EES-17

Both earthquakes and explosions generate seismic waves that can be studied to discriminate them. Geological structures control the propagation of seismic waves and affect our ability to analyze seismic signal to recover the properties of the source. Data collected from explosions show a seismic energy partition between compressive and shear motion unexpectedly close to earthquakes. The goal of this project is the parallelization of data analysis and generation of Earth models as well as running some HPC modeling of wave propagation in Rock Valley.

Continuous integration across multiple architectures with different parallelization strategies

Mentors: Scott Luedtke (sluedtke@lanl.gov, XCP-6) & Brian Albright (balbright@lanl.gov, XTD-PRI)

The Vector Particle-in-Cell (VPIC) code's recent port to the Kokkos performance portable framework has given rise to pressing continuous integration (CI) challenges.  While ensuring good performance across many platforms, Kokkos will change the data layout, parallelization, and computation schemes based on what will perform best on a given architecture.  Thus, while the physics should remain the same, the parallel numerics change on a platform-by-platform basis, making thoroughly testing code changes difficult or impossible for a single developer.  The goal of this project is to develop an automated CI testing suite that tests correctness across multiple parallel computation schemes with conservation tests, physics problems with analytical solutions, and possibly even order-of-accuracy tests via the method of manufactured solutions (Riva, Fabio, Carrie F. Beadle, and Paolo Ricci. "A methodology for the rigorous verification of particle-in-cell simulations." Physics of Plasmas 24, no. 5 (2017): 055703.).  Additional CI tests increasing our code coverage of non-physics parts of the code (MPI, output) will be highly valuable.

 

Modeling the dynamics of solids on GPU and CPU architectures

Mentor: Nathaniel Morgan, (nmorgan@lanl.gov, XCP-4): Continuum models and numerical methods

This project focuses on performance portability of an advanced solid dynamics code that accounts for mesoscale physics in continuum scale simulations.  The students will gain hands-on experience and a greater understanding of solid dynamics, advanced numerical methods and models, modern software development practices, and the C++ Kokkos software portability library that enables codes to run on both GPU and CPU architectures with a single implementation.

Efficient and correct parallel partitioning of a problem

Mentor: Laura Monroe (lmonroe@lanl.gov) and perhaps Terry Grove, HPC-DES

Students: 1-2

During FY20, the ASC BML Inexact Computing project demonstrated by example on real problems that bad ordering and grouping of a summation could give rise to substantial incorrectness in the results. In particular, divide-and-conquer, although efficient and in very common use, gives a great deal of incorrectness under certain conditions. We developed heuristics for correct calculation from this investigation, based on two example codes.

We have since developed a new algorithm for partitioning that optimizes both correctness and performance. It allows much more flexibility in partitioning than divide-and-conquer, which should greatly improve the correctness of the results. We have also proven even the worst case to be nearly as efficient as divide-and-conquer. This new algorithm is also easy to implement. We propose to demonstrate this method with real parallel calculations. First, we would like to start with a simple summation on many terms, and assess the correctness and speed of the parallel calculation with different sets of numbers to be summed. Then, if possible, we would like to run this on some real code, and again assess.

Implementing Pipeline Mode and Low-Latency Mode in Ptychography using MPI and OpenCL

Mentor: Kevin Mertes (kmmertes@lanl.gov), C-PCS

Students: 2 (US Citizens)

Ptychography is a coherent x-ray imaging technique used to resolve the internal structure of materials and devices in 2D and 3D at the nanometer scale.  Iterative techniques are used to reconstruct the complex-valued transmissivity and complex-valued probe from the real-valued intensities of diffraction patterns.  

For high throughput scenarios where we stream batches of diffraction patterns to the reconstruction software,  we need a schedular capable of analyzing the capabilities of the various nodes in a heterogeneous cluster.  The scheduler will need to coordinate the different sub-tasks in the reconstruction software (i.e. loading data via network, pre-processing data in CPU, reconstructing data in GPU).  We envision a pipeline that moves the data from each sub-task in a coordinated fashion to fully utilize all hardware in a node.  If the node has multiple CPUs and GPUs we expect multiple datasets to be processed simultaneously.

For low-latency operation, we need multiple nodes to work on one dataset to return a reconstruction in the shortest amount of time.  This means splitting the data among multiple nodes and/or multiple GPU’s to scatter the data among many compute units, gather the results and assemble them into a complete reconstruction.  The solution will need to run on single node with multiple GPU’s and/or a cluster with many individual GPU’s.  Profiling the software to find bottle necks will be expected.

Parallelize a finite-element and discrete-element computational fluid dynamics code with OpenACC

Mentors: Jon Reisner (reisner@lanl.gov, XCP-4) and Bob Robey (brobey@lanl.gov, XCP-2)

Students: 2 (US Citizens)

Purposed work scope is to incorporate OpenACC directives into LANL's HOSS code with the goal of enabling the model to run efficiently on GPUs.  HOSS combines finite-element and discrete-element methods to simulate material deformation and fracture. As part of this effort timings utilizing CPUs will also be undertaken to illustrate the comparative advantage afforded by the move to GPUs.

Using a parallelized coastal surface and subsurface coupled hydrologic model to predict coastal saltwater intrusion under the influence of coastal sediment transport and landscape evolution.

Project description: Saltwater intrusion is an important threat to coastal ecosystem stability. Current studies predicted saltwater intrusion by considering a static coastal landscape. However, the coastal landscape is highly dynamic in response to sediment erosion and deposition. In this project, we would hope to use the parallelized Hydrologic Model (Advanced Terrestrial Simulator, ATS) to simulate saltwater intrusion under a dynamic coastal landscape.

Expected project outcomes: Students can gain good modeling experience to solve water-related problems by using parallel computing. This is a cutting-edge research project, which can significantly improve our understanding of coastal saltwater intrusion under future climate change. I expect at least one high-impact research publication from this study.