Los Alamos National LaboratoryInformation Science and Technology Institute (ISTI)
Implementing and fostering collaborative research, workforce and program development, and technical exchange

2018 Project Descriptions

Creates next-generation leaders in Machine Learning for Scientific Applications

Contacts  

  • Program Lead
  • Diane Oyen
  • Program Co-Lead
  • Youzuo Lin
  • Program Co-Lead
  • Nick Lubbers
  • Program Co-Lead
  • Boian Alexandrov
  • Professional Staff Assistant
  • Melony Kosgei
  • Professional Staff Assistant
  • Nickole Aguilar Garcia

Contact Us  

Scalable Optimization Methods for Dictionary Learning

[Scalable Dictionary Learning]

Sparse coding and dictionary learning have proved to be highly successful in a wide range of machine learning and related applications. The computational cost of the associated optimization problems, however, has limited the application of these approaches to large scale problems. We are interested in developing scalable algorithms, including but not limited to stochastic gradient descent methods, that would extend the size of problems to which dictionary learning approaches can be applied. We are seeking a graduate student whose current research is complementary to these goals, so that the fellowship will enable the candidate to explore these issues with the LANL mentors, while advancing his/her studies.

Explainable Machine Learning

[Explain ML]

Many of successful machine learning techniques act as black boxes and do not necessarily provide sufficient explanations that are required for taking critical decisions. We are looking for graduate students interested in developing novel ML algorithms that combine the benefits of explicability and predictability. On the technical side, we will explore training from correlated samples, models allowing for an a posteriori easy inference, as well as models that incorporate explainable hidden variables. It is advantageous if the applicants have experience in working in topics relevant to graphical models, high-dimensional statistics, machine learning and statistical physics. The fellowship can be used by the successful candidate to continue their current studies, share their work with other students in the program, and explore new directions and applications with their LANL mentors and other students.

Machine Learning in the Wild

[Wild ML]

Adoption of machine learning approaches in many scientific and national security applications will require algorithms that are robust to changing environments including adversarial environments; adaptable to new yet similar inputs, outputs and objective functions; and allow an analyst to verify models and results. We are seeking students to address one (or more) technical challenge such as lifelong learning, transfer learning, domain adaptation, interactive machine learning, and/or interpretable machine learning to aid in the application of machine learning algorithms “in the wild”. The fellowship can be used by the successful candidate to continue their current studies, share their work with other students in the program, and explore new directions and applications with their LANL mentors and other students.

Biologically-inspired machine learning

[Bio ML]

Much of deep learning learning requires large, labeled training sets but much can be accomplished using biologically-inspired unsupervised training procedures that seek to learn hierarchical representations directly from raw, unlabeled data.  This project will apply such unsupervised training techniques to problems such as monaural separation of audio into vocal and instrumental tracks, tumor classification based on pathology slides and gene expression data, prediction of video and other spatiotemporal sequences, and related problems as dictated by student interest.

Characterization of High Performance Computing Sensor Data

[Characterization of HPC]

Extremely large computing facilities require a variety of monitoring strategies in order to ensure that the facility is running normally, and to detect undesirable behavior modes.  One way in which these large machines are monitored is a variety of environmental sensors placed on the hardware — measuring quantities such as temperatures, voltages, etc.  On the largest machines, the daily output of these sensors alone can amount to terabytes of data.  Not only is the data massive, but the relationships between its features are not analytically understood, and sensors themselves can be finicky, leading to a variety of missing data problems as well.  The goal of this project is to either use existing machine learning methods, or develop new methods if appropriate, to better understand and characterize the variety of behavior modes present in the HPC environmental sensor data.  The focus of a summer project could be on statistical modeling of available data, dealing with missing data, and/or incorporating domain knowledge about spatial and temporal relationships between hardware components.

Event Forecasting

We are seeking a graduate student who would like to explore machine learning approaches to forecast events using traditional (e.g., satellite imagery, climate, demographic information) and non-traditional (e.g., twitter, news articles, google searches, wikipedia) data sources. A forecast estimates relative likelihoods of events occurring which is different from predicting specific events. For example, some of the “events” one may forecast include disease incidence, school shootings, and social unrest. We are open to exploring other potential areas that could be forecasted.

Interactive Text Mining

Seeking students in computational linguistics, computer science, statistics, applied math, and/or machine learning for an exciting summer project that pushes the edges of what is achievable with text mining. We are developing a suite of algorithms and workflow for text mining that combines recent breakthroughs in natural language processing (NLP), topic extraction and modeling, text categorization, machine learning, and human interaction with the goal of demonstrating a measurable improvement over the state-of-the-art on a defined problem and data sets. Experience with Python, related machine learning and NLP libraries, and/or C/C++ is expected.

Hyperspectral Remote Sensing Image Analysis

[Hyperspectral Imagery Analysis]

This project will investigate ML approaches in hyperspectral remote sensing image analysis.  Our driving interest is target and anomaly detection, and a key aspect for both of those problems is the characterization of the non-target background. Background models have traditionally used simple distributions (eg, Gaussian is popular), though kernel and graph-based approaches have also been considered.  Machine learning is an attractive approach because this is a problem with a lot of data (every pixel in the image [or image archive] is potentially a sample) and not a lot of theory.  There are two parts to this project. One is a regression approach to background estimation, in which the background spectrum at a pixel is estimated as a function, learned from data, of the pixels in the local neighborhood.  The more accurately the background is estimated, the more effective traditional target detection algorithms will be.  The second part of the project is to use matched-pair machine learning to build a detector directly; here the power of ML is used for the full target detection problem, incorporating not only the variability of the background but also the nonlinear interaction of the target with the background.

Towards automated geologic feature detection using low-altitude UAS imagery analyses

[UAS Imagery Analysis]

Unmanned airborne systems equipped with high-resolution cameras provide efficient, non-invasive means of documenting areas of interest.  With adequate ground control and repeat data acquisition over time, these datasets can provide valuable information on spatial and temporal georeferenced change.  Analyses of these imagery datasets, however, are time-consuming and require significant human time commitments.  We seek machine learning tools to automatically detect, identify, and discriminate geologic features including fractures/fracture networks and varying geologic materials using RGB imagery plus digital elevation models.  While machine learning tools can ultimately be applied to smaller spatial regions of our data collection, we aim to automate these detection tools across an orthoimagery dataset representing an ~200m x 300m spatial area.  Familiarity with geoscience is a plus but not required. 

Imagery Analysis Techniques for Geosciences Applications

[Geoscience Imagery Analysis]

In geosciences, various types of imagery measurements are recorded to interpret the geological features. For instances, in seismology, 2D seismogram are utilized to understand the subsurface structure and earthquakes; in remote sensing, multi-/hyper-spectrum imageries are captured to understand the surface geological features. A common issue out of those applications is that certain valuable signals can be very small and buried among noisy environment. What makes it even more challenging is the lack of labeled data sets because of the difficulty of identifying these small signals, even for domain experts.

Through this project, we will develop imagery analysis techniques to address this issue. We will use seismic imagery as the testing problem, but we will develop a general framework extensible to many other domains. Our strategy is to divide-and-conquer the problem. We will separate the imagery contents into different components according to their statistical information, and then further analyze each of the components. Our algorithms will be built upon several existing imagery analysis techniques such as robust-PCA, supervised dictionary learning, and sparse coding, etc.

Fluid Flow Pathways in Fracture Networks

[Fracture Networks]

Fractures are the primary pathway for fluid flow through the subsurface in low permeability porous media such as granites and shales. Field and laboratory experiments of flow through fracture networks indicate that flow channeling is a common feature through fractured subsurface systems strongly suggests the existence of primary flow pathways. We seek to identify these primary flow paths using machine learning techniques combined with a graph representation of the network.  In these systems, the uncertainty surrounding topological properties, which dominates this system behavior, cannot be characterized at the macroscale due to the large computational cost of the high-fidelity simulations.  By representing the fracture network as a graph based on physical, geometric and topological properties of the network, we can reduce computational burden to a point where machine learning techniques can be efficiently applied.  The goal is to use training data based on particle trajectories through these network in conjunction with the graph representation to classify fractures and paths as those that do or do not participate in the flow. For given network characteristics and boundary conditions, we will rank the importance of physical, geometric and topological features in order to predict the reduced domain on which flow and transport can be simulated without sacrificing accuracy of key upscaled observables.

Extracting precursors that characterize the eruption dynamics of CO2-driven cold-water geysers using machine learning

[Eruption Dynamics of Geysers]

Thermally driven geysers (such as Yellowstone) are characterized by frequent eruptions of liquid water and steam. Another subsurface system capable of producing periodic eruptions (similar to thermal geysers) is CO2-driven cold-water geysers. They erupt for over 24h at a time with relatively high velocity CO2-driven discharge from wellbores. Growing interest in geologic carbon storage has brought attention to CO2-driven cold-water geysers because of its similarity to high velocity wellbore leakage process. In the CO2-driven cold-water geysers, CO2 (gas) evolves by the pressure reduction (flashing) of CO2-rich fluids. Once the internal pressure of CO2 (aqueous) becomes greater than that of the surrounding fluid, CO2 separates from the fluid causing bubbles to nucleate, grow, and coalesce. Hydrostatic pressure reduction resulting from increasing CO2 gas volume fraction enhances expansion of CO2 bubbles leading to the eruption. The goal of this summer project is to identify a set of precursors to understand the eruption dynamics from time-series signals (seismic and/or acoustic) using machine learning. To be specific, we plan to extract/decompose signals that characterize the periodic eruption events from noisy data sets through time-series feature engineering and source separation methods. This decomposition of signals (from sensors that are close and as well as from sensors that are far away from the geyser) into independent components (pre-eruption signatures and anthropogenic activities) can help in better understanding of the behavior of eruption times of CO2-driven cold-water geysers. For the success of the proposed summer AML project, we need a student with background on python programming (so that he can add value to existing ML code), time-series analysis or signal processing, and Blind Source Separation Methods or PCA/ICA/NMF/SVD methods.

Hierarchical comprehensive feature analysis of time-series signals to understand earthquakes using machine learning

[Time-series Analysis for Earthquakes]

Time-series signals are central to identify the state of a dynamical system. They are ubiquitous in many areas related to geosciences such as acoustic/seismic signals that characterize earthquakes. Predicting the timing of earthquakes from time-series signals is central to many early-warning systems. However, earthquake forecasting is a very hard problem. This is because earthquake recurrence is not constant for a given fault. Moreover, earthquake fault process is poorly understood. To get new insights in to fault physics of earthquakes, recently in laboratory settings, LANL scientists (Paul Johnson and Co-Workers) have shown that by applying new developments in machine learning (ML) they were able to identify hidden signals that precede earthquakes. From a bird-view, the ML approach employed uses a user-defined moving time-window to obtain features from time series signals and then filter them to predict the remaining time before the next failure event. The goal of this summer project is to identify an optimal moving time-window so that beyond this window it is very difficult to identify small events in the time-series signals. This optimal moving time-window can provide insight into the physics of the earthquakes (dissipation of energy in a chunk of time). For the success of the proposed summer AML project, we need a student with background on python programming (so that he can add value to existing ML code), time-series analysis or signal processing, and machine learning methods such as Random Forests/Support Vector Machines.

Machine learning solutions to revealing the hidden seismicity of Mars

[Seismicity of Mars]

The study of seismic signals recorded on Earth has led to discoveries such as the existence of the liquid outer core of our planet, the confirmation of tectonics along with an understanding of these extreme events that are earthquakes and tsunami. Today seismology plays an important role in geopolitics giving us evidence of nuclear tests, in economics giving us insight into dynamics of geophysical reservoirs and systems. However, another surprising area is emerging where seismology is foreseen to bring unique contributions: space. Locked inside celestial bodies such as Mars, the Moon, and asteroids are structures and materials that - if revealed - will tell us a lot about the past and dynamics of our solar system and the amount of in-situ resources for possible future exploration.

The aim of seismology is, in essence, hidden. As a consequence, the paradigm of instrumentation for planetary seismology is demanding: Complex and sensitive devices are required, for mostly unknown results to be recorded in complex conditions. The InSight (Interior exploration using Seismic Investigations, Geodesy and Heat Transport) mission, planned for November 2018, is addressing this issue with an innovative initiative. A blind test is proposed in which candidates are invited to develop techniques to detect and characterize the seismicity of mars as well as noise conditions of the planet (Clinton et al. 2017). Machine learning techniques are particularly well-suited to this task where noise level is unknown and varying (one source of noise is the wind responding to the diurnal cycle of Mars). Specific of the projects are: (a) two continuous synthetic waveform series are provided mimicking the expected records of a short-period and Very Broad-Band seismographs by a one Earth-year operation of Insight;(b) the synthetic catalog contains both mars quakes and impacts that have different source mechanisms; (c) sources of noise modeled in the tests are the seismometer instrument itself, changing atmospheric pressure causing ground deformation, temperature changes, and wind-induced solar panel vibrations; (d) the use of single-station event-location techniques specifically developed for Insight may be preferred to classical multi-station techniques classically used on Earth; (e) training is possible thanks to web-based modeling tools (Intaseis; Van Driel et al. 2015) with the complication that the 1D model that was used to generate the synthetic waveforms is unknown amongst 14 candidates. Success to the test will be measured by the number of correct detection and location of sources of seismic signals.

Machine Learning of First-Principle Particle Simulation Data on Particle Acceleration during Magnetic Reconnection

[Particle Acceleration]

Particle acceleration during magnetic reconnection is a major unsolved problem in magnetospheric science and solar physics. The accelerated particles in Earth’s magnetosphere and solar flares damage satellites and threaten human activity in space. Recent massively parallel particle simulations (such as LANL’s VPIC simulations on peta-scale computers) have made much progress on this problem. However, the current insight on particle acceleration mechanism is limited to analyzing only a few hundred particles manually picked among trillions of particles in the largest VPIC simulations. This project will focus on using machine learning to automatically classify trajectories of a large number of particles, based on which we will identify characteristic acceleration patterns and discover new acceleration mechanisms. The candidate is expected to be familiar with machine learning methods and be willing to apply his/her expertise to large-scale numerical simulations.

Neural networks applied to atmospheric source location

[Atmospheric Source Location]

LANL has developed a Convolution Neural Network Code that can locate and quantify gas sources by ingesting meteorological field and in situ ambient gas concentration data downwind of the source. The system has been integrated with a laser-based sensor on a Beaglebone chip and has successfully located methane leaks on well pads at 10-100m scales at well pads for ARPA-E. The system is trained using model simulations followed by on site calibration using controlled releases. Subsequently, it gets better with time as it learns with each successful hit that depends on the local topography and meteorology.  The project is at the forefront of data-model mining and interdisciplinary science to solve a real world methane leak problem.

Our student project will push our system to larger distances of 0.1-10km using field data sets and some web based atmospheric back-trajectory calculations. We will also examine the ability to distinguish methane leaks from a natural gas sources versus ruminants or landfill source by using ethane as a specific marker. We will also explore extending our application to mobile platforms such as Google car, UAV and/or robotic submarine. The student will learn about neural networks, meteorological parameters and gas sensors. The key requirement will be familiarity with merging data sets, analysis of input and output, excitement and the drive to solve problems using interdisciplinary approaches.

Pattern discovery and prediction in observations of seismic events from historical data using graph approaches

[Seismic Event Graphs]

Seismic observations of P-waves and S-waves from earthquakes or mining explosions at recording stations are commonly recorded by seismic network operators and published as "seismic bulletins."  Over time, stations in a recording network may be removed, and the ability of the network to fully record a seismic event may be diminished.  Seismic activity tends to be constant through time, however, and historical observations of seismic events at a recording station may be used to predict observations of a new "similar" event that would've been observed at that station, even after it has been removed.

In this project, we will build a graph database of historical seismic observations at recording stations and their associations to seismic events, we will use graph theoretical and clustering approaches to identify families of similar events, and finally, we will attempt to predict observations of new seismic events that are similar to those previously recorded.  Results may be compared to those predictions obtained through high-fidelity earth models and knowledge of seismic propagation physics.

Accelerating Electronic Structure Theory with Deep Neural Networks

[Electronic Structure Theory]

Electronic structure theory (EST) plays a principle role in the computational study of Chemistry, Physics, and Materials—enabling diverse fields such as drug discovery, biophysics simulation, and materials design. However, EST methods are computationally expensive, usually scaling as O(N^3) or worse in the system size N. Machine learning models such as Deep Neural Networks (DNNs) can​ operate as fast as O(N) time, and can thus provide fast approximation and/or preprocessing for EST. In this project, the student(s) will integrate with current LANL work to develop improvements to graph-based DNN methodologies and/or develop further applications of DNNs to EST. Previous experience with EST, Python programming, and/or DNNs is highly valued.