Los Alamos National LaboratoryInformation Science and Technology Institute (ISTI)
Implementing and fostering collaborative research, workforce and program development, and technical exchange

2019 Project Descriptions

Creates next-generation leaders in Machine Learning for Scientific Applications

Contacts  

  • Program Lead
  • Diane Oyen
  • Program Co-Lead
  • Youzuo Lin
  • Program Co-Lead
  • Nick Lubbers
  • Program Co-Lead
  • Boian Alexandrov
  • Professional Staff Assistant
  • Melony Kosgei
  • Professional Staff Assistant
  • Nickole Aguilar Garcia

Contact Us  

Nonnegative Tensor Factorization for Machine Learning
Unsupervised Machine Learning (ML) methods aim to extract sets of hidden (latent) features from uncategorized datasets. Unsupervised ML methods include classical neural networks, clustering, various autoencoders, and the contemporary blind source separation (BSS) techniques based on matrix factorization. Tensor (i.e., multidimensional array) factorization methods are the natural extension of the matrix factorization for decomposition of high-dimensional datasets, that provide meaningful links among various low-dimensional features hidden in different dimensions of the data tensor. A limitation shared by most of the factorization techniques is the difficulty to relate the extracted latent factors and subspaces to physically interpretable quantities. The nonnegative factorization overcomes this limitation as the nonnegativity leads to a collection of strictly additive features that are parts of the data and hence are amenable to simple and meaningful interpretation. We are looking for graduate students interested in applying and developing novel ML algorithms based on nonnegative tensor factorization and tensor networks. We will explore latent features buried in various data, such as, pictures, text, computer simulations of biological molecules and others, that naturally incorporate explainable hidden variables, features and topics.
Machine Learning for Analyzing Scientific Images

Students will develop and apply computer vision and machine learning for automating the understanding of technical content contained in images. Computer vision, especially through the use of machine learning methods, has dramatically improved the ability to detect objects in images and semantically segment images to automate scene understanding. However these advances have not yet automated the understanding of information contained in hand-drawn figures, technical diagrams, and imagery produced for scientific inquiry. Students in this project will focus on one or more of the following areas:

  1. Develop computer vision algorithms for extracting information from drawings, technical diagrams and scientific plots.
  2. Develop computer vision algorithms for shape matching across a variety of image types including photographs, drawings, diagrams, scanned historic documents, etc.
  3. Develop machine learning and computer vision algorithms for image analysis (classification, object detection, instance segmentation) that require very little or no labeled training data (zero-shot, one-shot, and transfer learning; or domain adaptation); and allow end-users to quickly customize machine learning models to novel problems (model selection, interactive learning, and workflow automation).
Active Learning Applied to Fluid Flow in Nanoscale Porous Media

50% of U.S. drinking water and 85% of the world's energy comes from the earth's subsurface, which is often filled with porous materials. As such, fluid flow in porous media plays an important role in several geophysical topics of great environmental impact, including groundwater and pollutant transportation, hydrocarbon extraction, and carbon sequestration.

There are a variety of macroscale models for simulating fluids (such as Lattice-Boltzmann and Navier-Stokes), but these are known to break down in many natural materials such as shale. These materials have a large number of pores at the nanoscale--not much larger than the molecules that compose the fluid. In nanoscale pores, more expensive simulation techniques such as Molecular Dynamics can resolve fluid flow in confined conditions. Nevertheless, pore configurations are myriad, and Molecular Dynamics simulations are too expensive to be run for every possible geometry. An unsolved problem is which pore geometries need to be understood using atomistic simulation.

This project will build machine learning models of when and how fluid flow is modified by the presence of nanoscale structures. Active Learning will be used to query for new data by spawning molecular dynamics simulations. This will build datasets of of pore geometries and flow conditions that are needed for describing the behavior of fluids in large, complex porous media. A main research endeavor is to understand which ML models, active learning techniques, and uncertainty quantification schemes are most effective. We anticipate that participants will find the best results by creative synthesis of existing ML algorithms with physical expectations -- so-called Physics Informed Machine learning.

Preferred candidates will have interest and prior experience with Active Learning. An additional bonus is prior experience with fluid modeling or molecular dynamics. A modest background in physical sciences and programming is expected.

Scientific Machine Learning for Geoscience Applications

Machine learning has been used as a powerful tool for data analysis. While machine learning produces unprecedented success in conventional AI tasks, their applicability in scientific analysis problems including subsurface is even more challenging and exciting. This encompasses a wide range of problems including solving inverse problems, monitoring geologic formation changes due to fluid injection, detecting small but useful signatures out of large-scale remotely sensed imagery data sets, etc. Comparing to conventional AI domains, in scientific domains such as subsurface there are several unique challenges: restrictions in the measurement process, lack of availability of annotated data, and need for domain knowledge. In this project, students will have the opportunity to work with lab scientists in one of the following topics. 

  1. Data-Driven Inverse Problems

    Many subsurface problems can be formulated as inverse problems. Conventionally, solving for inverse problems can be challenging due to limited data coverage. In this work, students will develop machine-learning algorithms to infer subsurface features using geophysical measurements. Various techniques from computer vision, data mining, and machine learning have been proven to be effective.

    Reference
    Zhongping Zhang, Yue Wu, Zheng Zhou, Youzuo Lin, “VelocityGAN: Data-Driven Full-Waveform Inversion Using Conditional Adversarial Networks,” IEEE Winter Conference on Applications of Computer Vision (WACV), 2019.

  2. Multi-Physics Data Analysis

    Subsurface is full of complexity and uncertainty. Single geophysical measurements may not contain enough information to interpret the uncertainty of the subsurface. To resolve this issue, disparate geophysical data sets have been acquired and fused to infer the subsurface. In this work, students will have access to various types of geophysical data sets and develop machine learning algorithms to extract useful features and infer the subsurface.

    Reference
    Yue Wu, Youzuo Lin, Zheng Zhou, David Chas Bolton, Ji Liu, Paul Johnson, "DeepDetect: A Cascaded Region-based Densely Connected Network for Seismic Event Detection," in IEEE Transactions on Geoscience and Remote Sensing, 2018.

  3. Aerial Imagery Analysis

    Aerial imagery (or airborne imagery) has wide applications in land-use planning, environmental studies, surveillance, and subsurface monitoring, etc. Compared to natural imagery analysis, it can be much more challenging for high-resolution aerial imagery due to the high-spatial resolution and large volumes of pixels. In this work, student will work with domain experts and develop machine-learning-based image analysis techniques for airborne imagery.

    Reference
    Panfeng Li, Youzuo Lin, Emily Schultz-Fellenz, “Encoded Hourglass Network for Semantic Segmentation of High-Resolution Aerial Imagery,” in arXiv preprint arXiv:1810.12813, 2018.