# 2020 Project Descriptions

Creates next-generation leaders in Machine Learning for Scientific Applications

## Contacts

**Program Lead**- Diane Oyen

**Program Co-Lead**- Youzuo Lin

**Program Co-Lead**- Nick Lubbers

**Program Co-Lead**- Boian Alexandrov

**Professional Staff Assistant**- Elizabeth Grossman

## Contact Us

Students will develop and apply computer vision and machine learning for automating the understanding and analysis of technical content contained in images. Computer vision, especially through the use of machine learning methods, has dramatically improved the ability to detect objects in images and semantically segment images to automate scene understanding. However these advances have not yet automated the understanding of information contained in hand-drawn figures, technical diagrams, mathematical equations, data plots, and other images conveying technical information. Additionally, scientific imaging technologies such as advanced radiography, and microscopy, are used in a variety of applications at LANL, and typically produce images with unique challenges and application-specific features of interest. Students in this project will focus on one of the following areas:

1) Develop computer vision algorithms for extracting information from drawings, technical diagrams and scientific plots.

2) Develop graph and manifold learning for shape analysis through learning-based approaches operating in non-Euclidean spaces including graphs and manifolds. Methods will be applied to images comprising technical diagrams, sketches and those conveying technical information to solve problems of pairwise correspondence, similarity and retrieval; or construction or estimation of central shapes from a collection of shapes to determine the variability of images relative to the constructed central.

3) Develop machine learning algorithms for consistent characterization and quantification of scientific imagery. Students will help develop solutions to feature delineation, segmentation and contour completion in low-contrast and noisy imagery. Students will also be encouraged to find ways to re-target (or transfer) solutions to new datasets with a small amount of training data and/or user interaction.

Subsurface imaging turns geophysical data into actionable information. The technique has been widely used in geophysical exploration to understand site geology, stratigraphy, and rock quality. Subsurface imaging usually represents itself as an inverse problem. However, solving those inverse problems has been notoriously challenging due to their ill-posed and computationally expensive nature. On the other hand, with advances in machine learning and computing, and the availability of more and better data, there has been notable progress in solving such problems. In our recent work [1, 2], we developed end-to-end data-driven subsurface imaging techniques. Our methods yield encouraging results when test data and training data share similar statistics characteristics. However, it can be rather challenging when test data becomes "further away from" the training data, which happens fairly often in real applications. In this project, students will work with lab scientists to explore the research on how to increase the robustness and generalization ability in data-driven subsurface imaging techniques. **Reference: **

[1]. Yue Wu and Youzuo Lin, "InversionNet: An Efficient and Accurate Data-driven Full Waveform Inversion", in IEEE Transactions on Computational Imaging, 2019 (accepted).

[2]. Zhongping Zhang, Yue Wu, Zheng Zhou, Youzuo Lin, “VelocityGAN: Data-Driven Full-Waveform Inversion Using Conditional Adversarial Networks,” IEEE Winter Conf. on Applications of Computer Vision (WACV), 2019.

Machine Learning is making waves in the study of molecules and materials. Physics models that provide properties of atomistic systems are applied in many fields, such as the design of new medicines and new materials. The correct physics for describing these interactions is given by quantum mechanics (QM). The equations of QM are highly accurate and transferable since they describe the underlying interactions of not just atoms but also electrons. However, modeling electrons is computationally expensive, often scaling O(N^3) or worse in the number of electrons for a given atomic system, severely limiting the application of QM methods to the dynamical simulation of atomistic systems, which can easily contain thousands to millions of atoms. Prior to ML, “force field” approximations using a restrictive functional form have been the most common approach to faster, linear scaling methods. While these can capture the the physics of specific systems, but tend to lack transferability to new systems; the models often must be refit from application to application.

Machine learning methods have provided a new avenue for predicting the properties of atomistic systems. Since these machine learning models provide a highly flexible functional form, the models can be fit to large datasets of highly diverse systems, and are capable of making accurate and general predictions that transfer well to larger and more complex systems. The best models incorporate physical constraints, for example, rotation and translation invariance.

Selected candidates will develop new methods and applications in one or more of our focus areas:- Potential energy and charge models

- Structure-property relationships

- Active Learning & diverse data selection

- Uncertainty Quantification for Neural Networks

- Physics Informed Machine Learning

- Efficient GPU kernels for atomistic machine learning

Preferred Candidates with have interest and/or experience with:

- Atomistic simulation

- Chemistry, Materials Science, or Molecular Biophysics

- Python programming

- Active Learning

- Neural Network architecture design

- Physics Informed Machine Learning

Unsupervised Machine Learning (ML) methods aim to extract sets of hidden (latent) features from uncategorized datasets. Unsupervised ML methods include classical neural networks, clustering, various autoencoders, and the contemporary blind source separation (BSS) techniques based on matrix factorization. Tensor (i.e., multidimensional array) factorization methods are the natural extension of the matrix factorization for decomposition of high-dimensional datasets, that provide meaningful links among various low-dimensional features hidden in different dimensions of the data tensor. A limitation shared by most of the factorization techniques is the difficulty to relate the extracted latent factors and subspaces to physically interpretable quantities. The nonnegative factorization overcomes this limitation as the nonnegativity leads to a collection of strictly additive features that are parts of the data and hence are amenable to simple and meaningful interpretation. We are looking for graduate students interested in applying and developing novel ML algorithms based on nonnegative tensor factorization and tensor networks. We will explore latent features buried in various data, such as, pictures, text, computer simulations of biological molecules and others, that naturally incorporate explainable hidden variables, features and topics.