AI for Materials: From Discovery to Production
Tuesday, October 6 - Wednesday, October 7, 2020 EDT
The New York Academy of Sciences
This Symposium aims for a broad perspective on leveraging the benefits of artificial intelligence (AI) in materials simulations, materials synthesis, and translating research into high-volume industrial production — covering the application of AI throughout the entire life cycle of new materials. It will bring together materials scientists, industry experts, and AI researchers to shape future research directions, identify urgent issues in this rising field, and foster interdisciplinary collaboration opportunities.
Facebook AI Research
Georgia Institute of Technology
UC Berkeley/Lawrence Berkeley National Laboratory
Brookhaven National Laboratory/Stony Brook University
Oak Ridge National Laboratory
The New York Academy of Sciences
The New York Academy of Sciences
October 06, 2020
Opening Remarks: New York Academy of Sciences
Session 1: Physics and Causality in Machine Learning
Keynote: Symplectic Recurrent Neural Networks
We propose Symplectic Recurrent Neural Networks (SRNNs) as learning algorithms that capture the dynamics of physical systems from observed trajectories. An SRNN models the Hamiltonian function of the system by a neural network and furthermore leverages symplectic integration, multi-step training and initial state optimization to address the challenging numerical issues associated with Hamiltonian systems. We show SRNNs succeed reliably on complex and noisy Hamiltonian systems. We also show how to augment the SRNN integration scheme in order to handle stiff dynamical systems such as bouncing billiards.
Microscopy in the Age of AI: Accelerating Imaging, Improving Discovery and Providing New Pathways to See and Manipulate Matter
Microscopy enables in-depth qualitative and quantitative characterization of materials, ranging from mesoscopic to atomic and even sub-atomic scales, and enables structural, chemical and functional imaging of a vast array of materials that has propelled our understanding of materials science in the past decades. With modern machine learning methods becoming more prevalent, and with current electron and scanning probe microscopes generating large, multidimensional datasets that are difficult to analyze with classical methods, the intersection of advanced data analytics and microscopy presents a natural melding of fields.
In this talk, I will discuss our use and development of AI methods that can improve microscopy. I will discuss methodologies including Bayesian inference and deep learning to both accelerate functional imaging by orders of magnitude, and also enable improved signal detection in noisy environments, and solve challenging inverse problems. I will further highlight how the use of universal function approximators can enable new understanding of materials descriptors.
Finally, microscopy is not merely a characterization tool - both electron and scanning probe microscopy provide platforms for manipulation of matter. Harnessing this requires automatic determination of parameters to reliably control and manipulate matter under noisy imaging conditions, a task highly suited for reinforcement learning agents. I will discuss our recent algorithmic progress on this front.
Advancing Photonics with Machine Learning
Discovering unconventional optical designs via machine-learning promises to advance on-chip circuitry, imaging, sensing, energy, and quantum information technology. In this talk, photonic design approaches and emerging material platforms will be discussed showcasting machine-learning-assisted topology optimization for thermophotovoltaic metasurface designs and machine-learning-enabled quantum optical measurements.
Machine Learning for Scarce Material Classes: Understanding and Predicting Metal-Insulator Transition Compounds
Metal-insulator transition (MIT) compounds are both of high technological interest, with potential applications to low-power computation and transistors, glass coatings and of broad general scientific interest; yet, identifying and optimizing new thermally-driven MIT materials with required performance metrics is exceedingly challenging owing to the complicated interplay of complex microscopic electronic and atomic degrees-of-freedom. This interplay also makes it difficult for high-throughput, first-principles-based search schemes to find potential MIT candidates. At the same time, no experimental materials database include resistivity classifications of materials, relevant to MIT compounds, owing to complexities with accurate interpretation of transport data.
Here, we present the first database of MIT materials based on reliable, experimental data, build new features relevant to the MIT problem, and train a machine learning model on the resulting data. This work led to the identification of the interplay of two atomic scale features not normally associated with MIT materials, and we explain their importance to changes in the electronic structure within a few candidate compounds. We further demonstrate the use of our online MIT-classifier tool, which can be used by non-domain scientists to estimate the probability that their material is an MIT, Metal, or Insulator compound, utilizing crystallographic information alone as input.
Data-Driven Discovery of Novel Electrostrictive Materials
With the aggregation of materials data from experimental and simulation-based databases, the potential for the discovery of materials exhibiting exceptional properties becomes possible with the implementation of programmatic search methods. In particular, utilizing physics-based modeling/intuition and programmatic methods of querying these databases, it becomes possible to quickly short-list a set of materials that exhibit significant functional responses to external fields. These candidates can potentially improve existing technologies significantly. The databases considered provide a potential opportunity for mining and employing machine learning models for the prediction of materials properties with a high level of accuracy. This work focuses on utilizing programmatic data mining and machine learning frameworks using supervised regression methodologies (Random Forest Regression), clustering, and anomaly detection. In particular, I focus on the discovery of novel electrostrictive materials exhibiting significant strain response in the presence of external electric fields utilizing the Materials Project database. This project has yielded a list of novel electrostrictive materials, PCl3, BI3, BCl3, YBr3, and a number of other metal halides exhibiting properties which qualify them as potential high strain response electrostrictors. These materials are being evaluated via high accuracy density functional theory calculations. Also, a unique machine learning framework has been developed for predicting elastic and dielectric properties of insulating materials, which, if successful, will bypass the need to perform expensive calculations to determine these properties in the future. This work has significant implications for the development of novel electrostrictive devices and as a benchmark study for developing machine learning frameworks in materials science.
Interactive Poster Session
Session 2: Data Infrastructures for Materials Science
Keynote: Polymer Informatics: Current Status & Critical Next Steps
The Materials Genome Initiative (MGI) has heralded a sea change in the philosophy of materials design. In an increasing number of applications, the successful deployment of novel materials has benefited from the use of computational, experimental and informatics methodologies. Here, we describe the role played by computational and experimental data generation and capture, polymer fingerprinting, machine-learning based property prediction models, and algorithms for designing polymers meeting target property requirements. These efforts have culminated in the creation of an online Polymer Informatics platform (https://www.polymergenome.org) to guide ongoing and future polymer discovery and design [1-3]. Challenges that remain will be examined, and systematic steps that may be taken to extend the applicability of such informatics efforts to a wide range of technological domains will be discussed. These include strategies to deal with the data bottleneck, new methods to represent polymer morphology and processing conditions, and the applicability of emerging AI algorithms for materials design.
Identifying “The Genes” of Materials Properties
The talk starts with a brief description of the “NOMAD concept”, https://nomad-lab.eu/, in particular its “FAIR and beyond” storing and sharing of materials data. This includes a sketch of the NOMAD OASIS, a self-contained platform for data storage, FAIRification, and an artificial-intelligence toolkit. For the latter, I will also address a frequent “big-data misconception”, noting that adding more data will not necessarily improve the learning of materials properties because often, most data are simply irrelevant for the target. Specifically, in materials science and engineering we are typically looking for materials with statistically exceptional properties and performance. Therefore, I will emphasize the “critical role of interpretable, descriptive parameters”.  These descriptors characterize the actuators, facilitators, or obstructers of materials properties and functions and may be considered as analogs to genes in biology. Two methods aiming at the identification of these “genes” will be discussed: SISSO (sure independent screening and sparsifying operator)  and Subgroup Discovery . Furthermore, I will present a method for detecting domains of applicability (DA) of machine-learning models, showing that different models have different DAs with distinctive features and notably improved performance. 
Hard Fought Lessons on Open Data and Code Sharing and the Terra Infirma of Ground Truth
The use of artificial intelligence (AI) or machine learning (ML) in the physical sciences has exploded over the past 5 years - 10 years. In that time frame several truly remarkable discoveries have been made including the discovery of new phase change materials, amorphous alloys, and catalysts. The continued success of these methods relies upon the availability of open data, meta-data and scientific code conforming to the findable, accessible, interoperable and reusable (F.A.I.R.) guidelines. Here I will briefly discuss successes and failures at NIST in collaboration with NREL, SLAC and the University of Maryland to create the first multi-institution combinatorial dataset and code repository to comply with F.A.I.R. guidelines. I will also touch on other examples of successful experimental data sharing such as the NREL HTEM database and the JCAP materials experiment and analysis database. But even scientifically sound AI models built from open data sets can only be as trusted as the labels and values contained within them. The second part of my talk will focus on the tenuousness of ground truth and the need for an honest and open accounting for experimental uncertainties within our data sets. An example to be discussed will be of phase attribution from x-ray diffraction data. This will drive home the difficulties in forming unanimous expert consensus and how a consensus with variance impacts the perceived performance of ML model attributions.
Linking Literature Data Extraction with Domain Specific Materials Informatics
Orbital Graph Convolutional Neural Network for Material Property Prediction
Material representations that are compatible with machine learning models play a key role in developing models that exhibit high accuracy for property prediction. Atomic orbital interactions are one of the important factors that govern the properties of crystalline materials, from which the local chemical environments of atoms is inferred. Therefore, to develop robust machine learning models for material properties prediction, it is imperative to include features representing such chemical attributes. Here, we propose the Orbital Graph Convolutional Neural Network (OGCNN), a crystal graph convolutional neural network framework that includes atomic orbital interaction features that learns material properties in a robust way. In addition, we embedded an encoder-decoder network into the OGCNN enabling it to learn important features among basic atomic (elemental features), orbital-orbital interactions, and topological features. We examined the performance of this model on a broad range of crystalline material data to predict different properties. We benchmarked the performance of the OGCNN model with that of: 1) the crystal graph convolutional neural network (CGCNN), 2) other state-of-the-art descriptors for material representations including Many-body Tensor Representation (MBTR) and the Smooth Overlap of Atomic Positions (SOAP), and 3) other conventional regression machine learning algorithms where different crystal featurization methods have been used. We find that OGCNN significantly outperforms them. The OGCNN model with high predictive accuracy can be used to discover new materials among the immense phase and compound spaces of materials.
Materials Design and Discovery with Autonomous Systems
Autonomous research platforms are becoming more prominent in materials science, aiming to reduce the time it takes to discover, optimize, synthesize and characterize new technological materials. These platforms entail coupling of automation systems for parallel execution of computational or physical experiments with artificial intelligence (AI) based decision-making systems, and hence require a concerted, multidisciplinary effort to design, build and maintain. In this talk, I will highlight our recent progress on development of such autonomous systems for materials research and provide examples in areas of on-demand cloud-computing based discovery and predictive synthesis of inorganic compounds.
Day 1 Wrap Up/Day 2 Preview
Close of Day 1
October 07, 2020
Session 3: AI in Materials Production and Industry
Keynote Address: Materials Design in Electronics Industry: Application of Materials Informatics and Cloud Computing Environment to the Design of Organic Carrier Transport Materials
In addition to the boosting of CPU power resulting from the progress of semiconductor technology, recent expansion of the cloud computing environment is creating a huge impact on materials design based on computational chemistry by drastically increasing the number of candidate molecules that can be calculated within a reasonable timeframe. Furthermore, rapid progress in the area of materials informatics (MI) is accelerating the speed of performing prediction of material properties; now a prediction can be made within milliseconds by MI, as compared to hours or even days by conventional computational methods. This progress has enabled performing massive screening of millions of materials that might show desired properties. Results of the progress of recent AI-related technologies are further being introduced to the area of materials development in the form of various proposals to realize inverse materials design.
In this talk, results of our trials to introduce such progress to materials design in the electronics industry will be presented for the case of the design of organic carrier transport materials such as heteroacenes. Results of a quarter million screens of such materials using the cloud computing environment will be discussed, along with the results of benchmark studies of various methods of inverse materials design such as junction-tree neural network.
Importance of High Accuracy Simulations and Closed-Loop Pilot Feedback for AI Driven Materials Discovery
AI driven materials discovery has the potential to significantly shorten the development timeline of specialty chemicals and materials, which can often take >10 years from ideation to commercial use in mass production. While AI techniques are well suited for incremental improvements of existing materials or formulations, they tend to struggle with the development of truly novel new materials due to limited or non-existent data. In industries that have exceptionally complex manufacturing processes, such as the electronics industry, this is further compounded by the unpredictable influence that processes and equipment used in mass production can have on a material's performance. As a result, >99.9% of new materials typically fail mass production qualification in the electronics industry, often for reasons that can’t be well predicted or quantified.
In this talk, we will discuss our approach to overcome the “industrial use gap” in AI driven materials through both more accurate simulations using quantum computing methods, and direct closed-loop feedback from testing of new materials pilot-production.
Science-Aware Machine Learning for Hybrid-Autonomous Systems
Increasingly, data-science tools are applied to materials research, to increase learning rates and improve learning outcomes. Historically, researchers used rule-based simulations (so-called "expert systems") or learning-by-doing (e.g., active, or reinforcement, learning). In this talk, my team and I will introduce hybrid model-based/model-free ML frameworks (so-called “science-aware frameworks”) that both incorporate domain knowledge and are responsive to experimental realities. We conclude by drawing parallels between the evolution of science-aware ML frameworks in materials research and other fields.
Panel Discussion: Automating Production from Laboratory to Factory
A Mobile Robotic Chemist
Technologies such as batteries, biomaterials and heterogeneous catalysts have functions that are defined by mixtures of molecular and mesoscale components. As yet, this multi-length-scale complexity cannot be fully captured by atomistic simulations, and the design of such materials from first principles is still rare. Likewise, experimental complexity scales exponentially with the number of variables, restricting most searches to narrow areas of materials space. Robots can assist in experimental searches but their widespread adoption in materials research is challenging because of the diversity of sample types, operations, instruments and measurements required. Here we use a mobile robot to search for improved photocatalysts for hydrogen production from water. The robot operated autonomously over eight days, performing 688 experiments within a ten-variable experimental space, driven by a batched Bayesian search algorithm. This autonomous search identified photocatalyst mixtures that were six times more active than the initial formulations, selecting beneficial components and deselecting negative ones. Our strategy uses a dexterous free-roaming robot, automating the researcher rather than the instruments. This modular approach could be deployed in conventional laboratories for a range of research problems beyond photocatalysis.
Persistent Homology Advances Interpretable Machine Learning For Nanoporous Materials
Machine learning for nanoporous materials design and discovery has materialized as a promising alternative to more time consuming experiments and simulations. Typically, domain experts select specific features as the model input. Creating a universal materials representation with strong performance across multiple prediction tasks is much more complicated. Moreover, this is often at odds with understanding how the prediction from the model relates to the material structure itself. Here, we use persistent homology to construct holistic feature representations to describe the structure of materials. We show that these representations can also be augmented with other generic features such as word embeddings from natural language processing to capture chemical information. We demonstrate our approaches on multiple metal-organic framework datasets by predicting a variety of different gas adsorption targets in different conditions. Our results show considerable improvement in both accuracy and transferability across targets compared to models constructed from commonly used manually-curated features. As our persistent homology-based features are direct representations of the material structure, this approach allows us to concurrently pinpoint the location and size of the porous frameworks that correlate best to adsorption at different pressures - thereby further contributing to our atomic level understanding of structure-property relationships to guide materials design.
Outlook for Artificial Intelligence and Machine Learning at the NSLS-II
The National Synchrotron Light Source II (NSLS-II) at Brookhaven National Laboratory (BNL) is the newest lightsource in the US Department of Energy (DOE) complex delivering an unprecedented brightness to advanced beamlines, employing the latest detectors and beamline instrumentation. As a scientific user facility, the NSLS-II typically hosts several thousand onsite visitors a year producing petabytes of data. With the number of beamlines still expanding and data rates increasing, the curation and analysis of such large datasets presents a daunting challenge.
In this contribution, the current and future plans for employing artificial intelligence and machine learning (AI/ML) methods at the NSLS-II will be presented. These include areas such as fault detection and recovery, optimization of source and beamline configurations, automation of data collection, and streamlining the pipeline from measurement to insight via advanced analysis methods. With the added challenges originating from the COVID-19 pandemic and the associated transition from a predominantly onsite user model to one focused on remote-access for the foreseeable future, the development of these tools is made even more critical. The overall strategy and direction of the NSLS-II facility in relation to AI/ML is presented, highlighting particularly the efforts and challenges as they relate to the greater community of large-scale user facilities.