Machine learning for materials design: opportunities, challenges, and methods
1. Machine learning for materials design:
opportunities, challenges, and methods
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
Energy Probe workshop, May 13, 2019
2. • Batteries
– stable and high-energy electrodes, solid state
electrolytes
• Thermal energy storage & conversion
– High zT thermoelectrics, high heat capacity liquids
• Photovoltaics
– Improved efficiency of absorber, reduced degradation
in coatings, controlling ion migration in front glass,
lifetime of organic / hybrid materials
2
Almost every technology could be improved with better
materials!
3. • Often, materials are
known for several decades
before their functional
applications are known
– MgB2 sitting on lab shelves
for 50 years before its
identification as a
superconductor in 2001
• Even after discovery,
optimization and
commercialization still
take decades
3
Typically, both new materials discovery and optimization
take decades
Materials data from: Eagar T., King M. Technology
Review 1995
4. 4
Some opportunities for accelerating materials design using
machine learning techniques
Accelerated
materials
design
ML
surrogates for
expt / comp.
“Self-driving
laboratories”
Opportunities
in natural
language
processing
5. • Experiments are generally time-consuming and
labor-intensive
– Days to months to get measurements with large
investment of researcher time
– Not too long ago, one essentially needed to do
everything experimentally
5
ML surrogates for experiments and computation:
background
6. • Computations can be faster and require less
researcher time
– Today, some materials design problems can be
modeled in the computer[1]
– But, CPU-time is still a major issue
6
ML surrogates for experiments and computation:
background
[1] Jain, A., Shin, Y. & Persson, K. A. Computational predictions of energy materials using density functional
theory. Nature Reviews Materials 1, 15004 (2016).
7. • Machine learning can be the fastest of all and
could play a major role in supporting experiments
and computation, e.g. to identify the most
promising regions of chemical space prior to
even computation / theory
7
ML surrogates for experiments and computation:
background
8. 8
Example application: machine learning as a surrogate for
DFT computations
1. S. Smith, J., Isayev, O. & E. Roitberg, A. ANI-1: an extensible neural network potential with DFT accuracy at force field
computational cost. Chemical Science 8, 3192–3203 (2017).
2. Aspuru-Guzik, A., & Persson, K. Materials Acceleration Platform—Accelerating Advanced Energy Materials Discovery by
Integrating High-Throughput Methods with Artificial Intelligence.
The ML model can be 5-6 orders of magnitude faster!
Potential to run ~1 million tests for the price of 1
9. 9
Example from our group: developing and testing surrogate
models over diverse materials data problems
(paper in preparation)
10. 10
Some opportunities for accelerating materials design using
machine learning techniques
Accelerated
materials
design
ML
surrogates for
expt / comp.
“Self-driving
laboratories”
Opportunities
in natural
language
processing
11. • Typically, the choice of what materials to
perform experiments on (or to compute) is
chosen by the researcher
• Advantage: takes advantage of domain expertise
of researcher (potentially decades of knowledge)
• Potential issues:
– Bias (exploring near already known systems)
– Time (takes time to think of what to study)
11
“Self-driving” laboratories: background
12. • In a “self-driving” laboratory,
an algorithm chooses the
next
experiment/computation
and performs it
automatically
• “Active learning” ML
• At each stage, the algorithm
balances exploration and
exploitation
12
“Self-driving” laboratories: background
Gubernatis, J. E. & Lookman, T. Machine learning in materials design and discovery: Examples from the present and
suggestions for the future. Phys. Rev. Materials 2, 120301 (2018).
13. 13
Example application: shape-memory allows with low
transition temperature and hysteresis
Gubernatis, J. E. & Lookman, T. Machine learning in materials design and discovery: Examples from the present and
suggestions for the future. Phys. Rev. Materials 2, 120301 (2018).
Using an adaptive design strategy, one can reduce
the number of measurements needed to find all
Pareto-optimal shape memory alloys
14. 14
Example from our group: Rocketsled for automated
computational searches
Rocketsled can help find optimal solutions using much
fewer computations overall (less CPU) and parallelized
over supercomputers (less time)
Dunn, A., Brenneck, J. & Jain, A. Rocketsled: a software library for optimizing high-throughput computational
searches. J. Phys. Mater. 2, 034002 (2019).
15. 15
Some opportunities for accelerating materials design using
machine learning techniques
Accelerated
materials
design
ML
surrogates for
expt / comp.
“Self-driving
laboratories”
Opportunities
in natural
language
processing
16. • Most materials science data and knowledge only
exists in unstructured format (e.g., as text in
journal publications)
• Can we make use of knowledge in text format?
16
Natural language processing: background
17. 17
Example: synthesis planning based on text mining
1.
1. Kim, E. et al. Data Descriptor : Machine-learned and codified synthesis parameters of oxide materials. Scientific Data 1–9 (2017).
2. Kim, E. et al. Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning. Chemistry of
Materials acs.chemmater.7b03500-acs.chemmater.7b03500 (2017).
18. 18
Example from our group: using NN to predict “gaps” in
materials discoveries
Using word2vec on a database of 3 million materials science
abstracts, we can predict which words should co-occur with
one another.
This can be used to predict materials that should be studied
for functional applications (“gaps” in the research literature)
Tshitoyan V., Dagdelen J., Weston L., Dunn A., Rong Z., Kononova O., Persson K., Ceder G., Jain A. Unsupervised word
embeddings capture latent knowledge from materials science literature. Accepted / in press, Nature
19. • Data availability
– Typical materials data sets range from ~dozen
examples to a few thousand; rare to have 100,000
data points
– No standard data sets to build models on (e.g.
ImageNet)
19
Challenges
20. • Data Heterogeneity
– There is no single data type (e.g., image data, spectral
data, graph data)
– Different materials problems have their own data
types and often ones unknown in computer science
(e.g., periodic crystal structures)
20
Challenges
21. • ML model Extrapolation
– Almost all industry ML focuses on interpolation-type
problems (data on almost all representative examples
is in place)
– Materials science requires extrapolation of very
complex physics
– Standard cross-validation likely insufficient (e.g.,
cluster-based cross-validation better?)
– ML interpretability would build confidence in
extrapolation
21
Challenges
22. • Kristin Persson (ESDR) – materials databases, ML
• Shyam Dwaraknath (ESDR) –ML for characterization
• Juli Mueller (CRD) – active learning
• Dani Ushizima (CRD) – classifying materials image data
• Tess Smidt (CRD) – crystal structure models for ML
• Emory Chan (MSD) – automated experiments
• Colin Ophus (MSD) – TEM image labeling
• Gerbrand Ceder (MSD) – text mining / NLP of synthesis
22
Some relevant groups at LBNL