SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
Machine learning for materials design:
opportunities, challenges, and methods
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
Energy Probe workshop, May 13, 2019
• Batteries
– stable and high-energy electrodes, solid state
electrolytes
• Thermal energy storage & conversion
– High zT thermoelectrics, high heat capacity liquids
• Photovoltaics
– Improved efficiency of absorber, reduced degradation
in coatings, controlling ion migration in front glass,
lifetime of organic / hybrid materials
2
Almost every technology could be improved with better
materials!
• Often, materials are
known for several decades
before their functional
applications are known
– MgB2 sitting on lab shelves
for 50 years before its
identification as a
superconductor in 2001
• Even after discovery,
optimization and
commercialization still
take decades
3
Typically, both new materials discovery and optimization
take decades
Materials data from: Eagar T., King M. Technology
Review 1995
4
Some opportunities for accelerating materials design using
machine learning techniques
Accelerated
materials
design
ML
surrogates for
expt / comp.
“Self-driving
laboratories”
Opportunities
in natural
language
processing
• Experiments are generally time-consuming and
labor-intensive
– Days to months to get measurements with large
investment of researcher time
– Not too long ago, one essentially needed to do
everything experimentally
5
ML surrogates for experiments and computation:
background
• Computations can be faster and require less
researcher time
– Today, some materials design problems can be
modeled in the computer[1]
– But, CPU-time is still a major issue
6
ML surrogates for experiments and computation:
background
[1] Jain, A., Shin, Y. & Persson, K. A. Computational predictions of energy materials using density functional
theory. Nature Reviews Materials 1, 15004 (2016).
• Machine learning can be the fastest of all and
could play a major role in supporting experiments
and computation, e.g. to identify the most
promising regions of chemical space prior to
even computation / theory
7
ML surrogates for experiments and computation:
background
8
Example application: machine learning as a surrogate for
DFT computations
1. S. Smith, J., Isayev, O. & E. Roitberg, A. ANI-1: an extensible neural network potential with DFT accuracy at force field
computational cost. Chemical Science 8, 3192–3203 (2017).
2. Aspuru-Guzik, A., & Persson, K. Materials Acceleration Platform—Accelerating Advanced Energy Materials Discovery by
Integrating High-Throughput Methods with Artificial Intelligence.
The ML model can be 5-6 orders of magnitude faster!
Potential to run ~1 million tests for the price of 1
9
Example from our group: developing and testing surrogate
models over diverse materials data problems
(paper in preparation)
10
Some opportunities for accelerating materials design using
machine learning techniques
Accelerated
materials
design
ML
surrogates for
expt / comp.
“Self-driving
laboratories”
Opportunities
in natural
language
processing
• Typically, the choice of what materials to
perform experiments on (or to compute) is
chosen by the researcher
• Advantage: takes advantage of domain expertise
of researcher (potentially decades of knowledge)
• Potential issues:
– Bias (exploring near already known systems)
– Time (takes time to think of what to study)
11
“Self-driving” laboratories: background
• In a “self-driving” laboratory,
an algorithm chooses the
next
experiment/computation
and performs it
automatically
• “Active learning” ML
• At each stage, the algorithm
balances exploration and
exploitation
12
“Self-driving” laboratories: background
Gubernatis, J. E. & Lookman, T. Machine learning in materials design and discovery: Examples from the present and
suggestions for the future. Phys. Rev. Materials 2, 120301 (2018).
13
Example application: shape-memory allows with low
transition temperature and hysteresis
Gubernatis, J. E. & Lookman, T. Machine learning in materials design and discovery: Examples from the present and
suggestions for the future. Phys. Rev. Materials 2, 120301 (2018).
Using an adaptive design strategy, one can reduce
the number of measurements needed to find all
Pareto-optimal shape memory alloys
14
Example from our group: Rocketsled for automated
computational searches
Rocketsled can help find optimal solutions using much
fewer computations overall (less CPU) and parallelized
over supercomputers (less time)
Dunn, A., Brenneck, J. & Jain, A. Rocketsled: a software library for optimizing high-throughput computational
searches. J. Phys. Mater. 2, 034002 (2019).
15
Some opportunities for accelerating materials design using
machine learning techniques
Accelerated
materials
design
ML
surrogates for
expt / comp.
“Self-driving
laboratories”
Opportunities
in natural
language
processing
• Most materials science data and knowledge only
exists in unstructured format (e.g., as text in
journal publications)
• Can we make use of knowledge in text format?
16
Natural language processing: background
17
Example: synthesis planning based on text mining
1.
1. Kim, E. et al. Data Descriptor : Machine-learned and codified synthesis parameters of oxide materials. Scientific Data 1–9 (2017).
2. Kim, E. et al. Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning. Chemistry of
Materials acs.chemmater.7b03500-acs.chemmater.7b03500 (2017).
18
Example from our group: using NN to predict “gaps” in
materials discoveries
Using word2vec on a database of 3 million materials science
abstracts, we can predict which words should co-occur with
one another.
This can be used to predict materials that should be studied
for functional applications (“gaps” in the research literature)
Tshitoyan V., Dagdelen J., Weston L., Dunn A., Rong Z., Kononova O., Persson K., Ceder G., Jain A. Unsupervised word
embeddings capture latent knowledge from materials science literature. Accepted / in press, Nature
• Data availability
– Typical materials data sets range from ~dozen
examples to a few thousand; rare to have 100,000
data points
– No standard data sets to build models on (e.g.
ImageNet)
19
Challenges
• Data Heterogeneity
– There is no single data type (e.g., image data, spectral
data, graph data)
– Different materials problems have their own data
types and often ones unknown in computer science
(e.g., periodic crystal structures)
20
Challenges
• ML model Extrapolation
– Almost all industry ML focuses on interpolation-type
problems (data on almost all representative examples
is in place)
– Materials science requires extrapolation of very
complex physics
– Standard cross-validation likely insufficient (e.g.,
cluster-based cross-validation better?)
– ML interpretability would build confidence in
extrapolation
21
Challenges
• Kristin Persson (ESDR) – materials databases, ML
• Shyam Dwaraknath (ESDR) –ML for characterization
• Juli Mueller (CRD) – active learning
• Dani Ushizima (CRD) – classifying materials image data
• Tess Smidt (CRD) – crystal structure models for ML
• Emory Chan (MSD) – automated experiments
• Colin Ophus (MSD) – TEM image labeling
• Gerbrand Ceder (MSD) – text mining / NLP of synthesis
22
Some relevant groups at LBNL

Más contenido relacionado

La actualidad más candente

Principles of Multiscale Modelling of Materials
Principles of Multiscale Modelling of Materials  Principles of Multiscale Modelling of Materials
Principles of Multiscale Modelling of Materials Altair
 
Materials Modelling: From theory to solar cells (Lecture 1)
Materials Modelling: From theory to solar cells  (Lecture 1)Materials Modelling: From theory to solar cells  (Lecture 1)
Materials Modelling: From theory to solar cells (Lecture 1)cdtpv
 
Introduction to nanophotonics
Introduction to nanophotonicsIntroduction to nanophotonics
Introduction to nanophotonicsajayrampelli
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsAnubhav Jain
 
Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectAnubhav Jain
 
Machine Learning in Chemistry: Part II
Machine Learning in Chemistry: Part IIMachine Learning in Chemistry: Part II
Machine Learning in Chemistry: Part IIJon Paul Janet
 
Preparation Of MXenes (A novel 2D Material)
Preparation Of MXenes (A novel 2D Material) Preparation Of MXenes (A novel 2D Material)
Preparation Of MXenes (A novel 2D Material) rittwikchatterjee
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryIan Foster
 
Fabrication and Characterization of 2D Titanium Carbide MXene Nanosheets
Fabrication and Characterization of 2D Titanium Carbide MXene NanosheetsFabrication and Characterization of 2D Titanium Carbide MXene Nanosheets
Fabrication and Characterization of 2D Titanium Carbide MXene NanosheetsBecker Budwan
 
Perovskite solar cells, All you need to know - Dawn John Mullassery
Perovskite solar cells, All you need to know - Dawn John MullasseryPerovskite solar cells, All you need to know - Dawn John Mullassery
Perovskite solar cells, All you need to know - Dawn John MullasseryDawn John Mullassery
 
Solid Oxide Fuel Cells Presentation
Solid Oxide Fuel Cells PresentationSolid Oxide Fuel Cells Presentation
Solid Oxide Fuel Cells PresentationFarbod Moghadam
 
Thermoelectric Materials
Thermoelectric MaterialsThermoelectric Materials
Thermoelectric MaterialsViji Vijitha
 
Sol-Gel Method
Sol-Gel MethodSol-Gel Method
Sol-Gel MethodLot Kubur
 
Perovskite Solar Cells - an Introduction
Perovskite Solar Cells - an IntroductionPerovskite Solar Cells - an Introduction
Perovskite Solar Cells - an IntroductionDawn John Mullassery
 
Materials informatics
Materials informaticsMaterials informatics
Materials informaticsskalidindi7
 
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum DataAutomated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Dataaimsnist
 

La actualidad más candente (20)

Principles of Multiscale Modelling of Materials
Principles of Multiscale Modelling of Materials  Principles of Multiscale Modelling of Materials
Principles of Multiscale Modelling of Materials
 
Materials Modelling: From theory to solar cells (Lecture 1)
Materials Modelling: From theory to solar cells  (Lecture 1)Materials Modelling: From theory to solar cells  (Lecture 1)
Materials Modelling: From theory to solar cells (Lecture 1)
 
Introduction to nanophotonics
Introduction to nanophotonicsIntroduction to nanophotonics
Introduction to nanophotonics
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methods
 
Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials Project
 
Machine Learning in Chemistry: Part II
Machine Learning in Chemistry: Part IIMachine Learning in Chemistry: Part II
Machine Learning in Chemistry: Part II
 
Preparation Of MXenes (A novel 2D Material)
Preparation Of MXenes (A novel 2D Material) Preparation Of MXenes (A novel 2D Material)
Preparation Of MXenes (A novel 2D Material)
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
Fabrication and Characterization of 2D Titanium Carbide MXene Nanosheets
Fabrication and Characterization of 2D Titanium Carbide MXene NanosheetsFabrication and Characterization of 2D Titanium Carbide MXene Nanosheets
Fabrication and Characterization of 2D Titanium Carbide MXene Nanosheets
 
Perovskite Solar Cell
Perovskite Solar CellPerovskite Solar Cell
Perovskite Solar Cell
 
Perovskite
PerovskitePerovskite
Perovskite
 
Perovskite solar cells, All you need to know - Dawn John Mullassery
Perovskite solar cells, All you need to know - Dawn John MullasseryPerovskite solar cells, All you need to know - Dawn John Mullassery
Perovskite solar cells, All you need to know - Dawn John Mullassery
 
Solid Oxide Fuel Cells Presentation
Solid Oxide Fuel Cells PresentationSolid Oxide Fuel Cells Presentation
Solid Oxide Fuel Cells Presentation
 
Thermoelectric Materials
Thermoelectric MaterialsThermoelectric Materials
Thermoelectric Materials
 
Sol-Gel Method
Sol-Gel MethodSol-Gel Method
Sol-Gel Method
 
Perovskite Solar Cells
Perovskite Solar CellsPerovskite Solar Cells
Perovskite Solar Cells
 
Perovskite Solar Cells - an Introduction
Perovskite Solar Cells - an IntroductionPerovskite Solar Cells - an Introduction
Perovskite Solar Cells - an Introduction
 
Materials informatics
Materials informaticsMaterials informatics
Materials informatics
 
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum DataAutomated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Data
 
Perovskite Solar cells
Perovskite Solar cells Perovskite Solar cells
Perovskite Solar cells
 

Similar a Machine learning for materials design: opportunities, challenges, and methods

Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningAnubhav Jain
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliersaimsnist
 
Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Anubhav Jain
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Anubhav Jain
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Anubhav Jain
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Anubhav Jain
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...Anubhav Jain
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningKAMAL CHOUDHARY
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...Anubhav Jain
 
Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Anubhav Jain
 
The interplay between data-driven and theory-driven methods for chemical scie...
The interplay between data-driven and theory-driven methods for chemical scie...The interplay between data-driven and theory-driven methods for chemical scie...
The interplay between data-driven and theory-driven methods for chemical scie...Ichigaku Takigawa
 
Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...
Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...
Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...remAYDOAN3
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectAnubhav Jain
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applicationsaimsnist
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsAnubhav Jain
 
2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML modelaimsnist
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsAnubhav Jain
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Anubhav Jain
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignAnubhav Jain
 

Similar a Machine learning for materials design: opportunities, challenges, and methods (20)

Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learning
 
When The New Science Is In The Outliers
When The New Science Is In The OutliersWhen The New Science Is In The Outliers
When The New Science Is In The Outliers
 
Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learning
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
 
Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...
 
The interplay between data-driven and theory-driven methods for chemical scie...
The interplay between data-driven and theory-driven methods for chemical scie...The interplay between data-driven and theory-driven methods for chemical scie...
The interplay between data-driven and theory-driven methods for chemical scie...
 
Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...
Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...
Advanced Intelligent Systems - 2020 - Sha - Artificial Intelligence to Power ...
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials Informatics
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 

Más de Anubhav Jain

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Anubhav Jain
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignAnubhav Jain
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAnubhav Jain
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software disseminationAnubhav Jain
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software disseminationAnubhav Jain
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Anubhav Jain
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Anubhav Jain
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Anubhav Jain
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst DesignAnubhav Jain
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Anubhav Jain
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAnubhav Jain
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …Anubhav Jain
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials ProjectAnubhav Jain
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Anubhav Jain
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Anubhav Jain
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignAnubhav Jain
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAnubhav Jain
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Anubhav Jain
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...Anubhav Jain
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Anubhav Jain
 

Más de Anubhav Jain (20)

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data Analysis
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
 

Último

Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxANSARKHAN96
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Silpa
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxSilpa
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxSilpa
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsbassianu17
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 

Último (20)

Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
Cyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptxCyanide resistant respiration pathway.pptx
Cyanide resistant respiration pathway.pptx
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 

Machine learning for materials design: opportunities, challenges, and methods

  • 1. Machine learning for materials design: opportunities, challenges, and methods Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA Energy Probe workshop, May 13, 2019
  • 2. • Batteries – stable and high-energy electrodes, solid state electrolytes • Thermal energy storage & conversion – High zT thermoelectrics, high heat capacity liquids • Photovoltaics – Improved efficiency of absorber, reduced degradation in coatings, controlling ion migration in front glass, lifetime of organic / hybrid materials 2 Almost every technology could be improved with better materials!
  • 3. • Often, materials are known for several decades before their functional applications are known – MgB2 sitting on lab shelves for 50 years before its identification as a superconductor in 2001 • Even after discovery, optimization and commercialization still take decades 3 Typically, both new materials discovery and optimization take decades Materials data from: Eagar T., King M. Technology Review 1995
  • 4. 4 Some opportunities for accelerating materials design using machine learning techniques Accelerated materials design ML surrogates for expt / comp. “Self-driving laboratories” Opportunities in natural language processing
  • 5. • Experiments are generally time-consuming and labor-intensive – Days to months to get measurements with large investment of researcher time – Not too long ago, one essentially needed to do everything experimentally 5 ML surrogates for experiments and computation: background
  • 6. • Computations can be faster and require less researcher time – Today, some materials design problems can be modeled in the computer[1] – But, CPU-time is still a major issue 6 ML surrogates for experiments and computation: background [1] Jain, A., Shin, Y. & Persson, K. A. Computational predictions of energy materials using density functional theory. Nature Reviews Materials 1, 15004 (2016).
  • 7. • Machine learning can be the fastest of all and could play a major role in supporting experiments and computation, e.g. to identify the most promising regions of chemical space prior to even computation / theory 7 ML surrogates for experiments and computation: background
  • 8. 8 Example application: machine learning as a surrogate for DFT computations 1. S. Smith, J., Isayev, O. & E. Roitberg, A. ANI-1: an extensible neural network potential with DFT accuracy at force field computational cost. Chemical Science 8, 3192–3203 (2017). 2. Aspuru-Guzik, A., & Persson, K. Materials Acceleration Platform—Accelerating Advanced Energy Materials Discovery by Integrating High-Throughput Methods with Artificial Intelligence. The ML model can be 5-6 orders of magnitude faster! Potential to run ~1 million tests for the price of 1
  • 9. 9 Example from our group: developing and testing surrogate models over diverse materials data problems (paper in preparation)
  • 10. 10 Some opportunities for accelerating materials design using machine learning techniques Accelerated materials design ML surrogates for expt / comp. “Self-driving laboratories” Opportunities in natural language processing
  • 11. • Typically, the choice of what materials to perform experiments on (or to compute) is chosen by the researcher • Advantage: takes advantage of domain expertise of researcher (potentially decades of knowledge) • Potential issues: – Bias (exploring near already known systems) – Time (takes time to think of what to study) 11 “Self-driving” laboratories: background
  • 12. • In a “self-driving” laboratory, an algorithm chooses the next experiment/computation and performs it automatically • “Active learning” ML • At each stage, the algorithm balances exploration and exploitation 12 “Self-driving” laboratories: background Gubernatis, J. E. & Lookman, T. Machine learning in materials design and discovery: Examples from the present and suggestions for the future. Phys. Rev. Materials 2, 120301 (2018).
  • 13. 13 Example application: shape-memory allows with low transition temperature and hysteresis Gubernatis, J. E. & Lookman, T. Machine learning in materials design and discovery: Examples from the present and suggestions for the future. Phys. Rev. Materials 2, 120301 (2018). Using an adaptive design strategy, one can reduce the number of measurements needed to find all Pareto-optimal shape memory alloys
  • 14. 14 Example from our group: Rocketsled for automated computational searches Rocketsled can help find optimal solutions using much fewer computations overall (less CPU) and parallelized over supercomputers (less time) Dunn, A., Brenneck, J. & Jain, A. Rocketsled: a software library for optimizing high-throughput computational searches. J. Phys. Mater. 2, 034002 (2019).
  • 15. 15 Some opportunities for accelerating materials design using machine learning techniques Accelerated materials design ML surrogates for expt / comp. “Self-driving laboratories” Opportunities in natural language processing
  • 16. • Most materials science data and knowledge only exists in unstructured format (e.g., as text in journal publications) • Can we make use of knowledge in text format? 16 Natural language processing: background
  • 17. 17 Example: synthesis planning based on text mining 1. 1. Kim, E. et al. Data Descriptor : Machine-learned and codified synthesis parameters of oxide materials. Scientific Data 1–9 (2017). 2. Kim, E. et al. Materials Synthesis Insights from Scientific Literature via Text Extraction and Machine Learning. Chemistry of Materials acs.chemmater.7b03500-acs.chemmater.7b03500 (2017).
  • 18. 18 Example from our group: using NN to predict “gaps” in materials discoveries Using word2vec on a database of 3 million materials science abstracts, we can predict which words should co-occur with one another. This can be used to predict materials that should be studied for functional applications (“gaps” in the research literature) Tshitoyan V., Dagdelen J., Weston L., Dunn A., Rong Z., Kononova O., Persson K., Ceder G., Jain A. Unsupervised word embeddings capture latent knowledge from materials science literature. Accepted / in press, Nature
  • 19. • Data availability – Typical materials data sets range from ~dozen examples to a few thousand; rare to have 100,000 data points – No standard data sets to build models on (e.g. ImageNet) 19 Challenges
  • 20. • Data Heterogeneity – There is no single data type (e.g., image data, spectral data, graph data) – Different materials problems have their own data types and often ones unknown in computer science (e.g., periodic crystal structures) 20 Challenges
  • 21. • ML model Extrapolation – Almost all industry ML focuses on interpolation-type problems (data on almost all representative examples is in place) – Materials science requires extrapolation of very complex physics – Standard cross-validation likely insufficient (e.g., cluster-based cross-validation better?) – ML interpretability would build confidence in extrapolation 21 Challenges
  • 22. • Kristin Persson (ESDR) – materials databases, ML • Shyam Dwaraknath (ESDR) –ML for characterization • Juli Mueller (CRD) – active learning • Dani Ushizima (CRD) – classifying materials image data • Tess Smidt (CRD) – crystal structure models for ML • Emory Chan (MSD) – automated experiments • Colin Ophus (MSD) – TEM image labeling • Gerbrand Ceder (MSD) – text mining / NLP of synthesis 22 Some relevant groups at LBNL