SlideShare una empresa de Scribd logo
1 de 55
Descargar para leer sin conexión
Automating materials science workflows with
pymatgen, FireWorks, and atomate
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
ASE/Fireworks workshop 2021
Slides (already) posted to hackingmaterials.lbl.gov
2
“Civilization advances by extending the
number of important operations which we
can perform without thinking about them.”
- Alfred North Whitehead
Outline
3
① Overview of vision and goals
② Case studies of usage
③ Current implementation
④ Future developments
⑤ Getting started
A “black-box” view of performing a calculation
4
“something”
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
Unfortunately, the inside of the “black box”
is usually tedious and “low-level”
5
lots of tedious,
low-level work…
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
Input file flags
SLURM format
how to fix ZPOTRF?
q set up the structure coordinates
q write input files, double-check all
the flags
q copy to supercomputer
q submit job to queue
q deal with supercomputer
headaches
q monitor job
q fix error jobs, resubmit to queue,
wait again
q repeat process for subsequent
calculations in workflow
q parse output files to obtain results
q copy and organize results, e.g., into
Excel
All the low-level steps can lead to errors!
Let’s take a look at two alternate universes:
It is too easy to end up in
the wrong timeline!
6
you have coffee
copy files from
previous simulation
edit 5 lines
run simulation,
analyze data
forget coffee
copy files from
previous simulation
edit 4 lines
but forget
LHFCALC=F
run simulation,
looks fine at first,
in a month you
discover it was wrong
1
2
you
Today, it is difficult to learn and apply several computational
procedures due to steep learning curve
Because of the multiple low-level steps and subtleties that all
need to be done correctly to apply computational methods,
there is often a single group “expert” for each technique
7
“Alice knows how to do charged defect calculations.”
“Bob is the one who can properly converge GW runs.”
“Olga has all the scripts for phonon calculations.”
What would be a better way?
8
“something”
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
What would be a better way?
9
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
Workflows to run
q band structure
q surface energies
ü elastic tensor
q Raman spectrum
q QH thermal expansion
q spin-orbit coupling
Ideally the method should scale to millions of calculations
10
Results!
researcher
Start with all binary
oxides, replace O->S,
run several different
properties
Workflows to run
ü band structure
ü surface energies
ü elastic tensor
q Raman spectrum
q QH thermal expansion
q spin-orbit coupling
Atomate’s vision is to make it easy, automatic, and flexible to
generate data with existing simulation packages
11
Results!
researcher
Run many different
properties of many
different materials!
Outline
12
① Overview of vision and goals
② Case studies of usage
③ Current implementation
④ Future developments
⑤ Getting started
Example: The Materials Project database
13
Jain*, Ong*, Hautier, Chen, Richards, Dacek, Cholia, Gunter, Skinner, Ceder, and
Persson, APL Mater., 2013, 1, 011002. *equal contributions
The Materials Project (http://www.materialsproject.org)
Free
>200,000 registered users
around the world
>150,000 compounds calculated
multiple properties / compound
Data includes:
• thermodynamic props.
• electronic band structure
• aqueous stability (E-pH)
• elasticity tensors
• piezoelectric tensors
• electrolyte molecules
•much more
~1 billion CPU-hours invested
14
Example: Open Catalyst Project
>1.2 million DFT
adsorbate relaxations
>250 million single
point DFT calculations
15
Example: many open data sets generated with high-
throughput computing
Outline
16
① Overview of vision and goals
② Case studies of usage
③ Current implementation
④ Future developments
⑤ Getting started
Steps for theory &
simulation
17
material
generation
simulation
procedure
calculation
&
execution
data
analysis
(crystal, surface/interface,
molecule, etc...)
simulation parameters
and sequence (workflow)
job management,
execution, error recovery
databases, plotting,
analysis, insight
Software toolkits and frameworks for theory / calculation
18
(automatic materials
science workflows*)
Custodian
(calculation error
recovery)
(materials analysis
framework)
Base packages
Derived packages
(workflow definition &
execution)
(materials data mining)
GASpy*
(automatic surface
workflows)
*Ulissi group
Other tools like ASE, luigi, etc.
*mainly
VASP, some
LAMMPS
(auto optimization on
HPC resources)
Software stack for
theory & simulation
19
material
generation
simulation
procedure
calculation
&
execution
data
analysis
Custodian
Steps for theory &
simulation
20
material
generation
simulation
procedure
calculation
&
execution
data
analysis
(crystal, surface/interface,
molecule, etc...)
simulation parameters
and sequence (workflow)
job management,
execution, error recovery
databases, plotting,
analysis, insight
Step 1: materials
generation
21
material
generation
• Pymatgen can help generate
models for:
– crystal structures
– molecules
– systems (surfaces, interfaces, etc.)
• Tools include:
– order-disorder (shown at right) and
SQS
– interstitial finding
– surface / slab generation
– structure matching and analysis
– get structures from Materials Project
• Won’t cover materials generation
in this presentation!
Example: Order-disorder
resolve partial or mixed
occupancies into a fully
ordered crystal structure
(e.g., mixed oxide-fluoride site
into separate oxygen/fluorine)
Steps for theory &
simulation
22
material
generation
simulation
procedure
calculation
&
execution
data
analysis
(crystal, surface/interface,
molecule, etc...)
simulation parameters
and sequence (workflow)
job management,
execution, error recovery
databases, plotting,
analysis, insight
Atomate contains a library of simulation procedures
23
VASP-based
• band structure
• spin-orbit coupling
• hybrid functional
calcs
• elastic tensor
• piezoelectric tensor
• Raman spectra
• NEB
• GIBBS method
• QH thermal
expansion
• AIMD
• ferroelectric
• surface adsorption
• work functions
• NMR spectra
• Bader charges
• Magnetic
orderings
• SCAN functionals
Other
• BoltzTraP
• FEFF method
• Q-Chem
In progress
• AMSET e-
transport
• HiPhive for
phonons
Mathew, K. et al Atomate: A high-level interface to generate, execute, and analyze
computational materials science workflows, Comput. Mater. Sci. 139 (2017) 140–152.
Each simulation procedure in atomate is composed of
multiple levels of detail / abstraction
24
Workflow – complete set of calculations to get a materials property
Firework – one step in the Workflow (typically one DFT calculation)
Firetask – one step in a Firework
Starting with just a crystal
structure, this workflow performs
four calculations to get an
optimized structure, optimized
charge density, and band
structure on two types of grids
(uniform and line)
Workflow parameters can be customized at
multiple levels of detail
25
1. Workflows have
various high-level
options
2. Fireworks also
have options / flags
(not shown)
3. Firetasks have
most detailed
number of options /
flags (not shown)
Example 1: “VASP input set”
controls the rules that set DFT
parameters (pseudopotentials,
cutoffs, grid densities, etc) via
pymatgen
Example II: If “stability_check” is
enabled, the later parts of the
workflow are skipped if the
structure is determined unstable to
save computer time on
uninteresting structures
You can build workflows from scratch or reuse components
to assemble workflows
Multiple workflows are built with the same
components stacked together in different ways
26
These two workflows reuse almost
all the same code between the
two!
atomate allows you to leverage the prior efforts and
knowledge of many researchers
27
K. Mathew J. Montoya S. Dwaraknath A. Faghaninia
All past and present knowledge, from everyone in the group,
everyone previously in the group, and our collaborators,
about how to run calculations
M. Aykol
S.P. Ong
B. Bocklund T. Smidt
H. Tang I.H. Chu M. Horton J. Dagdalen B. Wood
Z.K. Liu J. Neaton K. Persson A. Jain
+
Steps for theory &
simulation
28
material
generation
simulation
procedure
calculation
&
execution
data
analysis
(crystal, surface/interface,
molecule, etc...)
simulation parameters
and sequence (workflow)
job management,
execution, error recovery
databases, plotting,
analysis, insight
FireWorks allows you to write your workflow once and
execute (almost) anywhere
29
• Execute workflows locally
or at a supercomputing
center
• Queue systems supported
– PBS
– SGE
– SLURM
– IBM LoadLeveler
– NEWT (a REST-based API at
NERSC)
– Cobalt (Argonne LCF)
• Cloud based services (user-
generated)
– https://github.com/CovertLa
b/borealis
Dashboard with status of all jobs
30
Job provenance and automatic metadata storage
31
what machine
what time
what directory
what was the output
when was it queued
when did it start running
when was it completed
Detect and rerun failures
• All kinds of failures can be detected and rerun
– Soft failures (job quits with error code)
– hard failures (computing center goes down)
– human errors
32
Customize job priorities
• Within workflow, or between workflows
• Completely flexible and can be modified /
updated whenever you want
33
Track output files remotely
• Can bring up the last few lines of each of your
output files – and can be combined with queries
/ filters
34
Now seems like a
good time to bring
up the last few lines
of the OUTCAR of
all failed jobs...
FireWorks workflow software: the main ideas
• FireWorks is used to execute Workflow objects at
supercomputing centers
• It is very well-suited to high-throughput applications
and has been used to execute millions of jobs and is
well tested
• There are many features and advantages to using
FireWorks as your workflow manager, which has
made it one of the most popular scientific workflow
software in use today
35
Steps for theory &
simulation
36
material
generation
simulation
procedure
calculation
&
execution
data
analysis
(crystal, surface/interface,
molecule, etc...)
simulation parameters
and sequence (workflow)
job management,
execution, error recovery
databases, plotting,
analysis, insight
Atomate – builders
framework
37
“Builders” start with base
collections in a database and
create higher-level collections
that summarize information or
add metadata
Examples of analyses
38
phase diagrams
Pourbaix diagrams
diffusivity from MD
band structure analysis
Doing simulations programmatically can become second
nature – and very fast / automatic / scalable
39
material
generation
simulation
procedure
calculation
&
execution
data
analysis
Custodian
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
software infrastructure
Outline
40
① Vision of atomate
② History of atomate
③ Current implementation
④ Future developments
⑤ Getting started
• After using atomate for ~4-5 years, we realize there are still things
that are awkward to do. For example:
– Data flow between workflows requires manual thinking to manage
– Reusing workflow components with slight modifications often requires code
duplication
• We decided to add a library on top of FireWorks called “jobflow”
that helps make workflows more programmable
– https://github.com/materialsproject/jobflow
• We suggest:
– Try using jobflow when building on top of FireWorks (no need to use
pymatgen or any other specific codes)
– If wanting to use atomate, be on the lookout for atomate2 which will use
jobflow
– We are also working with the ABINIT team (G.M. Rignanese) to incorporate
these new ideas into ABINIT workflows
41
Atomate v2 – a “jobflow” layer to FireWorks
42
Jobflow idea #1: More explicit data dependencies
Can refer to output of a job before the
data is ever created. Makes chaining of
workflows and Fireworks much easier
Automatically infer
workflow diagram if desired
(based on data
dependencies)
43
Jobflow idea #2: Maker classes to enhance code reusability
• Most of “atomate” (and all the various software tools) are
largely programmatic, not interactive / GUI-driven
• The goal of atomate was always to work up to a visual
interface
– Stage 1: Exploring results using a web interface rather than by
database queries
– Stage 2: Submitting pre-defined workflow templates using a web
interface
– Stage 3: Designing and submitting custom workflows using a web
interface
• Currently working on stage 1
44
Atomate GUI
45
Atomate GUI prototype
Done by UCB undergrad
But really requires a dedicated, focused effort
Outline
46
① Vision of atomate
② History of atomate
③ Current implementation
④ Future developments
⑤ Getting started
• Papers: good general overview / vision but of no
practical help
• Online MP Workshop videos and tutorials: good way to
get started if you have at least some knowledge, but
usually not very deep. A “zero to one” situation
• Code documentation: Most comprehensive way to
learn, although some experience probably necessary
• Help channels: Great if you already started and run
into problems
47
What resources are available to learn?
• Resources on http://workshop.materialsproject.org
• Videos on MP channel of Youtube:
– https://www.youtube.com/channel/UC6pqY-__Nu-
mKkv0LMP8FJQ
48
Online MP workshop and videos
Online documentation
• Online documentation is the most comprehensive
writeup
– www.pymatgen.org
– https://materialsproject.github.io/fireworks/
– https://materialsproject.github.io/custodian/
– https://hackingmaterials.github.io/atomate/
• The online documentation includes installation, examples,
tutorials, and descriptions of how to use the code
• If you want to do “everything”, suggest starting with
atomate and going from there
49
Help lists
• A help forum for each code is at
https://discuss.matsci.org
50
Note: matsci.org now
includes help forums
for ~25 different
codes!
Including LAMMPS,
ASE, NOMAD, and
many more
Questions? Comments?
A big thank you to all the contributors of the various
codes!
• Pymatgen: 182 contributors
– https://github.com/materialsproject/pymatgen/graphs/contributors
• Fireworks: 46 contributors
– https://github.com/materialsproject/fireworks/graphs/contributors
• Atomate: 36 contributors
– https://github.com/hackingmaterials/atomate/graphs/contributors
Funding through the U.S. Department of Energy, Basic
Energy Sciences, Materials Science Division
51
Slides (already) posted to hackingmaterials.lbl.gov
52
Appendix slides
Network security issues
53
LAUNCHPAD
(MongoDB)
FIREWORKER
(computing resource)
LAUNCHPAD
(MongoDB)
FIREWORKER
(computing resource)
LAUNCHPAD
(MongoDB)
FIREWORKER
(computing resource)
LaunchPad and FireWorker within
the same network firewall
à Works great
LaunchPad and FireWorker
separated by firewall, BUT login
node of FireWorker is open to
MongoDB connection
à Works great if you have a MOM
node type structure
à Otherwise “offline” mode is a non-
ideal but viable option
LaunchPad and FireWorker
separated by firewall, no
communication allowed
à Doesn’t work!
2
4
6
0 250 500 750 1000
# Jobs
Jobs/second
command
mlaunch
rlaunch
1 workflow 5 workflows
0.0
0.2
0.4
0.6
0.0
0.2
0.4
0.6
1
client
8
clients
2
0
0
4
0
0
6
0
0
8
0
0
1
0
0
0
2
0
0
4
0
0
6
0
0
8
0
0
1
0
0
0
Number of tasks
Seconds
per
task
Workflow pattern
pairwise
parallel
reduce
sequence
Performance issues
• Tests indicate the FireWorks can handle a throughput of
about 6-7 jobs finishing per second
• Overhead is 0.1-1 sec per task
• Recently changes might enhance speed, but not tested
54
Job packing issues
• Computing center issues
– Almost all computing centers limit the number of
“mpirun”-style commands that can be executed
within a single job
– Typically, this sets a limit to the degree of job packing
that can be achieved
– Currently, no good solution; may need to work on
“hacking” the MPI communicator. e.g., “wraprun” is
one effort at Oak Ridge.
55

Más contenido relacionado

La actualidad más candente

Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Anubhav Jain
 
Machine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methodsMachine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methodsAnubhav Jain
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Anubhav Jain
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Anubhav Jain
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Anubhav Jain
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Anubhav Jain
 
Materials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationMaterials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationKAMAL CHOUDHARY
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
Progress in Natural Language Processing of Materials Science Text
Progress in Natural Language Processing of Materials Science TextProgress in Natural Language Processing of Materials Science Text
Progress in Natural Language Processing of Materials Science Textaimsnist
 
Machine Learning in Chemistry: Part II
Machine Learning in Chemistry: Part IIMachine Learning in Chemistry: Part II
Machine Learning in Chemistry: Part IIJon Paul Janet
 
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...KAMAL CHOUDHARY
 
Classical force fields as physics-based neural networks
Classical force fields as physics-based neural networksClassical force fields as physics-based neural networks
Classical force fields as physics-based neural networksaimsnist
 
PPT thesis defense_nikhil
PPT thesis defense_nikhilPPT thesis defense_nikhil
PPT thesis defense_nikhilNikhil Jain
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
 
The Materials Project: overview and infrastructure
The Materials Project: overview and infrastructureThe Materials Project: overview and infrastructure
The Materials Project: overview and infrastructureAnubhav Jain
 
Density Functional and Dynamical Mean-Field Theory (DFT+DMFT) method and its ...
Density Functional and Dynamical Mean-Field Theory (DFT+DMFT) method and its ...Density Functional and Dynamical Mean-Field Theory (DFT+DMFT) method and its ...
Density Functional and Dynamical Mean-Field Theory (DFT+DMFT) method and its ...ABDERRAHMANE REGGAD
 
Non-equilibrium molecular dynamics with LAMMPS
Non-equilibrium molecular dynamics with LAMMPSNon-equilibrium molecular dynamics with LAMMPS
Non-equilibrium molecular dynamics with LAMMPSAndrea Benassi
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphsdgarijo
 

La actualidad más candente (20)

Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...
 
Machine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methodsMachine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methods
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
 
Materials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationMaterials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum Computation
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
Progress in Natural Language Processing of Materials Science Text
Progress in Natural Language Processing of Materials Science TextProgress in Natural Language Processing of Materials Science Text
Progress in Natural Language Processing of Materials Science Text
 
Machine Learning in Chemistry: Part II
Machine Learning in Chemistry: Part IIMachine Learning in Chemistry: Part II
Machine Learning in Chemistry: Part II
 
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
Computational Discovery of Two-Dimensional Materials, Evaluation of Force-Fie...
 
Classical force fields as physics-based neural networks
Classical force fields as physics-based neural networksClassical force fields as physics-based neural networks
Classical force fields as physics-based neural networks
 
PPT thesis defense_nikhil
PPT thesis defense_nikhilPPT thesis defense_nikhil
PPT thesis defense_nikhil
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...
 
Approximations in DFT
Approximations in DFTApproximations in DFT
Approximations in DFT
 
Introduction to DFT Part 2
Introduction to DFT Part 2Introduction to DFT Part 2
Introduction to DFT Part 2
 
The Materials Project: overview and infrastructure
The Materials Project: overview and infrastructureThe Materials Project: overview and infrastructure
The Materials Project: overview and infrastructure
 
Density Functional and Dynamical Mean-Field Theory (DFT+DMFT) method and its ...
Density Functional and Dynamical Mean-Field Theory (DFT+DMFT) method and its ...Density Functional and Dynamical Mean-Field Theory (DFT+DMFT) method and its ...
Density Functional and Dynamical Mean-Field Theory (DFT+DMFT) method and its ...
 
Non-equilibrium molecular dynamics with LAMMPS
Non-equilibrium molecular dynamics with LAMMPSNon-equilibrium molecular dynamics with LAMMPS
Non-equilibrium molecular dynamics with LAMMPS
 
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge GraphsOBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
OBA: An Ontology-Based Framework for Creating REST APIs for Knowledge Graphs
 

Similar a Automating materials science workflows with pymatgen, FireWorks, and atomate

Atomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discoveryAtomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discoveryAnubhav Jain
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Anubhav Jain
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructureAnubhav Jain
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Anubhav Jain
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamFlink Forward
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learningjie cao
 
Wait-free data structures on embedded multi-core systems
Wait-free data structures on embedded multi-core systemsWait-free data structures on embedded multi-core systems
Wait-free data structures on embedded multi-core systemsMenlo Systems GmbH
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...t_ivanov
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsCarole Goble
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?inside-BigData.com
 
FireWorks overview
FireWorks overviewFireWorks overview
FireWorks overviewAnubhav Jain
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Christian Peel
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...Flink Forward
 

Similar a Automating materials science workflows with pymatgen, FireWorks, and atomate (20)

Atomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discoveryAtomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discovery
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
 
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
MAVRL Workshop 2014 - Python Materials Genomics (pymatgen)
 
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache BeamMalo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
Malo Denielou - No shard left behind: Dynamic work rebalancing in Apache Beam
 
Challenges on Distributed Machine Learning
Challenges on Distributed Machine LearningChallenges on Distributed Machine Learning
Challenges on Distributed Machine Learning
 
Wait-free data structures on embedded multi-core systems
Wait-free data structures on embedded multi-core systemsWait-free data structures on embedded multi-core systems
Wait-free data structures on embedded multi-core systems
 
2017 nov reflow sbtb
2017 nov reflow sbtb2017 nov reflow sbtb
2017 nov reflow sbtb
 
Intro_2.ppt
Intro_2.pptIntro_2.ppt
Intro_2.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
 
04 open source_tools
04 open source_tools04 open source_tools
04 open source_tools
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
FireWorks overview
FireWorks overviewFireWorks overview
FireWorks overview
 
Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015Ehsan parallel accelerator-dec2015
Ehsan parallel accelerator-dec2015
 
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...Flink Forward SF 2017: Malo Deniélou -  No shard left behind: Dynamic work re...
Flink Forward SF 2017: Malo Deniélou - No shard left behind: Dynamic work re...
 

Más de Anubhav Jain

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Anubhav Jain
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignAnubhav Jain
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAnubhav Jain
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software disseminationAnubhav Jain
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software disseminationAnubhav Jain
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Anubhav Jain
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst DesignAnubhav Jain
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Anubhav Jain
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAnubhav Jain
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …Anubhav Jain
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials ProjectAnubhav Jain
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Anubhav Jain
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Anubhav Jain
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectAnubhav Jain
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...Anubhav Jain
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignAnubhav Jain
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignAnubhav Jain
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAnubhav Jain
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...Anubhav Jain
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Anubhav Jain
 

Más de Anubhav Jain (20)

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data Analysis
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
 

Último

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 

Último (20)

Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 

Automating materials science workflows with pymatgen, FireWorks, and atomate

  • 1. Automating materials science workflows with pymatgen, FireWorks, and atomate Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA ASE/Fireworks workshop 2021 Slides (already) posted to hackingmaterials.lbl.gov
  • 2. 2 “Civilization advances by extending the number of important operations which we can perform without thinking about them.” - Alfred North Whitehead
  • 3. Outline 3 ① Overview of vision and goals ② Case studies of usage ③ Current implementation ④ Future developments ⑤ Getting started
  • 4. A “black-box” view of performing a calculation 4 “something” Results! researcher What is the GGA-PBE elastic tensor of GaAs?
  • 5. Unfortunately, the inside of the “black box” is usually tedious and “low-level” 5 lots of tedious, low-level work… Results! researcher What is the GGA-PBE elastic tensor of GaAs? Input file flags SLURM format how to fix ZPOTRF? q set up the structure coordinates q write input files, double-check all the flags q copy to supercomputer q submit job to queue q deal with supercomputer headaches q monitor job q fix error jobs, resubmit to queue, wait again q repeat process for subsequent calculations in workflow q parse output files to obtain results q copy and organize results, e.g., into Excel
  • 6. All the low-level steps can lead to errors! Let’s take a look at two alternate universes: It is too easy to end up in the wrong timeline! 6 you have coffee copy files from previous simulation edit 5 lines run simulation, analyze data forget coffee copy files from previous simulation edit 4 lines but forget LHFCALC=F run simulation, looks fine at first, in a month you discover it was wrong 1 2 you
  • 7. Today, it is difficult to learn and apply several computational procedures due to steep learning curve Because of the multiple low-level steps and subtleties that all need to be done correctly to apply computational methods, there is often a single group “expert” for each technique 7 “Alice knows how to do charged defect calculations.” “Bob is the one who can properly converge GW runs.” “Olga has all the scripts for phonon calculations.”
  • 8. What would be a better way? 8 “something” Results! researcher What is the GGA-PBE elastic tensor of GaAs?
  • 9. What would be a better way? 9 Results! researcher What is the GGA-PBE elastic tensor of GaAs? Workflows to run q band structure q surface energies ü elastic tensor q Raman spectrum q QH thermal expansion q spin-orbit coupling
  • 10. Ideally the method should scale to millions of calculations 10 Results! researcher Start with all binary oxides, replace O->S, run several different properties Workflows to run ü band structure ü surface energies ü elastic tensor q Raman spectrum q QH thermal expansion q spin-orbit coupling
  • 11. Atomate’s vision is to make it easy, automatic, and flexible to generate data with existing simulation packages 11 Results! researcher Run many different properties of many different materials!
  • 12. Outline 12 ① Overview of vision and goals ② Case studies of usage ③ Current implementation ④ Future developments ⑤ Getting started
  • 13. Example: The Materials Project database 13 Jain*, Ong*, Hautier, Chen, Richards, Dacek, Cholia, Gunter, Skinner, Ceder, and Persson, APL Mater., 2013, 1, 011002. *equal contributions The Materials Project (http://www.materialsproject.org) Free >200,000 registered users around the world >150,000 compounds calculated multiple properties / compound Data includes: • thermodynamic props. • electronic band structure • aqueous stability (E-pH) • elasticity tensors • piezoelectric tensors • electrolyte molecules •much more ~1 billion CPU-hours invested
  • 14. 14 Example: Open Catalyst Project >1.2 million DFT adsorbate relaxations >250 million single point DFT calculations
  • 15. 15 Example: many open data sets generated with high- throughput computing
  • 16. Outline 16 ① Overview of vision and goals ② Case studies of usage ③ Current implementation ④ Future developments ⑤ Getting started
  • 17. Steps for theory & simulation 17 material generation simulation procedure calculation & execution data analysis (crystal, surface/interface, molecule, etc...) simulation parameters and sequence (workflow) job management, execution, error recovery databases, plotting, analysis, insight
  • 18. Software toolkits and frameworks for theory / calculation 18 (automatic materials science workflows*) Custodian (calculation error recovery) (materials analysis framework) Base packages Derived packages (workflow definition & execution) (materials data mining) GASpy* (automatic surface workflows) *Ulissi group Other tools like ASE, luigi, etc. *mainly VASP, some LAMMPS (auto optimization on HPC resources)
  • 19. Software stack for theory & simulation 19 material generation simulation procedure calculation & execution data analysis Custodian
  • 20. Steps for theory & simulation 20 material generation simulation procedure calculation & execution data analysis (crystal, surface/interface, molecule, etc...) simulation parameters and sequence (workflow) job management, execution, error recovery databases, plotting, analysis, insight
  • 21. Step 1: materials generation 21 material generation • Pymatgen can help generate models for: – crystal structures – molecules – systems (surfaces, interfaces, etc.) • Tools include: – order-disorder (shown at right) and SQS – interstitial finding – surface / slab generation – structure matching and analysis – get structures from Materials Project • Won’t cover materials generation in this presentation! Example: Order-disorder resolve partial or mixed occupancies into a fully ordered crystal structure (e.g., mixed oxide-fluoride site into separate oxygen/fluorine)
  • 22. Steps for theory & simulation 22 material generation simulation procedure calculation & execution data analysis (crystal, surface/interface, molecule, etc...) simulation parameters and sequence (workflow) job management, execution, error recovery databases, plotting, analysis, insight
  • 23. Atomate contains a library of simulation procedures 23 VASP-based • band structure • spin-orbit coupling • hybrid functional calcs • elastic tensor • piezoelectric tensor • Raman spectra • NEB • GIBBS method • QH thermal expansion • AIMD • ferroelectric • surface adsorption • work functions • NMR spectra • Bader charges • Magnetic orderings • SCAN functionals Other • BoltzTraP • FEFF method • Q-Chem In progress • AMSET e- transport • HiPhive for phonons Mathew, K. et al Atomate: A high-level interface to generate, execute, and analyze computational materials science workflows, Comput. Mater. Sci. 139 (2017) 140–152.
  • 24. Each simulation procedure in atomate is composed of multiple levels of detail / abstraction 24 Workflow – complete set of calculations to get a materials property Firework – one step in the Workflow (typically one DFT calculation) Firetask – one step in a Firework Starting with just a crystal structure, this workflow performs four calculations to get an optimized structure, optimized charge density, and band structure on two types of grids (uniform and line)
  • 25. Workflow parameters can be customized at multiple levels of detail 25 1. Workflows have various high-level options 2. Fireworks also have options / flags (not shown) 3. Firetasks have most detailed number of options / flags (not shown) Example 1: “VASP input set” controls the rules that set DFT parameters (pseudopotentials, cutoffs, grid densities, etc) via pymatgen Example II: If “stability_check” is enabled, the later parts of the workflow are skipped if the structure is determined unstable to save computer time on uninteresting structures
  • 26. You can build workflows from scratch or reuse components to assemble workflows Multiple workflows are built with the same components stacked together in different ways 26 These two workflows reuse almost all the same code between the two!
  • 27. atomate allows you to leverage the prior efforts and knowledge of many researchers 27 K. Mathew J. Montoya S. Dwaraknath A. Faghaninia All past and present knowledge, from everyone in the group, everyone previously in the group, and our collaborators, about how to run calculations M. Aykol S.P. Ong B. Bocklund T. Smidt H. Tang I.H. Chu M. Horton J. Dagdalen B. Wood Z.K. Liu J. Neaton K. Persson A. Jain +
  • 28. Steps for theory & simulation 28 material generation simulation procedure calculation & execution data analysis (crystal, surface/interface, molecule, etc...) simulation parameters and sequence (workflow) job management, execution, error recovery databases, plotting, analysis, insight
  • 29. FireWorks allows you to write your workflow once and execute (almost) anywhere 29 • Execute workflows locally or at a supercomputing center • Queue systems supported – PBS – SGE – SLURM – IBM LoadLeveler – NEWT (a REST-based API at NERSC) – Cobalt (Argonne LCF) • Cloud based services (user- generated) – https://github.com/CovertLa b/borealis
  • 30. Dashboard with status of all jobs 30
  • 31. Job provenance and automatic metadata storage 31 what machine what time what directory what was the output when was it queued when did it start running when was it completed
  • 32. Detect and rerun failures • All kinds of failures can be detected and rerun – Soft failures (job quits with error code) – hard failures (computing center goes down) – human errors 32
  • 33. Customize job priorities • Within workflow, or between workflows • Completely flexible and can be modified / updated whenever you want 33
  • 34. Track output files remotely • Can bring up the last few lines of each of your output files – and can be combined with queries / filters 34 Now seems like a good time to bring up the last few lines of the OUTCAR of all failed jobs...
  • 35. FireWorks workflow software: the main ideas • FireWorks is used to execute Workflow objects at supercomputing centers • It is very well-suited to high-throughput applications and has been used to execute millions of jobs and is well tested • There are many features and advantages to using FireWorks as your workflow manager, which has made it one of the most popular scientific workflow software in use today 35
  • 36. Steps for theory & simulation 36 material generation simulation procedure calculation & execution data analysis (crystal, surface/interface, molecule, etc...) simulation parameters and sequence (workflow) job management, execution, error recovery databases, plotting, analysis, insight
  • 37. Atomate – builders framework 37 “Builders” start with base collections in a database and create higher-level collections that summarize information or add metadata
  • 38. Examples of analyses 38 phase diagrams Pourbaix diagrams diffusivity from MD band structure analysis
  • 39. Doing simulations programmatically can become second nature – and very fast / automatic / scalable 39 material generation simulation procedure calculation & execution data analysis Custodian Results! researcher What is the GGA-PBE elastic tensor of GaAs? software infrastructure
  • 40. Outline 40 ① Vision of atomate ② History of atomate ③ Current implementation ④ Future developments ⑤ Getting started
  • 41. • After using atomate for ~4-5 years, we realize there are still things that are awkward to do. For example: – Data flow between workflows requires manual thinking to manage – Reusing workflow components with slight modifications often requires code duplication • We decided to add a library on top of FireWorks called “jobflow” that helps make workflows more programmable – https://github.com/materialsproject/jobflow • We suggest: – Try using jobflow when building on top of FireWorks (no need to use pymatgen or any other specific codes) – If wanting to use atomate, be on the lookout for atomate2 which will use jobflow – We are also working with the ABINIT team (G.M. Rignanese) to incorporate these new ideas into ABINIT workflows 41 Atomate v2 – a “jobflow” layer to FireWorks
  • 42. 42 Jobflow idea #1: More explicit data dependencies Can refer to output of a job before the data is ever created. Makes chaining of workflows and Fireworks much easier Automatically infer workflow diagram if desired (based on data dependencies)
  • 43. 43 Jobflow idea #2: Maker classes to enhance code reusability
  • 44. • Most of “atomate” (and all the various software tools) are largely programmatic, not interactive / GUI-driven • The goal of atomate was always to work up to a visual interface – Stage 1: Exploring results using a web interface rather than by database queries – Stage 2: Submitting pre-defined workflow templates using a web interface – Stage 3: Designing and submitting custom workflows using a web interface • Currently working on stage 1 44 Atomate GUI
  • 45. 45 Atomate GUI prototype Done by UCB undergrad But really requires a dedicated, focused effort
  • 46. Outline 46 ① Vision of atomate ② History of atomate ③ Current implementation ④ Future developments ⑤ Getting started
  • 47. • Papers: good general overview / vision but of no practical help • Online MP Workshop videos and tutorials: good way to get started if you have at least some knowledge, but usually not very deep. A “zero to one” situation • Code documentation: Most comprehensive way to learn, although some experience probably necessary • Help channels: Great if you already started and run into problems 47 What resources are available to learn?
  • 48. • Resources on http://workshop.materialsproject.org • Videos on MP channel of Youtube: – https://www.youtube.com/channel/UC6pqY-__Nu- mKkv0LMP8FJQ 48 Online MP workshop and videos
  • 49. Online documentation • Online documentation is the most comprehensive writeup – www.pymatgen.org – https://materialsproject.github.io/fireworks/ – https://materialsproject.github.io/custodian/ – https://hackingmaterials.github.io/atomate/ • The online documentation includes installation, examples, tutorials, and descriptions of how to use the code • If you want to do “everything”, suggest starting with atomate and going from there 49
  • 50. Help lists • A help forum for each code is at https://discuss.matsci.org 50 Note: matsci.org now includes help forums for ~25 different codes! Including LAMMPS, ASE, NOMAD, and many more
  • 51. Questions? Comments? A big thank you to all the contributors of the various codes! • Pymatgen: 182 contributors – https://github.com/materialsproject/pymatgen/graphs/contributors • Fireworks: 46 contributors – https://github.com/materialsproject/fireworks/graphs/contributors • Atomate: 36 contributors – https://github.com/hackingmaterials/atomate/graphs/contributors Funding through the U.S. Department of Energy, Basic Energy Sciences, Materials Science Division 51 Slides (already) posted to hackingmaterials.lbl.gov
  • 53. Network security issues 53 LAUNCHPAD (MongoDB) FIREWORKER (computing resource) LAUNCHPAD (MongoDB) FIREWORKER (computing resource) LAUNCHPAD (MongoDB) FIREWORKER (computing resource) LaunchPad and FireWorker within the same network firewall à Works great LaunchPad and FireWorker separated by firewall, BUT login node of FireWorker is open to MongoDB connection à Works great if you have a MOM node type structure à Otherwise “offline” mode is a non- ideal but viable option LaunchPad and FireWorker separated by firewall, no communication allowed à Doesn’t work!
  • 54. 2 4 6 0 250 500 750 1000 # Jobs Jobs/second command mlaunch rlaunch 1 workflow 5 workflows 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 1 client 8 clients 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 2 0 0 4 0 0 6 0 0 8 0 0 1 0 0 0 Number of tasks Seconds per task Workflow pattern pairwise parallel reduce sequence Performance issues • Tests indicate the FireWorks can handle a throughput of about 6-7 jobs finishing per second • Overhead is 0.1-1 sec per task • Recently changes might enhance speed, but not tested 54
  • 55. Job packing issues • Computing center issues – Almost all computing centers limit the number of “mpirun”-style commands that can be executed within a single job – Typically, this sets a limit to the degree of job packing that can be achieved – Currently, no good solution; may need to work on “hacking” the MPI communicator. e.g., “wraprun” is one effort at Oak Ridge. 55