SlideShare una empresa de Scribd logo
1 de 53
Descargar para leer sin conexión
Anubhav Jain
FireWorks workflow software:
An introduction
LLNL meeting | November 2016
Energy & Environmental Technologies
Berkeley Lab
1Slides	available	at	www.slideshare.net/anubhavster
¡ Built w/Python+MongoDB. Open-source, pip-installable:
§ http://pythonhosted.org/FireWorks/
§ Very easy to install, most people can run first tutorial within 30 minutes of
starting
¡ At least 100 million CPU-hours used; everyday production use by
3 large DOE projects (Materials Project, JCESR, JCAP) as well as
many materials science research groups
¡ Also used for graphics processing, machine learning, multiscale
modeling, and document processing (but not by us)
¡ #1 Google hit for “Python workflow software”
§ still behind Pegasus, Kepler, Taverna, Trident,
for “scientific workflow software”
2
http://xkcd.com/927/
3
¡ Partly, we had trouble learning and using other people’s
workflow software
§ Today, I think the situation is much better
§ For example, Pegasus in 2011 gave no instructions to a
general user on how to install/use/deploy it apart from a
super-complicated user manual
§ Today, Pegasus takes more care to show you how to use it on
their web page
§ Other tools like Swift (Argonne) are also providing tutorials
¡ Partly, the other workflow software wasn’t what we were
looking for
§ Other software emphasized completing a fixed workload
quickly rather than fluidly adding, subtracting, reprioritizing,
searching, etc. workflows over long time periods
4
http://www3.canisius.edu/~grandem/animalshabitats/animals.jpg
5
¡ Millions of small jobs, each at least a minute long
¡ Small amount of inter-job parallelism (“bundling”) (e.g. <1000
jobs); any amount of intra-job parallelism
¡ Failures are common; need persistent status
§ like UPS packages, database is a necessity
¡ Very dynamic workflows
§ i.e. workflows that can modify themselves intelligently and act like
researchers that submit extra calculations as needed
¡ Collisions/duplicate detection
§ people submitting the same workflow, or perhaps have some steps in
common
¡ Runs on a laptop or a supercomputer
¡ Not “extreme” or record-breaking applications
¡ Can install/learn/use it by yourself without help/support, and
by a normal scientist rather than a “workflow expert”.
¡ Python-centric
6
¡ Features
¡ Potential issues
¡ Conclusion
¡ Appendix slides
§ Implementation
§ Getting started
§ Advanced usage
7
LAUNCHPAD
FW 1
FW 2
FW 3 FW 4
ROCKET LAUNCHER /
QUEUE LAUNCHER
Directory 1 Directory 2
8
?
You can scale without human effort
Easily customize what gets run where
9
¡ PBS
¡ SGE
¡ SLURM
¡ IBM LoadLeveler
¡ NEWT (a REST-based API at NERSC)
¡ Cobalt (Argonne LCF, initial runs of ~2
million CPU-hours successful)
10
11
No job left behind!
12
what machine
what time
what directory
what was the output
when was it queued
when did it start running
when was it completed
LAUNCH
¡ both job details (scripts+parameters) and
launch details are automatically stored
13
¡ Soft failures, hard failures, human errors
§ “lpad rerun –s FIZZLED”
§ “lpad detect_unreserved –rerun” OR
§ “lpad detect_lostruns –rerun” OR
14
Xiaohui can be replaced by
digital Xiaohui,
programmed into FireWorks
15
16
Generate relaxation
VASP input files from
initial structure
Run VASP calculation
with Custodian
Insert results into
database
Set up AIMD simulation
using final relaxed
structure
Generate AIMD VASP
input files from relaxed
structure
Run VASP calculation with
Custodian with Walltime
Handler
Insert AIMD
simulation results
into database
Convergence
reached?
No
Done
Transfer AIMD calculation
output to specified final
location
Yes
Each box represents a FireTask, and
each series of boxes with the same
color represents a single Firework.
Green: Initial structure relaxation run
Blue: AIMD simulation
Red: Insert AIMD run into db.
Generate AIMD VASP
input files from relaxed
structure
Run VASP calculation with
Custodian with Walltime
Handler
Insert AIMD
simulation results
into database
Convergence
reached?
No
Done
Transfer AIMD calculation
output to specified final
location
Yes
Dynamically add multiple
parallel AIMD Fireworks.
E.g., different incar configs,
temperatures, etc.
Dynamically add
continuation AIMD
Firework that starts
from previous run.
Dynamically add
continuation AIMD
Firework that starts
from previous run.
17
¡ Submitting
millions of jobs
§ Easy to lose track
of what was done
before
¡ Multiple users
submitting jobs
¡ Sub-workflow
duplication
A A
Duplicate Job
detection
(if two workflows contain an
identical step,
ensure that the step is only
run once and relevant
information is still passed)
18
¡ Within workflow, or between workflows
¡ Completely flexible and can be modified
whenever you want
19
Now seems like a
good time to bring
up the last few lines
of the OUTCAR of all
failed jobs...
20
¡ Keep queue full with jobs
¡ Pack jobs automatically (to a point)
21
22
¡ Keep queue full with jobs
¡ Pack jobs automatically (to a point)
¡ Lots of care put into
documentation and
tutorials
§ Many strangers and
outsiders have
independently used it w/o
support from us
¡ Built in tasks
§ run BASH/Python scripts
§ file transfer (incl. remote)
§ write/copy/delete files
23
¡ No direct funding for FWS – certainly not a multimillion dollar project
¡ Mitigating longevity concerns:
§ FWS is open-source so the existing code will always be there
§ FWS never required explicit funding for development / enhancment
§ FWS has a distributed user and developer community, shielding it from a single point of
failure
§ Several multimillion dollar DOE projects and many research groups including my own
depend critically on FireWorks. Funding for basic improvements/bugfixes is certainly
going to be there if really needed.
¡ Mitigating support concerns:
§ No funding does mean limited support for external users
§ Support mechanisms favor solving problems broadly (e.g., better code, better
documentation) versus working one-on-one with potential users to solve their problems
and develop single-serving “workarounds”
§ BUT there is a free support list, and if you look, you will see that even specific individual
concerns are handled quickly and efficiently:
▪ https://groups.google.com/forum/#!forum/fireworkflows
§ In fact, I have yet to see proof of better user support from well-funded projects:
▪ Compare against: http://mailman.isi.edu/pipermail/pegasus-users/
▪ Compare against: https://lists.apache.org/list.html?users@taverna.apache.org
▪ Compare against: http://swift-lang.org/support/index.php (no results in any search?)
24
¡ Features
¡ Potential issues
¡ Conclusion
¡ Appendix slides
§ Implementation
§ Getting started
§ Advanced usage
25
26
LAUNCHPAD
(MongoDB)
FIREWORKER
(computing resource)
LAUNCHPAD
(MongoDB)
FIREWORKER
(computing resource)
LAUNCHPAD
(MongoDB)
FIREWORKER
(computing resource)
LaunchPad and FireWorker within
the same network firewall
à Works great
LaunchPad and FireWorker
separated by firewall, BUT login
node of FireWorker is open to
MongoDB connection
à Works great if you have a MOM
node type structure
à Otherwise “offline” mode is a non-
ideal but viable option
LaunchPad and FireWorker
separated by firewall, no
communication allowed
à Doesn’t work!
2
4
6
0 250 500 750 1000
# Jobs
Jobs/second
command
mlaunch
rlaunch
1 workflow 5 workflows
0.0
0.2
0.4
0.6
0.0
0.2
0.4
0.6
1client8clients
200
400
600
800
1000
200
400
600
800
1000
Number of tasks
Secondspertask
Workflow pattern
pairwise
parallel
reduce
sequence
¡ Tests indicate the FireWorks can handle a throughput of
about 6-7 jobs finishing per second
¡ Overhead is 0.1-1 sec per task
¡ Recently changes might enhance speed, but not tested
27
¡ Computing center issues
§ Almost all computing centers limit the number
of “mpirun”-style commands that can be
executed within a single job
§ Typically, this sets a limit to the degree of job
packing that can be achieved
§ Currently, no good solution; may need to work
on “hacking” the MPI communicator. e.g.,
“wraprun” is one effort at Oak Ridge.
28
¡ Features
¡ Potential issues
¡ Conclusion
¡ Appendix slides
§ Implementation
§ Getting started
§ Advanced usage
29
¡ If you are curious, just try spending 1 hour with
FireWorks
§ http://pythonhosted.org/FireWorks
§ If you’re not intrigued after an hour, try something else
¡ If you need help, contact the support list:
§ https://groups.google.com/forum/#!forum/fireworkflows
¡ If you want to read up on FireWorks, there is a paper
– but this is no substitute for trying it
§ “FireWorks: a dynamic workflow system designed for high-
throughput applications”. Concurr. Comput. Pract. Exp. 22,
5037–5059 (2015).
§ Please cite this if you use FireWorks
30
¡ Features
¡ Potential issues
¡ Conclusion
¡ Appendix slides
§ Implementation
§ Getting started
§ Advanced usage
31
FW 1 Spec
FireTask 1
FireTask 2
FW 2 Spec
FireTask 1
FW 3 Spec
FireTask 1
FireTask 2
FireTask 3
FWAction
32
from fireworks import Firework, Workflow, LaunchPad, ScriptTask
from fireworks.core.rocket_launcher import rapidfire
# set up the LaunchPad and reset it (first time only)
launchpad = LaunchPad()
launchpad.reset('', require_password=False)
# define the individual FireWorks and Workflow
fw1 = Firework(ScriptTask.from_str('echo "To be, or not to be,"'))
fw2 = Firework(ScriptTask.from_str('echo "that is the question:"'))
wf = Workflow([fw1, fw2], {fw1:fw2}) # set of FWs and dependencies
# store workflow in LaunchPad
launchpad.add_wf(wf)
# pull all jobs and run them locally
rapidfire(launchpad)
33
fws:
- fw_id: 1
spec:
_tasks:
- _fw_name: ScriptTask:
script: echo 'To be, or not to be,’
- fw_id: 2
spec:
_tasks:
- _fw_name: ScriptTask
script: echo 'that is the question:’
links:
1:
- 2
metadata: {}
(this is YAML, a bit prettier for humans
but less pretty for computers)
The same JSON document will
produce the same result on
any computer (with the same
Python functions).
34
fws:
- fw_id: 1
spec:
_tasks:
- _fw_name: ScriptTask:
script: echo 'To be, or not to be,’
- fw_id: 2
spec:
_tasks:
- _fw_name: ScriptTask
script: echo 'that is the question:’
links:
1:
- 2
metadata: {}
Just some of your search
options:
• simple matches
• match in array
• greater than/less than
• regular expressions
• match subdocument
• Javascript function
• MapReduce…
All for free, and all on the native workflow format!
(this is YAML, a bit prettier for humans
but less pretty for computers)
35
36
¡ Theme: Worker machine pulls a job & runs it
¡ Variation 1:
§ different workers can be configured to pull different
types of jobs via config + MongoDB
¡ Variation 2:
§ worker machines sort the jobs by a priority key and
pull matching jobs the highest priority
37
Queue launcher
(running on login node or crontab)
thruput job
thruput job
thruput job
thruput job
thruput job
Job wakes up
when PBS runs it
Grabs the latest
job description
from an external
DB
Runs the job based
on DB description
38
¡ Multiple processes pull and run jobs simultaneously
§ It is all the same thing, just sliced* different ways!
Query&Job&*>&&&job&A!!*>&update&DB&
Query&Job&*>&&&job&B!!*>&update&DB&&
Query&Job&*>&&&job&X&&*>&Update&DB&
mpirun&*>&Node&1%
mpirun&*>&Node&2%
mpirun&*>&Node&n%
1!large!job!
Independent&Processes&
mol&a%
mol&b%
mol&x%
*get it? wink wink
39
because jobs
are JSON, they
are completely
serializable!
40
¡ Features
¡ Potential issues
¡ Conclusion
¡ Appendix slides
§ Implementation
§ Getting started
§ Advanced usage
41
input_array: [1, 2, 3]
1. Sum input array
2. Write to file
3. Pass result to next job
input_array: [4, 5, 6]
1. Sum input array
2. Write to file
3. Pass result to next job
input_data: [6, 15]
1. Sum input data
2. Write to file
3. Pass result to next job
-------------------------------------
1. Copy result to home dir
6 15
class MyAdditionTask(FireTaskBase):
_fw_name = "My Addition Task"
def run_task(self, fw_spec):
input_array = fw_spec['input_array']
m_sum = sum(input_array)
print("The sum of {} is: {}".format(input_array, m_sum))
with open('my_sum.txt', 'a') as f:
f.writelines(str(m_sum)+'n')
# store the sum; push the sum to the input array of the next
sum
return FWAction(stored_data={'sum': m_sum},
mod_spec=[{'_push': {'input_array': m_sum}}])
See also: http://pythonhosted.org/FireWorks/guide_to_writing_firetasks.html
input_array: [1, 2, 3]
1. Sum input array
2. Write to file
3. Pass result to next job
input_array: [1, 2, 3]
1.  Sum input array
2.  Write to file
3.  Pass result to next job
input_array: [4, 5, 6]
1.  Sum input array
2.  Write to file
3.  Pass result to next job
input_data: [6, 15]
1.  Sum input data
2.  Write to file
3.  Pass result to next job
-------------------------------------
1.  Copy result to home dir
6 15!
# set up the LaunchPad and reset it
launchpad = LaunchPad()
launchpad.reset('', require_password=False)
# create Workflow consisting of a AdditionTask FWs + file transfer
fw1 = Firework(MyAdditionTask(), {"input_array": [1,2,3]}, name="pt 1A")
fw2 = Firework(MyAdditionTask(), {"input_array": [4,5,6]}, name="pt 1B")
fw3 = Firework([MyAdditionTask(), FileTransferTask({"mode": "cp", "files": ["my_sum.txt"],
"dest": "~"})], name="pt 2")
wf = Workflow([fw1, fw2, fw3], {fw1: fw3, fw2: fw3}, name="MAVRL test")
launchpad.add_wf(wf)
# launch the entire Workflow locally
rapidfire(launchpad, FWorker())
¡ lpad get_wflows -d more
¡ lpad get_fws -i 3 -d all
¡ lpad webgui
¡ Also rerun features
See all reporting at official docs:
http://pythonhosted.org/FireWorks
¡ There are a ton in the documentation and tutorials,
just try them!
§ http://pythonhosted.org/FireWorks
¡ I want an example of running VASP!
§ https://github.com/materialsvirtuallab/fireworks-vasp
§ https://gist.github.com/computron/
▪ look for “fireworks-vasp_demo.py”
§ Note: demo is only a single VASP run
§ multiple VASP runs require passing directory names
between jobs
▪ currently you must do this manually
▪ in future, perhaps build into FireWorks
¡ If you can copy commands from a web page
and type them into a Terminal, you possess the
skills needed to complete the FireWorks tutorials
§ BUT: for long-term use, highly suggested you learn
some Python
¡ Go to:
§ http://pythonhosted.org/FireWorks
§ or Google “FireWorks workflow software”
¡ NERSC-specific instructions & notes:
§ https://pythonhosted.org/FireWorks/installation_note
s.html
47
¡ Features
¡ Potential issues
¡ Conclusion
¡ Appendix slides
§ Implementation
§ Getting started
§ Advanced usage
48
¡ Say you have a FWS database with many different
job types, and want to run different jobs types on
different machines
¡ You have three options:
1. Set the “_fworker” variable in the FW itself. Only the
FWorker(s) with the matching name will run the job.
2. Set the “_category” variable in the FW itself. Only the
FWorker(s) with the matching categories will run the job.
3. Set the “query” parameter in the FWorker. You can set
any Mongo query on the FW to decide what jobs this
FWorker will run. e.g., jobs with certain parameter
ranges.
49
¡ Both Trackers and BackgroundTasks will run a process in
the background of your main FW.
¡ A Tracker is a quick way to monitor the first or last few
lines of a file (e.g., output file) during job execution. It is
also easy to set up, just set the “_tracker” variable in the
FW spec with the details of what files you want to
monitor.
§ This allows you to track output files of all your jobs using the
database.
§ For example, one command will let you view the output files of
all failed jobs – all without logging into any machines!
¡ A BackgroundTask will run any FireTask in a separate
Process from the main task. There are built-in parameters
to help.
50
¡ Sometimes, the specific Python code that you
need to execute (FireTask) depends on what
machine you are running on
¡ A solution to this is FW_env
¡ Each Worker configuration can set its own “env”
variable, which is accessible by the FireWork
when running within the “_fw_env” key
¡ The same job will see different values of
“_fw_env” depending on where it’s running, and
use this to execute the workflow
51
¡ Normally, a workflow stops proceeding when a
FireWork fails, or “fizzles”.
§ at this point, a user might change some backend code and
rerun the failed job
¡ Sometimes, you want a child FW to run even if one
or more parents have “fizzled”.
§ For example, the child FW might inspect the parent,
determine a cause of failure, and initiate a “recovery
workflow”
¡ To enable a child to run, set the
“_allow_fizzled_parents” key in the spec to True
§ FWS also create a “_fizzled_parents” key in that FW
spec that becomes available when the parents fail, and
contains details about the parent FW
52
¡ You might want some statistics on FWS jobs:
§ daily, weekly, monthly reports over certain periods for
how many Workflows/FireWorks/etc. completed
§ identify days when there were many job failures, perhaps
associated with a computing center outage
§ grouping FIZZLED jobs by a key in the spec, e.g. to get
stats on what job types failed most often
¡ All this is possible with the reporting package, type
“lpad report –h” for more information
¡ You can also introspect to find common factors in job
failures, type “lpad introspect –h” for more
information
53

Más contenido relacionado

La actualidad más candente

MAVRL Workshop 2014 - pymatgen-db & custodian
MAVRL Workshop 2014 - pymatgen-db & custodianMAVRL Workshop 2014 - pymatgen-db & custodian
MAVRL Workshop 2014 - pymatgen-db & custodian
University of California, San Diego
 
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
University of California, San Diego
 

La actualidad más candente (20)

Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
MAVRL Workshop 2014 - pymatgen-db & custodian
MAVRL Workshop 2014 - pymatgen-db & custodianMAVRL Workshop 2014 - pymatgen-db & custodian
MAVRL Workshop 2014 - pymatgen-db & custodian
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methods
 
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
ICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials ProjectICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials Project
 
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
Prediction and Experimental Validation of New Bulk Thermoelectrics Compositio...
 
NANO266 - Lecture 9 - Tools of the Modeling Trade
NANO266 - Lecture 9 - Tools of the Modeling TradeNANO266 - Lecture 9 - Tools of the Modeling Trade
NANO266 - Lecture 9 - Tools of the Modeling Trade
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learning
 
Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...Capturing and leveraging materials science knowledge from millions of journal...
Capturing and leveraging materials science knowledge from millions of journal...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...The Materials Project - Combining Science and Informatics to Accelerate Mater...
The Materials Project - Combining Science and Informatics to Accelerate Mater...
 
The Materials API
The Materials APIThe Materials API
The Materials API
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
Machine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methodsMachine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methods
 
High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 

Destacado (7)

Open Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisOpen Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & Analysis
 
Application of the Materials Project database and data mining towards the des...
Application of the Materials Project database and data mining towards the des...Application of the Materials Project database and data mining towards the des...
Application of the Materials Project database and data mining towards the des...
 
Open MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOFOpen MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOF
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Combining High-Throughput Computing and Statistical Learning to Develop and U...
Combining High-Throughput Computing and Statistical Learning to Develop and U...Combining High-Throughput Computing and Statistical Learning to Develop and U...
Combining High-Throughput Computing and Statistical Learning to Develop and U...
 
The Materials Project and computational materials discovery
The Materials Project and computational materials discoveryThe Materials Project and computational materials discovery
The Materials Project and computational materials discovery
 
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...
Targeted Band Structure Design and Thermoelectric Materials Discovery Using H...
 

Similar a FireWorks overview

High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
Chris Fregly
 
Django Article V0
Django Article V0Django Article V0
Django Article V0
Udi Bauman
 
[NetApp Managing Big Workspaces with Storage Magic
[NetApp Managing Big Workspaces with Storage Magic[NetApp Managing Big Workspaces with Storage Magic
[NetApp Managing Big Workspaces with Storage Magic
Perforce
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
PyData
 

Similar a FireWorks overview (20)

There is something about serverless
There is something about serverlessThere is something about serverless
There is something about serverless
 
Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018Intro - End to end ML with Kubeflow @ SignalConf 2018
Intro - End to end ML with Kubeflow @ SignalConf 2018
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
 
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
Apache Deep Learning 101 - ApacheCon Montreal 2018 v0.31
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with Kubernetes
 
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
High Performance Distributed TensorFlow with GPUs - NYC Workshop - July 9 2017
 
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014
 
Docker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12xDocker and-containers-for-development-and-deployment-scale12x
Docker and-containers-for-development-and-deployment-scale12x
 
Django Article V0
Django Article V0Django Article V0
Django Article V0
 
Docker 102 - Immutable Infrastructure
Docker 102 - Immutable InfrastructureDocker 102 - Immutable Infrastructure
Docker 102 - Immutable Infrastructure
 
[NetApp Managing Big Workspaces with Storage Magic
[NetApp Managing Big Workspaces with Storage Magic[NetApp Managing Big Workspaces with Storage Magic
[NetApp Managing Big Workspaces with Storage Magic
 
Serverless - DevOps Lessons Learned From Production
Serverless - DevOps Lessons Learned From ProductionServerless - DevOps Lessons Learned From Production
Serverless - DevOps Lessons Learned From Production
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easier
 
From pets to cattle - powered by CoreOS, docker, Mesos & nginx
From pets to cattle - powered by CoreOS, docker, Mesos & nginxFrom pets to cattle - powered by CoreOS, docker, Mesos & nginx
From pets to cattle - powered by CoreOS, docker, Mesos & nginx
 
SciPipe - A light-weight workflow library inspired by flow-based programming
SciPipe - A light-weight workflow library inspired by flow-based programmingSciPipe - A light-weight workflow library inspired by flow-based programming
SciPipe - A light-weight workflow library inspired by flow-based programming
 
How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...
 
Intro to DefectDojo at OWASP Switzerland
Intro to DefectDojo at OWASP SwitzerlandIntro to DefectDojo at OWASP Switzerland
Intro to DefectDojo at OWASP Switzerland
 
Hands-on Lab: Red Hat Container Development & OpenShift
Hands-on Lab: Red Hat Container Development & OpenShiftHands-on Lab: Red Hat Container Development & OpenShift
Hands-on Lab: Red Hat Container Development & OpenShift
 
Powering tensor flow with big data using apache beam, flink, and spark cern...
Powering tensor flow with big data using apache beam, flink, and spark   cern...Powering tensor flow with big data using apache beam, flink, and spark   cern...
Powering tensor flow with big data using apache beam, flink, and spark cern...
 

Más de Anubhav Jain

Más de Anubhav Jain (20)

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 

Último

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Último (20)

MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 

FireWorks overview

  • 1. Anubhav Jain FireWorks workflow software: An introduction LLNL meeting | November 2016 Energy & Environmental Technologies Berkeley Lab 1Slides available at www.slideshare.net/anubhavster
  • 2. ¡ Built w/Python+MongoDB. Open-source, pip-installable: § http://pythonhosted.org/FireWorks/ § Very easy to install, most people can run first tutorial within 30 minutes of starting ¡ At least 100 million CPU-hours used; everyday production use by 3 large DOE projects (Materials Project, JCESR, JCAP) as well as many materials science research groups ¡ Also used for graphics processing, machine learning, multiscale modeling, and document processing (but not by us) ¡ #1 Google hit for “Python workflow software” § still behind Pegasus, Kepler, Taverna, Trident, for “scientific workflow software” 2
  • 4. ¡ Partly, we had trouble learning and using other people’s workflow software § Today, I think the situation is much better § For example, Pegasus in 2011 gave no instructions to a general user on how to install/use/deploy it apart from a super-complicated user manual § Today, Pegasus takes more care to show you how to use it on their web page § Other tools like Swift (Argonne) are also providing tutorials ¡ Partly, the other workflow software wasn’t what we were looking for § Other software emphasized completing a fixed workload quickly rather than fluidly adding, subtracting, reprioritizing, searching, etc. workflows over long time periods 4
  • 6. ¡ Millions of small jobs, each at least a minute long ¡ Small amount of inter-job parallelism (“bundling”) (e.g. <1000 jobs); any amount of intra-job parallelism ¡ Failures are common; need persistent status § like UPS packages, database is a necessity ¡ Very dynamic workflows § i.e. workflows that can modify themselves intelligently and act like researchers that submit extra calculations as needed ¡ Collisions/duplicate detection § people submitting the same workflow, or perhaps have some steps in common ¡ Runs on a laptop or a supercomputer ¡ Not “extreme” or record-breaking applications ¡ Can install/learn/use it by yourself without help/support, and by a normal scientist rather than a “workflow expert”. ¡ Python-centric 6
  • 7. ¡ Features ¡ Potential issues ¡ Conclusion ¡ Appendix slides § Implementation § Getting started § Advanced usage 7
  • 8. LAUNCHPAD FW 1 FW 2 FW 3 FW 4 ROCKET LAUNCHER / QUEUE LAUNCHER Directory 1 Directory 2 8
  • 9. ? You can scale without human effort Easily customize what gets run where 9
  • 10. ¡ PBS ¡ SGE ¡ SLURM ¡ IBM LoadLeveler ¡ NEWT (a REST-based API at NERSC) ¡ Cobalt (Argonne LCF, initial runs of ~2 million CPU-hours successful) 10
  • 11. 11
  • 12. No job left behind! 12
  • 13. what machine what time what directory what was the output when was it queued when did it start running when was it completed LAUNCH ¡ both job details (scripts+parameters) and launch details are automatically stored 13
  • 14. ¡ Soft failures, hard failures, human errors § “lpad rerun –s FIZZLED” § “lpad detect_unreserved –rerun” OR § “lpad detect_lostruns –rerun” OR 14
  • 15. Xiaohui can be replaced by digital Xiaohui, programmed into FireWorks 15
  • 16. 16
  • 17. Generate relaxation VASP input files from initial structure Run VASP calculation with Custodian Insert results into database Set up AIMD simulation using final relaxed structure Generate AIMD VASP input files from relaxed structure Run VASP calculation with Custodian with Walltime Handler Insert AIMD simulation results into database Convergence reached? No Done Transfer AIMD calculation output to specified final location Yes Each box represents a FireTask, and each series of boxes with the same color represents a single Firework. Green: Initial structure relaxation run Blue: AIMD simulation Red: Insert AIMD run into db. Generate AIMD VASP input files from relaxed structure Run VASP calculation with Custodian with Walltime Handler Insert AIMD simulation results into database Convergence reached? No Done Transfer AIMD calculation output to specified final location Yes Dynamically add multiple parallel AIMD Fireworks. E.g., different incar configs, temperatures, etc. Dynamically add continuation AIMD Firework that starts from previous run. Dynamically add continuation AIMD Firework that starts from previous run. 17
  • 18. ¡ Submitting millions of jobs § Easy to lose track of what was done before ¡ Multiple users submitting jobs ¡ Sub-workflow duplication A A Duplicate Job detection (if two workflows contain an identical step, ensure that the step is only run once and relevant information is still passed) 18
  • 19. ¡ Within workflow, or between workflows ¡ Completely flexible and can be modified whenever you want 19
  • 20. Now seems like a good time to bring up the last few lines of the OUTCAR of all failed jobs... 20
  • 21. ¡ Keep queue full with jobs ¡ Pack jobs automatically (to a point) 21
  • 22. 22 ¡ Keep queue full with jobs ¡ Pack jobs automatically (to a point)
  • 23. ¡ Lots of care put into documentation and tutorials § Many strangers and outsiders have independently used it w/o support from us ¡ Built in tasks § run BASH/Python scripts § file transfer (incl. remote) § write/copy/delete files 23
  • 24. ¡ No direct funding for FWS – certainly not a multimillion dollar project ¡ Mitigating longevity concerns: § FWS is open-source so the existing code will always be there § FWS never required explicit funding for development / enhancment § FWS has a distributed user and developer community, shielding it from a single point of failure § Several multimillion dollar DOE projects and many research groups including my own depend critically on FireWorks. Funding for basic improvements/bugfixes is certainly going to be there if really needed. ¡ Mitigating support concerns: § No funding does mean limited support for external users § Support mechanisms favor solving problems broadly (e.g., better code, better documentation) versus working one-on-one with potential users to solve their problems and develop single-serving “workarounds” § BUT there is a free support list, and if you look, you will see that even specific individual concerns are handled quickly and efficiently: ▪ https://groups.google.com/forum/#!forum/fireworkflows § In fact, I have yet to see proof of better user support from well-funded projects: ▪ Compare against: http://mailman.isi.edu/pipermail/pegasus-users/ ▪ Compare against: https://lists.apache.org/list.html?users@taverna.apache.org ▪ Compare against: http://swift-lang.org/support/index.php (no results in any search?) 24
  • 25. ¡ Features ¡ Potential issues ¡ Conclusion ¡ Appendix slides § Implementation § Getting started § Advanced usage 25
  • 26. 26 LAUNCHPAD (MongoDB) FIREWORKER (computing resource) LAUNCHPAD (MongoDB) FIREWORKER (computing resource) LAUNCHPAD (MongoDB) FIREWORKER (computing resource) LaunchPad and FireWorker within the same network firewall à Works great LaunchPad and FireWorker separated by firewall, BUT login node of FireWorker is open to MongoDB connection à Works great if you have a MOM node type structure à Otherwise “offline” mode is a non- ideal but viable option LaunchPad and FireWorker separated by firewall, no communication allowed à Doesn’t work!
  • 27. 2 4 6 0 250 500 750 1000 # Jobs Jobs/second command mlaunch rlaunch 1 workflow 5 workflows 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 1client8clients 200 400 600 800 1000 200 400 600 800 1000 Number of tasks Secondspertask Workflow pattern pairwise parallel reduce sequence ¡ Tests indicate the FireWorks can handle a throughput of about 6-7 jobs finishing per second ¡ Overhead is 0.1-1 sec per task ¡ Recently changes might enhance speed, but not tested 27
  • 28. ¡ Computing center issues § Almost all computing centers limit the number of “mpirun”-style commands that can be executed within a single job § Typically, this sets a limit to the degree of job packing that can be achieved § Currently, no good solution; may need to work on “hacking” the MPI communicator. e.g., “wraprun” is one effort at Oak Ridge. 28
  • 29. ¡ Features ¡ Potential issues ¡ Conclusion ¡ Appendix slides § Implementation § Getting started § Advanced usage 29
  • 30. ¡ If you are curious, just try spending 1 hour with FireWorks § http://pythonhosted.org/FireWorks § If you’re not intrigued after an hour, try something else ¡ If you need help, contact the support list: § https://groups.google.com/forum/#!forum/fireworkflows ¡ If you want to read up on FireWorks, there is a paper – but this is no substitute for trying it § “FireWorks: a dynamic workflow system designed for high- throughput applications”. Concurr. Comput. Pract. Exp. 22, 5037–5059 (2015). § Please cite this if you use FireWorks 30
  • 31. ¡ Features ¡ Potential issues ¡ Conclusion ¡ Appendix slides § Implementation § Getting started § Advanced usage 31
  • 32. FW 1 Spec FireTask 1 FireTask 2 FW 2 Spec FireTask 1 FW 3 Spec FireTask 1 FireTask 2 FireTask 3 FWAction 32
  • 33. from fireworks import Firework, Workflow, LaunchPad, ScriptTask from fireworks.core.rocket_launcher import rapidfire # set up the LaunchPad and reset it (first time only) launchpad = LaunchPad() launchpad.reset('', require_password=False) # define the individual FireWorks and Workflow fw1 = Firework(ScriptTask.from_str('echo "To be, or not to be,"')) fw2 = Firework(ScriptTask.from_str('echo "that is the question:"')) wf = Workflow([fw1, fw2], {fw1:fw2}) # set of FWs and dependencies # store workflow in LaunchPad launchpad.add_wf(wf) # pull all jobs and run them locally rapidfire(launchpad) 33
  • 34. fws: - fw_id: 1 spec: _tasks: - _fw_name: ScriptTask: script: echo 'To be, or not to be,’ - fw_id: 2 spec: _tasks: - _fw_name: ScriptTask script: echo 'that is the question:’ links: 1: - 2 metadata: {} (this is YAML, a bit prettier for humans but less pretty for computers) The same JSON document will produce the same result on any computer (with the same Python functions). 34
  • 35. fws: - fw_id: 1 spec: _tasks: - _fw_name: ScriptTask: script: echo 'To be, or not to be,’ - fw_id: 2 spec: _tasks: - _fw_name: ScriptTask script: echo 'that is the question:’ links: 1: - 2 metadata: {} Just some of your search options: • simple matches • match in array • greater than/less than • regular expressions • match subdocument • Javascript function • MapReduce… All for free, and all on the native workflow format! (this is YAML, a bit prettier for humans but less pretty for computers) 35
  • 36. 36
  • 37. ¡ Theme: Worker machine pulls a job & runs it ¡ Variation 1: § different workers can be configured to pull different types of jobs via config + MongoDB ¡ Variation 2: § worker machines sort the jobs by a priority key and pull matching jobs the highest priority 37
  • 38. Queue launcher (running on login node or crontab) thruput job thruput job thruput job thruput job thruput job Job wakes up when PBS runs it Grabs the latest job description from an external DB Runs the job based on DB description 38
  • 39. ¡ Multiple processes pull and run jobs simultaneously § It is all the same thing, just sliced* different ways! Query&Job&*>&&&job&A!!*>&update&DB& Query&Job&*>&&&job&B!!*>&update&DB&& Query&Job&*>&&&job&X&&*>&Update&DB& mpirun&*>&Node&1% mpirun&*>&Node&2% mpirun&*>&Node&n% 1!large!job! Independent&Processes& mol&a% mol&b% mol&x% *get it? wink wink 39
  • 40. because jobs are JSON, they are completely serializable! 40
  • 41. ¡ Features ¡ Potential issues ¡ Conclusion ¡ Appendix slides § Implementation § Getting started § Advanced usage 41
  • 42. input_array: [1, 2, 3] 1. Sum input array 2. Write to file 3. Pass result to next job input_array: [4, 5, 6] 1. Sum input array 2. Write to file 3. Pass result to next job input_data: [6, 15] 1. Sum input data 2. Write to file 3. Pass result to next job ------------------------------------- 1. Copy result to home dir 6 15
  • 43. class MyAdditionTask(FireTaskBase): _fw_name = "My Addition Task" def run_task(self, fw_spec): input_array = fw_spec['input_array'] m_sum = sum(input_array) print("The sum of {} is: {}".format(input_array, m_sum)) with open('my_sum.txt', 'a') as f: f.writelines(str(m_sum)+'n') # store the sum; push the sum to the input array of the next sum return FWAction(stored_data={'sum': m_sum}, mod_spec=[{'_push': {'input_array': m_sum}}]) See also: http://pythonhosted.org/FireWorks/guide_to_writing_firetasks.html input_array: [1, 2, 3] 1. Sum input array 2. Write to file 3. Pass result to next job
  • 44. input_array: [1, 2, 3] 1.  Sum input array 2.  Write to file 3.  Pass result to next job input_array: [4, 5, 6] 1.  Sum input array 2.  Write to file 3.  Pass result to next job input_data: [6, 15] 1.  Sum input data 2.  Write to file 3.  Pass result to next job ------------------------------------- 1.  Copy result to home dir 6 15! # set up the LaunchPad and reset it launchpad = LaunchPad() launchpad.reset('', require_password=False) # create Workflow consisting of a AdditionTask FWs + file transfer fw1 = Firework(MyAdditionTask(), {"input_array": [1,2,3]}, name="pt 1A") fw2 = Firework(MyAdditionTask(), {"input_array": [4,5,6]}, name="pt 1B") fw3 = Firework([MyAdditionTask(), FileTransferTask({"mode": "cp", "files": ["my_sum.txt"], "dest": "~"})], name="pt 2") wf = Workflow([fw1, fw2, fw3], {fw1: fw3, fw2: fw3}, name="MAVRL test") launchpad.add_wf(wf) # launch the entire Workflow locally rapidfire(launchpad, FWorker())
  • 45. ¡ lpad get_wflows -d more ¡ lpad get_fws -i 3 -d all ¡ lpad webgui ¡ Also rerun features See all reporting at official docs: http://pythonhosted.org/FireWorks
  • 46. ¡ There are a ton in the documentation and tutorials, just try them! § http://pythonhosted.org/FireWorks ¡ I want an example of running VASP! § https://github.com/materialsvirtuallab/fireworks-vasp § https://gist.github.com/computron/ ▪ look for “fireworks-vasp_demo.py” § Note: demo is only a single VASP run § multiple VASP runs require passing directory names between jobs ▪ currently you must do this manually ▪ in future, perhaps build into FireWorks
  • 47. ¡ If you can copy commands from a web page and type them into a Terminal, you possess the skills needed to complete the FireWorks tutorials § BUT: for long-term use, highly suggested you learn some Python ¡ Go to: § http://pythonhosted.org/FireWorks § or Google “FireWorks workflow software” ¡ NERSC-specific instructions & notes: § https://pythonhosted.org/FireWorks/installation_note s.html 47
  • 48. ¡ Features ¡ Potential issues ¡ Conclusion ¡ Appendix slides § Implementation § Getting started § Advanced usage 48
  • 49. ¡ Say you have a FWS database with many different job types, and want to run different jobs types on different machines ¡ You have three options: 1. Set the “_fworker” variable in the FW itself. Only the FWorker(s) with the matching name will run the job. 2. Set the “_category” variable in the FW itself. Only the FWorker(s) with the matching categories will run the job. 3. Set the “query” parameter in the FWorker. You can set any Mongo query on the FW to decide what jobs this FWorker will run. e.g., jobs with certain parameter ranges. 49
  • 50. ¡ Both Trackers and BackgroundTasks will run a process in the background of your main FW. ¡ A Tracker is a quick way to monitor the first or last few lines of a file (e.g., output file) during job execution. It is also easy to set up, just set the “_tracker” variable in the FW spec with the details of what files you want to monitor. § This allows you to track output files of all your jobs using the database. § For example, one command will let you view the output files of all failed jobs – all without logging into any machines! ¡ A BackgroundTask will run any FireTask in a separate Process from the main task. There are built-in parameters to help. 50
  • 51. ¡ Sometimes, the specific Python code that you need to execute (FireTask) depends on what machine you are running on ¡ A solution to this is FW_env ¡ Each Worker configuration can set its own “env” variable, which is accessible by the FireWork when running within the “_fw_env” key ¡ The same job will see different values of “_fw_env” depending on where it’s running, and use this to execute the workflow 51
  • 52. ¡ Normally, a workflow stops proceeding when a FireWork fails, or “fizzles”. § at this point, a user might change some backend code and rerun the failed job ¡ Sometimes, you want a child FW to run even if one or more parents have “fizzled”. § For example, the child FW might inspect the parent, determine a cause of failure, and initiate a “recovery workflow” ¡ To enable a child to run, set the “_allow_fizzled_parents” key in the spec to True § FWS also create a “_fizzled_parents” key in that FW spec that becomes available when the parents fail, and contains details about the parent FW 52
  • 53. ¡ You might want some statistics on FWS jobs: § daily, weekly, monthly reports over certain periods for how many Workflows/FireWorks/etc. completed § identify days when there were many job failures, perhaps associated with a computing center outage § grouping FIZZLED jobs by a key in the spec, e.g. to get stats on what job types failed most often ¡ All this is possible with the reporting package, type “lpad report –h” for more information ¡ You can also introspect to find common factors in job failures, type “lpad introspect –h” for more information 53