SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
Building workflows with Spotify's
Workshop for e-Infrastructures for Massively Parallel Sequencing
Uppsala, Jan 19-20, 2015
Samuel Lampa
UU/Dept of Pharm. Biosci. & BILS
samuel.lampa@bils.se
samuel.lampa.co
Luigi Short Facts
●
Batch (No streaming)
●
Both Command line and Hadoop execution
●
Does not replace Hive, Pig, Cascading etc
- instead used to stitch many such jobs together
●
Powers 1000:s of jobs every day at Spotify
●
Communication / synchronization between tasks via “targets” (Normal file,
HDFS, database …)
●
Dependencies hard coded in tasks
●
Pull based (ask for task, and it figures out dependencies)
●
Main features: Scheduling, Dependency Graph,
Basic multi-node support (via central planner daemon)
●
github.com/spotify/luigi
●
Easy to install: pip install luigi [tornado]
A Basic Luigi Task
A Basic Luigi Task
Calling tasks from the commandline
Defining workflows: Default way
Dependencies: Default way
TaskA()
TaskB()
TaskBA() TaskBB()
TaskBAA() TaskBBB()TaskBAB() TaskBBA()
Parameters: Default way
TaskA()
TaskB()
TaskBA() TaskBB()
TaskBAA() TaskBBB()TaskBAB() TaskBBA()
DependencyParameter
Growing list of parameters
Dependencies and parameters:
A better way
AWorkflowTask()
TaskA()
TaskB()
TaskBA() TaskBB()
TaskBAA() TaskBBB()TaskBAB() TaskBBA()
Supporting separate network definition
Using this functionality in tasks
Using this new functionality in workflows, and wrapping whole workflows in Tasks
Unit testing Luigi tasks is easy
Lessons Learned (April 2014)
●
Only “pull” or “call/return” semantics, no “push”
or “auto triggering by inputs”
●
Not fully separate workflow
●
Multiple inputs / outputs somewhat error-prone
●
Dynamic typed language
●
Check out for max processes limits
●
No real distribution of compute or data
(only common scheduling)
Lessons Learned (Update Jan 2015)
●
Only “pull” or “call/return” semantics, no “push” or “auto triggering
by inputs”
●
Not fully separate workflow def.
– We found a way around this.
●
Multiple inputs / outputs somewhat error-prone
– We're looking into solving this too.
●
Dynamic typed language
– We've found that unit testing is easy.
●
Check out for max processes limits
●
No real distribution of compute or data
(only common scheduling)
– We have found ourselves running everything through SLURM anyway
Thank you!

Más contenido relacionado

Destacado

Building a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache SparkBuilding a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache SparkDataWorks Summit
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupErik Bernhardsson
 
(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot Instances(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot InstancesAmazon Web Services
 
Engineering a robust(ish) data pipeline with Luigi and AWS Elastic Map Reduce
Engineering a robust(ish) data pipeline with Luigi and AWS Elastic Map ReduceEngineering a robust(ish) data pipeline with Luigi and AWS Elastic Map Reduce
Engineering a robust(ish) data pipeline with Luigi and AWS Elastic Map ReduceAaron Knight
 
Luigi PyData presentation
Luigi PyData presentationLuigi PyData presentation
Luigi PyData presentationElias Freider
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for HadoopJoe Crobak
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakHakka Labs
 

Destacado (7)

Building a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache SparkBuilding a unified data pipeline in Apache Spark
Building a unified data pipeline in Apache Spark
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
 
(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot Instances(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot Instances
 
Engineering a robust(ish) data pipeline with Luigi and AWS Elastic Map Reduce
Engineering a robust(ish) data pipeline with Luigi and AWS Elastic Map ReduceEngineering a robust(ish) data pipeline with Luigi and AWS Elastic Map Reduce
Engineering a robust(ish) data pipeline with Luigi and AWS Elastic Map Reduce
 
Luigi PyData presentation
Luigi PyData presentationLuigi PyData presentation
Luigi PyData presentation
 
Workflow Engines for Hadoop
Workflow Engines for HadoopWorkflow Engines for Hadoop
Workflow Engines for Hadoop
 
Building a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe CrobakBuilding a Data Pipeline from Scratch - Joe Crobak
Building a Data Pipeline from Scratch - Joe Crobak
 

Más de Samuel Lampa

Using Flow-based programming to write tools and workflows for Scientific Comp...
Using Flow-based programming to write tools and workflows for Scientific Comp...Using Flow-based programming to write tools and workflows for Scientific Comp...
Using Flow-based programming to write tools and workflows for Scientific Comp...Samuel Lampa
 
Linked Data for improved organization of research data
Linked Data  for improved organization  of research dataLinked Data  for improved organization  of research data
Linked Data for improved organization of research dataSamuel Lampa
 
How to document computational research projects
How to document computational research projectsHow to document computational research projects
How to document computational research projectsSamuel Lampa
 
Reproducibility in Scientific Data Analysis - BioScience Seminar
Reproducibility in Scientific Data Analysis - BioScience SeminarReproducibility in Scientific Data Analysis - BioScience Seminar
Reproducibility in Scientific Data Analysis - BioScience SeminarSamuel Lampa
 
Batch import of large RDF datasets into Semantic MediaWiki
Batch import of large RDF datasets into Semantic MediaWikiBatch import of large RDF datasets into Semantic MediaWiki
Batch import of large RDF datasets into Semantic MediaWikiSamuel Lampa
 
SciPipe - A light-weight workflow library inspired by flow-based programming
SciPipe - A light-weight workflow library inspired by flow-based programmingSciPipe - A light-weight workflow library inspired by flow-based programming
SciPipe - A light-weight workflow library inspired by flow-based programmingSamuel Lampa
 
Vagrant, Ansible and Docker - How they fit together for productive flexible d...
Vagrant, Ansible and Docker - How they fit together for productive flexible d...Vagrant, Ansible and Docker - How they fit together for productive flexible d...
Vagrant, Ansible and Docker - How they fit together for productive flexible d...Samuel Lampa
 
iRODS Rule Language Cheat Sheet
iRODS Rule Language Cheat SheetiRODS Rule Language Cheat Sheet
iRODS Rule Language Cheat SheetSamuel Lampa
 
AddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingAddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingSamuel Lampa
 
First encounter with Elixir - Some random things
First encounter with Elixir - Some random thingsFirst encounter with Elixir - Some random things
First encounter with Elixir - Some random thingsSamuel Lampa
 
Profiling go code a beginners tutorial
Profiling go code   a beginners tutorialProfiling go code   a beginners tutorial
Profiling go code a beginners tutorialSamuel Lampa
 
Flow based programming an overview
Flow based programming   an overviewFlow based programming   an overview
Flow based programming an overviewSamuel Lampa
 
Python Generators - Talk at PySthlm meetup #15
Python Generators - Talk at PySthlm meetup #15Python Generators - Talk at PySthlm meetup #15
Python Generators - Talk at PySthlm meetup #15Samuel Lampa
 
The RDFIO Extension - A Status update
The RDFIO Extension - A Status updateThe RDFIO Extension - A Status update
The RDFIO Extension - A Status updateSamuel Lampa
 
My lightning talk at Go Stockholm meetup Aug 6th 2013
My lightning talk at Go Stockholm meetup Aug 6th 2013My lightning talk at Go Stockholm meetup Aug 6th 2013
My lightning talk at Go Stockholm meetup Aug 6th 2013Samuel Lampa
 
Hooking up Semantic MediaWiki with external tools via SPARQL
Hooking up Semantic MediaWiki with external tools via SPARQLHooking up Semantic MediaWiki with external tools via SPARQL
Hooking up Semantic MediaWiki with external tools via SPARQLSamuel Lampa
 
Thesis presentation Samuel Lampa
Thesis presentation Samuel LampaThesis presentation Samuel Lampa
Thesis presentation Samuel LampaSamuel Lampa
 
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in BioclipseSamuel Lampa
 
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in BioclipseSamuel Lampa
 

Más de Samuel Lampa (19)

Using Flow-based programming to write tools and workflows for Scientific Comp...
Using Flow-based programming to write tools and workflows for Scientific Comp...Using Flow-based programming to write tools and workflows for Scientific Comp...
Using Flow-based programming to write tools and workflows for Scientific Comp...
 
Linked Data for improved organization of research data
Linked Data  for improved organization  of research dataLinked Data  for improved organization  of research data
Linked Data for improved organization of research data
 
How to document computational research projects
How to document computational research projectsHow to document computational research projects
How to document computational research projects
 
Reproducibility in Scientific Data Analysis - BioScience Seminar
Reproducibility in Scientific Data Analysis - BioScience SeminarReproducibility in Scientific Data Analysis - BioScience Seminar
Reproducibility in Scientific Data Analysis - BioScience Seminar
 
Batch import of large RDF datasets into Semantic MediaWiki
Batch import of large RDF datasets into Semantic MediaWikiBatch import of large RDF datasets into Semantic MediaWiki
Batch import of large RDF datasets into Semantic MediaWiki
 
SciPipe - A light-weight workflow library inspired by flow-based programming
SciPipe - A light-weight workflow library inspired by flow-based programmingSciPipe - A light-weight workflow library inspired by flow-based programming
SciPipe - A light-weight workflow library inspired by flow-based programming
 
Vagrant, Ansible and Docker - How they fit together for productive flexible d...
Vagrant, Ansible and Docker - How they fit together for productive flexible d...Vagrant, Ansible and Docker - How they fit together for productive flexible d...
Vagrant, Ansible and Docker - How they fit together for productive flexible d...
 
iRODS Rule Language Cheat Sheet
iRODS Rule Language Cheat SheetiRODS Rule Language Cheat Sheet
iRODS Rule Language Cheat Sheet
 
AddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based ProgrammingAddisDev Meetup ii: Golang and Flow-based Programming
AddisDev Meetup ii: Golang and Flow-based Programming
 
First encounter with Elixir - Some random things
First encounter with Elixir - Some random thingsFirst encounter with Elixir - Some random things
First encounter with Elixir - Some random things
 
Profiling go code a beginners tutorial
Profiling go code   a beginners tutorialProfiling go code   a beginners tutorial
Profiling go code a beginners tutorial
 
Flow based programming an overview
Flow based programming   an overviewFlow based programming   an overview
Flow based programming an overview
 
Python Generators - Talk at PySthlm meetup #15
Python Generators - Talk at PySthlm meetup #15Python Generators - Talk at PySthlm meetup #15
Python Generators - Talk at PySthlm meetup #15
 
The RDFIO Extension - A Status update
The RDFIO Extension - A Status updateThe RDFIO Extension - A Status update
The RDFIO Extension - A Status update
 
My lightning talk at Go Stockholm meetup Aug 6th 2013
My lightning talk at Go Stockholm meetup Aug 6th 2013My lightning talk at Go Stockholm meetup Aug 6th 2013
My lightning talk at Go Stockholm meetup Aug 6th 2013
 
Hooking up Semantic MediaWiki with external tools via SPARQL
Hooking up Semantic MediaWiki with external tools via SPARQLHooking up Semantic MediaWiki with external tools via SPARQL
Hooking up Semantic MediaWiki with external tools via SPARQL
 
Thesis presentation Samuel Lampa
Thesis presentation Samuel LampaThesis presentation Samuel Lampa
Thesis presentation Samuel Lampa
 
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
3rd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
 
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
2nd Proj. Update: Integrating SWI-Prolog for Semantic Reasoning in Bioclipse
 

Último

Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curveAreesha Ahmad
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 

Último (20)

Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 

Building Scientific Workflows with Spotify's Luigi

  • 1. Building workflows with Spotify's Workshop for e-Infrastructures for Massively Parallel Sequencing Uppsala, Jan 19-20, 2015 Samuel Lampa UU/Dept of Pharm. Biosci. & BILS samuel.lampa@bils.se samuel.lampa.co
  • 2. Luigi Short Facts ● Batch (No streaming) ● Both Command line and Hadoop execution ● Does not replace Hive, Pig, Cascading etc - instead used to stitch many such jobs together ● Powers 1000:s of jobs every day at Spotify ● Communication / synchronization between tasks via “targets” (Normal file, HDFS, database …) ● Dependencies hard coded in tasks ● Pull based (ask for task, and it figures out dependencies) ● Main features: Scheduling, Dependency Graph, Basic multi-node support (via central planner daemon) ● github.com/spotify/luigi ● Easy to install: pip install luigi [tornado]
  • 3.
  • 6. Calling tasks from the commandline
  • 8. Dependencies: Default way TaskA() TaskB() TaskBA() TaskBB() TaskBAA() TaskBBB()TaskBAB() TaskBBA()
  • 9. Parameters: Default way TaskA() TaskB() TaskBA() TaskBB() TaskBAA() TaskBBB()TaskBAB() TaskBBA() DependencyParameter
  • 10. Growing list of parameters
  • 11. Dependencies and parameters: A better way AWorkflowTask() TaskA() TaskB() TaskBA() TaskBB() TaskBAA() TaskBBB()TaskBAB() TaskBBA()
  • 14. Using this new functionality in workflows, and wrapping whole workflows in Tasks
  • 15. Unit testing Luigi tasks is easy
  • 16. Lessons Learned (April 2014) ● Only “pull” or “call/return” semantics, no “push” or “auto triggering by inputs” ● Not fully separate workflow ● Multiple inputs / outputs somewhat error-prone ● Dynamic typed language ● Check out for max processes limits ● No real distribution of compute or data (only common scheduling)
  • 17. Lessons Learned (Update Jan 2015) ● Only “pull” or “call/return” semantics, no “push” or “auto triggering by inputs” ● Not fully separate workflow def. – We found a way around this. ● Multiple inputs / outputs somewhat error-prone – We're looking into solving this too. ● Dynamic typed language – We've found that unit testing is easy. ● Check out for max processes limits ● No real distribution of compute or data (only common scheduling) – We have found ourselves running everything through SLURM anyway