SlideShare una empresa de Scribd logo
1 de 22
Getting
to Know
AIRFLOW
Rosie Hoyem
PyMNtos
04/27/2017
Me.
Data Scientist
Web Developer
Landlord
Cyclist
Traveler
rosiehoyem@gmail.com
rosiehoyem.com
0.
Airflow
huh?
???
Airflow
In a
Nutshell
⊙Data Engineering tool
⊙Pimped out Flask app
⊙Useful for building
functional data pipelines and
automating workflow
1.
Why do I
care?
It’s Popular.
History of
Airflow
2014
Maxime
Beauchemin
began building
a tool at Airbnb
in October of
2014
2016
Airflow entered
incubation as
an Apache
project
Now
Officially used
by dozens of
companies
large and small
Who
Already
uses it?
Airbnb [@mistercrunch, @artwr]
Agari [@r39132]
allegro.pl [@kretes]
AltX [@pedromduarte]
Apigee [@btallman]
Astronomer [@schnie]
Auth0 [@sicarul]
BandwidthX [@dineshdsharma]
Bellhops
BlaBlaCar [@puckel & @wmorin]
Bloc [@dpaola2]
BlueApron [@jasonjho & @matthewdavidhauser]
Blue Yonder [@blue-yonder]
Celect [@superdosh & @chadcelect]
Change.org [@change, @vijaykramesh]
Children's Hospital of Philadelphia Division of
Genomic Diagnostics [@genomics-geek]
City of San Diego [@MrMaksimize, @andrell81 &
@arnaudvedy]
Clairvoyant @shekharv
Clover Health [@gwax & @vansivallab]
Chartboost [@cgelman & @dclubb]
Cotap [@maraca & @richardchew]
Digital First Media [@duffn & @mschmo &
@seanmuth]
Easy Taxi [@caique-lima & @WesleyBatista]
FreshBooks [@DinoCow]
Gentner Lab [@neuromusic]
Glassdoor [@syvineckruyk]
HelloFresh [@tammymendt & @davidsbatista &
@iuriinedostup]
Holimetrix [@thibault-ketterer]
Hootsuite
IFTTT [@apurvajoshi]
iHeartRadio[@yiwang]
ING
Jampp
Kiwi.com [@underyx]
Kogan.com [@geeknam]
Lemann Foundation [@fernandosjp]
LendUp [@lendup]
liligo [@tromika]
LingoChamp [@haitaoyao]
Lucid [@jbrownlucid & @kkourtchikov]
Lumos Labs [@rfroetscher & @zzztimbo]
Lyft[@SaurabhBajaj]
Madrone [@mbreining & @scotthb]
Markovian [@al-xv, @skogsbaeck, @waltherg]
Mercadoni [@demorenoc]
MiNODES [@dice89, @diazcelsa]
MFG Labs
mytaxi [@mytaxi]
Nerdwallet
OfferUp
OneFineStay [@slangwald]
Open Knowledge International @vitorbaptista
PayPal [@jhsenjaliya]
Postmates [@syeoryn]
Sense360 [@kamilmroczek]
Shopkick [@shopkick]
Sidecar [@getsidecar]
SimilarWeb [@similarweb]
SmartNews [@takus]
Spotify [@znichols]
Stackspace
Stripe [@jbalogh]
Thumbtack [@natekupp]
T2 Systems [@unclaimedpants]
Vente-Exclusive.com [@alexvanboxel]
Vnomics [@lpalum]
WePay [@criccomini & @mtagle]
WeTransfer [@jochem]
Whistle Labs [@ananya77041]
WiseBanyan
Wooga
Xoom [@gepser & @omarvides]
Yahoo!
Zapier [@drknexus & @statwonk]
Zendesk
Zenly [@cerisier & @jbdalido]
99 [@fbenevides, @gustavoamigo & @mmmaia]
GovTech GDS [@chrissng & @datagovsg]
Gusto [@frankhsu]
Handshake [@mhickman]
Handy [@marcintustin / @mtustin-handy]
Qubole [@msumit]
2
What is it?
A Brief Overview
Before
Airflow,
there
was...
Cron Jobs.
(And a hodge-podge of other tools
people would duct tape together.)
What’s a Cron Job you say?
Cron
cron is a Linux utility
which schedules a
command or script on
your server to run
automatically at a
specified time and
date.
Schedule
Jobs
Cron Job
A cron job is the
scheduled task itself.
Cron jobs can be
very useful to
automate repetitive
tasks.
Airflow
DAGs as CODE
Directed Acyclic
Graph
Config file that outlines HOW to carry out a workflow
Contains a
collection of
tasks
Determines what
order tasks will
be implemented
Determines
when they will
be implemented
OPERATORS
Operators are the building blocks of workflows
Action
Performs an action, or
tell another system to
perform an action
(i.e., PythonOperator)
Transfer
Move data from one
system to another
(i.e., RedshiftToS3Transfer
Sensor
Will keep running until a
certain criterion is met
(i.e., S3KeySensor
Let’s
review
some
concepts
Operators
Classes provided by
Airflow. Building blocks
of DAGs.
DAGS
Directed Acyclic
Graphs. Specialized
config files for series of
tasks.
Tasks
Tasks are connected via
directed edges that
represent an
"execute_after"
relationship.
Life
Stream
Example
Rails
Application
Airflow Process
Manager
PostgreSQL
Data Store
3
Let’s Try It.
4
Final
Thoughts.
What’s It
Good For
It Can:
⊙Schedule complex
chains of tasks
⊙Manage
dependencies
between tasks
⊙Define complex
relations even in a
large distributed
environment
It Can’t:
⊙Store your data
⊙Clean your house
⊙Feed your pets while
you are gone on
vacation (yet)
Competito
rs Luigi
Came out of Spotify
Simpler in scope
More object oriented
*Complementary to
Airflow?
Pachyderm
Containerized data
pipeline framework
Azkaban
Created at LinkedIn
Batch workflow job
scheduler to run
Hadoop jobs
“ Airflow provides a load of
functionality, but like any
popular, fast-moving
project, the documentation
gap can be a challenge to
adoption.
Thanks!
Any questions?
Getting to Know Airflow

Más contenido relacionado

La actualidad más candente

How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
Laura Lorenz
 

La actualidad más candente (20)

How I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with AirflowHow I learned to time travel, or, data pipelining and scheduling with Airflow
How I learned to time travel, or, data pipelining and scheduling with Airflow
 
Apache Airflow Architecture
Apache Airflow ArchitectureApache Airflow Architecture
Apache Airflow Architecture
 
Airflow introduction
Airflow introductionAirflow introduction
Airflow introduction
 
Apache airflow
Apache airflowApache airflow
Apache airflow
 
Building an analytics workflow using Apache Airflow
Building an analytics workflow using Apache AirflowBuilding an analytics workflow using Apache Airflow
Building an analytics workflow using Apache Airflow
 
Workflow Engines + Luigi
Workflow Engines + LuigiWorkflow Engines + Luigi
Workflow Engines + Luigi
 
Airflow presentation
Airflow presentationAirflow presentation
Airflow presentation
 
Airflow Clustering and High Availability
Airflow Clustering and High AvailabilityAirflow Clustering and High Availability
Airflow Clustering and High Availability
 
Super Fast Gevent Introduction
Super Fast Gevent IntroductionSuper Fast Gevent Introduction
Super Fast Gevent Introduction
 
Apache Airflow overview
Apache Airflow overviewApache Airflow overview
Apache Airflow overview
 
(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot Instances(CMP310) Data Processing Pipelines Using Containers & Spot Instances
(CMP310) Data Processing Pipelines Using Containers & Spot Instances
 
Orchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWSOrchestrating workflows Apache Airflow on GCP & AWS
Orchestrating workflows Apache Airflow on GCP & AWS
 
Apache Airflow
Apache AirflowApache Airflow
Apache Airflow
 
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
Contributing to Apache Airflow | Journey to becoming Airflow's leading contri...
 
From business requirements to working pipelines with apache airflow
From business requirements to working pipelines with apache airflowFrom business requirements to working pipelines with apache airflow
From business requirements to working pipelines with apache airflow
 
Managing data workflows with Luigi
Managing data workflows with LuigiManaging data workflows with Luigi
Managing data workflows with Luigi
 
Luigi future
Luigi futureLuigi future
Luigi future
 
Building a Data Ingestion & Processing Pipeline with Spark & Airflow
Building a Data Ingestion & Processing Pipeline with Spark & AirflowBuilding a Data Ingestion & Processing Pipeline with Spark & Airflow
Building a Data Ingestion & Processing Pipeline with Spark & Airflow
 
Go With The Flow
Go With The FlowGo With The Flow
Go With The Flow
 
Luigi presentation OA Summit
Luigi presentation OA SummitLuigi presentation OA Summit
Luigi presentation OA Summit
 

Similar a Getting to Know Airflow

Similar a Getting to Know Airflow (20)

Platform Engineering: Manage your infrastructure using Kubernetes and Crossplane
Platform Engineering: Manage your infrastructure using Kubernetes and CrossplanePlatform Engineering: Manage your infrastructure using Kubernetes and Crossplane
Platform Engineering: Manage your infrastructure using Kubernetes and Crossplane
 
Deploying Web Apps with PaaS and Docker Tools
Deploying Web Apps with PaaS and Docker ToolsDeploying Web Apps with PaaS and Docker Tools
Deploying Web Apps with PaaS and Docker Tools
 
Introduce Airflow.ppsx
Introduce Airflow.ppsxIntroduce Airflow.ppsx
Introduce Airflow.ppsx
 
Building Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache AirflowBuilding Better Data Pipelines using Apache Airflow
Building Better Data Pipelines using Apache Airflow
 
Self-Service Supercomputing
Self-Service SupercomputingSelf-Service Supercomputing
Self-Service Supercomputing
 
Building Robotics Application at Scale using OpenSource from Zero to Hero
Building Robotics Application at Scale using OpenSource from Zero to HeroBuilding Robotics Application at Scale using OpenSource from Zero to Hero
Building Robotics Application at Scale using OpenSource from Zero to Hero
 
How we spread out our service globally by utilizing AWS and open source soft...
How we spread out our service globally by utilizing  AWS and open source soft...How we spread out our service globally by utilizing  AWS and open source soft...
How we spread out our service globally by utilizing AWS and open source soft...
 
DataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptxDataPipelineApacheAirflow.pptx
DataPipelineApacheAirflow.pptx
 
MongoDB World 2019: Terraform New Worlds on MongoDB Atlas
MongoDB World 2019: Terraform New Worlds on MongoDB Atlas MongoDB World 2019: Terraform New Worlds on MongoDB Atlas
MongoDB World 2019: Terraform New Worlds on MongoDB Atlas
 
Colloquium Report
Colloquium ReportColloquium Report
Colloquium Report
 
Crash Course in Open Source Cloud Computing
Crash Course in Open Source Cloud ComputingCrash Course in Open Source Cloud Computing
Crash Course in Open Source Cloud Computing
 
Automating your plugin with WP-Cron
Automating your plugin with WP-CronAutomating your plugin with WP-Cron
Automating your plugin with WP-Cron
 
Building a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWSBuilding a Sustainable Data Platform on AWS
Building a Sustainable Data Platform on AWS
 
What's New in Docker - February 2017
What's New in Docker - February 2017What's New in Docker - February 2017
What's New in Docker - February 2017
 
Introduction to Apache Airflow
Introduction to Apache AirflowIntroduction to Apache Airflow
Introduction to Apache Airflow
 
Automate the operation of your Oracle Cloud infrastructure v2.0
Automate the operation of your Oracle Cloud infrastructure v2.0Automate the operation of your Oracle Cloud infrastructure v2.0
Automate the operation of your Oracle Cloud infrastructure v2.0
 
CloudFormation Dark Arts
CloudFormation Dark ArtsCloudFormation Dark Arts
CloudFormation Dark Arts
 
OpenStack and serverless - long shot or sure thing
OpenStack and serverless - long shot or sure thingOpenStack and serverless - long shot or sure thing
OpenStack and serverless - long shot or sure thing
 
Airflow techtonic template
Airflow   techtonic templateAirflow   techtonic template
Airflow techtonic template
 
Annex1 kof hatem_9-11-2018
Annex1 kof hatem_9-11-2018Annex1 kof hatem_9-11-2018
Annex1 kof hatem_9-11-2018
 

Último

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 

Último (20)

Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 

Getting to Know Airflow

  • 4. Airflow In a Nutshell ⊙Data Engineering tool ⊙Pimped out Flask app ⊙Useful for building functional data pipelines and automating workflow
  • 6. History of Airflow 2014 Maxime Beauchemin began building a tool at Airbnb in October of 2014 2016 Airflow entered incubation as an Apache project Now Officially used by dozens of companies large and small
  • 7. Who Already uses it? Airbnb [@mistercrunch, @artwr] Agari [@r39132] allegro.pl [@kretes] AltX [@pedromduarte] Apigee [@btallman] Astronomer [@schnie] Auth0 [@sicarul] BandwidthX [@dineshdsharma] Bellhops BlaBlaCar [@puckel & @wmorin] Bloc [@dpaola2] BlueApron [@jasonjho & @matthewdavidhauser] Blue Yonder [@blue-yonder] Celect [@superdosh & @chadcelect] Change.org [@change, @vijaykramesh] Children's Hospital of Philadelphia Division of Genomic Diagnostics [@genomics-geek] City of San Diego [@MrMaksimize, @andrell81 & @arnaudvedy] Clairvoyant @shekharv Clover Health [@gwax & @vansivallab] Chartboost [@cgelman & @dclubb] Cotap [@maraca & @richardchew] Digital First Media [@duffn & @mschmo & @seanmuth] Easy Taxi [@caique-lima & @WesleyBatista] FreshBooks [@DinoCow] Gentner Lab [@neuromusic] Glassdoor [@syvineckruyk] HelloFresh [@tammymendt & @davidsbatista & @iuriinedostup] Holimetrix [@thibault-ketterer] Hootsuite IFTTT [@apurvajoshi] iHeartRadio[@yiwang] ING Jampp Kiwi.com [@underyx] Kogan.com [@geeknam] Lemann Foundation [@fernandosjp] LendUp [@lendup] liligo [@tromika] LingoChamp [@haitaoyao] Lucid [@jbrownlucid & @kkourtchikov] Lumos Labs [@rfroetscher & @zzztimbo] Lyft[@SaurabhBajaj] Madrone [@mbreining & @scotthb] Markovian [@al-xv, @skogsbaeck, @waltherg] Mercadoni [@demorenoc] MiNODES [@dice89, @diazcelsa] MFG Labs mytaxi [@mytaxi] Nerdwallet OfferUp OneFineStay [@slangwald] Open Knowledge International @vitorbaptista PayPal [@jhsenjaliya] Postmates [@syeoryn] Sense360 [@kamilmroczek] Shopkick [@shopkick] Sidecar [@getsidecar] SimilarWeb [@similarweb] SmartNews [@takus] Spotify [@znichols] Stackspace Stripe [@jbalogh] Thumbtack [@natekupp] T2 Systems [@unclaimedpants] Vente-Exclusive.com [@alexvanboxel] Vnomics [@lpalum] WePay [@criccomini & @mtagle] WeTransfer [@jochem] Whistle Labs [@ananya77041] WiseBanyan Wooga Xoom [@gepser & @omarvides] Yahoo! Zapier [@drknexus & @statwonk] Zendesk Zenly [@cerisier & @jbdalido] 99 [@fbenevides, @gustavoamigo & @mmmaia] GovTech GDS [@chrissng & @datagovsg] Gusto [@frankhsu] Handshake [@mhickman] Handy [@marcintustin / @mtustin-handy] Qubole [@msumit]
  • 8. 2 What is it? A Brief Overview
  • 9. Before Airflow, there was... Cron Jobs. (And a hodge-podge of other tools people would duct tape together.) What’s a Cron Job you say?
  • 10. Cron cron is a Linux utility which schedules a command or script on your server to run automatically at a specified time and date. Schedule Jobs Cron Job A cron job is the scheduled task itself. Cron jobs can be very useful to automate repetitive tasks.
  • 12. Directed Acyclic Graph Config file that outlines HOW to carry out a workflow Contains a collection of tasks Determines what order tasks will be implemented Determines when they will be implemented
  • 13. OPERATORS Operators are the building blocks of workflows Action Performs an action, or tell another system to perform an action (i.e., PythonOperator) Transfer Move data from one system to another (i.e., RedshiftToS3Transfer Sensor Will keep running until a certain criterion is met (i.e., S3KeySensor
  • 14. Let’s review some concepts Operators Classes provided by Airflow. Building blocks of DAGs. DAGS Directed Acyclic Graphs. Specialized config files for series of tasks. Tasks Tasks are connected via directed edges that represent an "execute_after" relationship.
  • 18. What’s It Good For It Can: ⊙Schedule complex chains of tasks ⊙Manage dependencies between tasks ⊙Define complex relations even in a large distributed environment It Can’t: ⊙Store your data ⊙Clean your house ⊙Feed your pets while you are gone on vacation (yet)
  • 19. Competito rs Luigi Came out of Spotify Simpler in scope More object oriented *Complementary to Airflow? Pachyderm Containerized data pipeline framework Azkaban Created at LinkedIn Batch workflow job scheduler to run Hadoop jobs
  • 20. “ Airflow provides a load of functionality, but like any popular, fast-moving project, the documentation gap can be a challenge to adoption.

Notas del editor

  1. functional pipelines
  2. biggest unicorns — Spotify, Lyft, Airbnb, Stripe
  3. A finite directed graph with no directed cycles Graph: Vertices and edges Acyclic: No cycles Directed: One direction, beginning and end
  4. Execute Python code UNLOAD command to s3 as a CSV with headers Waits for a key (a file-like instance on S3) to be present in a S3 bucket. S3 being a key/value it does not support folders. The path is just a key a resource.
  5. task A finishes, do both tasks B and C, and when B finishes execute tasks D and E