SlideShare una empresa de Scribd logo
1 de 13
Hadoop WorkFlow 
Scheduler / 
Automation Engine 
Azkaban & Oozie 
Praveen Thirukonda 
Senior Associate 
Data & Analytics 
Orange County, CA 
09/11/2014
© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms 
affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for 
distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 
1 
What is a workflow? 
- A workflow is a Directed Acyclic Graph 
(DAG) of “jobs” where each job has one 
or more inputs and outputs. 
- A workflow scheduler helps us manage 
the co ordination among the various 
jobs.
© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms 
affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for 
distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 
2 
When do we need a workflow scheduler? 
- In a Data Pipeline, Batch jobs need to be 
scheduled to run periodically. 
- They also typically have intricate 
dependency chains—for example, 
dependencies on various data extraction 
processes or previous steps. 
- Larger processes might have 50 or 60 
steps, of which some might run in 
parallel and others must wait for the 
output of earlier steps.
Azkaban
© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms 
affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for 
distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 
4 
What is Azkaban? 
- “cron on steroids” 
- A workflow scheduler can be seen as a 
combination of the cron and make Unix 
utilities combined with a friendly UI.
© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms 
affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for 
distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 
5 
What is Azkaban? 
- Azkaban was implemented at LinkedIn to 
solve the problem of Hadoop job 
dependencies. 
- Azkaban resolves the ordering through 
job dependencies and provides an easy 
to use web user interface to maintain and 
track your workflows.
© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms 
affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for 
distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 
6 
An Image is worth a 1000 words..
Apache Oozie
© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms 
affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for 
distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 
8 
What is Apache Oozie? 
- Similar to Azkaban. 
- Whereas Azkaban uses a series of 
Properties files, Oozie uses an XML file. 
- Oozie supports Java API, command line 
methods for workflow submission in 
addition to Browser interface/REST API. 
- Oozie is part of our Hortonworks 
environment in our cluster.
© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms 
affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for 
distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 
9 
Advantages of using a workflow scheduler 
- Let’s you easily manage dependencies within 
the various tasks. 
- Scheduling of workflows 
- Monitor the progress of your workflow with 
nice interface. 
- Email alerts on failure and successes 
- Retrying of failed jobs.
© 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms 
affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for 
distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 
10 
Application of a workflow scheduler 
- Real Life example of how and where you 
might use a workflow scheduler in your 
Big Data System architecture?
Thank you 
Presentation by Praveen Thirukonda
© 2014 KPMG LLP, a Delaware limited liability partnership and 
the U.S. member firm of the KPMG network of independent 
member firms affiliated with KPMG International Cooperative 
(“KPMG International”), a Swiss entity. All rights reserved. 
The KPMG name, logo and “cutting through complexity” are 
registered trademarks or trademarks of KPMG International.

Más contenido relacionado

La actualidad más candente

PaaS application in Heroku
PaaS application in HerokuPaaS application in Heroku
PaaS application in Heroku
Dileepa Jayakody
 

La actualidad más candente (20)

Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
 
Enterprise Kafka: Kafka as a Service
Enterprise Kafka: Kafka as a ServiceEnterprise Kafka: Kafka as a Service
Enterprise Kafka: Kafka as a Service
 
SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016
 
PaaS application in Heroku
PaaS application in HerokuPaaS application in Heroku
PaaS application in Heroku
 
Building a Streaming Data Pipeline for Trains Delays Processing
Building a Streaming Data Pipeline for Trains Delays ProcessingBuilding a Streaming Data Pipeline for Trains Delays Processing
Building a Streaming Data Pipeline for Trains Delays Processing
 
Introduction to Graph QL
Introduction to Graph QLIntroduction to Graph QL
Introduction to Graph QL
 
Graph ql and enterprise
Graph ql and enterpriseGraph ql and enterprise
Graph ql and enterprise
 
Clearing Airflow Obstructions
Clearing Airflow ObstructionsClearing Airflow Obstructions
Clearing Airflow Obstructions
 
Meteor MIT Tech Talk 9/18/14: Designing a New Platform For Modern Apps
Meteor MIT Tech Talk 9/18/14: Designing a New Platform For Modern AppsMeteor MIT Tech Talk 9/18/14: Designing a New Platform For Modern Apps
Meteor MIT Tech Talk 9/18/14: Designing a New Platform For Modern Apps
 
Building Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and KubeflowBuilding Notebook-based AI Pipelines with Elyra and Kubeflow
Building Notebook-based AI Pipelines with Elyra and Kubeflow
 
GraphQL Munich Meetup #1 - How We Use GraphQL At Commercetools
GraphQL Munich Meetup #1 - How We Use GraphQL At CommercetoolsGraphQL Munich Meetup #1 - How We Use GraphQL At Commercetools
GraphQL Munich Meetup #1 - How We Use GraphQL At Commercetools
 
Master thesis
Master thesisMaster thesis
Master thesis
 
GraphQL across the stack: How everything fits together
GraphQL across the stack: How everything fits togetherGraphQL across the stack: How everything fits together
GraphQL across the stack: How everything fits together
 
GraphQL Europe Recap
GraphQL Europe RecapGraphQL Europe Recap
GraphQL Europe Recap
 
Designing ACM solutions AMIS25
Designing  ACM solutions   AMIS25Designing  ACM solutions   AMIS25
Designing ACM solutions AMIS25
 
Scribe Online CDK & Connector Development
Scribe Online CDK & Connector DevelopmentScribe Online CDK & Connector Development
Scribe Online CDK & Connector Development
 
Adding GraphQL to your existing architecture
Adding GraphQL to your existing architectureAdding GraphQL to your existing architecture
Adding GraphQL to your existing architecture
 
GraphQL over REST at Reactathon 2018
GraphQL over REST at Reactathon 2018GraphQL over REST at Reactathon 2018
GraphQL over REST at Reactathon 2018
 
Performance Monitoring with Icinga2, Graphite und Grafana
Performance Monitoring with Icinga2, Graphite und GrafanaPerformance Monitoring with Icinga2, Graphite und Grafana
Performance Monitoring with Icinga2, Graphite und Grafana
 
Why UI Developers Love GraphQL - Sashko Stubailo, Apollo/Meteor
Why UI Developers Love GraphQL - Sashko Stubailo, Apollo/MeteorWhy UI Developers Love GraphQL - Sashko Stubailo, Apollo/Meteor
Why UI Developers Love GraphQL - Sashko Stubailo, Apollo/Meteor
 

Destacado

Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanBuilding a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
DataWorks Summit
 
24 CIPR IoD Public Relations Director of the Year
24 CIPR IoD Public Relations Director of the Year24 CIPR IoD Public Relations Director of the Year
24 CIPR IoD Public Relations Director of the Year
Bridget Aherne
 

Destacado (20)

Interactive workflow management using Azkaban
Interactive workflow management using AzkabanInteractive workflow management using Azkaban
Interactive workflow management using Azkaban
 
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with AzkabanBuilding a Self-Service Hadoop Platform at Linkedin with Azkaban
Building a Self-Service Hadoop Platform at Linkedin with Azkaban
 
Azkaban
AzkabanAzkaban
Azkaban
 
Real time ETL processing using Spark streaming
Real time ETL processing using Spark streamingReal time ETL processing using Spark streaming
Real time ETL processing using Spark streaming
 
Hadoop presentation
Hadoop presentationHadoop presentation
Hadoop presentation
 
Hadoop ecosystem framework n hadoop in live environment
Hadoop ecosystem framework  n hadoop in live environmentHadoop ecosystem framework  n hadoop in live environment
Hadoop ecosystem framework n hadoop in live environment
 
Frank Prezzy
Frank PrezzyFrank Prezzy
Frank Prezzy
 
Święta Teresa z Avila
Święta Teresa z AvilaŚwięta Teresa z Avila
Święta Teresa z Avila
 
Jak upiec Katedrę gorzowską
Jak upiec Katedrę gorzowskąJak upiec Katedrę gorzowską
Jak upiec Katedrę gorzowską
 
24 CIPR IoD Public Relations Director of the Year
24 CIPR IoD Public Relations Director of the Year24 CIPR IoD Public Relations Director of the Year
24 CIPR IoD Public Relations Director of the Year
 
Frank111
Frank111Frank111
Frank111
 
TOPIC 18 : ACIDS AND BASES
TOPIC 18 : ACIDS AND BASES TOPIC 18 : ACIDS AND BASES
TOPIC 18 : ACIDS AND BASES
 
Compare and Contrast
Compare and ContrastCompare and Contrast
Compare and Contrast
 
Bilingualism pp
Bilingualism ppBilingualism pp
Bilingualism pp
 
Święto niepodległości
Święto niepodległościŚwięto niepodległości
Święto niepodległości
 
Compare & Contrast
Compare & ContrastCompare & Contrast
Compare & Contrast
 
THEORY of KNOWLEDGE
THEORY  of KNOWLEDGE THEORY  of KNOWLEDGE
THEORY of KNOWLEDGE
 
Pieśń o wodzu miłym
Pieśń o wodzu miłymPieśń o wodzu miłym
Pieśń o wodzu miłym
 
0580 s11 qp_22
0580 s11 qp_220580 s11 qp_22
0580 s11 qp_22
 
Young Marketers Elite 2 - Elite 5 Challenge - Tu Oanh
Young Marketers Elite 2 - Elite 5 Challenge - Tu OanhYoung Marketers Elite 2 - Elite 5 Challenge - Tu Oanh
Young Marketers Elite 2 - Elite 5 Challenge - Tu Oanh
 

Similar a Azkaban - WorkFlow Scheduler/Automation Engine

Con8208 achieve a quicker and compliant financial close
Con8208 achieve a quicker and compliant financial closeCon8208 achieve a quicker and compliant financial close
Con8208 achieve a quicker and compliant financial close
Oracle
 
SAP_S4HANA_Compliance_and_Security_Webinar.pdf
SAP_S4HANA_Compliance_and_Security_Webinar.pdfSAP_S4HANA_Compliance_and_Security_Webinar.pdf
SAP_S4HANA_Compliance_and_Security_Webinar.pdf
anandkumar558548
 
Transforming Partner Consulting Business to Capture Profit in the Cloud
Transforming  Partner Consulting Business to Capture Profit in the CloudTransforming  Partner Consulting Business to Capture Profit in the Cloud
Transforming Partner Consulting Business to Capture Profit in the Cloud
Sarkis Kerkezian, PMP
 
Workplace-of-the-Future
Workplace-of-the-FutureWorkplace-of-the-Future
Workplace-of-the-Future
Oscar Kessen
 

Similar a Azkaban - WorkFlow Scheduler/Automation Engine (20)

Con8208 achieve a quicker and compliant financial close
Con8208 achieve a quicker and compliant financial closeCon8208 achieve a quicker and compliant financial close
Con8208 achieve a quicker and compliant financial close
 
KPMG SAP Licensing
KPMG SAP LicensingKPMG SAP Licensing
KPMG SAP Licensing
 
A Finance Leader’s Guide to Business Continuity
A Finance Leader’s Guide to Business ContinuityA Finance Leader’s Guide to Business Continuity
A Finance Leader’s Guide to Business Continuity
 
Reducing Cost and Complexity Managing Mission-Critical SAP® and Non-SAP Jobs
Reducing Cost and Complexity Managing Mission-Critical SAP® and Non-SAP JobsReducing Cost and Complexity Managing Mission-Critical SAP® and Non-SAP Jobs
Reducing Cost and Complexity Managing Mission-Critical SAP® and Non-SAP Jobs
 
L1_RISE_with_SAP_NNN_V3.4.pptx
L1_RISE_with_SAP_NNN_V3.4.pptxL1_RISE_with_SAP_NNN_V3.4.pptx
L1_RISE_with_SAP_NNN_V3.4.pptx
 
Collaborative Demand Planning: A Requirement for Successful Integrated Busine...
Collaborative Demand Planning: A Requirement for Successful Integrated Busine...Collaborative Demand Planning: A Requirement for Successful Integrated Busine...
Collaborative Demand Planning: A Requirement for Successful Integrated Busine...
 
Revenue management kpmg_oracle
Revenue management kpmg_oracleRevenue management kpmg_oracle
Revenue management kpmg_oracle
 
Cloud Platform Enterprise Agreement (CPEA) in Detail
Cloud Platform Enterprise Agreement (CPEA) in DetailCloud Platform Enterprise Agreement (CPEA) in Detail
Cloud Platform Enterprise Agreement (CPEA) in Detail
 
Automating Your Transactions on the Ariba Network
Automating Your Transactions on the Ariba NetworkAutomating Your Transactions on the Ariba Network
Automating Your Transactions on the Ariba Network
 
SAP_S4HANA_Compliance_and_Security_Webinar.pdf
SAP_S4HANA_Compliance_and_Security_Webinar.pdfSAP_S4HANA_Compliance_and_Security_Webinar.pdf
SAP_S4HANA_Compliance_and_Security_Webinar.pdf
 
AVT_Offerings&Credentials
AVT_Offerings&CredentialsAVT_Offerings&Credentials
AVT_Offerings&Credentials
 
Oracle Primavera Roadmap 2015
Oracle Primavera Roadmap 2015Oracle Primavera Roadmap 2015
Oracle Primavera Roadmap 2015
 
SAS 70 in a Post-Sarbanes, SaaS World: Quest Session 52070
SAS 70 in a Post-Sarbanes, SaaS World: Quest Session 52070SAS 70 in a Post-Sarbanes, SaaS World: Quest Session 52070
SAS 70 in a Post-Sarbanes, SaaS World: Quest Session 52070
 
7. Andy Campbell - Make the Most of the Cloud
7. Andy Campbell -  Make the Most of the Cloud7. Andy Campbell -  Make the Most of the Cloud
7. Andy Campbell - Make the Most of the Cloud
 
SAP BRIM - New Innovations Q2 2014
SAP BRIM -  New Innovations Q2 2014SAP BRIM -  New Innovations Q2 2014
SAP BRIM - New Innovations Q2 2014
 
Business Data Lake Best Practices
Business Data Lake Best PracticesBusiness Data Lake Best Practices
Business Data Lake Best Practices
 
Sap investor symposioum
Sap investor symposioumSap investor symposioum
Sap investor symposioum
 
Transforming Partner Consulting Business to Capture Profit in the Cloud
Transforming  Partner Consulting Business to Capture Profit in the CloudTransforming  Partner Consulting Business to Capture Profit in the Cloud
Transforming Partner Consulting Business to Capture Profit in the Cloud
 
Workplace-of-the-Future
Workplace-of-the-FutureWorkplace-of-the-Future
Workplace-of-the-Future
 
Powering Business Transformation with Oracle Exadata: a Capgemini Case Study
Powering Business Transformation with Oracle Exadata: a Capgemini Case StudyPowering Business Transformation with Oracle Exadata: a Capgemini Case Study
Powering Business Transformation with Oracle Exadata: a Capgemini Case Study
 

Último

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
gajnagarg
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 

Último (20)

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls Palakkad Escorts ☎️9352988975 Two shot with one girl...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men  🔝Sambalpur🔝   Esc...
➥🔝 7737669865 🔝▻ Sambalpur Call-girls in Women Seeking Men 🔝Sambalpur🔝 Esc...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 

Azkaban - WorkFlow Scheduler/Automation Engine

  • 1. Hadoop WorkFlow Scheduler / Automation Engine Azkaban & Oozie Praveen Thirukonda Senior Associate Data & Analytics Orange County, CA 09/11/2014
  • 2. © 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 1 What is a workflow? - A workflow is a Directed Acyclic Graph (DAG) of “jobs” where each job has one or more inputs and outputs. - A workflow scheduler helps us manage the co ordination among the various jobs.
  • 3. © 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 2 When do we need a workflow scheduler? - In a Data Pipeline, Batch jobs need to be scheduled to run periodically. - They also typically have intricate dependency chains—for example, dependencies on various data extraction processes or previous steps. - Larger processes might have 50 or 60 steps, of which some might run in parallel and others must wait for the output of earlier steps.
  • 5. © 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 4 What is Azkaban? - “cron on steroids” - A workflow scheduler can be seen as a combination of the cron and make Unix utilities combined with a friendly UI.
  • 6. © 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 5 What is Azkaban? - Azkaban was implemented at LinkedIn to solve the problem of Hadoop job dependencies. - Azkaban resolves the ordering through job dependencies and provides an easy to use web user interface to maintain and track your workflows.
  • 7. © 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 6 An Image is worth a 1000 words..
  • 9. © 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 8 What is Apache Oozie? - Similar to Azkaban. - Whereas Azkaban uses a series of Properties files, Oozie uses an XML file. - Oozie supports Java API, command line methods for workflow submission in addition to Browser interface/REST API. - Oozie is part of our Hortonworks environment in our cluster.
  • 10. © 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 9 Advantages of using a workflow scheduler - Let’s you easily manage dependencies within the various tasks. - Scheduling of workflows - Monitor the progress of your workflow with nice interface. - Email alerts on failure and successes - Retrying of failed jobs.
  • 11. © 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. INTERNAL USE ONLY. Not for distribution to clients unless the technical and policy review requirements of Tax Services Manual section 23.7 are satisfied. 10 Application of a workflow scheduler - Real Life example of how and where you might use a workflow scheduler in your Big Data System architecture?
  • 12. Thank you Presentation by Praveen Thirukonda
  • 13. © 2014 KPMG LLP, a Delaware limited liability partnership and the U.S. member firm of the KPMG network of independent member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved. The KPMG name, logo and “cutting through complexity” are registered trademarks or trademarks of KPMG International.

Notas del editor

  1. Got raw data from car, cleaned and preprocessed data on EC2 machines, based on amount of data spun up EMR instances, copied data to it, ran MR jobs, ran Hive scripts (which were dynamically created), then used sqoop to copy over final processed output to Postgresql db, then shut down the emr instances and did cleanup operations.