SlideShare una empresa de Scribd logo
1 de 21
Apache Falcon
DevOps
Sanjeev Tripurari
Tech Lead Operations@inmobi
Falcon is a feed processing and feed management system aimed at making it easier for
end consumers to onboard their feed processing and feed management on hadoop
clusters.
http://falcon.apache.org
What’s on GRID
/user/sanjeev
/user/mohit
/user/Iliyas
/projects/meetup
/projects/support
/data/stream/click
/data/stream/beacon
Basic Components
Falcon
• Prism
• Server
• Client
ActiveMQ
Oozie
Hadoop
What’s in for DevOps
Cluster
NameNode, JT, Oozie, ActiveMQ, Colo
Feed
Data, DataPath, Lifetime, Retention, Owner,Replication
Process
Job, Queue, Priority, Parallelism, Input, Output, Workflow
Basic Enviroment Setup
UK US
Prism
Server
Oozie ActiveMQ
HDFS - MR1/2
Server
Oozie ActiveMQ
HDFS - MR1/2
Logical Setup
UK US
uk-clusterAlpha
uk-clusterBeta
prism
us-clusterGamma
Falcon Entity Operation
Command:
falcon entity -submit -type [cluster/feed/process] -file cluster-definition.xml
falcon entity -list -type [cluster/feed/process]
Cluster
• Submit
• Delete
falcon entity -list -type [feed/process] -name [processname/feedname] -[OPTIONS]
Feed/Process OPTIONS
• schedule
• Status
• list
• touch
• depedency
• definition
• update
• delete
Falcon Cluster
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cluster name="uk-clusterAlpha" description="" colo="uk" xmlns="uri:falcon:cluster:0.1">
<interfaces>
<interface type="readonly" endpoint=“hftp://nn.cluster.my.com:50070” version="0.20.2-cdh3u3"/>
<interface type="write" endpoint=“hdfs://nn..cluster.my.com:8020” version="0.20.2-cdh3u3"/>
<interface type="execute" endpoint=“jt.cluster.my.com:8021" version="0.20.2-cdh3u3"/>
<interface type="workflow" endpoint=“http://oozie..cluster.my.com:11000/oozie/" version="3.1.6"/>
<interface type="messaging" endpoint=“tcp://amq..cluster.my.com:61616?daemon=true” version="5.4.3"/>
</interfaces>
<locations>
<location name="staging" path="/store/falcon/staging"/>
<location name="temp" path="/tmp"/>
<location name="working" path="/store/falcon/working"/>
</locations>
<properties>
<property name="colo.name" value="uk"/>
</properties>
</cluster>
Falcon Feed
<feed description="input feed" name="uk-inputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<groups>input</groups>
<frequency>hours(1)</frequency>
<late-arrival cut-off="hours(6)" />
<clusters>
<cluster name="uk-clusterAlpha" type="source">
<validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/>
<retention limit="hours(24)" action="delete" />
</cluster>
</clusters>
<locations>
<location type="data" path="/user/sanjeev/falcon/input/${YEAR}/${MONTH}/${DAY}/${HOUR}" />
</locations>
<ACL owner="sanjeev" group="users" permission="0x755" />
<schema location="/none" provider="none" />
</feed>
Falcon Feed
<feed description="input feed" name="uk-outputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<groups>output</groups>
<frequency>hours(1)</frequency>
<late-arrival cut-off="hours(6)" />
<clusters>
<cluster name="uk-clusterAlpha" type="source">
<validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/>
<retention limit="hours(24)" action="delete" />
</cluster>
</clusters>
<locations>
<location type="data" path="/user/sanjeev/falcon/output/${YEAR}/${MONTH}/${DAY}/${HOUR}" />
</locations>
<ACL owner="sanjeev" group="users" permission="0x755" />
<schema location="/none" provider="none" />
</feed>
Falcon Process
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<process name="falcon-sanjeev-process" xmlns="uri:falcon:process:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<clusters>
<cluster name="uk-clusterAlpha">
<validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/>
</cluster>
</clusters>
<parallel>1</parallel>
<frequency>hours(1)</frequency>
<timezone>UTC</timezone>
<inputs>
<input end="today(18,0)" start="today(18,0)" feed="uk-inputfeed" name="input" />
</inputs>
<outputs>
<output instance="now(0,0)" feed="uk-outputfeed" name="output" />
</outputs>
<properties>
<property name="fileTime" value="${formatTime(dateOffset(instanceTime(), 1, 'DAY'), 'yyyy-MMM-dd')}"/>
<property name="user" value="${user()}"/>
<property name="baseTime" value="${today(0,0)}"/>
</properties>
<workflow engine="oozie" path="/user/sanjeev/falcon/workflow" />
<retry policy="periodic" delay="minutes(10)" attempts="3" />
</process>
Oozie Workflow
<workflow-app xmlns="uri:oozie:workflow:0.3" name="fs-workflow">
<start to="fs-cmds"/>
<action name="fs-cmds">
<fs>
<mkdir path='${output}'/>
</fs>
<ok to="end"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name="end"/>
</workflow-app>
What’s on HDFS
Input Feed: /user/sanjeev/falcon/input/2015/02/20/00
Input Feed: /user/sanjeevt/falcon/input/2015/02/20/18
Output Feed: /user/sanjeevt/falcon/output/
Workflow: /user/sanjeevt/falcon/workflow/workflow.xml
falcon entity -type cluster -submit -file uk-clusterAlpha.xml
falcon entity -type feed -submit -file uk-inputfeed.xml
falcon entity -type feed -submit -file uk-outputfeed.xml
falcon entity -type process -submitAndSchedule -file falcon-sanjeev-process.xml
Typical Production Process and Workflow
(process) process-click-convert
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<process name="process-click-convert" xmlns="uri:falcon:process:0.1">
<clusters>
<cluster name="uk-clusterAlpha">
<validity start="2015-01-15T00:00Z" end="2100-01-01T00:00Z"/>
</cluster>
<cluster name="us-clusterGamma">
<validity start="2015-01-15T00:30Z" end="2100-01-01T00:00Z"/>
</cluster>
</clusters>
<parallel>2</parallel>
<order>FIFO</order>
<frequency>minutes(30)</frequency>
<timezone>UTC</timezone>
<inputs>
<input name="Input" feed="feed-click-stream" start="now(0,-30)" end="now(0,-1)"/>
</inputs>
<outputs>
<output name="Output" feed="feed-click-convert" instance="now(0,-30)"/>
</outputs>
<properties>
<property name="queueName" value="stream"/>
<property name="jobPriority" value="NORMAL"/>
</properties>
<workflow path="/projects/support/click/conversion" lib="/projects/support/lib"/>
</process>
(workflow) /projects/support/click/conversion/workflow.xml
<workflow-app xmlns='uri:oozie:workflow:0.3' name='click-conversion'>
<start to='click-convert' />
<action name='click-convert'>
<java>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${Output}"/>
<delete path="${wf:conf('Output.stats')}"/>
<delete path="${wf:conf('Output.tmp')}"/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>${queueName}</value>
</property>
<property>
<name>mapred.job.priority</name>
<value>${jobPriority}</value>
</property>
</configuration>
nn.cluster.my.com
<main-class>com.my.cluster.io.Driver</main-class>
<arg>-inputpath</arg><arg>${Input}</arg>
<arg>-outputpath</arg><arg>${Output}</arg>
<arg>-statspath</arg><arg>${wf:conf("Output.stats")}</arg>
<arg>-stagingpath</arg><arg>${wf:conf("Output.tmp")}</arg>
</java>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>Workflow failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
<end name='end' />
</workflow-app>
Falcon Instance
Operation
Command:
falcon instance -type [feed/process] -[status/list]
falcon entity -list -type [feed/process] -name [processname/feedname] {-start
YYYY-MM-DDTHH:MMZ -end YYYY-MM-DDTHH:MMZ } [OPTIONS]
Feed/Process OPTIONS
• status
• list
• logs
• kill
• rerun
• suspend
• resume
Monitoring
• Falcon CLI
• Oozie CLI
• ActiveMQ
• falcon entity type -type process -name falcon-sanjeev-process -dependency
(cluster) uk-clusterAlpha
(feed) uk-inputfeed - [Input]
(feed) uk-outputfeed - [Output]
• falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T18:00Z - end
2015-02-23T00:00Z -status
Consolidated Status: SUCCEEDED
Instances:
Instance Cluster SourceCluster Status Start End Details Log
-----------------------------------------------------------------------------------------------
2015-02-20T18:00Z uk-clusterAlpha - SUCCEEDED 2015-02-20T18:00Z 2015-02-20T18:01Z - http://oozie..cluster.my.com:11000/oozie/?job=0229074-150205100814135-
oozie-oozi-W
MonitoringDashboard
• https://github.com/ajayyadav/falcon-dashboard
OnBoarding Pipeline
• Group All Process
• Minutely, Hourly, Daily, Weekly, Monthly
• Group Related Feeds
• Verify All process jars, workflows pushed to cluster
• Verify ownerships of all feed and process directories
• Verify owners have job scheduling access roles in particular cluster
• Validate the feeds
• Submit and schedule the feeds, so retention and replication is in place
• Dryrun the process schedule
• Submit and schedule the process
• Document the FEED SLA, HDFS Usage, retention period for
monitoring
• Document the PROCESS SLA, to observe delays
Challenges
• Tightly Integrated with Oozie
• Monitoring, onboarding needs streamlined
• Realtime change in Schedule Time, Queues
Advantages
• Development is very aggressive
• Industry is adopted quickly
• Once onboarded, focus only needs to be on set of critical process
• Easy shutdown and upgrade, as all the running jobs are managed by oozie
• DevOps can do easy setup and manage data
Thank You

Más contenido relacionado

La actualidad más candente

Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016alanfgates
 
Oracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTSOracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTSChristian Gohmann
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaJason Shih
 
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...Michael Noel
 
Making MySQL highly available using Oracle Grid Infrastructure
Making MySQL highly available using Oracle Grid InfrastructureMaking MySQL highly available using Oracle Grid Infrastructure
Making MySQL highly available using Oracle Grid InfrastructureIlmar Kerm
 
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)Cedric CARBONE
 
Sql Server 2008 New Programmability Features
Sql Server 2008 New Programmability FeaturesSql Server 2008 New Programmability Features
Sql Server 2008 New Programmability Featuressqlserver.co.il
 
Changes in WebLogic 12.1.3 Every Administrator Must Know
Changes in WebLogic 12.1.3 Every Administrator Must KnowChanges in WebLogic 12.1.3 Every Administrator Must Know
Changes in WebLogic 12.1.3 Every Administrator Must KnowBruno Borges
 
Why Upgrade to Oracle Database 12c?
Why Upgrade to Oracle Database 12c?Why Upgrade to Oracle Database 12c?
Why Upgrade to Oracle Database 12c?DLT Solutions
 
ORDS - Oracle REST Data Services
ORDS - Oracle REST Data ServicesORDS - Oracle REST Data Services
ORDS - Oracle REST Data ServicesJustin Michael Raj
 
Editioning use in ebs
Editioning use in  ebsEditioning use in  ebs
Editioning use in ebspasalapudi123
 
SQL Server Alwayson for SharePoint HA/DR Step by Step Guide
SQL Server Alwayson for SharePoint HA/DR Step by Step GuideSQL Server Alwayson for SharePoint HA/DR Step by Step Guide
SQL Server Alwayson for SharePoint HA/DR Step by Step GuideLars Platzdasch
 
A Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12cA Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12cLeighton Nelson
 
PDB Provisioning with Oracle Multitenant Self Service Application
PDB Provisioning with Oracle Multitenant Self Service ApplicationPDB Provisioning with Oracle Multitenant Self Service Application
PDB Provisioning with Oracle Multitenant Self Service ApplicationLeighton Nelson
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieShareThis
 
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expr...
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata  Expr...Oracle Database 12c Release 2 - New Features On Oracle Database Exadata  Expr...
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expr...Alex Zaballa
 
Cloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and AnalysisCloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and AnalysisYue Chen
 
Expose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug MadridExpose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug MadridVinay Kumar
 
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...Michael Noel
 
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...Lars Platzdasch
 

La actualidad más candente (20)

Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
 
Oracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTSOracle 21c: New Features and Enhancements of Data Pump & TTS
Oracle 21c: New Features and Enhancements of Data Pump & TTS
 
Real-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using ImpalaReal-time Big Data Analytics Engine using Impala
Real-time Big Data Analytics Engine using Impala
 
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
SQL 2012 AlwaysOn Availability Groups for SharePoint 2013 - SharePoint Connec...
 
Making MySQL highly available using Oracle Grid Infrastructure
Making MySQL highly available using Oracle Grid InfrastructureMaking MySQL highly available using Oracle Grid Infrastructure
Making MySQL highly available using Oracle Grid Infrastructure
 
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
Apache Falcon : 22 Sept 2014 for Hadoop User Group France (@Criteo)
 
Sql Server 2008 New Programmability Features
Sql Server 2008 New Programmability FeaturesSql Server 2008 New Programmability Features
Sql Server 2008 New Programmability Features
 
Changes in WebLogic 12.1.3 Every Administrator Must Know
Changes in WebLogic 12.1.3 Every Administrator Must KnowChanges in WebLogic 12.1.3 Every Administrator Must Know
Changes in WebLogic 12.1.3 Every Administrator Must Know
 
Why Upgrade to Oracle Database 12c?
Why Upgrade to Oracle Database 12c?Why Upgrade to Oracle Database 12c?
Why Upgrade to Oracle Database 12c?
 
ORDS - Oracle REST Data Services
ORDS - Oracle REST Data ServicesORDS - Oracle REST Data Services
ORDS - Oracle REST Data Services
 
Editioning use in ebs
Editioning use in  ebsEditioning use in  ebs
Editioning use in ebs
 
SQL Server Alwayson for SharePoint HA/DR Step by Step Guide
SQL Server Alwayson for SharePoint HA/DR Step by Step GuideSQL Server Alwayson for SharePoint HA/DR Step by Step Guide
SQL Server Alwayson for SharePoint HA/DR Step by Step Guide
 
A Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12cA Second Look at Oracle RAC 12c
A Second Look at Oracle RAC 12c
 
PDB Provisioning with Oracle Multitenant Self Service Application
PDB Provisioning with Oracle Multitenant Self Service ApplicationPDB Provisioning with Oracle Multitenant Self Service Application
PDB Provisioning with Oracle Multitenant Self Service Application
 
Data Pipeline Management Framework on Oozie
Data Pipeline Management Framework on OozieData Pipeline Management Framework on Oozie
Data Pipeline Management Framework on Oozie
 
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expr...
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata  Expr...Oracle Database 12c Release 2 - New Features On Oracle Database Exadata  Expr...
Oracle Database 12c Release 2 - New Features On Oracle Database Exadata Expr...
 
Cloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and AnalysisCloudera Impala Source Code Explanation and Analysis
Cloudera Impala Source Code Explanation and Analysis
 
Expose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug MadridExpose your data as an api is with oracle rest data services -spoug Madrid
Expose your data as an api is with oracle rest data services -spoug Madrid
 
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
SPSSac2014 - SharePoint Infrastructure Tips and Tricks for On-Premises and Hy...
 
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
 

Similar a Apache Falcon - Sanjeev Tripurari

UCS Management APIs A Technical Deep Dive
UCS Management APIs A Technical Deep DiveUCS Management APIs A Technical Deep Dive
UCS Management APIs A Technical Deep DiveCisco DevNet
 
Introduction to es bs mule
Introduction to es bs   muleIntroduction to es bs   mule
Introduction to es bs muleAchyuta Lakshmi
 
Breaking SAP portal (HackerHalted)
Breaking SAP portal (HackerHalted)Breaking SAP portal (HackerHalted)
Breaking SAP portal (HackerHalted)ERPScan
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
The Four Principles of Atlassian Performance Tuning
The Four Principles of Atlassian Performance TuningThe Four Principles of Atlassian Performance Tuning
The Four Principles of Atlassian Performance TuningAtlassian
 
SecZone 2011: Scrubbing SAP clean with SOAP
SecZone 2011: Scrubbing SAP clean with SOAPSecZone 2011: Scrubbing SAP clean with SOAP
SecZone 2011: Scrubbing SAP clean with SOAPChris John Riley
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기NAVER D2
 
Breaking SAP portal (DeepSec)
Breaking SAP portal (DeepSec)Breaking SAP portal (DeepSec)
Breaking SAP portal (DeepSec)ERPScan
 
High-Performance Hibernate - JDK.io 2018
High-Performance Hibernate - JDK.io 2018High-Performance Hibernate - JDK.io 2018
High-Performance Hibernate - JDK.io 2018Vlad Mihalcea
 
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMixEasy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMixelliando dias
 
Breaking SAP portal (HashDays)
Breaking SAP portal (HashDays)Breaking SAP portal (HashDays)
Breaking SAP portal (HashDays)ERPScan
 
SAP (in)security: Scrubbing SAP clean with SOAP
SAP (in)security: Scrubbing SAP clean with SOAPSAP (in)security: Scrubbing SAP clean with SOAP
SAP (in)security: Scrubbing SAP clean with SOAPChris John Riley
 
Apache Falcon _ Hadoop User Group France 22-sept-2014
Apache Falcon _ Hadoop User Group France 22-sept-2014Apache Falcon _ Hadoop User Group France 22-sept-2014
Apache Falcon _ Hadoop User Group France 22-sept-2014Modern Data Stack France
 
Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Seetharam Venkatesh
 
vJUG - The JavaFX Ecosystem
vJUG - The JavaFX EcosystemvJUG - The JavaFX Ecosystem
vJUG - The JavaFX EcosystemAndres Almiray
 

Similar a Apache Falcon - Sanjeev Tripurari (20)

UCS Management APIs A Technical Deep Dive
UCS Management APIs A Technical Deep DiveUCS Management APIs A Technical Deep Dive
UCS Management APIs A Technical Deep Dive
 
Immutant
ImmutantImmutant
Immutant
 
Introduction to es bs mule
Introduction to es bs   muleIntroduction to es bs   mule
Introduction to es bs mule
 
Breaking SAP portal (HackerHalted)
Breaking SAP portal (HackerHalted)Breaking SAP portal (HackerHalted)
Breaking SAP portal (HackerHalted)
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
2014-10-30 Taverna 3 status
2014-10-30 Taverna 3 status2014-10-30 Taverna 3 status
2014-10-30 Taverna 3 status
 
The Four Principles of Atlassian Performance Tuning
The Four Principles of Atlassian Performance TuningThe Four Principles of Atlassian Performance Tuning
The Four Principles of Atlassian Performance Tuning
 
SecZone 2011: Scrubbing SAP clean with SOAP
SecZone 2011: Scrubbing SAP clean with SOAPSecZone 2011: Scrubbing SAP clean with SOAP
SecZone 2011: Scrubbing SAP clean with SOAP
 
Performance tests with Gatling
Performance tests with GatlingPerformance tests with Gatling
Performance tests with Gatling
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
Breaking SAP portal (DeepSec)
Breaking SAP portal (DeepSec)Breaking SAP portal (DeepSec)
Breaking SAP portal (DeepSec)
 
Load testing with Blitz
Load testing with BlitzLoad testing with Blitz
Load testing with Blitz
 
Hadoop Oozie
Hadoop OozieHadoop Oozie
Hadoop Oozie
 
High-Performance Hibernate - JDK.io 2018
High-Performance Hibernate - JDK.io 2018High-Performance Hibernate - JDK.io 2018
High-Performance Hibernate - JDK.io 2018
 
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMixEasy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
Easy Enterprise Integration Patterns with Apache Camel, ActiveMQ and ServiceMix
 
Breaking SAP portal (HashDays)
Breaking SAP portal (HashDays)Breaking SAP portal (HashDays)
Breaking SAP portal (HashDays)
 
SAP (in)security: Scrubbing SAP clean with SOAP
SAP (in)security: Scrubbing SAP clean with SOAPSAP (in)security: Scrubbing SAP clean with SOAP
SAP (in)security: Scrubbing SAP clean with SOAP
 
Apache Falcon _ Hadoop User Group France 22-sept-2014
Apache Falcon _ Hadoop User Group France 22-sept-2014Apache Falcon _ Hadoop User Group France 22-sept-2014
Apache Falcon _ Hadoop User Group France 22-sept-2014
 
Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014Apache Falcon at Hadoop Summit Europe 2014
Apache Falcon at Hadoop Summit Europe 2014
 
vJUG - The JavaFX Ecosystem
vJUG - The JavaFX EcosystemvJUG - The JavaFX Ecosystem
vJUG - The JavaFX Ecosystem
 

Último

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Último (20)

How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Apache Falcon - Sanjeev Tripurari

  • 2. Falcon is a feed processing and feed management system aimed at making it easier for end consumers to onboard their feed processing and feed management on hadoop clusters. http://falcon.apache.org
  • 4. Basic Components Falcon • Prism • Server • Client ActiveMQ Oozie Hadoop
  • 5. What’s in for DevOps Cluster NameNode, JT, Oozie, ActiveMQ, Colo Feed Data, DataPath, Lifetime, Retention, Owner,Replication Process Job, Queue, Priority, Parallelism, Input, Output, Workflow
  • 6. Basic Enviroment Setup UK US Prism Server Oozie ActiveMQ HDFS - MR1/2 Server Oozie ActiveMQ HDFS - MR1/2
  • 8. Falcon Entity Operation Command: falcon entity -submit -type [cluster/feed/process] -file cluster-definition.xml falcon entity -list -type [cluster/feed/process] Cluster • Submit • Delete falcon entity -list -type [feed/process] -name [processname/feedname] -[OPTIONS] Feed/Process OPTIONS • schedule • Status • list • touch • depedency • definition • update • delete
  • 9. Falcon Cluster <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <cluster name="uk-clusterAlpha" description="" colo="uk" xmlns="uri:falcon:cluster:0.1"> <interfaces> <interface type="readonly" endpoint=“hftp://nn.cluster.my.com:50070” version="0.20.2-cdh3u3"/> <interface type="write" endpoint=“hdfs://nn..cluster.my.com:8020” version="0.20.2-cdh3u3"/> <interface type="execute" endpoint=“jt.cluster.my.com:8021" version="0.20.2-cdh3u3"/> <interface type="workflow" endpoint=“http://oozie..cluster.my.com:11000/oozie/" version="3.1.6"/> <interface type="messaging" endpoint=“tcp://amq..cluster.my.com:61616?daemon=true” version="5.4.3"/> </interfaces> <locations> <location name="staging" path="/store/falcon/staging"/> <location name="temp" path="/tmp"/> <location name="working" path="/store/falcon/working"/> </locations> <properties> <property name="colo.name" value="uk"/> </properties> </cluster>
  • 10. Falcon Feed <feed description="input feed" name="uk-inputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <groups>input</groups> <frequency>hours(1)</frequency> <late-arrival cut-off="hours(6)" /> <clusters> <cluster name="uk-clusterAlpha" type="source"> <validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/> <retention limit="hours(24)" action="delete" /> </cluster> </clusters> <locations> <location type="data" path="/user/sanjeev/falcon/input/${YEAR}/${MONTH}/${DAY}/${HOUR}" /> </locations> <ACL owner="sanjeev" group="users" permission="0x755" /> <schema location="/none" provider="none" /> </feed>
  • 11. Falcon Feed <feed description="input feed" name="uk-outputfeed" xmlns="uri:falcon:feed:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <groups>output</groups> <frequency>hours(1)</frequency> <late-arrival cut-off="hours(6)" /> <clusters> <cluster name="uk-clusterAlpha" type="source"> <validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/> <retention limit="hours(24)" action="delete" /> </cluster> </clusters> <locations> <location type="data" path="/user/sanjeev/falcon/output/${YEAR}/${MONTH}/${DAY}/${HOUR}" /> </locations> <ACL owner="sanjeev" group="users" permission="0x755" /> <schema location="/none" provider="none" /> </feed>
  • 12. Falcon Process <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <process name="falcon-sanjeev-process" xmlns="uri:falcon:process:0.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <clusters> <cluster name="uk-clusterAlpha"> <validity start="2015-02-20T18:00Z" end="2015-02-23T00:00Z"/> </cluster> </clusters> <parallel>1</parallel> <frequency>hours(1)</frequency> <timezone>UTC</timezone> <inputs> <input end="today(18,0)" start="today(18,0)" feed="uk-inputfeed" name="input" /> </inputs> <outputs> <output instance="now(0,0)" feed="uk-outputfeed" name="output" /> </outputs> <properties> <property name="fileTime" value="${formatTime(dateOffset(instanceTime(), 1, 'DAY'), 'yyyy-MMM-dd')}"/> <property name="user" value="${user()}"/> <property name="baseTime" value="${today(0,0)}"/> </properties> <workflow engine="oozie" path="/user/sanjeev/falcon/workflow" /> <retry policy="periodic" delay="minutes(10)" attempts="3" /> </process>
  • 13. Oozie Workflow <workflow-app xmlns="uri:oozie:workflow:0.3" name="fs-workflow"> <start to="fs-cmds"/> <action name="fs-cmds"> <fs> <mkdir path='${output}'/> </fs> <ok to="end"/> <error to="fail"/> </action> <kill name="fail"> <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
  • 14. What’s on HDFS Input Feed: /user/sanjeev/falcon/input/2015/02/20/00 Input Feed: /user/sanjeevt/falcon/input/2015/02/20/18 Output Feed: /user/sanjeevt/falcon/output/ Workflow: /user/sanjeevt/falcon/workflow/workflow.xml falcon entity -type cluster -submit -file uk-clusterAlpha.xml falcon entity -type feed -submit -file uk-inputfeed.xml falcon entity -type feed -submit -file uk-outputfeed.xml falcon entity -type process -submitAndSchedule -file falcon-sanjeev-process.xml
  • 15. Typical Production Process and Workflow (process) process-click-convert <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <process name="process-click-convert" xmlns="uri:falcon:process:0.1"> <clusters> <cluster name="uk-clusterAlpha"> <validity start="2015-01-15T00:00Z" end="2100-01-01T00:00Z"/> </cluster> <cluster name="us-clusterGamma"> <validity start="2015-01-15T00:30Z" end="2100-01-01T00:00Z"/> </cluster> </clusters> <parallel>2</parallel> <order>FIFO</order> <frequency>minutes(30)</frequency> <timezone>UTC</timezone> <inputs> <input name="Input" feed="feed-click-stream" start="now(0,-30)" end="now(0,-1)"/> </inputs> <outputs> <output name="Output" feed="feed-click-convert" instance="now(0,-30)"/> </outputs> <properties> <property name="queueName" value="stream"/> <property name="jobPriority" value="NORMAL"/> </properties> <workflow path="/projects/support/click/conversion" lib="/projects/support/lib"/> </process> (workflow) /projects/support/click/conversion/workflow.xml <workflow-app xmlns='uri:oozie:workflow:0.3' name='click-conversion'> <start to='click-convert' /> <action name='click-convert'> <java> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <prepare> <delete path="${Output}"/> <delete path="${wf:conf('Output.stats')}"/> <delete path="${wf:conf('Output.tmp')}"/> </prepare> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>mapred.job.priority</name> <value>${jobPriority}</value> </property> </configuration> nn.cluster.my.com <main-class>com.my.cluster.io.Driver</main-class> <arg>-inputpath</arg><arg>${Input}</arg> <arg>-outputpath</arg><arg>${Output}</arg> <arg>-statspath</arg><arg>${wf:conf("Output.stats")}</arg> <arg>-stagingpath</arg><arg>${wf:conf("Output.tmp")}</arg> </java> <ok to="end" /> <error to="fail" /> </action> <kill name="fail"> <message>Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}] </message> </kill> <end name='end' /> </workflow-app>
  • 16. Falcon Instance Operation Command: falcon instance -type [feed/process] -[status/list] falcon entity -list -type [feed/process] -name [processname/feedname] {-start YYYY-MM-DDTHH:MMZ -end YYYY-MM-DDTHH:MMZ } [OPTIONS] Feed/Process OPTIONS • status • list • logs • kill • rerun • suspend • resume
  • 17. Monitoring • Falcon CLI • Oozie CLI • ActiveMQ • falcon entity type -type process -name falcon-sanjeev-process -dependency (cluster) uk-clusterAlpha (feed) uk-inputfeed - [Input] (feed) uk-outputfeed - [Output] • falcon instance type -type process -name falcon-sanjeev-process -start 2015-02-20T18:00Z - end 2015-02-23T00:00Z -status Consolidated Status: SUCCEEDED Instances: Instance Cluster SourceCluster Status Start End Details Log ----------------------------------------------------------------------------------------------- 2015-02-20T18:00Z uk-clusterAlpha - SUCCEEDED 2015-02-20T18:00Z 2015-02-20T18:01Z - http://oozie..cluster.my.com:11000/oozie/?job=0229074-150205100814135- oozie-oozi-W
  • 19. OnBoarding Pipeline • Group All Process • Minutely, Hourly, Daily, Weekly, Monthly • Group Related Feeds • Verify All process jars, workflows pushed to cluster • Verify ownerships of all feed and process directories • Verify owners have job scheduling access roles in particular cluster • Validate the feeds • Submit and schedule the feeds, so retention and replication is in place • Dryrun the process schedule • Submit and schedule the process • Document the FEED SLA, HDFS Usage, retention period for monitoring • Document the PROCESS SLA, to observe delays
  • 20. Challenges • Tightly Integrated with Oozie • Monitoring, onboarding needs streamlined • Realtime change in Schedule Time, Queues Advantages • Development is very aggressive • Industry is adopted quickly • Once onboarded, focus only needs to be on set of critical process • Easy shutdown and upgrade, as all the running jobs are managed by oozie • DevOps can do easy setup and manage data