SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
Big Data Computations
Using Elastic Data
Processing in
OpenStack Cloud
Sergey Lukjanov (Mirantis)
Alexander Ignatov (Mirantis)
Trevor McKay (Red Hat)
Agenda
• OpenStack Data Processing Overview
• EDP Architecture & Technical Concepts
• Live Demo
Agenda
• OpenStack Data Processing Overview
• EDP Architecture & Technical Concepts
• Live Demo
OpenStack Data Processing: Sahara
Mission: To provide a scalable data processing
stack and associated management interfaces.
• provision and operate Hadoop clusters
• schedule and operate Hadoop jobs
Hadoop - Big Data Platform
© http://hortonworks.com/hadoop/yarn/
Trends
http://www.google.com/trends/
Architecture overview
Data
Sources
Savanna
Python
Client
RESTAPI
Cluster
Configuration
Manager
Horizon
Keystone
Auth
Data
Access
Layer
Swift
Savanna
Pages
Hadoop
VM
Vendors
Plugins
Hadoop
VM
Hadoop
VM
Hadoop
VM
Resources
Orchestration
Manager
Job
Sources Job
Manager
Heat
Nova
Glance
Cinder
Neutron
Trove DB
Sahara status
• Official integrated OpenStack project
• Supported Hadoop distros:
• Vanilla Apache Hadoop
• Hortonworks Data Platform
• Intel Distribution
• Cloudera Distribution in blueprint
• Included into OpenStack distros:
• RDO - openstack.redhat.com
• Mirantis OpenStack - software.mirantis.com
Contributors
Agenda
• OpenStack Data Processing Overview
• EDP Architecture & Technical Concepts
• Live Demo
Elastic Data Processing
• EDP - API for executing MapReduce jobs on
Hadoop clusters (similar to AWS EMR)
• Supported data sources: Swift, HDFS, Ceph
• Supported job types: Java actions,
MapReduce, MapReduce.Streaming, Pig, Hive
• Oozie for Hadoop jobs workflow management
• Supports both Hadoop 1 & 2
• Job executions on transient clusters
EDP Use Cases
• Simplified task executions. You don’t need to
know Hadoop!
• Bursty workload: ad-hoc queries requiring a
significant resource only for short time period
• Utilization of free IaaS capacity for Hadoop tasks
EDP - Data Sources
Swift Sahara EDP
INPUT
OUTPUT
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
swift://some_container/INPUT
swift://some_container/OUTPUT
EDP - Job Binaries
Swift
Sahara DB
Sahara EDP
internal-db://script.pig
swift://some_container/mapreduce.jar
1. Pig, Hive scripts
2. Executable Jar files
3. Pluggable binaries and
libraries
EDP - Job Execution. Step 1
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
EDP - Job Execution. Step 2
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
JobTracker
Oozie
Hadoop
VM
Hadoop
VM
Hadoop
VM
EDP - Job Execution. Step 3
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
JobTracker
Oozie
Execute
a
job
EDP - Job Execution. Step 4
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
JobTracker
Oozie
EDP - Job Execution. Step 5
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
workflow.xm
l
1. Job-specific configurations
2. URLs to binaries
3. URLs for data sources
4. Credentials
JobTracker
Oozie
EDP - Job Execution. Step 6
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
workflow.xm
l
Data Processing
OUTPUT
1. Job-specific configurations
2. URLs to binaries
3. URLs for data sources
4. Credentials
JobTracker
Oozie
EDP - Job Execution. Step 7
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
workflow.xm
l
1. Job-specific configurations
2. URLs to binaries
3. URLs for data sources
4. Credentials
Data Processing
OUTPUT
JobTracker
Oozie
Agenda
• OpenStack Data Processing Overview
• EDP Architecture & Technical Concepts
• Live Demo
EDP BigPetStore Demo
BigPetStore is now part of Apache BigTop
• Test/demo laboratory for all things Hadoop
• Actively developed with integration testing
• Generates and processes data of arbitrary size
• git clone git://git.apache.org/bigtop.git
• Filed under bigtop/bigtop-bigpetstore
EDP BigPetStore Demo
What are we going to do?
• Generate 1M records of pet supply purchases
• Clean the data (“dirty CSV”)
• Extract cumulative counts by state
• Demonstrates Sahara EDP objects
• Job Binaries
• Jobs (Java and Pig)
• Data Sources
EDP BigPetStore Sample Data
Generated Data (first job)
$ hadoop fs -cat bigpetstore/gen/part-r-00000 | more
BigPetStore,storeCode_AK,1 deanna,booker,Sun Jan 18 20:50:06 GMT+00:00 1970,7.5,cat-food
BigPetStore,storeCode_AK,10 erica,buck,Thu Dec 25 16:29:28 GMT+00:00 1969,10.5,dog-food
Cleaned Data (second job)
$ hadoop fs -cat bigpetstore/clean/part-m-00000 | more
BigPetStore storeCode_AK 1 deanna booker Sun Jan 18 20:50:06 GMT+00:00 1970 7.5 cat-food
BigPetStore storeCode_AK 10 erica buck Thu Dec 25 16:29:28 GMT+00:00 1969 10.5 dog-food
EDP BigPetStore Sample Data
Summed Data For Products by State (3rd job)
$ hadoop fs -cat bigpetstore/analyze_rel/part-r-00000 | more
US-AK cat-food 24837
US-AK dog-food 24994
US-AK fuzzy-collar 25145
US-AK antelope-caller 25024
US-AZ cat-food 25106
US-AZ dog-food 25064
US-AZ leather-collar 24870
US-AZ snake-bite ointment 24960
What Next for EDP
Potential Areas for Development within EDP
• Pluggable Job Execution Model
• Allows Sahara to run jobs with additional execution engines
• Current Oozie offerings become one of multiple options
• Expand Capabilities via Oozie
• Support upload of user-written Oozie workflows
• Support for coordinated jobs
• Enhanced Usability
• Better Error Reporting
• User Experience (UI, CLI, API)
Please, send us your feedback! Ideas are always welcome
• #openstack-sahara on freenode
• openstack-dev@lists.openstack.org with [openstack-dev][sahara] subject
Design Summit Sessions
7 Sessions: Thursday 1:30 - Friday 10:30
http://goo.gl/lQXtUS
Q&A
Thank you!

Más contenido relacionado

La actualidad más candente

Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
DataWorks Summit
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)
Nicolas Poggi
 
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Summit
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Databricks
 

La actualidad más candente (20)

20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker20150425 experimenting with openstack sahara on docker
20150425 experimenting with openstack sahara on docker
 
Data Processing Updates - Juno Edition
Data Processing Updates - Juno EditionData Processing Updates - Juno Edition
Data Processing Updates - Juno Edition
 
Hadoop and OpenStack
Hadoop and OpenStackHadoop and OpenStack
Hadoop and OpenStack
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
 
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA)  - SaharaOpenStack Trove Day (19 Aug 2014, Cambridge MA)  - Sahara
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
 
Tachyon and Apache Spark
Tachyon and Apache SparkTachyon and Apache Spark
Tachyon and Apache Spark
 
20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup20150314 sahara intro and the future plan for open stack meetup
20150314 sahara intro and the future plan for open stack meetup
 
Performance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark MetricsPerformance Troubleshooting Using Apache Spark Metrics
Performance Troubleshooting Using Apache Spark Metrics
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)State of Spark in the cloud (Spark Summit EU 2017)
State of Spark in the cloud (Spark Summit EU 2017)
 
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene Pang
 
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for HadoopHBaseCon 2015: Apache Kylin - Extreme OLAP  Engine for Hadoop
HBaseCon 2015: Apache Kylin - Extreme OLAP Engine for Hadoop
 
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
Analyzing IOT Data in Apache Spark Across Data Centers and Cloud with NetApp ...
 
Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
 
Hive Now Sparks
Hive Now SparksHive Now Sparks
Hive Now Sparks
 
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
 
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache TezYahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
Yahoo - Moving beyond running 100% of Apache Pig jobs on Apache Tez
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
 

Similar a Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud

De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDe-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
DataWorks Summit
 

Similar a Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud (20)

Big data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guideBig data hadooop analytic and data warehouse comparison guide
Big data hadooop analytic and data warehouse comparison guide
 
Big data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guideBig data Hadoop Analytic and Data warehouse comparison guide
Big data Hadoop Analytic and Data warehouse comparison guide
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)
 
De-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-CloudDe-Bugging Hive with Hadoop-in-the-Cloud
De-Bugging Hive with Hadoop-in-the-Cloud
 
Debugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-CloudDebugging Hive with Hadoop-in-the-Cloud
Debugging Hive with Hadoop-in-the-Cloud
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for Presto
 
Hadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and FutureHadoop @ eBay: Past, Present, and Future
Hadoop @ eBay: Past, Present, and Future
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Hackathon bonn
Hackathon bonnHackathon bonn
Hackathon bonn
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of AltiscaleDebugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
Debugging Hive with Hadoop-in-the-Cloud by David Chaiken of Altiscale
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
Apache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data ProcessingApache Tez - A unifying Framework for Hadoop Data Processing
Apache Tez - A unifying Framework for Hadoop Data Processing
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived
 
Hadoop at ayasdi
Hadoop at ayasdiHadoop at ayasdi
Hadoop at ayasdi
 
Big Data Journey
Big Data JourneyBig Data Journey
Big Data Journey
 
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014Dataiku  - hadoop ecosystem - @Epitech Paris - janvier 2014
Dataiku - hadoop ecosystem - @Epitech Paris - janvier 2014
 

Más de Sergey Lukjanov (6)

[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
 
Courses: concurrency #2
Courses: concurrency #2Courses: concurrency #2
Courses: concurrency #2
 
Twitter Storm
Twitter StormTwitter Storm
Twitter Storm
 
Java Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalkJava Agents and Instrumentation techtalk
Java Agents and Instrumentation techtalk
 
Java Bytecode techtalk
Java Bytecode techtalkJava Bytecode techtalk
Java Bytecode techtalk
 
Kotlin techtalk
Kotlin techtalkKotlin techtalk
Kotlin techtalk
 

Último

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 

Último (20)

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 

Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud

  • 1. Big Data Computations Using Elastic Data Processing in OpenStack Cloud Sergey Lukjanov (Mirantis) Alexander Ignatov (Mirantis) Trevor McKay (Red Hat)
  • 2. Agenda • OpenStack Data Processing Overview • EDP Architecture & Technical Concepts • Live Demo
  • 3. Agenda • OpenStack Data Processing Overview • EDP Architecture & Technical Concepts • Live Demo
  • 4. OpenStack Data Processing: Sahara Mission: To provide a scalable data processing stack and associated management interfaces. • provision and operate Hadoop clusters • schedule and operate Hadoop jobs
  • 5. Hadoop - Big Data Platform © http://hortonworks.com/hadoop/yarn/
  • 8. Sahara status • Official integrated OpenStack project • Supported Hadoop distros: • Vanilla Apache Hadoop • Hortonworks Data Platform • Intel Distribution • Cloudera Distribution in blueprint • Included into OpenStack distros: • RDO - openstack.redhat.com • Mirantis OpenStack - software.mirantis.com
  • 10. Agenda • OpenStack Data Processing Overview • EDP Architecture & Technical Concepts • Live Demo
  • 11. Elastic Data Processing • EDP - API for executing MapReduce jobs on Hadoop clusters (similar to AWS EMR) • Supported data sources: Swift, HDFS, Ceph • Supported job types: Java actions, MapReduce, MapReduce.Streaming, Pig, Hive • Oozie for Hadoop jobs workflow management • Supports both Hadoop 1 & 2 • Job executions on transient clusters
  • 12. EDP Use Cases • Simplified task executions. You don’t need to know Hadoop! • Bursty workload: ad-hoc queries requiring a significant resource only for short time period • Utilization of free IaaS capacity for Hadoop tasks
  • 13. EDP - Data Sources Swift Sahara EDP INPUT OUTPUT Hadoop VM Hadoop VM Hadoop VM Hadoop VM swift://some_container/INPUT swift://some_container/OUTPUT
  • 14. EDP - Job Binaries Swift Sahara DB Sahara EDP internal-db://script.pig swift://some_container/mapreduce.jar 1. Pig, Hive scripts 2. Executable Jar files 3. Pluggable binaries and libraries
  • 15. EDP - Job Execution. Step 1 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig
  • 16. EDP - Job Execution. Step 2 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig JobTracker Oozie Hadoop VM Hadoop VM Hadoop VM
  • 17. EDP - Job Execution. Step 3 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM JobTracker Oozie Execute a job
  • 18. EDP - Job Execution. Step 4 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM JobTracker Oozie
  • 19. EDP - Job Execution. Step 5 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM workflow.xm l 1. Job-specific configurations 2. URLs to binaries 3. URLs for data sources 4. Credentials JobTracker Oozie
  • 20. EDP - Job Execution. Step 6 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM workflow.xm l Data Processing OUTPUT 1. Job-specific configurations 2. URLs to binaries 3. URLs for data sources 4. Credentials JobTracker Oozie
  • 21. EDP - Job Execution. Step 7 Sahara Swift INPUT DB: Jar, Pig EDP Jar, Pig Hadoop VM Hadoop VM Hadoop VM workflow.xm l 1. Job-specific configurations 2. URLs to binaries 3. URLs for data sources 4. Credentials Data Processing OUTPUT JobTracker Oozie
  • 22. Agenda • OpenStack Data Processing Overview • EDP Architecture & Technical Concepts • Live Demo
  • 23. EDP BigPetStore Demo BigPetStore is now part of Apache BigTop • Test/demo laboratory for all things Hadoop • Actively developed with integration testing • Generates and processes data of arbitrary size • git clone git://git.apache.org/bigtop.git • Filed under bigtop/bigtop-bigpetstore
  • 24. EDP BigPetStore Demo What are we going to do? • Generate 1M records of pet supply purchases • Clean the data (“dirty CSV”) • Extract cumulative counts by state • Demonstrates Sahara EDP objects • Job Binaries • Jobs (Java and Pig) • Data Sources
  • 25. EDP BigPetStore Sample Data Generated Data (first job) $ hadoop fs -cat bigpetstore/gen/part-r-00000 | more BigPetStore,storeCode_AK,1 deanna,booker,Sun Jan 18 20:50:06 GMT+00:00 1970,7.5,cat-food BigPetStore,storeCode_AK,10 erica,buck,Thu Dec 25 16:29:28 GMT+00:00 1969,10.5,dog-food Cleaned Data (second job) $ hadoop fs -cat bigpetstore/clean/part-m-00000 | more BigPetStore storeCode_AK 1 deanna booker Sun Jan 18 20:50:06 GMT+00:00 1970 7.5 cat-food BigPetStore storeCode_AK 10 erica buck Thu Dec 25 16:29:28 GMT+00:00 1969 10.5 dog-food
  • 26. EDP BigPetStore Sample Data Summed Data For Products by State (3rd job) $ hadoop fs -cat bigpetstore/analyze_rel/part-r-00000 | more US-AK cat-food 24837 US-AK dog-food 24994 US-AK fuzzy-collar 25145 US-AK antelope-caller 25024 US-AZ cat-food 25106 US-AZ dog-food 25064 US-AZ leather-collar 24870 US-AZ snake-bite ointment 24960
  • 27. What Next for EDP Potential Areas for Development within EDP • Pluggable Job Execution Model • Allows Sahara to run jobs with additional execution engines • Current Oozie offerings become one of multiple options • Expand Capabilities via Oozie • Support upload of user-written Oozie workflows • Support for coordinated jobs • Enhanced Usability • Better Error Reporting • User Experience (UI, CLI, API) Please, send us your feedback! Ideas are always welcome • #openstack-sahara on freenode • openstack-dev@lists.openstack.org with [openstack-dev][sahara] subject
  • 28. Design Summit Sessions 7 Sessions: Thursday 1:30 - Friday 10:30 http://goo.gl/lQXtUS
  • 29. Q&A