In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
Atlanta OpenStack Summit: Technical Deep Dive: Big Data Computations Using Elastic Data Processing in OpenStack Cloud
1. Big Data Computations
Using Elastic Data
Processing in
OpenStack Cloud
Sergey Lukjanov (Mirantis)
Alexander Ignatov (Mirantis)
Trevor McKay (Red Hat)
2. Agenda
• OpenStack Data Processing Overview
• EDP Architecture & Technical Concepts
• Live Demo
3. Agenda
• OpenStack Data Processing Overview
• EDP Architecture & Technical Concepts
• Live Demo
4. OpenStack Data Processing: Sahara
Mission: To provide a scalable data processing
stack and associated management interfaces.
• provision and operate Hadoop clusters
• schedule and operate Hadoop jobs
10. Agenda
• OpenStack Data Processing Overview
• EDP Architecture & Technical Concepts
• Live Demo
11. Elastic Data Processing
• EDP - API for executing MapReduce jobs on
Hadoop clusters (similar to AWS EMR)
• Supported data sources: Swift, HDFS, Ceph
• Supported job types: Java actions,
MapReduce, MapReduce.Streaming, Pig, Hive
• Oozie for Hadoop jobs workflow management
• Supports both Hadoop 1 & 2
• Job executions on transient clusters
12. EDP Use Cases
• Simplified task executions. You don’t need to
know Hadoop!
• Bursty workload: ad-hoc queries requiring a
significant resource only for short time period
• Utilization of free IaaS capacity for Hadoop tasks
13. EDP - Data Sources
Swift Sahara EDP
INPUT
OUTPUT
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
swift://some_container/INPUT
swift://some_container/OUTPUT
14. EDP - Job Binaries
Swift
Sahara DB
Sahara EDP
internal-db://script.pig
swift://some_container/mapreduce.jar
1. Pig, Hive scripts
2. Executable Jar files
3. Pluggable binaries and
libraries
16. EDP - Job Execution. Step 2
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
JobTracker
Oozie
Hadoop
VM
Hadoop
VM
Hadoop
VM
17. EDP - Job Execution. Step 3
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
JobTracker
Oozie
Execute
a
job
18. EDP - Job Execution. Step 4
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
JobTracker
Oozie
19. EDP - Job Execution. Step 5
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
workflow.xm
l
1. Job-specific configurations
2. URLs to binaries
3. URLs for data sources
4. Credentials
JobTracker
Oozie
20. EDP - Job Execution. Step 6
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
workflow.xm
l
Data Processing
OUTPUT
1. Job-specific configurations
2. URLs to binaries
3. URLs for data sources
4. Credentials
JobTracker
Oozie
21. EDP - Job Execution. Step 7
Sahara
Swift
INPUT
DB: Jar, Pig
EDP
Jar, Pig
Hadoop
VM
Hadoop
VM
Hadoop
VM
workflow.xm
l
1. Job-specific configurations
2. URLs to binaries
3. URLs for data sources
4. Credentials
Data Processing
OUTPUT
JobTracker
Oozie
22. Agenda
• OpenStack Data Processing Overview
• EDP Architecture & Technical Concepts
• Live Demo
23. EDP BigPetStore Demo
BigPetStore is now part of Apache BigTop
• Test/demo laboratory for all things Hadoop
• Actively developed with integration testing
• Generates and processes data of arbitrary size
• git clone git://git.apache.org/bigtop.git
• Filed under bigtop/bigtop-bigpetstore
24. EDP BigPetStore Demo
What are we going to do?
• Generate 1M records of pet supply purchases
• Clean the data (“dirty CSV”)
• Extract cumulative counts by state
• Demonstrates Sahara EDP objects
• Job Binaries
• Jobs (Java and Pig)
• Data Sources
25. EDP BigPetStore Sample Data
Generated Data (first job)
$ hadoop fs -cat bigpetstore/gen/part-r-00000 | more
BigPetStore,storeCode_AK,1 deanna,booker,Sun Jan 18 20:50:06 GMT+00:00 1970,7.5,cat-food
BigPetStore,storeCode_AK,10 erica,buck,Thu Dec 25 16:29:28 GMT+00:00 1969,10.5,dog-food
Cleaned Data (second job)
$ hadoop fs -cat bigpetstore/clean/part-m-00000 | more
BigPetStore storeCode_AK 1 deanna booker Sun Jan 18 20:50:06 GMT+00:00 1970 7.5 cat-food
BigPetStore storeCode_AK 10 erica buck Thu Dec 25 16:29:28 GMT+00:00 1969 10.5 dog-food
26. EDP BigPetStore Sample Data
Summed Data For Products by State (3rd job)
$ hadoop fs -cat bigpetstore/analyze_rel/part-r-00000 | more
US-AK cat-food 24837
US-AK dog-food 24994
US-AK fuzzy-collar 25145
US-AK antelope-caller 25024
US-AZ cat-food 25106
US-AZ dog-food 25064
US-AZ leather-collar 24870
US-AZ snake-bite ointment 24960
27. What Next for EDP
Potential Areas for Development within EDP
• Pluggable Job Execution Model
• Allows Sahara to run jobs with additional execution engines
• Current Oozie offerings become one of multiple options
• Expand Capabilities via Oozie
• Support upload of user-written Oozie workflows
• Support for coordinated jobs
• Enhanced Usability
• Better Error Reporting
• User Experience (UI, CLI, API)
Please, send us your feedback! Ideas are always welcome
• #openstack-sahara on freenode
• openstack-dev@lists.openstack.org with [openstack-dev][sahara] subject