SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
Big Data Architecture on Cloud
Computing Infrastructure
Reza Bakhshayeshi
About me
• Reza Bakhshayeshi
• MSc. Information Technology – Computer Networks
• 7 years of experience in Cloud Computing research
• 3 years of experience in industry
• Email: bakhshayeshi.reza@gmail.com
2
Agenda
• Cloud Computing
• Introduction to OpenStack
• Why OpenStack
• What is Sahara?
• Sahara Architecture
• Lab Session
3
Cloud Computing 4
Five Essential Characteristics
• Based on NIST:
5
Service Offering Models
• Software as a Service (SaaS)
• Platform as a Service (PaaS)
• Infrastructure as a Service (IaaS)
6
Introduction to OpenStack
• OpenStack began in 2010 as a joint project of Rackspace Hosting
and NASA.
• OpenStack is a free and open-source software platform for cloud
computing, mostly deployed as an infrastructure-as-a-service
(IaaS)
7
Why OpenStack?
• OpenStack elevates your business to the cloud.
OpenStack is a scalable, open sourced cloud computing
platform.
• Comprised of modular, scalable, and flexible set of
utilities; provides clients with value, efficiency, and
agility.
8
Why OpenStack?
• Open-source; the technology is supported by a large
community of developers.
• Tried and tested by large businesses.
• Interoperability and open-source APIs allow admins
to manage hybrid IT environments without the
additional overhead layer
9
OpenStack By Numbers 10
11
12
13
What size organizations use OpenStack? 14
Increase Maturity in Deployments 15
OpenStack Architecture 16
17
What is Sahara?
• Basic Idea comes from Amazon Elastic MapReduce (EMR)
• Sahara’s mission is to provide a scalable data processing
stack and associated management interfaces.
• Provision and operate data processing clusters
• Schedule and operate data processing jobs
• Data Processing ~ Hadoop, Spark, Storm, etc.
18
What is Sahara?
• Sahara aims to provide users with a simple means to
provision Hadoop, Spark, and Storm clusters by
specifying several parameters such as the:
oVersion
oCluster topology
oHardware node details and more.
19
Use Cases
• Fast provisioning of data processing clusters on
OpenStack for development and quality assurance(QA).
• Utilization of unused compute power from a general
purpose OpenStack IaaS cloud.
• “Analytics as a Service” for ad-hoc or bursty analytic
workloads (similar to AWS EMR).
20
Key Features
• Designed as an OpenStack component.
• Managed through a REST API with a user interface(UI)
available as part of OpenStack Dashboard.
• Predefined configuration templates with the ability to
modify parameters.
21
Key Features
• Support for a variety of data processing frameworks:
omultiple Hadoop vendor distributions.
oApache Spark and Storm.
opluggable system of Hadoop installation engines.
ointegration with vendor specific management tools,
such as Apache Ambari and Cloudera Management
Console.
22
Key Features - Provision Cluster
• Create/Terminate Cluster
• Heat API/Nova Direct API
• Neutron/Nova Network
• Floating IP Management
• Anti-affinity
• Cluster Scaling
• Add Node/Remove Node
• Support Plugins
• Vanilla/Hortonworks Data Platform/Cloudera/Spark/MapR
23
Key Features - Elastic Data Processing
• Support Job Type
• Hive/Pig/MapReduce/MapReduce
Streaming/Java/Spark/Shell/HBase
• Support Data Locality
• Rack/Hypervisor/Swift
• Data Source
• Internal: Ephemeral Disk/Cinder
• External: Swift
• Run Job in Transient Cluster
24
Sahara and OpenStack 25
Distros
• Vanilla Apache Hadoop: 2.6.0, 2.7.1
• Hotonworks Data Platform (HDP): 2.2, 2.3
• Cloudera (CDH): 5.3.x, 5.4.x
• MapR: 4.0.x, 5.0.x
• Vanilla Apache Spark: 1.0.0, 1.3.1
• Vanilla Apache Storm: 0.9.2
26
Fast Cluster Provisioning
Select
Hadoop Version
Select
Base Image
w/ Hadoop
Define
Cluster
Configuration
Provision
Cluster
Operate
Cluster
Terminate
Cluster
Analytic as a Service using Elastic Data Processing
Select
Hadoop Version
Configure Jobs
Set Limit
for Cluster
Execute Jobs Get The Result
• Choose type of the job: pig, hive, jar-file, etc.
• Select input and output data location (Swift support)
• Cluster will be removed automatically after the job completion
• Provide the details Hadoop configuration, like size, topology, and others
• Sahara will provision VMs, install and configure Hadoop
• Support Scale out Cluster to add/remove nodes
Work Flow 27
Swift
OpenStack
Virtual Clusters
OpenStack
Virtual Clusters
HDFS
Collector Agent
Data Stream
Pattern 2: External - SwiftPattern 1: Internal - HDFS Only
Collector Agent
Collecting Data
Collecting Data
OpenStack use Swift as a data source to store input
and output data. The benefit is to process the data
directly and persist the data via Swift.
OpenStack support to create HDFS on Cinder or
Ephemeral Disk. This method can provide a better
data processing performance via Ephemeral Disk or
to persist the data via Cinder with lower
performance.
Cinder
Ephemeral Disk
MapReduce MapReduce
28
Architecture 29
30
OpenStack + Sahara notes
• CPU:
• Estimated virtualization overhead (KVM): < 10%
• Isolated networks on OpenStack nodes
• Scheduler hints passed by Sahara – place VMs on the same hosts
31
Lab Session 32
Questions? 33

Más contenido relacionado

La actualidad más candente

Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analyticsjoshwills
 
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.Data Con LA
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystemJakub Stransky
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteAmr Awadallah
 
Data Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringData Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringAnant Corporation
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and HadoopFlavio Vit
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsDataWorks Summit/Hadoop Summit
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightGert Drapers
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemMd. Hasan Basri (Angel)
 
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Cloudera, Inc.
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : BeginnersShweta Patnaik
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop IntroductionDzung Nguyen
 

La actualidad más candente (18)

Hadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced AnalyticsHadoop vs. RDBMS for Advanced Analytics
Hadoop vs. RDBMS for Advanced Analytics
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
 
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
Schema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-WriteSchema-on-Read vs Schema-on-Write
Schema-on-Read vs Schema-on-Write
 
Data Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringData Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data Engineering
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Big Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the ExpertsBig Data in the Cloud - The What, Why and How from the Experts
Big Data in the Cloud - The What, Why and How from the Experts
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
 
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
Hadoop World 2011: Hadoop and RDBMS with Sqoop and Other Tools - Guy Harrison...
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
 
Big Data and Hadoop Introduction
 Big Data and Hadoop Introduction Big Data and Hadoop Introduction
Big Data and Hadoop Introduction
 

Similar a Big data architecture on cloud computing infrastructure

Cloud Foundry and OpenStack – Marriage Made in Heaven !
Cloud Foundry and OpenStack – Marriage Made in Heaven !Cloud Foundry and OpenStack – Marriage Made in Heaven !
Cloud Foundry and OpenStack – Marriage Made in Heaven ! Animesh Singh
 
Deep Dive: OpenStack Summit (Red Hat Summit 2014)
Deep Dive: OpenStack Summit (Red Hat Summit 2014)Deep Dive: OpenStack Summit (Red Hat Summit 2014)
Deep Dive: OpenStack Summit (Red Hat Summit 2014)Stephen Gordon
 
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...VMware Tanzu
 
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...VMware Tanzu
 
Chef and OpenStack Workshop from ChefConf 2013
Chef and OpenStack Workshop from ChefConf 2013Chef and OpenStack Workshop from ChefConf 2013
Chef and OpenStack Workshop from ChefConf 2013Matt Ray
 
What is the OpenStack Platform? By Peter Dens - Kangaroot
What is the OpenStack Platform? By Peter Dens - KangarootWhat is the OpenStack Platform? By Peter Dens - Kangaroot
What is the OpenStack Platform? By Peter Dens - KangarootKangaroot
 
Cloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: OpenstackCloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: OpenstackMicrosoft
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloudDmitry Tolpeko
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyPeter Clapham
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introductionfardinjamshidi
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Cask Data
 
UNC Chapel Hill Ctc Retreat 2014 SAS Visual Analytics and Business Intelligence
UNC Chapel Hill Ctc Retreat 2014 SAS Visual Analytics and Business IntelligenceUNC Chapel Hill Ctc Retreat 2014 SAS Visual Analytics and Business Intelligence
UNC Chapel Hill Ctc Retreat 2014 SAS Visual Analytics and Business IntelligenceJonathan Pletzke
 
Cloud Foundry and OpenStack: How They Fit - Cloud Expo 2014
Cloud Foundry and OpenStack: How They Fit - Cloud Expo 2014Cloud Foundry and OpenStack: How They Fit - Cloud Expo 2014
Cloud Foundry and OpenStack: How They Fit - Cloud Expo 2014Jason Anderson
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraVictor Coustenoble
 
TDC2018SP | Trilha Cloud - Why Apache CloudStack
TDC2018SP | Trilha Cloud - Why Apache CloudStackTDC2018SP | Trilha Cloud - Why Apache CloudStack
TDC2018SP | Trilha Cloud - Why Apache CloudStacktdc-globalcode
 

Similar a Big data architecture on cloud computing infrastructure (20)

Openstack
OpenstackOpenstack
Openstack
 
Cloud Foundry and OpenStack – Marriage Made in Heaven !
Cloud Foundry and OpenStack – Marriage Made in Heaven !Cloud Foundry and OpenStack – Marriage Made in Heaven !
Cloud Foundry and OpenStack – Marriage Made in Heaven !
 
CC -Unit4.pptx
CC -Unit4.pptxCC -Unit4.pptx
CC -Unit4.pptx
 
Deep Dive: OpenStack Summit (Red Hat Summit 2014)
Deep Dive: OpenStack Summit (Red Hat Summit 2014)Deep Dive: OpenStack Summit (Red Hat Summit 2014)
Deep Dive: OpenStack Summit (Red Hat Summit 2014)
 
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
 
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
 
DR_PRESENT 1
DR_PRESENT 1DR_PRESENT 1
DR_PRESENT 1
 
Chef and OpenStack Workshop from ChefConf 2013
Chef and OpenStack Workshop from ChefConf 2013Chef and OpenStack Workshop from ChefConf 2013
Chef and OpenStack Workshop from ChefConf 2013
 
What is the OpenStack Platform? By Peter Dens - Kangaroot
What is the OpenStack Platform? By Peter Dens - KangarootWhat is the OpenStack Platform? By Peter Dens - Kangaroot
What is the OpenStack Platform? By Peter Dens - Kangaroot
 
Cloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: OpenstackCloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: Openstack
 
Qubole - Big data in cloud
Qubole - Big data in cloudQubole - Big data in cloud
Qubole - Big data in cloud
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 
Cloud and OpenStack
Cloud and OpenStackCloud and OpenStack
Cloud and OpenStack
 
Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
UNC Chapel Hill Ctc Retreat 2014 SAS Visual Analytics and Business Intelligence
UNC Chapel Hill Ctc Retreat 2014 SAS Visual Analytics and Business IntelligenceUNC Chapel Hill Ctc Retreat 2014 SAS Visual Analytics and Business Intelligence
UNC Chapel Hill Ctc Retreat 2014 SAS Visual Analytics and Business Intelligence
 
Cloud Foundry and OpenStack: How They Fit - Cloud Expo 2014
Cloud Foundry and OpenStack: How They Fit - Cloud Expo 2014Cloud Foundry and OpenStack: How They Fit - Cloud Expo 2014
Cloud Foundry and OpenStack: How They Fit - Cloud Expo 2014
 
OpenStack 101 update
OpenStack 101 updateOpenStack 101 update
OpenStack 101 update
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
TDC2018SP | Trilha Cloud - Why Apache CloudStack
TDC2018SP | Trilha Cloud - Why Apache CloudStackTDC2018SP | Trilha Cloud - Why Apache CloudStack
TDC2018SP | Trilha Cloud - Why Apache CloudStack
 

Último

DS Lesson 2 - Subsets, Supersets and Power Set.pdf
DS Lesson 2 - Subsets, Supersets and Power Set.pdfDS Lesson 2 - Subsets, Supersets and Power Set.pdf
DS Lesson 2 - Subsets, Supersets and Power Set.pdfROWELL MARQUINA
 
Transcript: Book industry state of the nation 2024 - Tech Forum 2024
Transcript: Book industry state of the nation 2024 - Tech Forum 2024Transcript: Book industry state of the nation 2024 - Tech Forum 2024
Transcript: Book industry state of the nation 2024 - Tech Forum 2024BookNet Canada
 
🎶🎵Bo-stream-ian Rhapsody: A Musical Demo of Kafka Connect and Kafka Streams 🎵🎶
🎶🎵Bo-stream-ian Rhapsody: A Musical Demo of Kafka Connect and Kafka Streams 🎵🎶🎶🎵Bo-stream-ian Rhapsody: A Musical Demo of Kafka Connect and Kafka Streams 🎵🎶
🎶🎵Bo-stream-ian Rhapsody: A Musical Demo of Kafka Connect and Kafka Streams 🎵🎶HostedbyConfluent
 
Real-time Geospatial Aircraft Monitoring Using Apache Kafka
Real-time Geospatial Aircraft Monitoring Using Apache KafkaReal-time Geospatial Aircraft Monitoring Using Apache Kafka
Real-time Geospatial Aircraft Monitoring Using Apache KafkaHostedbyConfluent
 
Book industry state of the nation 2024 - Tech Forum 2024
Book industry state of the nation 2024 - Tech Forum 2024Book industry state of the nation 2024 - Tech Forum 2024
Book industry state of the nation 2024 - Tech Forum 2024BookNet Canada
 
Technology Governance & Migration In The AI Era
Technology Governance & Migration In The AI EraTechnology Governance & Migration In The AI Era
Technology Governance & Migration In The AI Era2toLead Limited
 
Case Study: Implementing a Data Mesh at NORD/LB
Case Study: Implementing a Data Mesh at NORD/LBCase Study: Implementing a Data Mesh at NORD/LB
Case Study: Implementing a Data Mesh at NORD/LBHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...HostedbyConfluent
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactivestartupro
 
Web Development Solutions 2024 A Beginner's Comprehensive Handbook.pdf
Web Development Solutions 2024 A Beginner's Comprehensive Handbook.pdfWeb Development Solutions 2024 A Beginner's Comprehensive Handbook.pdf
Web Development Solutions 2024 A Beginner's Comprehensive Handbook.pdfSeasia Infotech
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Bridge to the Future: Migrating to KRaft
Bridge to the Future: Migrating to KRaftBridge to the Future: Migrating to KRaft
Bridge to the Future: Migrating to KRaftHostedbyConfluent
 
Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Stream...
Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Stream...Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Stream...
Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Stream...HostedbyConfluent
 
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023Joshua Flannery
 
Event-Driven Microservices: Back to the Basics
Event-Driven Microservices: Back to the BasicsEvent-Driven Microservices: Back to the Basics
Event-Driven Microservices: Back to the BasicsHostedbyConfluent
 
Brick-by-Brick: Exploring the Elements of Apache Kafka®
Brick-by-Brick: Exploring the Elements of Apache Kafka®Brick-by-Brick: Exploring the Elements of Apache Kafka®
Brick-by-Brick: Exploring the Elements of Apache Kafka®HostedbyConfluent
 
Data Contracts In Practice With Debezium and Apache Flink
Data Contracts In Practice With Debezium and Apache FlinkData Contracts In Practice With Debezium and Apache Flink
Data Contracts In Practice With Debezium and Apache FlinkHostedbyConfluent
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveApache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveHostedbyConfluent
 
Modifying Your SQL Streaming Queries on the Fly: The Impossible Trinity
Modifying Your SQL Streaming Queries on the Fly: The Impossible TrinityModifying Your SQL Streaming Queries on the Fly: The Impossible Trinity
Modifying Your SQL Streaming Queries on the Fly: The Impossible TrinityHostedbyConfluent
 

Último (20)

DS Lesson 2 - Subsets, Supersets and Power Set.pdf
DS Lesson 2 - Subsets, Supersets and Power Set.pdfDS Lesson 2 - Subsets, Supersets and Power Set.pdf
DS Lesson 2 - Subsets, Supersets and Power Set.pdf
 
Transcript: Book industry state of the nation 2024 - Tech Forum 2024
Transcript: Book industry state of the nation 2024 - Tech Forum 2024Transcript: Book industry state of the nation 2024 - Tech Forum 2024
Transcript: Book industry state of the nation 2024 - Tech Forum 2024
 
🎶🎵Bo-stream-ian Rhapsody: A Musical Demo of Kafka Connect and Kafka Streams 🎵🎶
🎶🎵Bo-stream-ian Rhapsody: A Musical Demo of Kafka Connect and Kafka Streams 🎵🎶🎶🎵Bo-stream-ian Rhapsody: A Musical Demo of Kafka Connect and Kafka Streams 🎵🎶
🎶🎵Bo-stream-ian Rhapsody: A Musical Demo of Kafka Connect and Kafka Streams 🎵🎶
 
Real-time Geospatial Aircraft Monitoring Using Apache Kafka
Real-time Geospatial Aircraft Monitoring Using Apache KafkaReal-time Geospatial Aircraft Monitoring Using Apache Kafka
Real-time Geospatial Aircraft Monitoring Using Apache Kafka
 
Book industry state of the nation 2024 - Tech Forum 2024
Book industry state of the nation 2024 - Tech Forum 2024Book industry state of the nation 2024 - Tech Forum 2024
Book industry state of the nation 2024 - Tech Forum 2024
 
Technology Governance & Migration In The AI Era
Technology Governance & Migration In The AI EraTechnology Governance & Migration In The AI Era
Technology Governance & Migration In The AI Era
 
Case Study: Implementing a Data Mesh at NORD/LB
Case Study: Implementing a Data Mesh at NORD/LBCase Study: Implementing a Data Mesh at NORD/LB
Case Study: Implementing a Data Mesh at NORD/LB
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
The Streaming Data Lake - What Do KIP-405 and KIP-833 Mean for Your Larger Da...
 
Bitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactiveBitdefender-CSG-Report-creat7534-interactive
Bitdefender-CSG-Report-creat7534-interactive
 
Web Development Solutions 2024 A Beginner's Comprehensive Handbook.pdf
Web Development Solutions 2024 A Beginner's Comprehensive Handbook.pdfWeb Development Solutions 2024 A Beginner's Comprehensive Handbook.pdf
Web Development Solutions 2024 A Beginner's Comprehensive Handbook.pdf
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Bridge to the Future: Migrating to KRaft
Bridge to the Future: Migrating to KRaftBridge to the Future: Migrating to KRaft
Bridge to the Future: Migrating to KRaft
 
Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Stream...
Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Stream...Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Stream...
Leveraging Tiered Storage in Strimzi-Operated Kafka for Cost-Effective Stream...
 
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
THE STATE OF STARTUP ECOSYSTEM - INDIA x JAPAN 2023
 
Event-Driven Microservices: Back to the Basics
Event-Driven Microservices: Back to the BasicsEvent-Driven Microservices: Back to the Basics
Event-Driven Microservices: Back to the Basics
 
Brick-by-Brick: Exploring the Elements of Apache Kafka®
Brick-by-Brick: Exploring the Elements of Apache Kafka®Brick-by-Brick: Exploring the Elements of Apache Kafka®
Brick-by-Brick: Exploring the Elements of Apache Kafka®
 
Data Contracts In Practice With Debezium and Apache Flink
Data Contracts In Practice With Debezium and Apache FlinkData Contracts In Practice With Debezium and Apache Flink
Data Contracts In Practice With Debezium and Apache Flink
 
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support PerspectiveApache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
Apache Kafka's Common Pitfalls & Intricacies: A Customer Support Perspective
 
Modifying Your SQL Streaming Queries on the Fly: The Impossible Trinity
Modifying Your SQL Streaming Queries on the Fly: The Impossible TrinityModifying Your SQL Streaming Queries on the Fly: The Impossible Trinity
Modifying Your SQL Streaming Queries on the Fly: The Impossible Trinity
 

Big data architecture on cloud computing infrastructure

  • 1. Big Data Architecture on Cloud Computing Infrastructure Reza Bakhshayeshi
  • 2. About me • Reza Bakhshayeshi • MSc. Information Technology – Computer Networks • 7 years of experience in Cloud Computing research • 3 years of experience in industry • Email: bakhshayeshi.reza@gmail.com 2
  • 3. Agenda • Cloud Computing • Introduction to OpenStack • Why OpenStack • What is Sahara? • Sahara Architecture • Lab Session 3
  • 6. Service Offering Models • Software as a Service (SaaS) • Platform as a Service (PaaS) • Infrastructure as a Service (IaaS) 6
  • 7. Introduction to OpenStack • OpenStack began in 2010 as a joint project of Rackspace Hosting and NASA. • OpenStack is a free and open-source software platform for cloud computing, mostly deployed as an infrastructure-as-a-service (IaaS) 7
  • 8. Why OpenStack? • OpenStack elevates your business to the cloud. OpenStack is a scalable, open sourced cloud computing platform. • Comprised of modular, scalable, and flexible set of utilities; provides clients with value, efficiency, and agility. 8
  • 9. Why OpenStack? • Open-source; the technology is supported by a large community of developers. • Tried and tested by large businesses. • Interoperability and open-source APIs allow admins to manage hybrid IT environments without the additional overhead layer 9
  • 11. 11
  • 12. 12
  • 13. 13
  • 14. What size organizations use OpenStack? 14
  • 15. Increase Maturity in Deployments 15
  • 17. 17
  • 18. What is Sahara? • Basic Idea comes from Amazon Elastic MapReduce (EMR) • Sahara’s mission is to provide a scalable data processing stack and associated management interfaces. • Provision and operate data processing clusters • Schedule and operate data processing jobs • Data Processing ~ Hadoop, Spark, Storm, etc. 18
  • 19. What is Sahara? • Sahara aims to provide users with a simple means to provision Hadoop, Spark, and Storm clusters by specifying several parameters such as the: oVersion oCluster topology oHardware node details and more. 19
  • 20. Use Cases • Fast provisioning of data processing clusters on OpenStack for development and quality assurance(QA). • Utilization of unused compute power from a general purpose OpenStack IaaS cloud. • “Analytics as a Service” for ad-hoc or bursty analytic workloads (similar to AWS EMR). 20
  • 21. Key Features • Designed as an OpenStack component. • Managed through a REST API with a user interface(UI) available as part of OpenStack Dashboard. • Predefined configuration templates with the ability to modify parameters. 21
  • 22. Key Features • Support for a variety of data processing frameworks: omultiple Hadoop vendor distributions. oApache Spark and Storm. opluggable system of Hadoop installation engines. ointegration with vendor specific management tools, such as Apache Ambari and Cloudera Management Console. 22
  • 23. Key Features - Provision Cluster • Create/Terminate Cluster • Heat API/Nova Direct API • Neutron/Nova Network • Floating IP Management • Anti-affinity • Cluster Scaling • Add Node/Remove Node • Support Plugins • Vanilla/Hortonworks Data Platform/Cloudera/Spark/MapR 23
  • 24. Key Features - Elastic Data Processing • Support Job Type • Hive/Pig/MapReduce/MapReduce Streaming/Java/Spark/Shell/HBase • Support Data Locality • Rack/Hypervisor/Swift • Data Source • Internal: Ephemeral Disk/Cinder • External: Swift • Run Job in Transient Cluster 24
  • 26. Distros • Vanilla Apache Hadoop: 2.6.0, 2.7.1 • Hotonworks Data Platform (HDP): 2.2, 2.3 • Cloudera (CDH): 5.3.x, 5.4.x • MapR: 4.0.x, 5.0.x • Vanilla Apache Spark: 1.0.0, 1.3.1 • Vanilla Apache Storm: 0.9.2 26
  • 27. Fast Cluster Provisioning Select Hadoop Version Select Base Image w/ Hadoop Define Cluster Configuration Provision Cluster Operate Cluster Terminate Cluster Analytic as a Service using Elastic Data Processing Select Hadoop Version Configure Jobs Set Limit for Cluster Execute Jobs Get The Result • Choose type of the job: pig, hive, jar-file, etc. • Select input and output data location (Swift support) • Cluster will be removed automatically after the job completion • Provide the details Hadoop configuration, like size, topology, and others • Sahara will provision VMs, install and configure Hadoop • Support Scale out Cluster to add/remove nodes Work Flow 27
  • 28. Swift OpenStack Virtual Clusters OpenStack Virtual Clusters HDFS Collector Agent Data Stream Pattern 2: External - SwiftPattern 1: Internal - HDFS Only Collector Agent Collecting Data Collecting Data OpenStack use Swift as a data source to store input and output data. The benefit is to process the data directly and persist the data via Swift. OpenStack support to create HDFS on Cinder or Ephemeral Disk. This method can provide a better data processing performance via Ephemeral Disk or to persist the data via Cinder with lower performance. Cinder Ephemeral Disk MapReduce MapReduce 28
  • 30. 30
  • 31. OpenStack + Sahara notes • CPU: • Estimated virtualization overhead (KVM): < 10% • Isolated networks on OpenStack nodes • Scheduler hints passed by Sahara – place VMs on the same hosts 31