SlideShare una empresa de Scribd logo
1 de 23
Hadoop and OpenStack
Matthew Farrellee, @spinningmatt, Red Hat
Sumit Mohanty, @smohanty, Hortonworks
What is OpenStack?
OpenStack is
A cloud operating system that controls large
pools of compute, storage, and networking
resources throughout a datacenter, all
managed through a dashboard that gives
administrators control while empowering their
users to provision resources through a web
interface.
An ecosystem of projects
● Compute - Nova
● Networking - Neutron
● Object Storage - Swift
● Block Storage - Cinder
● Identity - Keystone
● Image Service - Glance
● Dashboard - Horizon
● Telemetry - Ceilometer
● Orchestration - Heat
● Data Processing - Sahara
Sahara is combining use cases
Trends
Hadoop
EC2
OpenStack
www.google.com/trends/explore#q=hadoop,ec2,openstack
EC2 beta Aug 25 2006 (http://aws.typepad.
com/aws/2006/08/amazon_ec2_beta.html)
Data analysis is hard
Data analysis is hard...
● Come up w/ a relevant question
○ The question you answer won’t be the question you
set out to ask
○ Mine: Can I predict doctor specialty from what
procedures they perform?
● Find the data
○ Tons, little consistency, unknown origin, hidden in
silos, horded
○ Data w/o a dictionary is worse than code w/o
Data analysis is hard...
● Data usability
○ Acceptable license? (Even for Gov’t sets)
■ Mine: Metadata copyrighted by AMA!
○ Private is often highly protected, no/narrow DMZ
● Explore and clean
○ Two of the oldest people in the medical profession
working with medicare
○ Stephen Glasser graduated in 1773
○ Cheryl Palma graduated in 1776
Data analysis is hard...
● You got some answer to a question you
approximately asked
● You must refine the question and process
● Repeat
This is hard enough without having to manage
tools and infrastructure!
Sahara’s goal
Make managing Hadoop+ infrastructure and
tools so simple that they get out of your way
Sahara provides
● Apache Hadoop cluster and workload
management
○ Cluster - construct and manage the lifecycle of a
Hadoop cluster
○ Workload - workflow for big data processing with
Hadoop (AWS EMR-like)
● Through a Python library, REST API, Web
UI, command line interface
Sahara’s architecture
Data
Sources
Sahara
Python
Client
RESTAPI
Cluster
Configuration
Manager
Horizon
Keystone
Auth
Data
Access
Layer
Swift
Sahara
Pages
Hadoop
VM
Vendors
Plugins
Hadoop
VM
Hadoop
VM
Hadoop
VM
Resources
Orchestration
Manager
Job
Sources Job
Manager
Heat
Nova
Glance
Cinder
Neutron
Trove DB
Sahara Service
Sahara’s features
● Plugin mechanism - distro choice
● Cluster scaling - elasticity
● Swift integration - data storage
● Cinder integration - persistent HDFS
● Network management with Nova and Neutron
● Anti-affinity, separate services on physical hardware
● Data locality with Swift
● Repeatable cluster creation w/ template mechanism
● http://docs.openstack.
org/developer/sahara/userdoc/features.html
Storage considerations
● Swift
○ Input/output through Swift HCFS plugin
○ Intermediate data stored in HDFS on cluster
○ Locality when co-locating swift & nova-compute
● HDFS
○ Local (long lived cluster) and remote (copy in)
● HDFS backed by ephemeral disk or Cinder
○ Ephemeral - /var/lib/nova/instances on compute host
○ Cinder - persistent block devices attached to instances
Sahara’s plugin architecture
● This is important!
● It’s where Hadoop distribution vendors
integrate their management software
● It’s how users pick different software
versions
● Currently: Vanilla (reference impl. w/ Apache
versions), HDP (via Ambari), IDH (via Intel
Manager), and Spark (w/ minimal CDH)
HDP Plugin Overview
● Full support for all Sahara Functionality
● Nova and Neutron network
● Cluster Scaling
● Scale Up
● Swift Integration
● Cinder Support
● Data Locality
● EDP
● Apache Ambari REST API’s used for cluster
provisioning
● Monitoring/Management of clusters via Ambari
● Full support for multiple HDP stacks
● HDP pre-installed or generic VM images
HDP 1.3
● NameNode
● Secondary NameNode
● DataNode
● HDFS
● ZooKeeper
● Ambari Server/Agent
● HCatalog
● Sqoop
● Job Tracker
● Task Tracker
● MapReduce
● Hive
● MySQL
● Pig
● WebHCat Server
● Oozie
● Ganglia
● Nagios
● HBase
HDP Plugin Stack Support
HDP 2.0
● History Server
● MapReduce 2 / YARN
● Resource Manager
● YARN Client
HDP 2.1
● Storm
● Falcon
Com
ing Soon!
Available
Available
HDP 2.1 +
● SOLR
● Cascading
Roadm
ap
Ambari Blueprints
● Two primary goals of Ambari Blueprints
○ Ability to export a complete description of a running
cluster
○ Provide API based cluster installations based on a self-
contained cluster description
● Blueprints contain cluster topology and configuration
information
● Enables Interesting use cases between physical and virtual,
including OpenStack/Sahara
Blueprint API
BLUEPRINT
POST /blueprints/my-
blueprint
CLUSTER
INSTANCE
POST
/clusters/MyCluster
1
2
Example: Single-Node Definitions
{
"configurations" : [
{
”hdfs-site" : {
"dfs.namenode.name.dir" : ”/hadoop/nn"
}
}
],
"host_groups" : [
{
"name" : ”uber-host",
"components" : [
{ "name" : "NAMENODE” },
{ "name" : "SECONDARY_NAMENODE” },
{ "name" : "DATANODE” },
{ "name" : "HDFS_CLIENT” },
{ "name" : "RESOURCEMANAGER” },
{ "name" : "NODEMANAGER” },
{ "name" : "YARN_CLIENT” },
{ "name" : "HISTORYSERVER” },
{ "name" : "MAPREDUCE2_CLIENT” }
],
"cardinality" : "1"
}
],
"Blueprints" : {
"blueprint_name" : "single-node-hdfs-yarn",
"stack_name" : "HDP",
"stack_version" : "2.0"
}
}
{
"blueprint" : "single-node-hdfs-yarn",
"host_groups" :[
{
"name" : ”uber-host",
"hosts" : [
{
"fqdn" : "c6401.ambari.apache.org”
}
]
}
]
}
BLUEPRINT
CLUSTER INSTANCE
Description
• Single-node cluster
• Use HDP 2.0 Stack
• HDFS + YARN + MR2
• Everything on c6401
Demo - youtu.be/vmry_kXqn4c
● http://jayunit100.github.io/bigpetstore/slides
● Bigpetstore
o A full stack hadoop application
o Uses the main players in the hadoop ecosystem
o To demonstrate a single domain
o Just accepted into the Bigtop project!
● Come by the Red Hat booth - G18
Q&A
● Status - Integrated for Juno (Oct 2014)
● Distro - RDO (Fedora/RHEL/CentOS), RHEL
OSP 5, ...
● Home - https://launchpad.net/sahara
● Docs - http://docs.openstack.org/developer/sahara
● Code - https://github.com/openstack/ *sahara*
● Email - openstack-dev w/ [sahara]
● IRC - #openstack-sahara on freenode

Más contenido relacionado

La actualidad más candente

(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per SecondAmazon Web Services
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022Kai Wähner
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark FundamentalsZahra Eskandari
 
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew Ray
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew RayData Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew Ray
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew RayDatabricks
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...Simplilearn
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Securing APIs with Open Policy Agent
Securing APIs with Open Policy AgentSecuring APIs with Open Policy Agent
Securing APIs with Open Policy AgentNordic APIs
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Alex Levenson
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practicelarsgeorge
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionCloudera, Inc.
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon Web Services
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Amazon Web Services
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDBSage Weil
 
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopBig Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopDataWorks Summit
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17spark-project
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftAmazon Web Services
 

La actualidad más candente (20)

(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022The Top 5 Apache Kafka Use Cases and Architectures in 2022
The Top 5 Apache Kafka Use Cases and Architectures in 2022
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew Ray
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew RayData Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew Ray
Data Wrangling with PySpark for Data Scientists Who Know Pandas with Andrew Ray
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Securing APIs with Open Policy Agent
Securing APIs with Open Policy AgentSecuring APIs with Open Policy Agent
Securing APIs with Open Policy Agent
 
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
Hadoop Summit 2015: Performance Optimization at Scale, Lessons Learned at Twi...
 
HBase in Practice
HBase in PracticeHBase in Practice
HBase in Practice
 
Chicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An IntroductionChicago Data Summit: Apache HBase: An Introduction
Chicago Data Summit: Apache HBase: An Introduction
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)Deep Dive - Amazon Elastic MapReduce (EMR)
Deep Dive - Amazon Elastic MapReduce (EMR)
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Amazon Kinesis
Amazon KinesisAmazon Kinesis
Amazon Kinesis
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with HadoopBig Data Business Wins: Real-time Inventory Tracking with Hadoop
Big Data Business Wins: Real-time Inventory Tracking with Hadoop
 
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17Deep Dive with Spark Streaming - Tathagata  Das - Spark Meetup 2013-06-17
Deep Dive with Spark Streaming - Tathagata Das - Spark Meetup 2013-06-17
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF Loft
 

Destacado

Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopDataWorks Summit
 
Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014spinningmatt
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewKonstantin V. Shvachko
 
Asca Perception Data & Surveys
Asca Perception Data & SurveysAsca Perception Data & Surveys
Asca Perception Data & Surveysshashley14
 
IBM Endpoint Manager for Mobile Devices (Overview)
IBM Endpoint Manager for Mobile Devices (Overview)IBM Endpoint Manager for Mobile Devices (Overview)
IBM Endpoint Manager for Mobile Devices (Overview)Kimber Spradlin
 
Top 10 implementation specialist interview questions and answers
Top 10 implementation specialist interview questions and answersTop 10 implementation specialist interview questions and answers
Top 10 implementation specialist interview questions and answersjomdare
 
Web Application Optimization Techniques
Web Application Optimization TechniquesWeb Application Optimization Techniques
Web Application Optimization Techniquestakinbo
 
Mandibular nerve block and mental nerve / oral surgery courses
Mandibular nerve block and mental nerve / oral surgery courses  Mandibular nerve block and mental nerve / oral surgery courses
Mandibular nerve block and mental nerve / oral surgery courses Indian dental academy
 
Training i-staad pro 2007
Training i-staad pro 2007Training i-staad pro 2007
Training i-staad pro 2007fazil64
 
Telecom Fraud Detection - Naive Bayes Classification
Telecom Fraud Detection - Naive Bayes ClassificationTelecom Fraud Detection - Naive Bayes Classification
Telecom Fraud Detection - Naive Bayes ClassificationMaruthi Nataraj K
 
Strategic review (Sample)
Strategic review (Sample)Strategic review (Sample)
Strategic review (Sample)guestbbb20c4
 
Managed Print Services Presentation
Managed Print Services PresentationManaged Print Services Presentation
Managed Print Services PresentationLarry Levine
 

Destacado (14)

Hello OpenStack, Meet Hadoop
Hello OpenStack, Meet HadoopHello OpenStack, Meet Hadoop
Hello OpenStack, Meet Hadoop
 
Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014Hadoop and OpenStack - Hadoop Summit San Jose 2014
Hadoop and OpenStack - Hadoop Summit San Jose 2014
 
Distributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology OverviewDistributed Computing with Apache Hadoop: Technology Overview
Distributed Computing with Apache Hadoop: Technology Overview
 
Asca Perception Data & Surveys
Asca Perception Data & SurveysAsca Perception Data & Surveys
Asca Perception Data & Surveys
 
IBM Endpoint Manager for Mobile Devices (Overview)
IBM Endpoint Manager for Mobile Devices (Overview)IBM Endpoint Manager for Mobile Devices (Overview)
IBM Endpoint Manager for Mobile Devices (Overview)
 
Top 10 implementation specialist interview questions and answers
Top 10 implementation specialist interview questions and answersTop 10 implementation specialist interview questions and answers
Top 10 implementation specialist interview questions and answers
 
Web Application Optimization Techniques
Web Application Optimization TechniquesWeb Application Optimization Techniques
Web Application Optimization Techniques
 
Mandibular nerve block and mental nerve / oral surgery courses
Mandibular nerve block and mental nerve / oral surgery courses  Mandibular nerve block and mental nerve / oral surgery courses
Mandibular nerve block and mental nerve / oral surgery courses
 
Training i-staad pro 2007
Training i-staad pro 2007Training i-staad pro 2007
Training i-staad pro 2007
 
Telecom Fraud Detection - Naive Bayes Classification
Telecom Fraud Detection - Naive Bayes ClassificationTelecom Fraud Detection - Naive Bayes Classification
Telecom Fraud Detection - Naive Bayes Classification
 
 Traumatic bone cyst
 Traumatic bone cyst Traumatic bone cyst
 Traumatic bone cyst
 
Strategic review (Sample)
Strategic review (Sample)Strategic review (Sample)
Strategic review (Sample)
 
SRAM Design
SRAM DesignSRAM Design
SRAM Design
 
Managed Print Services Presentation
Managed Print Services PresentationManaged Print Services Presentation
Managed Print Services Presentation
 

Similar a Hadoop and OpenStack

Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStackHong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStackSergey Lukjanov
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsGeoffrey Fox
 
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Sergey Lukjanov
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSergey Lukjanov
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...spinningmatt
 
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...OpenShift Origin
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platformnvvrajesh
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentationAmrut Patil
 
Openstack For Beginners
Openstack For BeginnersOpenstack For Beginners
Openstack For Beginnerscpallares
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?DataWorks Summit
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to SchoolAdam Doyle
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructuredatastack
 
Upcoming services in OpenStack
Upcoming services in OpenStackUpcoming services in OpenStack
Upcoming services in OpenStackCisco DevNet
 
State of openstack industry: Why we are doing this
State of openstack industry: Why we are doing thisState of openstack industry: Why we are doing this
State of openstack industry: Why we are doing thisDmitriy Novakovskiy
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming DataGeoffrey Fox
 
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014spinningmatt
 

Similar a Hadoop and OpenStack (20)

Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStackHong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
Hong Kong OpenStack Summit: Savanna - Hadoop on OpenStack
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Cloud Services for Big Data Analytics
Cloud Services for Big Data AnalyticsCloud Services for Big Data Analytics
Cloud Services for Big Data Analytics
 
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
Atlanta OpenStack Summit: The State of OpenStack Data Processing: Sahara, Now...
 
Savanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStackSavanna - Elastic Hadoop on OpenStack
Savanna - Elastic Hadoop on OpenStack
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
 
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
HdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft PlatformHdInsight essentials Hadoop on Microsoft Platform
HdInsight essentials Hadoop on Microsoft Platform
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Big data processing using hadoop poster presentation
Big data processing using hadoop poster presentationBig data processing using hadoop poster presentation
Big data processing using hadoop poster presentation
 
Openstack For Beginners
Openstack For BeginnersOpenstack For Beginners
Openstack For Beginners
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School9/2017 STL HUG - Back to School
9/2017 STL HUG - Back to School
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
 
Upcoming services in OpenStack
Upcoming services in OpenStackUpcoming services in OpenStack
Upcoming services in OpenStack
 
State of openstack industry: Why we are doing this
State of openstack industry: Why we are doing thisState of openstack industry: Why we are doing this
State of openstack industry: Why we are doing this
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
 
Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014Hadoop on OpenStack - Sahara @DevNation 2014
Hadoop on OpenStack - Sahara @DevNation 2014
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Último (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Hadoop and OpenStack

  • 1. Hadoop and OpenStack Matthew Farrellee, @spinningmatt, Red Hat Sumit Mohanty, @smohanty, Hortonworks
  • 3. OpenStack is A cloud operating system that controls large pools of compute, storage, and networking resources throughout a datacenter, all managed through a dashboard that gives administrators control while empowering their users to provision resources through a web interface.
  • 4. An ecosystem of projects ● Compute - Nova ● Networking - Neutron ● Object Storage - Swift ● Block Storage - Cinder ● Identity - Keystone ● Image Service - Glance ● Dashboard - Horizon ● Telemetry - Ceilometer ● Orchestration - Heat ● Data Processing - Sahara
  • 6. Trends Hadoop EC2 OpenStack www.google.com/trends/explore#q=hadoop,ec2,openstack EC2 beta Aug 25 2006 (http://aws.typepad. com/aws/2006/08/amazon_ec2_beta.html)
  • 8. Data analysis is hard... ● Come up w/ a relevant question ○ The question you answer won’t be the question you set out to ask ○ Mine: Can I predict doctor specialty from what procedures they perform? ● Find the data ○ Tons, little consistency, unknown origin, hidden in silos, horded ○ Data w/o a dictionary is worse than code w/o
  • 9. Data analysis is hard... ● Data usability ○ Acceptable license? (Even for Gov’t sets) ■ Mine: Metadata copyrighted by AMA! ○ Private is often highly protected, no/narrow DMZ ● Explore and clean ○ Two of the oldest people in the medical profession working with medicare ○ Stephen Glasser graduated in 1773 ○ Cheryl Palma graduated in 1776
  • 10. Data analysis is hard... ● You got some answer to a question you approximately asked ● You must refine the question and process ● Repeat This is hard enough without having to manage tools and infrastructure!
  • 11. Sahara’s goal Make managing Hadoop+ infrastructure and tools so simple that they get out of your way
  • 12. Sahara provides ● Apache Hadoop cluster and workload management ○ Cluster - construct and manage the lifecycle of a Hadoop cluster ○ Workload - workflow for big data processing with Hadoop (AWS EMR-like) ● Through a Python library, REST API, Web UI, command line interface
  • 14. Sahara’s features ● Plugin mechanism - distro choice ● Cluster scaling - elasticity ● Swift integration - data storage ● Cinder integration - persistent HDFS ● Network management with Nova and Neutron ● Anti-affinity, separate services on physical hardware ● Data locality with Swift ● Repeatable cluster creation w/ template mechanism ● http://docs.openstack. org/developer/sahara/userdoc/features.html
  • 15. Storage considerations ● Swift ○ Input/output through Swift HCFS plugin ○ Intermediate data stored in HDFS on cluster ○ Locality when co-locating swift & nova-compute ● HDFS ○ Local (long lived cluster) and remote (copy in) ● HDFS backed by ephemeral disk or Cinder ○ Ephemeral - /var/lib/nova/instances on compute host ○ Cinder - persistent block devices attached to instances
  • 16. Sahara’s plugin architecture ● This is important! ● It’s where Hadoop distribution vendors integrate their management software ● It’s how users pick different software versions ● Currently: Vanilla (reference impl. w/ Apache versions), HDP (via Ambari), IDH (via Intel Manager), and Spark (w/ minimal CDH)
  • 17. HDP Plugin Overview ● Full support for all Sahara Functionality ● Nova and Neutron network ● Cluster Scaling ● Scale Up ● Swift Integration ● Cinder Support ● Data Locality ● EDP ● Apache Ambari REST API’s used for cluster provisioning ● Monitoring/Management of clusters via Ambari ● Full support for multiple HDP stacks ● HDP pre-installed or generic VM images
  • 18. HDP 1.3 ● NameNode ● Secondary NameNode ● DataNode ● HDFS ● ZooKeeper ● Ambari Server/Agent ● HCatalog ● Sqoop ● Job Tracker ● Task Tracker ● MapReduce ● Hive ● MySQL ● Pig ● WebHCat Server ● Oozie ● Ganglia ● Nagios ● HBase HDP Plugin Stack Support HDP 2.0 ● History Server ● MapReduce 2 / YARN ● Resource Manager ● YARN Client HDP 2.1 ● Storm ● Falcon Com ing Soon! Available Available HDP 2.1 + ● SOLR ● Cascading Roadm ap
  • 19. Ambari Blueprints ● Two primary goals of Ambari Blueprints ○ Ability to export a complete description of a running cluster ○ Provide API based cluster installations based on a self- contained cluster description ● Blueprints contain cluster topology and configuration information ● Enables Interesting use cases between physical and virtual, including OpenStack/Sahara
  • 21. Example: Single-Node Definitions { "configurations" : [ { ”hdfs-site" : { "dfs.namenode.name.dir" : ”/hadoop/nn" } } ], "host_groups" : [ { "name" : ”uber-host", "components" : [ { "name" : "NAMENODE” }, { "name" : "SECONDARY_NAMENODE” }, { "name" : "DATANODE” }, { "name" : "HDFS_CLIENT” }, { "name" : "RESOURCEMANAGER” }, { "name" : "NODEMANAGER” }, { "name" : "YARN_CLIENT” }, { "name" : "HISTORYSERVER” }, { "name" : "MAPREDUCE2_CLIENT” } ], "cardinality" : "1" } ], "Blueprints" : { "blueprint_name" : "single-node-hdfs-yarn", "stack_name" : "HDP", "stack_version" : "2.0" } } { "blueprint" : "single-node-hdfs-yarn", "host_groups" :[ { "name" : ”uber-host", "hosts" : [ { "fqdn" : "c6401.ambari.apache.org” } ] } ] } BLUEPRINT CLUSTER INSTANCE Description • Single-node cluster • Use HDP 2.0 Stack • HDFS + YARN + MR2 • Everything on c6401
  • 22. Demo - youtu.be/vmry_kXqn4c ● http://jayunit100.github.io/bigpetstore/slides ● Bigpetstore o A full stack hadoop application o Uses the main players in the hadoop ecosystem o To demonstrate a single domain o Just accepted into the Bigtop project! ● Come by the Red Hat booth - G18
  • 23. Q&A ● Status - Integrated for Juno (Oct 2014) ● Distro - RDO (Fedora/RHEL/CentOS), RHEL OSP 5, ... ● Home - https://launchpad.net/sahara ● Docs - http://docs.openstack.org/developer/sahara ● Code - https://github.com/openstack/ *sahara* ● Email - openstack-dev w/ [sahara] ● IRC - #openstack-sahara on freenode