Más contenido relacionado
La actualidad más candente (20)
Similar a Apache Ambari BOF - OpenStack - Hadoop Summit 2013 (20)
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
- 1. © Hortonworks Inc. 2013
Hadoop + OpenStack
integration Roadmap
Himanshu Bari
June 28th, 2013
Sr. Product Manager
hbari@hortonworks.com
- 2. © Hortonworks Inc. 2013
Disclaimer
• This document may contain product features and technology directions
that are under development or may be under development in the future.
• Technical feasibility, market demand, user feedback, and the Apache
Software Foundation community development process can all affect
timing and final delivery.
• This document’s description of these features and technology
directions does not represent a contractual commitment from
Hortonworks to deliver these features in any generally available
product.
• Product features and technology directions are subject to change, and
must not be included in contracts, purchase orders, or sales
agreements of any kind.
- 4. © Hortonworks Inc. 2013
Big Data & Cloud
Intersection
Point è2013
Big Data & Cloud are top priority for CIOs
Page 4
*
- 5. © Hortonworks Inc. 2013
OpenStack is an open source cloud
management platform
Glance
Image Service
Keystone
Identity Service
Horizon
QuantumNova
Cinder
Block Store
Swift
Object Store
(Apache License)
Ceilometer
Metering
Heat
Orchestration
Integrated
Mutli-hypervisor & guest OS
support
- 6. © Hortonworks Inc. 2013
OpenStack has taken over Amazon AWS in
market awareness…
Source: Google trends
- 7. © Hortonworks Inc. 2013
Maturing quickly with broad support..
Pushed
by
150+
vendors
Millions
of
dollars
in
venture
capital
Early
adop;on
across
all
ver;cals
- 8. © Hortonworks Inc. 2013
Why Hadoop & OpenStack?
Hadoop provides a greenfield
use case
• Net new workload
• Needs scale out
infrastructure
• Shared platform
OpenStack provides the perfect
cloud platform
• Operational agility
• Supports scale out architecture
• Deployment choice across
public & private clouds
1. Open source communities provide the fastest path to innovation
2. Open source is changing the game as economics and accessibility serve to
accelerate cloud & big data market trends
3. Both are attracting major ecosystem players: IBM, RHT, HP, RAX, etc…
Marries two of the largest open source movements
- 9. © Hortonworks Inc. 2013
Accelerate Adoption of Hadoop on OpenStack
Page 9
The leading contributor
to Apache Hadoop
The leading system
integrator for OpenStack
The leading contributor
to OpenStack
Apache Hadoop…
The killer app for OpenStack
- 10. © Hortonworks Inc. 2013
OpenStack Infrastructure
Savanna
Elastic Hadoop Controller
Collaborating on Project Savanna
Page 10
Swift
storage
Hadoop Cluster
N
N
N
N
N
N
2
Ambari
Hadoop management
- - + +
N
N
N
N
1
3
1. Cluster templates: deploy
pre configured Hadoop
clusters in seconds from
Horizon or Ambari
2. HDFS-Swift connectors:
move data between HDFS
and Swift object storage
3. Simplified elasticity
Project Savanna
Automate deployment of
Apache Hadoop on
OpenStack
- 12. © Hortonworks Inc. 2013
Focus on API driven tight integration
Hide Hadoop complexity
through APIs
“It Just Works” experience
Fully leverage virtualization
Scalability, Reliability,
Performance
Project Savanna
design Goals
- 13. © Hortonworks Inc. 2013
Problems driving use cases
Finance
Compliance
ITMarketing
Web
Mobile
Sensor
Interactive
Batch
Dev QA
Prod
Operational nightmare of
supporting multiple cluster flavors
Lack of agility
Underutilized
resources
Maintenance
complications
Cluster requirements vary by business unit,
data type & analytics use case
Can’t migrate from public to private cloud
- 14. © Hortonworks Inc. 2013
Provisioning related use cases
- Frequent dev/test/staging cluster provision requests
- Migrations from staging to prod and vice versa
- Reduce operator error in cluster provisioning
- Migrate away from Amazon EMR for Ad hoc analytics
requests to support experimentation
- 15. © Hortonworks Inc. 2013
Simplified provisioningPhase-1Phase-2
Use as is Single click
provisioning
Modify
Update VM
resource
allocation,
service to
VM mapping
and service
config
Provision
and/or save
template
Template based provisioning
Hadoop as a service (job flow based provisioning)
Pick
job
type
+
Cascading,
streaming
&
custom
jar
Upload data
to Swift
Get results in
Swift
Cluster
template
E.g.
QA
cluster
Node
template
a.
Resource
based
-‐
node.Large
b.
Func;on
based
-‐
node.NameNode
Modify
- 17. © Hortonworks Inc. 2013
Swift object store support
Phase-1
Phase-2 Bug fixes & optimizations
Read/write data from/to Swift object stores
Option-1: Copy data from Swift to HDFS, run mapreduce
and copy results back to swift
Option-2: Run mapreduce directly on top of Swift (Output
data still needs to be copied from HDFS to Swift)
- 18. © Hortonworks Inc. 2013
Elasticity related use cases
- Commission a new node or decommission a node for
maintenance
- For dev/test/staging clusters: automatically vary
cluster data & compute capacity based on tenant,
workload, time of day, resource utilization etc.
- Automatically vary compute capacity for production
clusters
- 19. © Hortonworks Inc. 2013
Elasticity
Nodeelasticity
(computeand/ordata)
Manual
Rule
based
Long lived Short lived
Cluster life
(Swift or HDFS used for storage)
Phase-1
Phase-2
Handle variable
workloads eg. Alter
cluster compute node
count for peak/off-peak
hrs.
Job flow based
clusters for
ad-hoc analysis
Best for
Dev/QA use
Best for predictable
workloads.
- 20. © Hortonworks Inc. 2013
Multi-tenancy related use cases
- Improve server utilization by creating a common
server pool for Hadoop and non Hadoop workloads
- Simplify maintenance & upgrade testing with the
ability to multiple Hadoop clusters with different
versions on the same server pool
- Support varying SLAs based on tenant and workload
through resource isolation provided by VMs
- Simplify chargeback/showback
- 21. © Hortonworks Inc. 2013
Multi-tenancy
Phase-1
Phase-2
• Access isolation
• Single sign-on for Ambari & HUE through Keystone
integration
• Dedicated Ambari & HUE instance per cluster per
tenant
• Resource isolation
• CPU, memory isolation through VMs
• Ability to pin a Hadoop VM to a given set of physical
hosts to enable per tenant physical host isolation
• Version isolation
• Choice of Hadoop versions for tenants
• Access isolation
• Single Ambari instance per tenant ( multi-cluster
support with Ambari)
• Keystone enhancements to support Hadoop job flow
level RBAC to support Hadoop as a service
- 23. © Hortonworks Inc. 2013
Savanna logical architecture
OpenStack Infrastructure
Network Storage
Security Compute
Savanna
Controller
HDP Savanna plugin
API
Hadoop
Provisioning
Ambari template
management
Horizon +
Savanna UI
A
P
I
Configuration Elasticity
Orchestration
Plugin manager
Hadoop Cluster
Ambari + API
- 24. © Hortonworks Inc. 2013
Provisioning workflow overview
24
Horizon
Savanna
Controller
+
HDP OpenStack
Plugin
Nova
Glance
Cluster
request
Provisions
vanilla
VMs
Ambari
configures all
services and
starts the
cluster
VM IMAGE
OS only
OR
Pre loaded
with HDP bits
HDP plugin
passes
cluster
template to
Ambari
Hadoop
Cluster
…
…
HDP
Plugin
installs
Ambari
Ambari
Server
HUE
NN JT DNDN
- 25. © Hortonworks Inc. 2013
Ambari based cluster templates
Preconfigured information across all
clusters using this template
HDP Stack Information
- Services & Components & Packages
- Description
- Package Dependencies
Hadoop Topology
Component / Host Group Mapping
Hadoop Configuration
All Hadoop Configuration for the Cluster
(hundreds of parameters and their
values)
Per cluster pluggable data
- User names
- Passwords
- Host names
- Host VM flavors ( CPU/Mem)
- Node count per host group
……….
……….
……….
……….
- 26. © Hortonworks Inc. 2013
Swift object store support (Hadoop-8545)
Dir
File1 file2 file3
KEYSTONE
Dir/file1
Dir/file2
MapReduce,
pig & Hive
Swift store-1
Create, read, write,
delete, mkdir, ls, mv
& stat
HDFS
+
Swift
Bridge
Container -1 Container -2
Swift store-n
…
Dir/file3
Container -1
Input data
Output results
- 27. © Hortonworks Inc. 2013
Hadoop virtualization extensions(HVE)
• Account for the additional ‘node group’ layer so
replicas do not end up on VMs in the same hypervisor
• Available in HDP 1.3. Work in progress to enable in
HDP 2.0 ( YARN & HDFS)
Data
Center
Rack-1
Node
group-1
VM1 VM2
Node
group-2
VM1 VM2
Rack-2
Node
group-1
VM1 VM2
Node
group-2
VM1 VM2
- Replica (place,
choose & remove)
policies
- Balancer policies
- Task placement &
container
allocation(YARN)