Más contenido relacionado La actualidad más candente (20) Similar a Discover.hdp2.2.h base.final[2] (20) Discover.hdp2.2.h base.final[2]1. Page 1 © Hortonworks Inc. 2014
Discover HDP 2.2:
Apache HBase with YARN & Slider for Fast NoSQL Access
Hortonworks. We do Hadoop.
2. Page 2 © Hortonworks Inc. 2014
Speakers
Justin Sears
Hortonworks Product Marketing Manager
Carter Shanklin
Hortonworks Director of Product Management & PM for
Apache HBase in Hortonworks Data Platform
Enis Soztutar
Hortonworks Engineer, Apache HBase Committer & PMC Member
3. Page 3 © Hortonworks Inc. 2014
Agenda
• Introduction to Apache HBase
• New HBase Innovation in HDP 2.2
– HBase HA
– Support for rolling upgrades
– HBase on YARN using Apache Slider
• Q & A
We’ll move quickly:
• Attendee phone lines are muted
• Text any questions to Enis Soztutar using Webex chat
• Questions answered at the end
• Unanswered questions and answers in upcoming blog post
4. Page 4 © Hortonworks Inc. 2014
Big Data, Hadoop & Data Center Re-platforming
Business Drivers
• From reactive analytics
to proactive interactions
• Insights that drive
competitive advantage
& optimal returns
Financial Drivers
• Cost of data systems, as
% of IT spend,
continues to grow
• Cost advantages of
commodity hardware
& open source software
$
Technical Drivers
• Data is growing
exponentially & existing
systems overwhelmed
• Predominantly driven by
NEW types of data that
can inform analytics
There is an inequitable balance between vendor and customer in the market
5. Page 5 © Hortonworks Inc. 2014
Clickstream
Capture and analyze
website visitors’ data
trails and optimize
your website
Sensors
Discover patterns in
data streaming
automatically from
remote sensors and
machines
Server Logs
Research logs to
diagnose process
failures and prevent
security breaches
New Types of DataHadoop Value:
Sentiment
Understand how
your customers feel
about your brand
and products –
right now
Geographic
Analyze location-
based data to
manage operations
where they occur
Unstructured
Understand patterns
in files across millions
of web pages, emails,
and documents
6. Page 6 © Hortonworks Inc. 2014
A Shift from Reactive to Proactive Interactions
HDP and Hadoop allow
organizations to use
data to shift interactions
from…
Reactive
Post Transaction
Proactive
Pre Decision
…to Real-time PersonalizationFrom static branding
…to repair before breakFrom break then fix
…to Designer MedicineFrom mass treatment
…to Automated AlgorithmsFrom Educated Investing
…to 1x1 TargetingFrom mass branding
A shift in Advertising
A shift in Financial Services
A shift in Healthcare
A shift in Retail
A shift in Telco
7. Page 7 © Hortonworks Inc. 2014
Enterprise Goals for the Modern Data Architecture
• Consolidate siloed data sets structured
and unstructured
• Central data set on a single cluster
• Multiple workloads across batch
interactive and real time
• Central services for security, governance
and operation
• Preserve existing investment in current
tools and platforms
• Single view of the customer, product,
supply chain
APPLICATIONSDATASYSTEM
Business
Analytics
Custom
Applications
Packaged
Applications
RDBMS
EDW
MPP
YARN: Data Operating System
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° N
Interactive Real-TimeBatch
CRM
ERP
Other
1 ° ° °
° ° ° °
HDFS
(Hadoop Distributed File System)
SOURCES
EXISTING
Systems
Clickstream
Web
&Social
Geoloca9on
Sensor
&
Machine
Server
Logs
Unstructured
8. Page 8 © Hortonworks Inc. 2014
YARN Transformed Hadoop & Opened a New Era
YARN
The Architectural
Center of Hadoop
• Common data platform, many applications
• Support multi-tenant access & processing
• Batch, interactive & real-time use cases
YARN: Data Operating System
(Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez
Tez
Java
Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
Others
ISV
Engines
HDFS
(Hadoop Distributed File System)
Stream
Storm
Search
Solr
NoSQL
HBase
Accumulo
Slider
Slider
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
9. Page 9 © Hortonworks Inc. 2014
YARN Extends Hadoop to Other Data Center Leaders
YARN
The Architectural
Center of Hadoop
• Common data platform, many applications
• Support multi-tenant access & processing
• Batch, interactive & real-time use cases
• Supports 3rd-party ISV tools
(ex. SAS, Syncsort, Actian, etc.)
YARN Ready Applications
Facilitates ongoing innovation and enterprise adoption via
ecosystem of new and existing “YARN Ready” solutions
YARN: Data Operating System
(Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez
Tez
Java
Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
Others
ISV
Engines
HDFS
(Hadoop Distributed File System)
Stream
Storm
Search
Solr
NoSQL
HBase
Accumulo
Slider
Slider
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
10. Page 10 © Hortonworks Inc. 2014
Enterprise Hadoop: Central Set of Services
YARN: Data Operating System
(Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
° °
° °
° ° ° ° °
° ° ° ° °
Enables Apache Hadoop to be
an Enterprise Data Platform
with centralized services for:
• Governance
• Operations
• Security
Everything that plugs into
Hadoop inherits these services
Provision,
Manage &
Monitor
Ambari
Zookeeper
Scheduling
Oozie
Load data and
manage
according
to policy
Deploy and
effectively
manage the
platform
Provide layered
approach to
security through
Authentication,
Authorization,
Accounting, and
Data Protection
SECURITYGOVERNANCE OPERATIONS
Script
Pig
SQL
Hive
Java
Scala
Cascading
Stream
Storm
Search
Solr
NoSQL
HBase
Accumulo
BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Others
ISV
Engines
YARN: Data Operating System
(Cluster Resource Management)
HDFS
(Hadoop Distributed File System)
Tez
Slider
Slider
Tez
Tez
11. Page 11 © Hortonworks Inc. 2014
Hortonworks Data Platform 2.2
HDP Delivers Enterprise Hadoop
YARN: Data Operating System
(Cluster Resource Management)
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez
Tez
Java
Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
HDFS
(Hadoop Distributed File System)
Stream
Storm
Search
Solr
NoSQL
HBase
Accumulo
Slider
Slider
SECURITYGOVERNANCE OPERATIONSBATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Provision,
Manage &
Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow,
Lifecycle &
Governance
Falcon
Sqoop
Flume
Kafka
NFS
WebHDFS
Authentication
Authorization
Audit
Data Protection
Storage: HDFS
Resources: YARN
Access: Hive
Pipeline: Falcon
Cluster: Ranger
Cluster: Knox
Deployment ChoiceLinux Windows Cloud
YARN is the architectural
center of HDP
• Common data set across all
applications
• Batch, interactive & real-time
workloads
• Multi-tenant access & processing
Provides comprehensive
enterprise capabilities
• Governance
• Security
• Operations
Enables broad
ecosystem adoption
• ISVs can plug directly into Hadoop
The widest range of deployment options
• Linux & Windows
• On premises & cloud
Others
ISV
Engines
On-Premises
12. Page 12 © Hortonworks Inc. 2014
Hortonworks Data Platform 2.2
HDP Delivers Enterprise Hadoop
1 ° ° ° ° ° ° °
° ° ° ° ° ° ° °
Script
Pig
SQL
Hive
Tez
Tez
Java
Scala
Cascading
Tez
° °
° °
° ° ° ° °
° ° ° ° °
HDFS
(Hadoop Distributed File System)
Stream
Storm
Search
Solr
Slider
SECURITYGOVERNANCE OPERATIONSBATCH, INTERACTIVE & REAL-TIME DATA ACCESS
In-Memory
Spark
Provision,
Manage &
Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow,
Lifecycle &
Governance
Falcon
Sqoop
Flume
Kafka
NFS
WebHDFS
Authentication
Authorization
Audit
Data Protection
Storage: HDFS
Resources: YARN
Access: Hive
Pipeline: Falcon
Cluster: Ranger
Cluster: Knox
YARN is the architectural
center of HDP
• Common data set across all
applications
• Batch, interactive & real-time
workloads
• Multi-tenant access & processing
Provides comprehensive
enterprise capabilities
• Governance
• Security
• Operations
Enables broad
ecosystem adoption
• ISVs can plug directly into Hadoop
The widest range of deployment options
• Linux & Windows
• On premises & cloud
Others
ISV
Engines
YARN: Data Operating System
(Cluster Resource Management)
Deployment ChoiceLinux Windows CloudOn-Premises
NoSQL
HBase
Accumulo
Slider
13. Page 13 © Hortonworks Inc. 2014
Introduction to Apache HBase
14. Page 14 © Hortonworks Inc. 2014
What Is Apache HBase?
Flexible
Schema
Extreme
Low
Latency
SQL
and
NoSQL
Interfaces
Store
and
Process
Petabytes
of
Data
Scale
out
on
Commodity
Servers
Integrated
with
YARN
100%
Open
Source
YARN
:
Data
Opera9ng
System
HBase
RegionServer
1
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
°
N
HDFS
(Permanent
Data
Storage)
HBase
RegionServer
HBase
RegionServer
Flexible Schema
Extreme Low Latency
Directly Integrated with Hadoop
15. Page 15 © Hortonworks Inc. 2014
New in HDP 2.2: HBase HA
16. Page 16 © Hortonworks Inc. 2014
Primary
Keys:
(Read
Write)
1-‐100
Standby
Keys:
(Read
Only)
101-‐200
201-‐300
Primary
Keys:
(Read
Write)
101-‐200
Standby
Keys:
(Read
Only)
201-‐300
301-‐400
Primary
Keys:
(Read
Write)
201-‐300
Standby
Keys:
(Read
Only)
301-‐400
1-‐100
Primary
Keys:
(Read
Write)
301-‐400
Standby
Keys:
(Read
Only)
1-‐100
101-‐200
HBase
RegionServer
1
HBase
RegionServer
2
HBase
RegionServer
3
HBase
RegionServer
4
HDFS
(3
Copies
of
All
Data,
Available
to
all
RegionServers)
1
2
3
1 HBase
Keys
are
range
parVVoned
across
servers,
node
failure
affects
1
key
range,
rest
remain
available.
2 HBase
HA
stores
read-‐only
copies
in
separate
RegionServers.
Data
can
sVll
be
read
if
a
node
fails.
3 3
copies
of
all
data
stored
in
HDFS.
Data
from
failed
nodes
automaVcally
recovered
on
other
nodes.
HBase
HA:
3
Levels
of
Protec9on
17. Page 17 © Hortonworks Inc. 2014
Comparing HBase HA Phase 1 Versus 2
Item
HA
Phase
1
/
HDP
2.1
HA
Phase
2
/
HDP
2.2
Data
Staleness
>
30s
Near
Zero
HA
in
Scans
Unsupported
Supported
Region
Split/Merge
Disabled
Supported
META
Table
Highly
Available
Unsupported
Supported
HBCK
check
for
common
HA
problems
Unsupported
Supported
18. Page 18 © Hortonworks Inc. 2014
New in HDP 2.2: Rolling Upgrades
19. Page 19 © Hortonworks Inc. 2014
Rolling Upgrade Goals
Zero downtime upgrades
Roll forward and roll backward
Update clients and servers independently
20. Page 20 © Hortonworks Inc. 2014
HBase Rolling Upgrade: Component Overview
New
Package
Format
Install
mulVple
versions
of
Hadoop
so`ware
on
a
single
node
or
cluster.
hdp-‐select
U9lity
Choose
the
component
version
you
want,
roll
forward
or
backward.
Decoupled
Clients
and
Servers
Upgrade
servers
independently
of
clients.
21. Page 21 © Hortonworks Inc. 2014
HBase Rolling Upgrade: Directory Layout
Directory
Layout:
/usr/hdp
[root@cluster1
current]#
pwd
/usr/hdp/current
[root@cluster1
current]#
ls
-‐l
|
grep
hbase
lrwxrwxrwx.
1
root
root
27
Dec
6
22:57
hbase-‐client
-‐>
/usr/hdp/2.2.0.0-‐1995/hbase
lrwxrwxrwx.
1
root
root
27
Dec
6
22:57
hbase-‐master
-‐>
/usr/hdp/2.2.0.0-‐1995/hbase
lrwxrwxrwx.
1
root
root
27
Dec
6
22:57
hbase-‐regionserver
-‐>
/usr/hdp/2.2.0.0-‐1995/hbase
[root@cluster1
hdp]#
pwd
/usr/hdp
[root@cluster1
hdp]#
ls
-‐l
drwxr-‐xr-‐x.
19
root
root
4096
Nov
15
07:26
2.2.0.0-‐1995
drwxr-‐xr-‐x.
2
root
root
4096
Dec
7
01:22
2.2.0.1-‐2217
drwxr-‐xr-‐x.
2
root
root
4096
Dec
6
22:57
current
Multiple versions of
the HDP stack.
Within
/usr/hdp/current
22. Page 22 © Hortonworks Inc. 2014
HBase Rolling Upgrade: Upgrade One Component
hdp-‐select
[root@cluster1
hdp]#
hdp-‐select
status
|
grep
hbase
hbase-‐client
-‐
2.2.0.0-‐1995
hbase-‐master
-‐
2.2.0.0-‐1995
hbase-‐regionserver
-‐
2.2.0.0-‐1995
Upgrade
Servers
Before
Clients
[root@cluster1
hdp]#
hdp-‐select
set
hbase-‐master
2.2.0.1-‐2217
[root@cluster1
current]#
pwd
/usr/hdp/current
[root@cluster1
current]#
ls
-‐l
|
grep
hbase
lrwxrwxrwx.
1
root
root
27
Dec
6
22:57
hbase-‐client
-‐>
/usr/hdp/2.2.0.0-‐1995/hbase
lrwxrwxrwx.
1
root
root
27
Dec
7
02:23
hbase-‐master
-‐>
/usr/hdp/2.2.0.1-‐2217/hbase
lrwxrwxrwx.
1
root
root
27
Dec
6
22:57
hbase-‐regionserver
-‐>
/usr/hdp/2.2.0.0-‐1995/hbase
23. Page 23 © Hortonworks Inc. 2014
Rolling Upgrade Contracts
Rolling Upgrade works for minor upgrades.
• Example: HDP 2.2.0 to HDP 2.2.1.
Wire compatibility guaranteed between clients and servers.
Binary compatibility guaranteed, e.g. for coprocessors.
Data format compatibility guaranteed.
24. Page 24 © Hortonworks Inc. 2014
Rolling Upgrade Benefits
Rolling
Upgrade
Benefit
Upgrade
with
zero
downVme.
Roll
forward
and
roll
backward.
Instant
switchover
/
restart
preserve
data
locality
when
upgrading
HBase.
Update
servers
and
clients
independently.
25. Page 25 © Hortonworks Inc. 2014
New in HDP 2.2: HBase on YARN via Slider
26. Page 26 © Hortonworks Inc. 2014
Deploying HBase with Slider
What is it?
• Deploy HBase into the Hadoop cluster using YARN.
Benefit Details
Simplified Deployment No need to deploy HBase or its configuration to individual cluster nodes.
Lifecycle Management Start / stop / process management handled automatically.
Multitenancy Different users can run HBase clusters within one Hadoop cluster.
Multiple Versions Run different versions of HBase (e.g. 0.98 and 1.0) on the same cluster.
Elasticity Cluster size is a parameter and easily changed.
Co-located Analytics HBase resource usage is known to YARN, nodes running HBase will not
be used as heavily to satisfy MapReduce or Tez jobs.
27. Page 27 © Hortonworks Inc. 2014
HBase / Slider Sample
Configure HBase settings in appConfig.json and resources.json
Sample Slider Command:
• slider
create
mycluster
-‐-‐template
appConfig.json
-‐-‐resources
resources.json
{
"schema":
"http://example.org/specification/v2.0.0",
"metadata":
{
},
"global":
{
"site.hbase-‐site.hbase.hstore.flush.retries.number":
"120",
"site.hbase-‐site.hbase.client.keyvalue.maxsize":
"10485760",
"site.hbase-‐site.hbase.hstore.compactionThreshold":
"3",
"site.hbase-‐site.hbase.rootdir":
"${DEFAULT_DATA_DIR}/data",
"site.hbase-‐site.hbase.stagingdir":
"${DEFAULT_DATA_DIR}/staging",
"site.hbase-‐site.hbase.regionserver.handler.count":
"60”,
...
29. Page 29 © Hortonworks Inc. 2014
Thank you!
Learn more at:
hortonworks.com/hadoop/hbase/
Register for the last
Discover HDP 2.2 Webinar
Hortonworks.com/webinars