Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes

ACHIEVING HBASE
MULTI-TENANCY:
REGIONSERVER
GROUPS
AND
FAVORED NODES
Francis Liu & Thiruvel Thirumoolan
HBase Yahoos

HBase Multi-tenancy @ Y!
• ~45 Tenants
• ~940 RegionServers
• ~300k regions
• RS Peak 115k requests/sec

RegionServer Groups
• Group Membership
• Table
• RegionServer
• Coarse Isolation
• Namespace Integration

Divide and Conquer
RS RS…Group A RS
RS RS…Group B RS
RS RS…Group C RS
RS RS…Group D RS
RS RS…Group E RS

Multi-tenancy with RegionServer Groups
• ~45 namespaces
• ~45 Region server groups
• 4 to 100s of servers
• Up to 2000+ regions per server

Provisioning a RegionServer Group
1. Create a group
hbase> add_rsgroup 'group1’
2. Add Servers
hbase> move_servers_rsgroup 'group1', ['host1:1234',....., 'hostN:1234']
3. Create a namespace
hbase> create_namespace 'yahoo', {'hbase.rsgroup.name’ => 'group1'}
4. Create table in namespace
hbase> create 'yahoo:hadoop', 'f'

Architecture
LoadBalancer
RSGroupBasedLoadBalancer
RSGroupAdminEndpoint
HMaster
FilterBy
Group
foo
bar
RSGroupInfoManager
RSGroup
Table
RSGroup on
ZK

Group Metric Tag
RS RS…Group A RS
RS RS…Group B RS
RS RS…Group C RS
RS RS…Group D RS
RS RS…Group E RS

Dead RegionServer Processing
▪Per Group Queue RegionServer
RegionServer
RegionServer
HBase Master
Zookeeper

RSGroups @ Y!
• Per Group configurations
• hbase-site.xml
• hbase-env.sh
• System Group
• Isolate system tables
• Rolling Upgrade/Restart Per Group
• Different strategies for Balance per Group
• Alerting/Monitoring Per Group
• Namespace Integration
• User run DDL on their own tables in sandbox
• Table and Region Quotas

Overview
▪ HDFS
› File level block placement hint (on file creation)
› Pass a set of preferred hosts to client to replicate data
› preferred hosts => “Favored Nodes” or hints
▪ HBase
› Region level block placement hint
› Select 3 favored nodes for each region - primary, secondary, tertiary
› Constraint: Favored Nodes on 2 racks (where possible)

Motivation
▪ Data Locality
▪ Performance
▪ Network utilization
▪ Datanode isolation
▪ Previous work from FB and Community
› HBASE-4755 (HBase based block placement in DFS)

Enabling Favored Nodes
▪ HBase
› Use Favored node balancer
› Setup tool for creating FN for existing regions
▪ HDFS
› Set “dfs.namenode.replication.considerLoad” to false
› Recommend disabling HDFS balancer

Flow
hbase:meta
RG1 RS1 RS2 RS3
Col: info:fn
Master
FN Cache
Favored Balancer
RG1 RS1 RS2 RS3
Assignment Manager
openRegion
Region Server
RG1 DN1 DN2 DN3
Flush/Compaction

Enhancements - Summary
▪ Umbrella jira HBASE-15531 (design doc)
▪ Balancer
› FavoredStochasticBalancer (HBASE-16942)
› FavoredGroupBalancer – RSGroup version (HBASE-15533)
› Splits/Merges inherit FN
▪ Admin APIs/tools
› redistribute (HBASE-18064)
› complete_redistribute (HBASE-18065)
› removeFN (HBASE-18062)
› checkFN (HBASE-18063)
› hbck (HBASE-17153

Favored Node Balancers
▪ FavoredStochasticBalancer
› Assigns only to FN of a region (user tables)
› New Candidate Generators (FNLocality and FNLoad)
› Recommended same cost for load and locality generators
› Future – Work with Region Replicas
› Future - WALs
▪ FavoredRSGroupLoadBalancer
› Uses FavoredStochasticBalancer
› Recommended minimum 4 nodes per group
› Generated FN within the group servers

Region Split and Merge
▪ Splits
› Each daughter inherits 2 FN from parent
› One FN is randomly generated
› Locality vs Distribution
› FN within rsgroup servers (if enabled)
▪ Merge
› Inherited from one of the parents
› Preserve locality

Distribution
▪ Replica count distribution across favored nodes (FNReplica)
▪ Why is it important?
› Balancer assigns only to FN
› RegionServer crashes
› Uniform load
▪ Sample replica load for a group from production
SN=Rack1_RS1 Primary=695 Secondary=19Tertiary=11 Total=725
SN=Rack1_RS2 Primary=142 Secondary=398 Tertiary=185 Total=725

Modifying Distribution
▪ Spread FN across all region servers
▪ redistribute:
› Balance of FNReplicas
› Also used when adding new servers
› Only one FN is changed for a region, Constraint: 2 FN >= 80% locality
› Current assignment not changed
› Overloaded servers -> underloaded servers
▪ complete_redistribute:
› Round robin generation of FNReplicas
› Locality is lost and regions reassigned
▪ removeFN - Decommissioning a favored node

Adding servers - redistribute
RS3
DN3
RS Group - A
DN1 DN2
RS2
RS4
DN4
RS1
RS5
DN5
RS3
DN3
RS Group - A
DN1 DN2
RS2
RS4
DN4
RS1
RS5
DN5
redistribute
New node
added

Decommissioning a node - removeFN
RS3
DN3
RS Group - A
DN1 DN2
RS2
RS4
DN4
RS1
RS5
DN5
RS3
DN3
RS Group - A
DN1 DN2
RS2
RS4
DN4
RS1
RS5
DN5
removeFN
Decommission
node

Motivation (Revisited)
▪ Data Locality
▪ Performance
▪ Network utilization
▪ Datanode isolation

Data Locality - Fault Testing
Favored Nodes
▪ Locality preserved on chaos monkey tests
No Favored Nodes
percentfileslocal
percentfileslocal

Data Locality - Rolling Restart and Balancer
percentfileslocalregioncount
 RS balanced   Rolling Restart   Favored Balancer 

Datanode Isolation – Tenant specific
▪ diskUsed% changes after FN (2 racks). Tenant #1 – Storage heavy
FN Enabled
diskused%diskused%
Tenant#1Tenant#2,3Tenant#2,3Tenant#1
diskUsed spread across diskUsed tenant specific

Remote DN reads…
▪ Cluster level remote DN reads significantly less inspite of 2x reads
BeforeFavoredNodesAfterFavoredNodes

Hbase - Read Request Rate (Cluster level)

Network Utilization (Cluster level N/W traffic)
10
20
30
40
50
60
70
2016-01-15
2016-01-23
2016-01-31
2016-02-08
2016-02-16
2016-02-24
2016-03-03
2016-03-11
2016-03-19
2016-03-27
2016-04-04
2016-04-12
2016-04-20
2016-04-28
2016-05-06
2016-05-14
2016-05-22
2016-05-30
2016-06-07
2016-06-15
2016-06-23
2016-07-01
2016-07-09
2016-07-17
2016-07-25
2016-08-02
2016-08-10
2016-08-18
2016-08-26
2016-09-03
2016-09-11
2016-09-19
2016-09-27
2016-10-05
2016-10-13
2016-10-21
2016-10-29
2016-11-06
2016-11-14
2016-11-22
2016-11-30
2016-12-08
2016-12-16
2016-12-24
2017-01-01
2017-01-09
2017-01-17
2017-01-25
2017-02-02
2017-02-10
2017-02-18
2017-02-26
2017-03-06
2017-03-14
2017-03-22
2017-03-30
2017-04-07
2017-04-15
2017-04-23
2017-05-01
2017-05-09
2017-05-17
2017-05-25
2017-06-02
Max Input Max Output
Scheduled Maintenance
Favored Nodes Enabled
 Max Network Traffic
 HDFS + User data
 2x User traffic
NetworkTraffic(xUnits)–Maxtrafficonthecluster

Monitoring/Operations
▪ HBck checks various factors
› No FN or incorrect FN
› Regions with dead FN
› Out-of-rsgroup favored nodes
› System tables
▪ Check dead FN (tool, JMX)
▪ Master UI - RIT indicates when all FN dead

Production Experience
▪ Steady increase in data locality (percentfileslocal)
▪ Redistribute runs once a day for all groups
› FN distribution more of less equally spread across group nodes
› Adding 10% servers to an rsgroup – equal distribution
▪ FN hints not chosen when DN in decommission
› DFSClient logs warning when hints not chosen, NN logs too
› Sometimes DN takes a long time to decomm
› HDFS Rolling upgrade or system updates causes DN downtime
▪ Regions in transition due to FN
› All FN dead (missed alert)
› Non-rsgroup servers as FN (bug in code)

Data Locality - Rolling Restart
▪ Region Count varies, but locality is preserved across multiple rolling restarts
percentfileslocalregioncount
 Balanced   Rolling Restart


Data growth
•Same set of tenants across 2 racks
Favored Nodes Enabled
storefilesizestorefilesize
0to4TB0to4TB

Network Utilization
▪ Cluster level writeRequestRate – Before and After FN (3x increase)

Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes

Similar a Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes (20)

Más de DataWorks Summit

Más de DataWorks Summit (20)

Último

Último (20)

Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes

Notas del editor