At Yahoo! HBase has been running as a hosted multi-tenant service since 2013. In a single HBase cluster we have around 30 tenants running various types of workloads (ie batch, near real-time, ad-hoc, etc). Typically such a deployment would cause tenant workloads to negatively affect each other because of resource contention (disk, cpu, network, cache thrashing, etc). Using RegionServer Groups we are able to designate a dedicated subset of RegionServers in a cluster to host only tables of a given tenant (HBASE-6721).
Most HBase deployments use HDFS as their distributed filesystem, which in turn does not guarantee that a region’s data is locally available to the hosting regionserver. This poses a problem when providing isolation since the hdfs data blocks may have to be read remotely from a different tenant’s host thus contending for disk or network resources. Favored nodes addresses this problem by providing hints to HDFS on which datanodes data should be stored and only assigns regions to these favored regionservers (HBASE-15531).
We will walk through these features explaining our motivation, how they work as well as our experiences running these multi-tenant clusters. These features will be available in Apache HBase 2.0.
15. RSGroups @ Y!
• Per Group configurations
• hbase-site.xml
• hbase-env.sh
• System Group
• Isolate system tables
• Rolling Upgrade/Restart Per Group
• Different strategies for Balance per Group
• Alerting/Monitoring Per Group
• Namespace Integration
• User run DDL on their own tables in sandbox
• Table and Region Quotas
17. Overview
▪ HDFS
› File level block placement hint (on file creation)
› Pass a set of preferred hosts to client to replicate data
› preferred hosts => “Favored Nodes” or hints
▪ HBase
› Region level block placement hint
› Select 3 favored nodes for each region - primary, secondary, tertiary
› Constraint: Favored Nodes on 2 racks (where possible)
18. Motivation
▪ Data Locality
▪ Performance
▪ Network utilization
▪ Datanode isolation
▪ Previous work from FB and Community
› HBASE-4755 (HBase based block placement in DFS)
19. Enabling Favored Nodes
▪ HBase
› Use Favored node balancer
› Setup tool for creating FN for existing regions
▪ HDFS
› Set “dfs.namenode.replication.considerLoad” to false
› Recommend disabling HDFS balancer
22. Favored Node Balancers
▪ FavoredStochasticBalancer
› Assigns only to FN of a region (user tables)
› New Candidate Generators (FNLocality and FNLoad)
› Recommended same cost for load and locality generators
› Future – Work with Region Replicas
› Future - WALs
▪ FavoredRSGroupLoadBalancer
› Uses FavoredStochasticBalancer
› Recommended minimum 4 nodes per group
› Generated FN within the group servers
23. Region Split and Merge
▪ Splits
› Each daughter inherits 2 FN from parent
› One FN is randomly generated
› Locality vs Distribution
› FN within rsgroup servers (if enabled)
▪ Merge
› Inherited from one of the parents
› Preserve locality
24. Distribution
▪ Replica count distribution across favored nodes (FNReplica)
▪ Why is it important?
› Balancer assigns only to FN
› RegionServer crashes
› Uniform load
▪ Sample replica load for a group from production
SN=Rack1_RS1 Primary=695 Secondary=19Tertiary=11 Total=725
SN=Rack1_RS2 Primary=142 Secondary=398 Tertiary=185 Total=725
SN=Rack2_RS1 Primary=93 Secondary=376 Tertiary=256 Total=725
SN=Rack2_RS1 Primary=36 Secondary=173 Tertiary=514 Total=723
25. Modifying Distribution
▪ Spread FN across all region servers
▪ redistribute:
› Balance of FNReplicas
› Also used when adding new servers
› Only one FN is changed for a region, Constraint: 2 FN >= 80% locality
› Current assignment not changed
› Overloaded servers -> underloaded servers
▪ complete_redistribute:
› Round robin generation of FNReplicas
› Locality is lost and regions reassigned
▪ removeFN - Decommissioning a favored node
26. Adding servers - redistribute
RS3
DN3
RS Group - A
DN1 DN2
RS2
RS4
DN4
RS1
RS5
DN5
RS3
DN3
RS Group - A
DN1 DN2
RS2
RS4
DN4
RS1
RS5
DN5
redistribute
New node
added
27. Decommissioning a node - removeFN
RS3
DN3
RS Group - A
DN1 DN2
RS2
RS4
DN4
RS1
RS5
DN5
RS3
DN3
RS Group - A
DN1 DN2
RS2
RS4
DN4
RS1
RS5
DN5
removeFN
Decommission
node
35. Monitoring/Operations
▪ HBck checks various factors
› No FN or incorrect FN
› Regions with dead FN
› Out-of-rsgroup favored nodes
› System tables
▪ Check dead FN (tool, JMX)
▪ Master UI - RIT indicates when all FN dead
36. Production Experience
▪ Steady increase in data locality (percentfileslocal)
▪ Redistribute runs once a day for all groups
› FN distribution more of less equally spread across group nodes
› Adding 10% servers to an rsgroup – equal distribution
▪ FN hints not chosen when DN in decommission
› DFSClient logs warning when hints not chosen, NN logs too
› Sometimes DN takes a long time to decomm
› HDFS Rolling upgrade or system updates causes DN downtime
▪ Regions in transition due to FN
› All FN dead (missed alert)
› Non-rsgroup servers as FN (bug in code)
37.
38. Data Locality - Rolling Restart
▪ Region Count varies, but locality is preserved across multiple rolling restarts
percentfileslocalregioncount
Balanced Rolling Restart
39. Data growth
•Same set of tenants across 2 racks
Favored Nodes Enabled
storefilesizestorefilesize
0to4TB0to4TB
40. Network Utilization
▪ Cluster level writeRequestRate – Before and After FN (3x increase)
BeforeFavoredNodesAfterFavoredNodes
Notas del editor
Even if Chaos Monkey tests are run on the cluster, locality is still retained.
This is not a 100% balanced cluster, but is a heavily used one with 2.4k regions per server that’s equally balanced to start with.
There were same set of tenants across 2 racks. One tenant is storage heavy and others are not.
After FN is enabled, we can see that the disk used on storage heavy tenant increased, and it reduced on other tenants in the same rack.
The overall size of the regions did not change much as can be seen from the storefilesize metrics in backup slides.
After FN is enabled, we see that remoteDN reads significantly dropped. There are occasional spikes and those happen due to rolling restarts, DN decommissions etc. Note than the occasional spikes are on the same machines and system tables don’t have favored nodes.
Cluster level readRequestRate. We capture it before FN was enabled and sometime after FN was enabled. The readRequestRate is 2x more (more customers, more use cases).
The NW level graphs are at top of the rack level and are MRTG format - http://oss.oetiker.ch/mrtg/doc/mrtg-logfile.en.html
Network Utilization went down after favored nodes was enabled.
This is on the same cluster where remote Datanode Reads graphs were shown. As it can be inferred, the readRequestRate has increased significantly after Favored Nodes was enabled.
We start with a cluster that’s well balanced and continuously do rolling restart on the servers multiple time. Locality is still preserved even if regions are not uniform and keep moving around.
Cluster level readRequestRate. We capture it before FN was enabled and sometime after FN was enabled. The readRequestRate is 2x more (more customers, more use cases).