SlideShare una empresa de Scribd logo
1 de 40
ACHIEVING HBASE
MULTI-TENANCY:
REGIONSERVER
GROUPS
AND
FAVORED NODES
Francis Liu & Thiruvel Thirumoolan
HBase Yahoos
HBase @ Y!
Multi-tenancy
Multi-tenancy
HBase Multi-tenancy @ Y!
• ~45 Tenants
• ~940 RegionServers
• ~300k regions
• RS Peak 115k requests/sec
RegionServer Groups
• Group Membership
• Table
• RegionServer
• Coarse Isolation
• Namespace Integration
Divide and Conquer
RS RS…Group A RS
RS RS…Group B RS
RS RS…Group C RS
RS RS…Group D RS
RS RS…Group E RS
Multi-tenancy with RegionServer Groups
• ~45 namespaces
• ~45 Region server groups
• 4 to 100s of servers
• Up to 2000+ regions per server
Provisioning a RegionServer Group
1. Create a group
hbase> add_rsgroup 'group1’
2. Add Servers
hbase> move_servers_rsgroup 'group1', ['host1:1234',....., 'hostN:1234']
3. Create a namespace
hbase> create_namespace 'yahoo', {'hbase.rsgroup.name’ => 'group1'}
4. Create table in namespace
hbase> create 'yahoo:hadoop', 'f'
Architecture
LoadBalancer
RSGroupBasedLoadBalancer
RSGroupAdminEndpoint
HMaster
FilterBy
Group
foo
bar
RSGroupInfoManager
RSGroup
Table
RSGroup on
ZK
Group Metric Tag
RS RS…Group A RS
RS RS…Group B RS
RS RS…Group C RS
RS RS…Group D RS
RS RS…Group E RS
Dead RegionServer Thresholds
Dead RegionServer Processing
▪Per Group Queue RegionServer
RegionServer
RegionServer
HBase Master
Zookeeper
Group Aware Replication
RSGroups @ Y!
• Per Group configurations
• hbase-site.xml
• hbase-env.sh
• System Group
• Isolate system tables
• Rolling Upgrade/Restart Per Group
• Different strategies for Balance per Group
• Alerting/Monitoring Per Group
• Namespace Integration
• User run DDL on their own tables in sandbox
• Table and Region Quotas
Favored Nodes
Overview
▪ HDFS
› File level block placement hint (on file creation)
› Pass a set of preferred hosts to client to replicate data
› preferred hosts => “Favored Nodes” or hints
▪ HBase
› Region level block placement hint
› Select 3 favored nodes for each region - primary, secondary, tertiary
› Constraint: Favored Nodes on 2 racks (where possible)
Motivation
▪ Data Locality
▪ Performance
▪ Network utilization
▪ Datanode isolation
▪ Previous work from FB and Community
› HBASE-4755 (HBase based block placement in DFS)
Enabling Favored Nodes
▪ HBase
› Use Favored node balancer
› Setup tool for creating FN for existing regions
▪ HDFS
› Set “dfs.namenode.replication.considerLoad” to false
› Recommend disabling HDFS balancer
Flow
hbase:meta
RG1 RS1 RS2 RS3
Col: info:fn
Master
FN Cache
Favored Balancer
RG1 RS1 RS2 RS3
Assignment Manager
openRegion
Region Server
RG1 DN1 DN2 DN3
Flush/Compaction
Enhancements - Summary
▪ Umbrella jira HBASE-15531 (design doc)
▪ Balancer
› FavoredStochasticBalancer (HBASE-16942)
› FavoredGroupBalancer – RSGroup version (HBASE-15533)
› Splits/Merges inherit FN
▪ Admin APIs/tools
› redistribute (HBASE-18064)
› complete_redistribute (HBASE-18065)
› removeFN (HBASE-18062)
› checkFN (HBASE-18063)
› hbck (HBASE-17153
Favored Node Balancers
▪ FavoredStochasticBalancer
› Assigns only to FN of a region (user tables)
› New Candidate Generators (FNLocality and FNLoad)
› Recommended same cost for load and locality generators
› Future – Work with Region Replicas
› Future - WALs
▪ FavoredRSGroupLoadBalancer
› Uses FavoredStochasticBalancer
› Recommended minimum 4 nodes per group
› Generated FN within the group servers
Region Split and Merge
▪ Splits
› Each daughter inherits 2 FN from parent
› One FN is randomly generated
› Locality vs Distribution
› FN within rsgroup servers (if enabled)
▪ Merge
› Inherited from one of the parents
› Preserve locality
Distribution
▪ Replica count distribution across favored nodes (FNReplica)
▪ Why is it important?
› Balancer assigns only to FN
› RegionServer crashes
› Uniform load
▪ Sample replica load for a group from production
SN=Rack1_RS1 Primary=695 Secondary=19Tertiary=11 Total=725
SN=Rack1_RS2 Primary=142 Secondary=398 Tertiary=185 Total=725
SN=Rack2_RS1 Primary=93 Secondary=376 Tertiary=256 Total=725
SN=Rack2_RS1 Primary=36 Secondary=173 Tertiary=514 Total=723
Modifying Distribution
▪ Spread FN across all region servers
▪ redistribute:
› Balance of FNReplicas
› Also used when adding new servers
› Only one FN is changed for a region, Constraint: 2 FN >= 80% locality
› Current assignment not changed
› Overloaded servers -> underloaded servers
▪ complete_redistribute:
› Round robin generation of FNReplicas
› Locality is lost and regions reassigned
▪ removeFN - Decommissioning a favored node
Adding servers - redistribute
RS3
DN3
RS Group - A
DN1 DN2
RS2
RS4
DN4
RS1
RS5
DN5
RS3
DN3
RS Group - A
DN1 DN2
RS2
RS4
DN4
RS1
RS5
DN5
redistribute
New node
added
Decommissioning a node - removeFN
RS3
DN3
RS Group - A
DN1 DN2
RS2
RS4
DN4
RS1
RS5
DN5
RS3
DN3
RS Group - A
DN1 DN2
RS2
RS4
DN4
RS1
RS5
DN5
removeFN
Decommission
node
Motivation (Revisited)
▪ Data Locality
▪ Performance
▪ Network utilization
▪ Datanode isolation
Data Locality - Fault Testing
Favored Nodes
▪ Locality preserved on chaos monkey tests
No Favored Nodes
percentfileslocal
percentfileslocal
Data Locality - Rolling Restart and Balancer
percentfileslocalregioncount
 RS balanced   Rolling Restart   Favored Balancer 
Datanode Isolation – Tenant specific
▪ diskUsed% changes after FN (2 racks). Tenant #1 – Storage heavy
FN Enabled
diskused%diskused%
Tenant#1Tenant#2,3Tenant#2,3Tenant#1
diskUsed spread across diskUsed tenant specific
Remote DN reads…
▪ Cluster level remote DN reads significantly less inspite of 2x reads
BeforeFavoredNodesAfterFavoredNodes
Hbase - Read Request Rate (Cluster level)
BeforeFavoredNodesAfterFavoredNodes
Network Utilization (Cluster level N/W traffic)
10
20
30
40
50
60
70
2016-01-15
2016-01-23
2016-01-31
2016-02-08
2016-02-16
2016-02-24
2016-03-03
2016-03-11
2016-03-19
2016-03-27
2016-04-04
2016-04-12
2016-04-20
2016-04-28
2016-05-06
2016-05-14
2016-05-22
2016-05-30
2016-06-07
2016-06-15
2016-06-23
2016-07-01
2016-07-09
2016-07-17
2016-07-25
2016-08-02
2016-08-10
2016-08-18
2016-08-26
2016-09-03
2016-09-11
2016-09-19
2016-09-27
2016-10-05
2016-10-13
2016-10-21
2016-10-29
2016-11-06
2016-11-14
2016-11-22
2016-11-30
2016-12-08
2016-12-16
2016-12-24
2017-01-01
2017-01-09
2017-01-17
2017-01-25
2017-02-02
2017-02-10
2017-02-18
2017-02-26
2017-03-06
2017-03-14
2017-03-22
2017-03-30
2017-04-07
2017-04-15
2017-04-23
2017-05-01
2017-05-09
2017-05-17
2017-05-25
2017-06-02
Max Input Max Output
Scheduled Maintenance
Favored Nodes Enabled
 Max Network Traffic
 HDFS + User data
 2x User traffic
NetworkTraffic(xUnits)–Maxtrafficonthecluster
Monitoring/Operations
▪ HBck checks various factors
› No FN or incorrect FN
› Regions with dead FN
› Out-of-rsgroup favored nodes
› System tables
▪ Check dead FN (tool, JMX)
▪ Master UI - RIT indicates when all FN dead
Production Experience
▪ Steady increase in data locality (percentfileslocal)
▪ Redistribute runs once a day for all groups
› FN distribution more of less equally spread across group nodes
› Adding 10% servers to an rsgroup – equal distribution
▪ FN hints not chosen when DN in decommission
› DFSClient logs warning when hints not chosen, NN logs too
› Sometimes DN takes a long time to decomm
› HDFS Rolling upgrade or system updates causes DN downtime
▪ Regions in transition due to FN
› All FN dead (missed alert)
› Non-rsgroup servers as FN (bug in code)
Data Locality - Rolling Restart
▪ Region Count varies, but locality is preserved across multiple rolling restarts
percentfileslocalregioncount
 Balanced   Rolling Restart

Data growth
•Same set of tenants across 2 racks
Favored Nodes Enabled
storefilesizestorefilesize
0to4TB0to4TB
Network Utilization
▪ Cluster level writeRequestRate – Before and After FN (3x increase)
BeforeFavoredNodesAfterFavoredNodes

Más contenido relacionado

La actualidad más candente

Performance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li XiaoyanPerformance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li XiaoyanCeph Community
 
Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsOzone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsDataWorks Summit
 
Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기NeoClova
 
Comparison of-foss-distributed-storage
Comparison of-foss-distributed-storageComparison of-foss-distributed-storage
Comparison of-foss-distributed-storageMarian Marinov
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Giuseppe Paterno'
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBaseHBaseCon
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariDataWorks Summit
 
Keynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleHBaseCon
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideKaran Singh
 
Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency CephShapeBlue
 
Kafka at half the price with JBOD setup
Kafka at half the price with JBOD setupKafka at half the price with JBOD setup
Kafka at half the price with JBOD setupDong Lin
 
NGINX: High Performance Load Balancing
NGINX: High Performance Load BalancingNGINX: High Performance Load Balancing
NGINX: High Performance Load BalancingNGINX, Inc.
 

La actualidad más candente (20)

Performance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li XiaoyanPerformance tuning in BlueStore & RocksDB - Li Xiaoyan
Performance tuning in BlueStore & RocksDB - Li Xiaoyan
 
Ozone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objectsOzone: scaling HDFS to trillions of objects
Ozone: scaling HDFS to trillions of objects
 
Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기Maria db 이중화구성_고민하기
Maria db 이중화구성_고민하기
 
Comparison of-foss-distributed-storage
Comparison of-foss-distributed-storageComparison of-foss-distributed-storage
Comparison of-foss-distributed-storage
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
HBase Low Latency
HBase Low LatencyHBase Low Latency
HBase Low Latency
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
IBM GPFS
IBM GPFSIBM GPFS
IBM GPFS
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
 
Time-Series Apache HBase
Time-Series Apache HBaseTime-Series Apache HBase
Time-Series Apache HBase
 
Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Keynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! ScaleKeynote: Apache HBase at Yahoo! Scale
Keynote: Apache HBase at Yahoo! Scale
 
What's New in Apache Hive
What's New in Apache HiveWhat's New in Apache Hive
What's New in Apache Hive
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
 
Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency Ceph
 
Kafka at half the price with JBOD setup
Kafka at half the price with JBOD setupKafka at half the price with JBOD setup
Kafka at half the price with JBOD setup
 
NGINX: High Performance Load Balancing
NGINX: High Performance Load BalancingNGINX: High Performance Load Balancing
NGINX: High Performance Load Balancing
 

Similar a Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes

HBaseCon2017 Achieving HBase Multi-Tenancy with RegionServer Groups and Favor...
HBaseCon2017 Achieving HBase Multi-Tenancy with RegionServer Groups and Favor...HBaseCon2017 Achieving HBase Multi-Tenancy with RegionServer Groups and Favor...
HBaseCon2017 Achieving HBase Multi-Tenancy with RegionServer Groups and Favor...HBaseCon
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统yongboy
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase强 王
 
SQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarSQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarDenny Lee
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersDataWorks Summit
 
Chapter 4 configuring and managing the dns server role
Chapter 4   configuring and managing the dns server roleChapter 4   configuring and managing the dns server role
Chapter 4 configuring and managing the dns server roleLuis Garay
 
Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Cloudera, Inc.
 
SQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarSQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarDenny Lee
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...Michael Stack
 
Chapter4 configuringandmanagingthednsserverrole-140520003253-phpapp01
Chapter4 configuringandmanagingthednsserverrole-140520003253-phpapp01Chapter4 configuringandmanagingthednsserverrole-140520003253-phpapp01
Chapter4 configuringandmanagingthednsserverrole-140520003253-phpapp01velimamedov
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDBI Goo Lee
 
Putting Wings on the Elephant
Putting Wings on the ElephantPutting Wings on the Elephant
Putting Wings on the ElephantDataWorks Summit
 
Local DNS with pfSense 2.4 - pfSense Hangout April 2018
Local DNS with pfSense 2.4 - pfSense Hangout April 2018Local DNS with pfSense 2.4 - pfSense Hangout April 2018
Local DNS with pfSense 2.4 - pfSense Hangout April 2018Netgate
 
IPv6 Operational Issues (with DNS), presented by Geoff Huston at IETF 119
IPv6 Operational Issues (with DNS), presented by Geoff Huston at IETF 119IPv6 Operational Issues (with DNS), presented by Geoff Huston at IETF 119
IPv6 Operational Issues (with DNS), presented by Geoff Huston at IETF 119APNIC
 
Running a Local Copy of the DNS Root Zone
Running a Local Copy of the DNS Root ZoneRunning a Local Copy of the DNS Root Zone
Running a Local Copy of the DNS Root ZoneAPNIC
 

Similar a Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes (20)

HBaseCon2017 Achieving HBase Multi-Tenancy with RegionServer Groups and Favor...
HBaseCon2017 Achieving HBase Multi-Tenancy with RegionServer Groups and Favor...HBaseCon2017 Achieving HBase Multi-Tenancy with RegionServer Groups and Favor...
HBaseCon2017 Achieving HBase Multi-Tenancy with RegionServer Groups and Favor...
 
HBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBaseHBaseCon 2015: Multitenancy in HBase
HBaseCon 2015: Multitenancy in HBase
 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
 
支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统支撑Facebook消息处理的h base存储系统
支撑Facebook消息处理的h base存储系统
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
SQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinarSQL Server Reporting Services Disaster Recovery webinar
SQL Server Reporting Services Disaster Recovery webinar
 
Millions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size MattersMillions of Regions in HBase: Size Matters
Millions of Regions in HBase: Size Matters
 
Chapter 4 configuring and managing the dns server role
Chapter 4   configuring and managing the dns server roleChapter 4   configuring and managing the dns server role
Chapter 4 configuring and managing the dns server role
 
Evolving Enterprise Network Architecture
Evolving Enterprise Network ArchitectureEvolving Enterprise Network Architecture
Evolving Enterprise Network Architecture
 
DNSSEC - WHAT IS IT ? INSTALL AND CONFIGURE IN CHROOT JAIL
DNSSEC - WHAT IS IT ? INSTALL AND CONFIGURE IN CHROOT JAILDNSSEC - WHAT IS IT ? INSTALL AND CONFIGURE IN CHROOT JAIL
DNSSEC - WHAT IS IT ? INSTALL AND CONFIGURE IN CHROOT JAIL
 
Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010Facebook - Jonthan Gray - Hadoop World 2010
Facebook - Jonthan Gray - Hadoop World 2010
 
SQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery WebinarSQL Server Reporting Services Disaster Recovery Webinar
SQL Server Reporting Services Disaster Recovery Webinar
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
 
Chapter4 configuringandmanagingthednsserverrole-140520003253-phpapp01
Chapter4 configuringandmanagingthednsserverrole-140520003253-phpapp01Chapter4 configuringandmanagingthednsserverrole-140520003253-phpapp01
Chapter4 configuringandmanagingthednsserverrole-140520003253-phpapp01
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDB
 
Putting Wings on the Elephant
Putting Wings on the ElephantPutting Wings on the Elephant
Putting Wings on the Elephant
 
Local DNS with pfSense 2.4 - pfSense Hangout April 2018
Local DNS with pfSense 2.4 - pfSense Hangout April 2018Local DNS with pfSense 2.4 - pfSense Hangout April 2018
Local DNS with pfSense 2.4 - pfSense Hangout April 2018
 
Jee conf
Jee confJee conf
Jee conf
 
IPv6 Operational Issues (with DNS), presented by Geoff Huston at IETF 119
IPv6 Operational Issues (with DNS), presented by Geoff Huston at IETF 119IPv6 Operational Issues (with DNS), presented by Geoff Huston at IETF 119
IPv6 Operational Issues (with DNS), presented by Geoff Huston at IETF 119
 
Running a Local Copy of the DNS Root Zone
Running a Local Copy of the DNS Root ZoneRunning a Local Copy of the DNS Root Zone
Running a Local Copy of the DNS Root Zone
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
 

Último

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 

Último (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 

Achieving HBase Multi-Tenancy with RegionServer Groups and Favored Nodes

  • 5. HBase Multi-tenancy @ Y! • ~45 Tenants • ~940 RegionServers • ~300k regions • RS Peak 115k requests/sec
  • 6. RegionServer Groups • Group Membership • Table • RegionServer • Coarse Isolation • Namespace Integration
  • 7. Divide and Conquer RS RS…Group A RS RS RS…Group B RS RS RS…Group C RS RS RS…Group D RS RS RS…Group E RS
  • 8. Multi-tenancy with RegionServer Groups • ~45 namespaces • ~45 Region server groups • 4 to 100s of servers • Up to 2000+ regions per server
  • 9. Provisioning a RegionServer Group 1. Create a group hbase> add_rsgroup 'group1’ 2. Add Servers hbase> move_servers_rsgroup 'group1', ['host1:1234',....., 'hostN:1234'] 3. Create a namespace hbase> create_namespace 'yahoo', {'hbase.rsgroup.name’ => 'group1'} 4. Create table in namespace hbase> create 'yahoo:hadoop', 'f'
  • 11. Group Metric Tag RS RS…Group A RS RS RS…Group B RS RS RS…Group C RS RS RS…Group D RS RS RS…Group E RS
  • 13. Dead RegionServer Processing ▪Per Group Queue RegionServer RegionServer RegionServer HBase Master Zookeeper
  • 15. RSGroups @ Y! • Per Group configurations • hbase-site.xml • hbase-env.sh • System Group • Isolate system tables • Rolling Upgrade/Restart Per Group • Different strategies for Balance per Group • Alerting/Monitoring Per Group • Namespace Integration • User run DDL on their own tables in sandbox • Table and Region Quotas
  • 17. Overview ▪ HDFS › File level block placement hint (on file creation) › Pass a set of preferred hosts to client to replicate data › preferred hosts => “Favored Nodes” or hints ▪ HBase › Region level block placement hint › Select 3 favored nodes for each region - primary, secondary, tertiary › Constraint: Favored Nodes on 2 racks (where possible)
  • 18. Motivation ▪ Data Locality ▪ Performance ▪ Network utilization ▪ Datanode isolation ▪ Previous work from FB and Community › HBASE-4755 (HBase based block placement in DFS)
  • 19. Enabling Favored Nodes ▪ HBase › Use Favored node balancer › Setup tool for creating FN for existing regions ▪ HDFS › Set “dfs.namenode.replication.considerLoad” to false › Recommend disabling HDFS balancer
  • 20. Flow hbase:meta RG1 RS1 RS2 RS3 Col: info:fn Master FN Cache Favored Balancer RG1 RS1 RS2 RS3 Assignment Manager openRegion Region Server RG1 DN1 DN2 DN3 Flush/Compaction
  • 21. Enhancements - Summary ▪ Umbrella jira HBASE-15531 (design doc) ▪ Balancer › FavoredStochasticBalancer (HBASE-16942) › FavoredGroupBalancer – RSGroup version (HBASE-15533) › Splits/Merges inherit FN ▪ Admin APIs/tools › redistribute (HBASE-18064) › complete_redistribute (HBASE-18065) › removeFN (HBASE-18062) › checkFN (HBASE-18063) › hbck (HBASE-17153
  • 22. Favored Node Balancers ▪ FavoredStochasticBalancer › Assigns only to FN of a region (user tables) › New Candidate Generators (FNLocality and FNLoad) › Recommended same cost for load and locality generators › Future – Work with Region Replicas › Future - WALs ▪ FavoredRSGroupLoadBalancer › Uses FavoredStochasticBalancer › Recommended minimum 4 nodes per group › Generated FN within the group servers
  • 23. Region Split and Merge ▪ Splits › Each daughter inherits 2 FN from parent › One FN is randomly generated › Locality vs Distribution › FN within rsgroup servers (if enabled) ▪ Merge › Inherited from one of the parents › Preserve locality
  • 24. Distribution ▪ Replica count distribution across favored nodes (FNReplica) ▪ Why is it important? › Balancer assigns only to FN › RegionServer crashes › Uniform load ▪ Sample replica load for a group from production SN=Rack1_RS1 Primary=695 Secondary=19Tertiary=11 Total=725 SN=Rack1_RS2 Primary=142 Secondary=398 Tertiary=185 Total=725 SN=Rack2_RS1 Primary=93 Secondary=376 Tertiary=256 Total=725 SN=Rack2_RS1 Primary=36 Secondary=173 Tertiary=514 Total=723
  • 25. Modifying Distribution ▪ Spread FN across all region servers ▪ redistribute: › Balance of FNReplicas › Also used when adding new servers › Only one FN is changed for a region, Constraint: 2 FN >= 80% locality › Current assignment not changed › Overloaded servers -> underloaded servers ▪ complete_redistribute: › Round robin generation of FNReplicas › Locality is lost and regions reassigned ▪ removeFN - Decommissioning a favored node
  • 26. Adding servers - redistribute RS3 DN3 RS Group - A DN1 DN2 RS2 RS4 DN4 RS1 RS5 DN5 RS3 DN3 RS Group - A DN1 DN2 RS2 RS4 DN4 RS1 RS5 DN5 redistribute New node added
  • 27. Decommissioning a node - removeFN RS3 DN3 RS Group - A DN1 DN2 RS2 RS4 DN4 RS1 RS5 DN5 RS3 DN3 RS Group - A DN1 DN2 RS2 RS4 DN4 RS1 RS5 DN5 removeFN Decommission node
  • 28. Motivation (Revisited) ▪ Data Locality ▪ Performance ▪ Network utilization ▪ Datanode isolation
  • 29. Data Locality - Fault Testing Favored Nodes ▪ Locality preserved on chaos monkey tests No Favored Nodes percentfileslocal percentfileslocal
  • 30. Data Locality - Rolling Restart and Balancer percentfileslocalregioncount  RS balanced   Rolling Restart   Favored Balancer 
  • 31. Datanode Isolation – Tenant specific ▪ diskUsed% changes after FN (2 racks). Tenant #1 – Storage heavy FN Enabled diskused%diskused% Tenant#1Tenant#2,3Tenant#2,3Tenant#1 diskUsed spread across diskUsed tenant specific
  • 32. Remote DN reads… ▪ Cluster level remote DN reads significantly less inspite of 2x reads BeforeFavoredNodesAfterFavoredNodes
  • 33. Hbase - Read Request Rate (Cluster level) BeforeFavoredNodesAfterFavoredNodes
  • 34. Network Utilization (Cluster level N/W traffic) 10 20 30 40 50 60 70 2016-01-15 2016-01-23 2016-01-31 2016-02-08 2016-02-16 2016-02-24 2016-03-03 2016-03-11 2016-03-19 2016-03-27 2016-04-04 2016-04-12 2016-04-20 2016-04-28 2016-05-06 2016-05-14 2016-05-22 2016-05-30 2016-06-07 2016-06-15 2016-06-23 2016-07-01 2016-07-09 2016-07-17 2016-07-25 2016-08-02 2016-08-10 2016-08-18 2016-08-26 2016-09-03 2016-09-11 2016-09-19 2016-09-27 2016-10-05 2016-10-13 2016-10-21 2016-10-29 2016-11-06 2016-11-14 2016-11-22 2016-11-30 2016-12-08 2016-12-16 2016-12-24 2017-01-01 2017-01-09 2017-01-17 2017-01-25 2017-02-02 2017-02-10 2017-02-18 2017-02-26 2017-03-06 2017-03-14 2017-03-22 2017-03-30 2017-04-07 2017-04-15 2017-04-23 2017-05-01 2017-05-09 2017-05-17 2017-05-25 2017-06-02 Max Input Max Output Scheduled Maintenance Favored Nodes Enabled  Max Network Traffic  HDFS + User data  2x User traffic NetworkTraffic(xUnits)–Maxtrafficonthecluster
  • 35. Monitoring/Operations ▪ HBck checks various factors › No FN or incorrect FN › Regions with dead FN › Out-of-rsgroup favored nodes › System tables ▪ Check dead FN (tool, JMX) ▪ Master UI - RIT indicates when all FN dead
  • 36. Production Experience ▪ Steady increase in data locality (percentfileslocal) ▪ Redistribute runs once a day for all groups › FN distribution more of less equally spread across group nodes › Adding 10% servers to an rsgroup – equal distribution ▪ FN hints not chosen when DN in decommission › DFSClient logs warning when hints not chosen, NN logs too › Sometimes DN takes a long time to decomm › HDFS Rolling upgrade or system updates causes DN downtime ▪ Regions in transition due to FN › All FN dead (missed alert) › Non-rsgroup servers as FN (bug in code)
  • 37.
  • 38. Data Locality - Rolling Restart ▪ Region Count varies, but locality is preserved across multiple rolling restarts percentfileslocalregioncount  Balanced   Rolling Restart 
  • 39. Data growth •Same set of tenants across 2 racks Favored Nodes Enabled storefilesizestorefilesize 0to4TB0to4TB
  • 40. Network Utilization ▪ Cluster level writeRequestRate – Before and After FN (3x increase) BeforeFavoredNodesAfterFavoredNodes

Notas del editor

  1. Even if Chaos Monkey tests are run on the cluster, locality is still retained.
  2. This is not a 100% balanced cluster, but is a heavily used one with 2.4k regions per server that’s equally balanced to start with.
  3. There were same set of tenants across 2 racks. One tenant is storage heavy and others are not. After FN is enabled, we can see that the disk used on storage heavy tenant increased, and it reduced on other tenants in the same rack. The overall size of the regions did not change much as can be seen from the storefilesize metrics in backup slides.
  4. After FN is enabled, we see that remoteDN reads significantly dropped. There are occasional spikes and those happen due to rolling restarts, DN decommissions etc. Note than the occasional spikes are on the same machines and system tables don’t have favored nodes.
  5. Cluster level readRequestRate. We capture it before FN was enabled and sometime after FN was enabled. The readRequestRate is 2x more (more customers, more use cases).
  6. The NW level graphs are at top of the rack level and are MRTG format - http://oss.oetiker.ch/mrtg/doc/mrtg-logfile.en.html Network Utilization went down after favored nodes was enabled. This is on the same cluster where remote Datanode Reads graphs were shown. As it can be inferred, the readRequestRate has increased significantly after Favored Nodes was enabled.
  7. We start with a cluster that’s well balanced and continuously do rolling restart on the servers multiple time. Locality is still preserved even if regions are not uniform and keep moving around.
  8. Cluster level readRequestRate. We capture it before FN was enabled and sometime after FN was enabled. The readRequestRate is 2x more (more customers, more use cases).