LinkedIn emplea cookies para mejorar la funcionalidad y el rendimiento de nuestro sitio web, así como para ofrecer publicidad relevante. Si continúas navegando por ese sitio web, aceptas el uso de cookies. Consulta nuestras Condiciones de uso y nuestra Política de privacidad para más información.
LinkedIn emplea cookies para mejorar la funcionalidad y el rendimiento de nuestro sitio web, así como para ofrecer publicidad relevante. Si continúas navegando por ese sitio web, aceptas el uso de cookies. Consulta nuestra Política de privacidad y nuestras Condiciones de uso para más información.
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Apache HBase is a distributed data store that is in production today at many enterprises and sites serving large volumes of near-real-time random-accesses. As Apache HBase matures, the community has augmented the system with new features that many enterprise consider to be hard requirements. We will discuss how the upcoming HBase 0.96 release addresses many of these shortcomings by introducing new features that will help the administrator monitor and control access to the system, and new mechanisms to minimize downtime due to expected and unexpected outages.
Hbase is a project that solves this problem. In a sentence, Hbase is an open source, distributed, sorted map modeled after Google’s BigTable.Open-source: Apache HBase is an open source project with an Apache 2.0 license.Distributed: HBase is designed to use multiple machines to store and serve data.Sorted Map: HBase stores data as a map, and guarantees that adjacent keys will be stored next to each other on disk.HBase is modeled after BigTable, a system that is used for hundreds of applications at Google.
Tested under HBase
2 generals about detection time.
Most metrics reside in the region serverMetrics for various categories (e.g. Stores, compactions, flushes)Metrics per operation type (e.g. get, put, delete)Remember HDFS,JVM, and host metrics as well
Tested under HBase
Strata + Hadoop World 2012: Apache HBase Features for the Enterprise
Apache HBase Features for the USE PUBLICLY DO NOTEnterprise PRIOR TO 10/23/12Headline Goes HereJonathan Hsieh | @jmhsiehSpeaker Name or Subhead Goes HereSoftware Engineer at Cloudera / HBase PMC MemberOctober 2012
Who Am I? • Cloudera: • Software Engineer • Apache HBase committer / PMC • Apache Flume founder / PMC • Apache Sqoop committer / PMC • U of Washington: • Research in Distributed Systems2 10/25/12 Strata Hadoop World 2012
What is Apache HBase? Apache HBase is an open App MR source, distributed, scala ble, consistent, low latency, random access ZK HDFS non-relational database built on Apache Hadoop3 10/25/12 Strata Hadoop World 2012
HBase provides Low-latency Random Access • Writes: • 1-3ms, 1k-10k writes/sec per node 0000000000 • Reads: 4 1111111111 • 0-3ms cached, 10-30ms disk 1 2222222222 • 10-40k reads / second / node from 3333333333 cache 5 4444444444 • Cell size: 5555555555 6666666666 • 0-3MB preferred 2 7777777777 • Read, write and insert data anywhere in 3 the table • No sequential write limitations4 9/23/12 Strangeloop 2012
HBase On a Cluster HDFS NameNodes ZooKeeper Slave Boxes (DN + RS) HBase Masters Quorum Rack 1 Name node Rack 2 Name node5 10/25/12 Strata Hadoop World 2012
Production Apache HBase Applications • Inbox • Storage • Web • Search • Analytics • Monitoring More Case Studies at http://www.hbasecon.com/agenda/6 10/25/12 Strata Hadoop World 2012
Production Systems Need to Avoid Risk • Unfortunately, all things can fail. • Enterprises need to minimize risk. • Understand potential data loss scenarios • Understand potential unavailability scenarios • Must have a disaster recovery story • Downtime, data loss == risk • Let’s talk about how HBase deals with: • Risks from within the cluster • Risks from outside the cluster • Risks posed by Users • Goal: Remove or reduce negative impact of potential risks7 10/25/12 Strata Hadoop World 2012
Risks from within the clusterHosts and Services
Causes of HBase Downtime within the cluster • Unplanned Maintenance • Planned Maintenance • Hardware failures • Upgrades • Software errors • Migrations • Human error Goal: Reduce downtime from hours to minutes to seconds.10 10/25/12 Strata Hadoop World 2012
Unplanned Downtime Service realizes there is a Failure Event problem starts fixing. Detection Time Recovery Time time Service still Service is thinks we are ok restored. • Two sources of unavailability • Detection time • Recovery time11 10/25/12 Strata Hadoop World 2012
Reduce downtime by speeding up recovery time Service realizes there is a Failure Event problem starts fixing. Detection Time time Service still Service is thinks we are ok restored. • Distributed log splitting (0.92) • Automated metadata repairs with hbck (0.92) • Enable of writes while recovering from failure (0.96)12 10/25/12 Strata Hadoop World 2012
Reduce downtime by speeding up detection Service realizes there is a Failure Event problem starts fixing. time Service still Service is thinks we are ok restored. • Proactively notify to recover from process failures quickly 0.92/0.94 All Master Failure Detection 180s Some Region Server Failure detection 180s 0.96 Master process failure detection 0- Region Server Failure detection 1s 0-1s13 10/25/12 Strata Hadoop World 2012
Manual Problem detection: Metrics • Goal: Pinpoint root causes of problems faster • Take a baseline of your system in steady- state • Anomalies like spikes or dips from baseline can indicate problems • Ex: Slow Query Logging • Integrates with existing infrastructure via JMX or use with Ganglia, Cloudera Manager14 10/25/12 Strata Hadoop World 2012
Metrics from all levels of the system • HBase Region Servers Host JVM • Operations / sec Master • Get / put latencies (0.92) CPUs Memory Disks Network • Per CF metrics (0.94) • Per Region metrics (0.94) Host • HBase Master JVM JVM Region server • RIT metrics Region HDFS DataNode • Replication Metrics CF CF • System/JVM CPUs Memory Network Disks • GC, RPC metrics15 10/25/12 Strata Hadoop World 2012
Eliminate HBase downtime Maintenance Service remains online, Full service Begins slightly degraded is restored. Maintenance Time time • Highly Available stack: HDFS (2.0) / ZK / HBASE • Client Cross-version wire compatibility (0.96) • Rolling Restarts • Online Schema Change (experimental)17 10/25/12 Strata Hadoop World 2012
High Availability: HBase + HDFS + ZK HDFS NameNodes ZooKeeper Slave Boxes (DN + RS) HBase Masters Quorum Rack 1 Name node Rack 2 Name node18 10/25/12 Strata Hadoop World 2012
Wire Compatibility • Reduces downtime due to planned maintenance • Wire compatibility + Extensible Data formats • Allow for forwards and backwards compatibility • Older clients can talk to newer servers and visa App MR versa • Rolling upgrade • Upgrade a single node at a time while system runs • Allows API and changes while guaranteeing wire ZK HDFS compatibility between different minor versions • HDFS client-server compatibility between Major Versions19 10/25/12 Strata Hadoop World 2012
Risks from outside the clusterDatacenters and Disaster Recovery
External Risks21 10/25/12 Strata Hadoop World 2012
Geographically separated copies of data22 10/25/12 Strata Hadoop World 2012
Strategy: HBase-Supported Batch Backups • Export / Dist CP / Import Import Export Dist CP • 3 batch MR jobs MR Job MR Job MR Job • Several extra copies of data • High latency (hours) • Copy Table Copy Table • 1 MR Job MR Job • Single copy of data • High Latency (hours) • Incremental table copies23 10/25/12 Strata Hadoop World 2012
Strategy: Custom Application-managed Replication • Application writes to two instances of HBase • Low Latency • Adds complexity • Inefficient App App24 10/25/12 Strata Hadoop World 2012
Strategy: HBase replication (0.92+) • HBase Asynchronously copy edit logs to other clusters. • Replication lag measured in seconds • Automatically catch up from failures. • Eventually consistent • Efficient batching • Master-slave† (0.90) logs logs logs logs • Master-master (0.92) Replication25 10/25/12 Strata Hadoop World 2012
Master-Master Replication logs logs logs Replicating data reduces chances of data loss.26 10/25/12 Strata Hadoop World 2012
Risks from Users“Problem exists between keyboard and chair.”
Oops… User Error User Error: Service is restored, drop ‘table’ major data loss Recovery Time time Service is down! • How do we prevent user error? • How do we recover from user error?28 10/25/12 Strata Hadoop World 2012
Prevent user mistakes: User-level Security User Error: Operation drop ‘table’ rejected, insufficient permissions. time • Authentication: • Ensure the identity of the services or users that are communicating • Access Control: • Ensure user has permission to execute table data operations29 10/25/12 Strata Hadoop World 2012
HBase User-level Security • Based on Kerberos for HBase, HDFS and Zookeeper • Grant privileges to users • Revoke privileges from users. • Column Family and Table granularity • Confidentiality: • Ensure information is only seen by intended users. • Audit Trails: • Track which users performed particular operations30 10/25/12 Strata Hadoop World 2012
Recovering from User Mistakes: Table Snapshots Service is User Error: Service is down! restored, minor data drop ‘table’ loss restore time Periodic snapshot • Snapshot the state of a table at a certain moment in time • Restore it or Clone it later, creating a new read write table • Export it to another cluster with minimal impact on HBase31 10/25/12 Strata Hadoop World 2012
Table Snapshots (0.96+) • Under development, slated for HBase 0.96 • Multiple snapshot flavors planned • Offline snapshots • Online Snapshots • Snapshot uses • Recover from application or user error. • Application experimentation (no need to spin up another cluster for replication) • Use MR directly on snapshot files32 10/25/12 Strata Hadoop World 2012