HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data

•Descargar como PPT, PDF•

1 recomendación•3,533 vistas

The AOL Mail Team will discuss our implementation of HBase for two large scale applications: an anti-abuse mechanism and a user-visible API. We will provide an overview of how and why HBase and Hadoop were incorporated into the massive and diverse technology stack that is the nearly 20-year-old AOL Mail system and the history of how we took our HBase/Hadoop apps through our traditional process of design, to development, through QA, and into production. We will explain how our practical approach to HBase has evolved over time, and we will discuss our lessons learned and some of our techniques and tools developed via our iterative dev/qa and operational processes. We will explain the pain-points we have experienced with erratic usage and edge-cases, and how we address problems when we run across them.

Tecnología

You’ve got HBase
How AOL Mail handles Big Data

Presented at HBaseCon
May 22, 2012

The AOL Mail System
Over 15 years old
Constantly evolving
10,000+ hosts
70+ Million mailboxes
50+ Billion emails
A technology stack that runs the gamut

Presented at
HBaseCon 2012
Page 2

What that means…
Lots of data
Lots of moving parts
Tight SLAs
Mature system + Young software = Tough marriage
We don’t buy “commodity” hardware
Engrained Dev/QA/Prod product lifecycle
Somewhat “version locked” to tried-and-true platforms
Expect service outages to be quickly mitigated by our NOC w/out waiting for an on-call

Presented at
HBaseCon 2012
Page 3

So where does HBase fit?
It’s a component, not the foundation
Currently used in two places
Being evaluated for more
It will remain a tool in our diverse Big Data arsenal

Presented at
HBaseCon 2012
Page 4

An “Activity Profiler”
Watches for particular behaviors
Designed and built in 6/2010
Originally “vanilla” Hadoop 0.20.2 + HBase 0.90.2
Currently CDH3
1.4+ Million Events/min
60x 24TB (raw) DataNodes w/ local RegionServers
15x application hosts
Is an internal-only tool
Used by automated anti-abuse systems
Leveraged by data analysts for adhoc queries/MapRed

Presented at
HBaseCon 2012
Page 6

An “Activity Profiler”

Presented at
HBaseCon 2012
Page 7

Why the “Event Catcher” layer?
Has to “speak the language” of our existing systems
Easy to plug an HBase translator in to existing data feeds
Hard to modify the infrastructure to speak HBase

Flume was too young at the time

Presented at
HBaseCon 2012
Page 8

Why batch load via MapRed?
Real time is not currently a requirement
Allows filtering at different points
Allows us to “trigger” events
Designed before coprocessors

Early data integrity issues necessitated “replaying”
Missing append support early on
Holes in the Meta table
Long splits and GC pauses caused client timeouts

Can sample data into a “sandbox” for job development
Makes pig, hive, and other MapRed easy and stable
We keep the raw data around as well

Presented at
HBaseCon 2012
Page 9

HBase and MapRed can live in harmony
Bigger than “average” hardware
36+GB RAM
8+ cores

Proper system tuning is essential
Good information on tuning Hadoop is prolific, but…
XFS > EXT
JBOD > RAID
As far as HBase is concerned…
Just go buy Lars’ book

Careful job development, optimization is key!

Presented at
HBaseCon 2012
Page 10

Contact History API
Services a member-facing API
Designed and built in 10/2010
Modeled after the previous application
Built by a different Engineering team
Used to solve a very different problem

250K+ Inserts/min
3+ Million Inserts/min during MapRed
20x 24TB (raw) DataNodes w/ local RegionServers
14x application hosts
Leverages Memcached to reduce query load on HBase
Presented at
HBaseCon 2012
Page 12

Contact History API

Presented at
HBaseCon 2012
Page 13

Amusing mistakes to learn from
Exploding regions
Batch inserts via MapRed result in fast, symmetrical key space growth
Attempting to split every region at the same time is a bad idea
Turning off region splitting and using a custom “rolling region splitter” is a good idea
Take time and load into consideration when selecting regions to split

Backups, backups, backups!
You can never have to many

Large, non-splitable regions tell you things
Our key space maps to accounts
Excessively large keys equal excessively “active” accounts

Presented at
HBaseCon 2012
Page 15

Next-generation model

Presented at
HBaseCon 2012
Page 16

Thanks!

Presented at
HBaseCon 2012
Page 17

Más contenido relacionado

La actualidad más candente

HBaseCon 2013: Integration of Apache Hive and HBaseCloudera, Inc.

HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon

A Survey of HBase Application ArchetypesHBaseCon

HBaseCon 2015: HBase and SparkHBaseCon

HBaseCon 2015 General Session: The Evolution of HBase @ BloombergHBaseCon

Keynote: The Future of Apache HBaseHBaseCon

HBaseCon 2015: State of HBase Docs and How to ContributeHBaseCon

HBaseCon 2012 | HBase, the Use Case in eBay Cassini Cloudera, Inc.

HBase BackupsHBaseCon

HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend MicroCloudera, Inc.

HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...HBaseCon

Facebook - Jonthan Gray - Hadoop World 2010Cloudera, Inc.

Large-scale Web Apps @ PinterestHBaseCon

HBaseCon 2015- HBase @ FlipboardMatthew Blair

Dancing with the elephant h base1_finalasterix_smartplatf

Meet hbase 2.0enissoz

HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase Cloudera, Inc.

Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsightHBaseCon

Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)Suman Srinivasan

HBase in Practice DataWorks Summit/Hadoop Summit

La actualidad más candente (20)

HBaseCon 2013: Integration of Apache Hive and HBase

HBase Read High Availability Using Timeline-Consistent Region Replicas

A Survey of HBase Application Archetypes

HBaseCon 2015: HBase and Spark

HBaseCon 2015 General Session: The Evolution of HBase @ Bloomberg

Keynote: The Future of Apache HBase

HBaseCon 2015: State of HBase Docs and How to Contribute

HBaseCon 2012 | HBase, the Use Case in eBay Cassini

HBase Backups

HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro

HBaseCon 2015: Apache Phoenix - The Evolution of a Relational Database Layer ...

Facebook - Jonthan Gray - Hadoop World 2010

Large-scale Web Apps @ Pinterest

HBaseCon 2015- HBase @ Flipboard

Dancing with the elephant h base1_final

Meet hbase 2.0

HBaseCon 2013:High-Throughput, Transactional Stream Processing on Apache HBase

Optimizing Apache HBase for Cloud Storage in Microsoft Azure HDInsight

Real-Time Video Analytics Using Hadoop and HBase (HBaseCon 2013)

HBase in Practice

Destacado

HBaseCon 2015: Meet HBase 1.0HBaseCon

HBaseCon 2013: Apache HBase Operations at PinterestCloudera, Inc.

HBaseCon 2013: How to Get the MTTR Below 1 Minute and MoreCloudera, Inc.

HBaseCon 2012 | Scaling GIS In Three ActsCloudera, Inc.

HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...Cloudera, Inc.

HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARNHBaseCon

Cross-Site BigTable using HBaseHBaseCon

HBaseCon 2012 | Building Mobile Infrastructure with HBaseCloudera, Inc.

HBaseCon 2013: 1500 JIRAs in 20 MinutesCloudera, Inc.

Tales from the Cloudera FieldHBaseCon

HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...Cloudera, Inc.

HBaseCon 2013: Being Smarter Than the Smart MeterCloudera, Inc.

HBaseCon 2013: Apache HBase on FlashCloudera, Inc.

HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUponCloudera, Inc.

HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics Cloudera, Inc.

HBaseCon 2013: Rebuilding for Scale on Apache HBaseCloudera, Inc.

HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...Cloudera, Inc.

HBaseCon 2012 | HBase for the Worlds Libraries - OCLCCloudera, Inc.

HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...Cloudera, Inc.

HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.Cloudera, Inc.

Destacado (20)

HBaseCon 2015: Meet HBase 1.0

HBaseCon 2013: Apache HBase Operations at Pinterest

HBaseCon 2013: How to Get the MTTR Below 1 Minute and More

HBaseCon 2012 | Scaling GIS In Three Acts

HBaseCon 2013: Evolving a First-Generation Apache HBase Deployment to Second...

HBaseCon 2015: DeathStar - Easy, Dynamic, Multi-tenant HBase via YARN

Cross-Site BigTable using HBase

HBaseCon 2012 | Building Mobile Infrastructure with HBase

HBaseCon 2013: 1500 JIRAs in 20 Minutes

Tales from the Cloudera Field

HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...

HBaseCon 2013: Being Smarter Than the Smart Meter

HBaseCon 2013: Apache HBase on Flash

HBaseCon 2012 | Unique Sets on HBase and Hadoop - Elliot Clark, StumbleUpon

HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics

HBaseCon 2013: Rebuilding for Scale on Apache HBase

HBaseCon 2012 | Content Addressable Storages for Fun and Profit - Berk Demir,...

HBaseCon 2012 | HBase for the Worlds Libraries - OCLC

HBaseCon 2012 | Leveraging HBase for the World’s Largest Curated Genomic Data...

HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.

Similar a HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data

Hadoop: today and tomorrowSteve Loughran

Big data and tools Shivam Shukla

Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar

Hbase mhug 2015Joseph Niemiec

Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen

Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen

Hadoop demo pptPhil Young

The Future of HbaseSalesforce Engineering

Hadoop presentationChandra Sekhar Saripaka

Hw09 Data Processing In The EnterpriseCloudera, Inc.

Chicago Data Summit: Geo-based Content Processing Using HBaseCloudera, Inc.

How to use Hadoop for operational and transactional purposes by RODRIGO MERI...Big Data Spain

Chicago Data Summit: Extending the Enterprise Data Warehouse with HadoopCloudera, Inc.

H baseShashwat Shriparv

Talend Big Data Capabilities OverviewRajan Kanitkar

Eric Baldeschwieler Keynote from Storage Developers ConferenceHortonworks

Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Chris Baglieri

Modern data warehouseStephen Alex

Big Data - Hadoop Ecosystem nuriadelasheras

Similar a HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data (20)

Hadoop: today and tomorrow

Big data and tools

Hadoop a Natural Choice for Data Intensive Log Processing

Hbase mhug 2015

Overview of big data & hadoop version 1 - Tony Nguyen

Overview of Big data, Hadoop and Microsoft BI - version1

Hadoop demo ppt

The Future of Hbase

Hadoop presentation

Hw09 Data Processing In The Enterprise

Chicago Data Summit: Geo-based Content Processing Using HBase

How to use Hadoop for operational and transactional purposes by RODRIGO MERI...

Chicago Data Summit: Extending the Enterprise Data Warehouse with Hadoop

H base

Talend Big Data Capabilities Overview

Eric Baldeschwieler Keynote from Storage Developers Conference

Finding the needles in the haystack. An Overview of Analyzing Big Data with H...

Modern data warehouse

Big Data - Hadoop Ecosystem

Más de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.

Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.

2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.

Edc event vienna presentation 1 oct 2019Cloudera, Inc.

Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.

Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.

Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.

Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.

Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.

Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.

Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.

Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.

Extending Cloudera SDX beyond the PlatformCloudera, Inc.

Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.

Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.

Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.

Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.

Más de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx

Cloudera Data Impact Awards 2021 - Finalists

2020 Cloudera Data Impact Awards Finalists

Edc event vienna presentation 1 oct 2019

Machine Learning with Limited Labeled Data 4/3/19

Data Driven With the Cloudera Modern Data Warehouse 3.19.19

Introducing Cloudera DataFlow (CDF) 2.13.19

Introducing Cloudera Data Science Workbench for HDP 2.12.19

Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19

Leveraging the cloud for analytics and machine learning 1.29.19

Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19

Leveraging the Cloud for Big Data Analytics 12.11.18

Modern Data Warehouse Fundamentals Part 3

Modern Data Warehouse Fundamentals Part 2

Modern Data Warehouse Fundamentals Part 1

Extending Cloudera SDX beyond the Platform

Federated Learning: ML with Privacy on the Edge 11.15.18

Analyst Webinar: Doing a 180 on Customer 360

Build a modern platform for anti-money laundering 9.19.18

Introducing the data science sandbox as a service 8.30.18

Último

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

Ransomware_Q4_2023. The report. [EN].pdfOverkill Security

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

A Year of the Servo Reboot: Where Are We Now?Igalia

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@

Corporate and higher education May webinar.pptxRustici Software

Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays

Apidays New York 2024 - The value of a flexible API Management solution for O...apidays

Architecting Cloud Native ApplicationsWSO2

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood

AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data

1. You’ve got HBase How AOL Mail handles Big Data Presented at HBaseCon May 22, 2012

2. The AOL Mail System Over 15 years old Constantly evolving 10,000+ hosts 70+ Million mailboxes 50+ Billion emails A technology stack that runs the gamut Presented at HBaseCon 2012 Page 2

3. What that means… Lots of data Lots of moving parts Tight SLAs Mature system + Young software = Tough marriage We don’t buy “commodity” hardware Engrained Dev/QA/Prod product lifecycle Somewhat “version locked” to tried-and-true platforms Expect service outages to be quickly mitigated by our NOC w/out waiting for an on-call Presented at HBaseCon 2012 Page 3

4. So where does HBase fit? It’s a component, not the foundation Currently used in two places Being evaluated for more It will remain a tool in our diverse Big Data arsenal Presented at HBaseCon 2012 Page 4

5. An Activity Profiler

6. An “Activity Profiler” Watches for particular behaviors Designed and built in 6/2010 Originally “vanilla” Hadoop 0.20.2 + HBase 0.90.2 Currently CDH3 1.4+ Million Events/min 60x 24TB (raw) DataNodes w/ local RegionServers 15x application hosts Is an internal-only tool Used by automated anti-abuse systems Leveraged by data analysts for adhoc queries/MapRed Presented at HBaseCon 2012 Page 6

7. An “Activity Profiler” Presented at HBaseCon 2012 Page 7

8. Why the “Event Catcher” layer? Has to “speak the language” of our existing systems Easy to plug an HBase translator in to existing data feeds Hard to modify the infrastructure to speak HBase Flume was too young at the time Presented at HBaseCon 2012 Page 8

9. Why batch load via MapRed? Real time is not currently a requirement Allows filtering at different points Allows us to “trigger” events Designed before coprocessors Early data integrity issues necessitated “replaying” Missing append support early on Holes in the Meta table Long splits and GC pauses caused client timeouts Can sample data into a “sandbox” for job development Makes pig, hive, and other MapRed easy and stable We keep the raw data around as well Presented at HBaseCon 2012 Page 9

10. HBase and MapRed can live in harmony Bigger than “average” hardware 36+GB RAM 8+ cores Proper system tuning is essential Good information on tuning Hadoop is prolific, but… XFS > EXT JBOD > RAID As far as HBase is concerned… Just go buy Lars’ book Careful job development, optimization is key! Presented at HBaseCon 2012 Page 10

11. Contact History API

12. Contact History API Services a member-facing API Designed and built in 10/2010 Modeled after the previous application Built by a different Engineering team Used to solve a very different problem 250K+ Inserts/min 3+ Million Inserts/min during MapRed 20x 24TB (raw) DataNodes w/ local RegionServers 14x application hosts Leverages Memcached to reduce query load on HBase Presented at HBaseCon 2012 Page 12

13. Contact History API Presented at HBaseCon 2012 Page 13

14. Where we go from here

15. Amusing mistakes to learn from Exploding regions Batch inserts via MapRed result in fast, symmetrical key space growth Attempting to split every region at the same time is a bad idea Turning off region splitting and using a custom “rolling region splitter” is a good idea Take time and load into consideration when selecting regions to split Backups, backups, backups! You can never have to many Large, non-splitable regions tell you things Our key space maps to accounts Excessively large keys equal excessively “active” accounts Presented at HBaseCon 2012 Page 15

16. Next-generation model Presented at HBaseCon 2012 Page 16

17. Thanks! Presented at HBaseCon 2012 Page 17

Notas del editor

Introduce myself: I am Chris Niemira, a Systems Administrator with AOL. I run a number of Hadoop and HBase clusters, along with numerous other components of the AOL Mail system. I spend my days doing work that ranges from system patches, code installs and troubleshooting, to capacity planning, performance and bottleneck analysis, and kernel tuning. I do a little engineering, a little design work, an on-call rotation, and every once in a while I get to play with Hadoop/HBase.
The AOL Mail System has been around for a long time, and went through a major re-architecture between 2010 – 2011. It’s not a 15 year old code base, and we evolve it constantly. We service over 70 million mailboxes in the AOL Mail environment today. That includes supporting our paying members, in addition to free accounts. Of course, member experience is our #1 priority. We have all kinds of tools in our proverbial utility belt, as we believe in trying to use the right thing for the right job.
It means we’re reasonably large. But we’ve also been operating “at scale” for a long time now. While we have been doing “Big Data” for a lot of years now, we got to our current size by operating a certain way: Rigid quality and change controls, lots of documentation, emphasis on uptime. As we have shifted toward being more agile, we have had to be careful with unproven technologies. HBase, for all the buzz, is still pretty young and error-prone. Some of the realities for dealing with a production Hadoop/HBase system would seemingly require a departure from our traditional mentality. Like everyone, we require stability and robustness of our production applications, but our way of getting there has had to change. Above all, however, we must still take care of our customers, so it’s a balancing act for us.
So HBase is one of the tools we’ve added to kit in the last few years that’s still proving itself. We’ve got two applications running and we’ve identified a few other places where it’s a good candidate to utilize. This isn’t to say that we are not using it for important things, but it’s not at the core of our system. We’ve managed to build a relatively stable platform over time. There’s a lot of scripted recovery, and a lot of proactive monitoring in our environment, and for the most part when there are problems, they are mitigated or resolved without even the involvement of an admin.
AOL Mail first stared looking into Hadoop and HBase back in mid 2010. Other business units in our company had been working with Hadoop for a while before then, and a little of intra-company consulting convinced us to give HBase a try. This system is one component our our anti-abuse strategy. I can’t reveal exactly what it does, but I can tell you a bit about how the HBase stuff happens. In addition to the 60 node cluster and the application servers there’s the ancillary junk which includes NameNodes (2x), HMasters (2x), Zookeepers (3x). The app hosts and Zookeepers, which are currently physicals, are being switched to virtual devices in our internal cloud.
This is what the application looks like. The “Service Layer” comprises various components within the AOL Mail system. They speak their own protocols and send messages to an “Event Catcher” which decodes the stream, and writes a log to local disk. That log is imported in Hadoop (and can optionally be sampled to a development sandbox at the same time) and then further cooked via MapRed which ultimately outputs rows into HBase, and can send triggers to external applications. One thing we can do at this point (not illustrated) is populate a memcache which may be used by client apps to reduce load on some HBase queries.
The real answer is that when we first started, we couldn’t make streaming a million and a half rows a minute work out with the Hbase we had two years ago. At the time, it was easier for us to build the batch loader, which has proven to have a few interesting advantages. Our next-generation model will rely on HBase itself being more stable, and will heavily leverage coprocessors to do a lot of what we’re doing now with MapReduce.
A big obstacle for us is getting MapReduce and HBase to play nicely together. From what I’ve seen, bigger hardware is starting to become more popular for running HBase, and we believe it’s essential. We’ve floated between an 8 – 16 GB heap for the RegionServer. For this application, I believe we’re currently using 16. Getting GC tuning and the IPC timeouts in HBase/Zookeeper correct are critically important. System tuning is also very important. Depending on which flavor of Linux you’re running, the stock configuration may be completely inappropriate for the needs of an HBase/Hadoop complex. In particular, look at the kenel’s IO scheduler, and VM settings.
This application was built a short while after we started our trial-by-fire with HBase on the previous application. It was a different development team with input from the engineers working on the previously discussed application. This application has the same “event catcher” layer for the same reasons, but it has always written directly to HBase. We import data into a “raw” table and then process that table with MapReduce writing the output into a “cooked” table. There’s a much lower number of events here, but it spikes up significantly during the MapReduce phase. It’s exactly the same class of hardware with the same ancillary junk as the previous app. Most of the query load is actually farmed out of memcache.
Yes, this is a relatively straight-forward design.
Exploding tables might be a better name for this, since it’s an across-the-board sort of thing. Backups, of course, are obvious. We’ve run into three catastrophic data loss events, actually once each with three different clusters. The first was during a burn-in phase for the Contact History application I described earlier. At that time the data it had accumulated over the week or so that it had been running wasn’t considered important essential so we were able to truncate and move along. Another time, for a separate plain Hadoop cluster, an unintentionally malicious user actually managed to delete my backups and corrupt the namenode’s event log. Luckily that data was restorable from another source. The last time was with the Activity Profiler application. Basically, having data backups saved the day.
This is our working model for a next-generation HBase system It is currently being prototyped with the cooperation of our Engineering and Operations teams The key design concept is to allow for a great deal of flexibility and re-use, and it centers around this idea of installing a fairly dynamic rules-engine at both the event collection and event storage layers. Hopefully will be presenting it soon

HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data

Similar a HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data (20)

Más de Cloudera, Inc.

Más de Cloudera, Inc. (20)

Último

Último (20)

HBaseCon 2012 | You’ve got HBase! How AOL Mail Handles Big Data

Notas del editor