Cloud Connect 2012, Big Data @ Netflix

Big Data @

Using Big Data to Grow our Business
& Retain our Customers.

Jerome Boulon
Lead Architect, Hadoop Big Data Infrastructure

February 15, 2012
jboulon@netflix.com

Big Data @ Netflix
Offline analysis:
•  Honu: Scalable log analysis system to gain business
insights:
–  Errors logs (unstructured logs)
–  Statistical logs & Performance logs
–  Etc

Online analysis:
•  Cassandra for all online activities and user facing
data
–  A/B testing (test allocation, metadata)
–  Service level Configuration
–  etc
2

Overview
Data collection pipeline

Applica'on
Collectors

Hive
M/R

Data processing pipeline
3

Honu - Structured Log API
Using
Annota+ons
Using the Key/Value API
•  Convert Java Class to Hive •  Produce the same result as
Table dynamically Annotation
•  Add/Remove column •  Avoid unnecessary object
•  Supported java types: creation
•  All primitives •  Fully dynamic
•  Map •  Thread Safe
•  Object using the
toString method

Honu, What you get:

log.logEvent(myObject)
Hive table
movieId customerId timestamp hostname

Select customerId, count(1) from MyTable group by customerId;

December 2009
Collectors

–  POC for Streaming analysis Applica'on

–  Single AWS zone
–  1 application
–  60 Millions events/Day
–  50 clients
–  Small Hadoop cluster Oracle

–  1 Map/Reduce
–  1 Table
M/R

Feb 2012
40+ Billion events/Day
8+ tables with 1+TB/Day
100+ smaller tables
Self-serve:
à No DBA
à No Pre-provisioning

à Fully integrated with Hive
- Multi Regions deployments
- Transparent to our engineers
- Streaming based solution
- Zero configuration
- 7000+ clients
- Built-in: Netflix Hive warehouse
- Fail-Over
- Load balancing

à One central Data warehouse
à Hourly/Daily reports
à Data retention/expiration

Traceability & Performance
analysis
•  Track service level call
–  Instrument low level HTTP client
–  Calls graph
–  Request processing vs Perceive latency
–  Payload marshalling/unmarshalling
- duration, size, etc
–  Service Result
- Status, Error code, Exception, etc

Diagnostic Information
•  Collect latency information for all external
operations
•  If Latency > threshold log to Honu:
–  AWS Region & Zone
–  Instance
–  Service details
•  Open Jira/Ticket & Attach diagnostic info

Mix Offline and Online Data
Offline data Specific conditions
- Fire & forget - Online Data availability is not mandatory
- Scale to very large volumes - If exist, data could be useful online
- Cost effective - Only a subset useful Online
- Ready to pay a little bit more

Special collectors Customer support
- All data goes to Hive - Browsing history
- A subset goes to a real-time system - Historical & non-critical actions
- Still cost effective Debug
- Push validation
- Root cause analysis

Honu Realtime usages
•  Movie playback experience •  Customer Support
–  Video quality –  Historical usage
–  Network issue –  Last activity

•  Errors Summary •  Launch Reports
–  Error tracking per service –  Push validation
–  Error tracking per device –  Root cause analysis

Honu Realtime - Architecture
Realtime Data collection pipeline

Applica'on
Collectors

Real'me

Access

Realtime
System M/R

A/B Testing
Test: An experiment where several
competing behaviors are
implemented and compared.

Cell: different experiences within a
test that are being compared against
each other.

Allocation: a customer-specific
assignment to a cell within a test

Online data: Tracking 1 M customers per Test
- Cell Allocation > 1 Billion records information 8 tracking events per Day
- Test config: 1 entry/test/customer (example) ------------------------------------
100 Tests = 800 M events/ Day
3 Months = 72 B events

A/B Testing - Architecture
Online Data Offline Data

- Customer test allocation - Test tracking
- Metadata about the test Ex:
Ex: - Retention
- Start/End date - Engagement metrics
- UI directives
- Logging directives

Beacon Server

User behavior
- Client side interactions
- Search/Play/Stop/Pause
Ajax calls
Device monitoring
- Heartbeat
- Status & Key metrics Beacon
Beacon
Beacon

BI Integration
Three main technologies

•  Teradata (Data center)
•  Hive (Cloud)
•  Cassandra (Cloud)

Hive ß à BI
–  Dimension tables (daily export from Teradata)
–  Hourly/Daily Hive summary queries
–  Hourly/Daily export from Hive to BI
•  Queries runs in the cloud
•  Aggregated result goes back to our BI solution

Cassandra à BI

•  Use Cassandra backups to run analytics
•  Export SSTable to Hadoop
•  Pig to:
–  Parse SSTable
–  Extract/Group required information
•  Load the result back to Teradata

jboulon@gmail.com
www.linkedin.com/in/jboulon

Cloud Connect 2012, Big Data @ Netflix

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (6)

Similar a Cloud Connect 2012, Big Data @ Netflix

Similar a Cloud Connect 2012, Big Data @ Netflix (20)

Último

Último (20)

Cloud Connect 2012, Big Data @ Netflix