NoSQL at Twitter (NoSQL EU 2010)

NoSQL at Twitter
Kevin Weil -- @kevinweil
Analytics Lead, Twitter

April 21, 2010

TM

Introduction
‣ How We Arrived at NoSQL: A Crash Course
‣ Collecting Data (Scribe)
‣ Storing and Analyzing Data (Hadoop)
‣ Rapid Learning over Big Data (Pig)
‣ And More: Cassandra, HBase, FlockDB

My Background
‣ Studied Mathematics and Physics at Harvard, Physics at
Stanford
‣ Tropos Networks (city-wide wireless): mesh routing algorithms,
GBs of data
‣ Cooliris (web media): Hadoop and Pig for analytics, TBs of data
‣ Twitter: Hadoop, Pig, HBase, Cassandra, machine learning,
visualization, social graph analysis, soon to be PBs data

Data, Data Everywhere
‣ Twitter users generate a lot of data
‣ Anybody want to guess?

‣ 7 TB/day (2+ PB/yr)

‣ 10,000 CDs/day

‣ 10,000 CDs/day
‣ 5 million floppy disks

‣ 10,000 CDs/day
‣ 300 GB while I give this talk

‣ 10,000 CDs/day
‣ 300 GB while I give this talk
‣ And doubling multiple times per year

Syslog?
‣ Started with syslog-ng
‣ As our volume grew, it didn’t scale

Syslog?
‣ Started with syslog-ng
‣ As our volume grew, it didn’t scale
‣ Resources overwhelmed
‣ Lost data

Scribe
‣ Surprise! FB had same problem, built and open-sourced Scribe
‣ Log collection framework over Thrift
‣ You write log lines, with categories
‣ It does the rest

Scribe
‣ Runs locally; reliable in network outage

FE FE FE

Scribe
‣ Nodes only know downstream
FE FE FE
writer; hierarchical, scalable

Agg Agg

Scribe
‣ Nodes only know downstream
FE FE FE
writer; hierarchical, scalable
‣ Pluggable outputs

Agg Agg

File HDFS

Scribe at Twitter
‣ Solved our problem, opened new vistas
‣ Currently 30 different categories logged from multiple sources
‣ FE: Javascript, Ruby on Rails
‣ Middle tier: Ruby on Rails, Scala
‣ Backend: Scala, Java, C++

Scribe at Twitter
‣ We’ve contributed to it as we’ve used it
‣ Improved logging, monitoring, writing to HDFS, compression
‣ Continuing to work with FB on patches
‣ GSoC project! Help make it more awesome.

• http://github.com/traviscrawford/scribe
• http://wiki.developers.facebook.com/index.php/User:GSoC

How do you store 7TB/day?
‣ Single machine?
‣ What’s HD write speed?

‣ Single machine?
‣ ~80 MB/s

‣ Single machine?
‣ ~80 MB/s
‣ 24.3 hours to write 7 TB

‣ Single machine?
‣ ~80 MB/s
‣ 24.3 hours to write 7 TB
‣ Uh oh.

Where do I put 7TB/day?
‣ Need a cluster of machines

Where do I put 7TB/day?
‣ Need a cluster of machines

‣ ... which adds new layers of complexity

Hadoop
‣ Distributed file system
‣ Automatic replication, fault tolerance

Hadoop
‣ Distributed file system
‣ Automatic replication, fault tolerance
‣ MapReduce-based parallel computation
‣ Key-value based computation interface allows for wide
applicability

Hadoop
‣ Open source: top-level Apache project
‣ Scalable: Y! has a 4000 node cluster
‣ Powerful: sorted 1TB of random integers in 62 seconds

‣ Easy packaging: free Cloudera RPMs

MapReduce Workflow
Inputs

Map
Shufﬂe/Sort ‣ Challenge: how many tweets per user,
given tweets table?
Map
Outputs ‣ Input: key=row, value=tweet info
Map Reduce
‣ Map: output key=user_id, value=1
Map Reduce
‣ Shuffle: sort by user_id
Map Reduce
‣ Reduce: for each user_id, sum
Map ‣ Output: user_id, tweet count
Map ‣ With 2x machines, runs 2x faster

Two Analysis Challenges
‣ 1. Compute friendships in Twitter’s social graph
‣ grep, awk? No way.
‣ Data is in MySQL... self join on an n-billion row table?
‣ n,000,000,000 x n,000,000,000 = ?

‣ 1. Compute friendships in Twitter’s social graph
‣ grep, awk? No way.
‣ Data is in MySQL... self join on an n-billion row table?
‣ n,000,000,000 x n,000,000,000 = ?
‣ I don’t know either.

‣ 2. Large-scale grouping and counting?
‣ select count(*) from users? Maybe...
‣ select count(*) from tweets? Uh...
‣ Imagine joining them...
‣ ... and grouping...
‣ ... and sorting...

Back to Hadoop
‣ Didn’t we have a cluster of machines?

Back to Hadoop
‣ Hadoop makes it easy to distribute the
calculation
‣ Purpose-built for parallel computation
‣ Just a slight mindset adjustment

Back to Hadoop
‣ Hadoop makes it easy to distribute the
calculation
‣ Purpose-built for parallel computation
‣ Just a slight mindset adjustment
‣ But a fun and valuable one!

Analysis at scale
‣ Now we’re rolling
‣ Count all tweets: 12 billion, 5 minutes
‣ Hit FlockDB in parallel to assemble social graph aggregates
‣ Run pagerank across users to calculate reputations

But...
‣ Analysis typically in Java
‣ “I need less Java in my life, not more.”

But...
‣ Single-input, two-stage data flow is rigid

But...
‣ Projections, filters: custom code

But...
‣ Joins are lengthy, error-prone

But...
‣ n-stage jobs hard to manage

But...
‣ n-stage jobs hard to manage
‣ Exploration requires compilation!

Pig
‣ High-level language
‣ Transformations on sets of records
‣ Process data one step at a time
‣ Easier than SQL?

Why Pig?
‣ Because I bet you can read the following script.

A Real Pig Script

‣ Now, just for fun... the same calculation in vanilla Hadoop MapReduce.

Pig Democratizes Large-scale Data
Analysis
‣ The Pig version is:
‣ 5% of the code

Analysis
‣ 5% of the code
‣ 5% of the time

Analysis
‣ 5% of the code
‣ 5% of the time
‣ Within 25% of the execution time

One Thing I’ve Learned
‣ It’s easy to answer questions
‣ It’s hard to ask the right questions


‣ Value the system that promotes innovation, iteration


‣ Value the system that promotes innovation, iteration
‣ More minds contributing = more value from your data

The Hadoop Ecosystem at Twitter
‣ Running Cloudera’s free distro, CDH2 and Hadoop 0.20.1

‣ Heavily modified Scribe writing LZO-compressed to HDFS
‣ LZO: fast, splittable compression, ideal for HDFS*

‣ * http://www.github.com/kevinweil/hadoop-lzo
‣

‣ Data either as flat files (logs) or in protocol buffer format (newer
logs, structured data, etc)
‣ Libs for reading/writing/more open-sourced as elephant-bird**

‣ ** http://www.github.com/kevinweil/elephant-bird

‣ Some Java-based MapReduce, a little Hadoop streaming


‣ Some Java-based MapReduce, some HBase, Hadoop streaming
‣ Most analysis, and most interesting analyses, done in Pig

Data?
‣ Semi-structured: apache logs
(search, .com, mobile), search query
logs, RoR logs, mysql query logs, A/B
testing logs, signup flow logging, and
on...

Data?
on...
‣ Structured: tweets, users, blocks,
phones, favorites, saved searches,
retweets, geo, authentications, sms,
3rd party clients, followings

Data?
on...
‣ Structured: tweets, users, blocks,
phones, favorites, saved searches,
retweets, geo, authentications, sms,
3rd party clients, followings
‣ Entangled: the social graph

Counting Big Data
‣ standard counts, min, max, std dev
‣ How many requests do we serve in a day?

Counting Big Data
‣ What is the average latency? 95% latency?
‣

Counting Big Data
‣ Group by response code. What is the hourly distribution?
‣

Counting Big Data
‣ How many searches happen each day on Twitter?
‣

Counting Big Data
‣ How many unique queries, how many unique users?
‣

Counting Big Data
‣ How many unique queries, how many unique users?
‣ What is their geographic distribution?

Counting Big Data
‣ Where are users querying from? The API, the front page, their
profile page, etc?
‣

Correlating Big Data
‣ probabilities, covariance, influence
‣ How does usage differ for mobile users?

‣ How about for users with 3rd party desktop clients?

‣ Cohort analyses

‣ Cohort analyses
‣ Site problems: what goes wrong at the same time?

‣ Cohort analyses
‣ Which features get users hooked?

‣ Cohort analyses
‣ Which features do successful users use often?

‣ Cohort analyses
‣ Search corrections, search suggestions

‣ Cohort analyses
‣ Search corrections, search suggestions
‣ A/B testing

‣ What is the correlation between users with registered phones
and users that tweet?

Research on Big Data
‣ prediction, graph analysis, natural language
‣ What can we tell about a user from their tweets?

‣ From the tweets of those they follow?

‣ From the tweets of their followers?

‣ From the ratio of followers/following?

‣ What graph structures lead to successful networks?

‣ What graph structures lead to successful networks?
‣ User reputation

‣ Sentiment analysis

‣ What features get a tweet retweeted?

‣ How deep is the corresponding retweet tree?

‣ Long-term duplicate detection

‣ Machine learning

‣ Language detection

‣ Language detection
‣ ... the list goes on.

‣ How well can we detect bots and other non-human tweeters?

HBase
‣ BigTable clone on top of HDFS
‣ Distributed, column-oriented, no datatypes
‣ Unlike the rest of HDFS, designed for low-latency
‣ Importantly, data is mutable

HBase at Twitter
‣ We began building real products based on Hadoop
‣ People search

HBase at Twitter
‣ People search
‣ Old version: offline process on a single node

HBase at Twitter
‣ People search
‣ New version: complex user calculations,
hit extra services in real time, custom indexing

HBase at Twitter
‣ People search
‣ New version: complex user calculations,
hit extra services in real time, custom indexing
‣ Underlying data is mutable
‣ Mutable layer on top of HDFS --> HBase

People Search
‣ Import user data into HBase

People Search
‣ Periodic MapReduce job reading from HBase
‣ Hits FlockDB, multiple other internal services in mapper
‣ Custom partitioning

People Search
‣ Data sucked across to sharded, replicated, horizontally
scalable, in-memory, low-latency Scala service
‣ Build a trie, do case folding/normalization, suggestions, etc

People Search
‣ Data sucked across to sharded, replicated, horizontally
scalable, in-memory, low-latency Scala service
‣ Build a trie, do case folding/normalization, suggestions, etc
‣ See http://www.slideshare.net/al3x/building-distributed-systems-
in-scala for more

HBase
‣ More products now being built on top of it
‣ Flexible, easy to connect to MapReduce/Pig

HBase vs Cassandra
‣ “Their origins reveal their strengths and weaknesses”

HBase vs Cassandra
‣ HBase built on top of batch-oriented system, not low latency

HBase vs Cassandra
‣ Cassandra built from ground up for low latency

HBase vs Cassandra
‣ HBase easy to connect to batch jobs as input and output

HBase vs Cassandra
‣ Cassandra not so much (but we’re working on it)

HBase vs Cassandra
‣ Cassandra not so much (but we’re working on it)
‣ HBase has SPOF in the namenode

HBase vs Cassandra
‣ Your mileage may vary
‣ At Twitter: HBase for analytics, analysis, dataset generation
‣ Cassandra for online systems

HBase vs Cassandra
‣ Your mileage may vary
‣ At Twitter: HBase for analytics, analysis, dataset generation
‣ Cassandra for online systems

‣ As with all NoSQL systems: strengths in different situations

FlockDB
‣ Realtime, distributed
social graph store
‣ NOT optimized for data mining

‣ Note: the following slides largely come from @nk’s more
complete talk at http://www.slideshare.net/nkallen/
q-con-3770885

FlockDB
‣ Realtime, distributed Intersection
Temporal

social graph store
‣ NOT optimized for data mining
‣ Who follows who (nearly 8
Counts
orders of magnitude!)
‣ Intersection/set operations
‣ Cardinality
‣ Temporal index

Set operations?
‣ This tweet needs to
be delivered to people
who follow both
@aplusk (4.7M
followers) and
@foursquare (53K followers)

Original solution
‣ MySQL table source_id destination-id

‣ Indices on source_id
20 12

and destination_id
29 12
‣ Couldn’t handle write
34 16
throughput
‣ Indices too large for RAM

Next Try
‣ MySQL still
‣ Denormalized
‣ Byte-packed
‣ Chunked
‣ Still temporally ordered

Next Try
‣ Problems
‣ O(n) deletes
‣ Data consistency challenges
‣ Inefficient intersections
‣ All of these manifested strongly
for huge users like @aplusk
or @lancearmstrong

FlockDB
‣ MySQL underneath still (like PNUTS from Y!)
‣ Partitioned by user_id, gizzard handles sharding/partitioning
‣ Edges stored in both directions, indexed by (src, dest)
‣ Denormalized counts stored

Forward Backward

source_id destination_id updated_at x destination_id source_id updated_at x

20 12 20:50:14 x 12 20 20:50:14 x

20 13 20:51:32 12 32 20:51:32

20 16 12 16

FlockDB Timings
‣ Counts: 1ms

FlockDB Timings
‣ Counts: 1ms
‣ Temporal Query: 2ms

FlockDB Timings
‣ Counts: 1ms
‣ Writes: 1ms for journal, 16ms for durability

FlockDB Timings
‣ Counts: 1ms
‣ Writes: 1ms for journal, 16ms for durability
‣ Full walks: 100 edges/ms

FlockDB is Open Source
‣ We will maintain a community at
‣ http://www.github.com/twitter/flockdb
‣ http://www.github.com/twitter/gizzard

‣ See Nick Kallen’s QCon talk for more
‣ http://www.slideshare.net/nkallen/q-
con-3770885

Cassandra
‣ Why Cassandra, for Twitter?

Cassandra
‣ Old/current: vertically, horizontally partitioned MySQL

Cassandra
‣ All kinds of caching layers, all application managed

Cassandra
‣ Alter table impossible, leads to bitfields, piggyback tables

Cassandra
‣ Hardware intensive, error prone, etc

Cassandra
‣ Not to mention, we hit MySQL write limits sometimes

Cassandra

‣ First goal: move all tweets to Cassandra

Cassandra
‣ Decentralized, fault-tolerant


Cassandra
‣ Flexible schema


Cassandra
‣ Flexible schema
‣ Elastic


Cassandra
‣ Flexible schema
‣ Elastic
‣ High write throughput


Eventually Consistent?
‣ Twitter is already eventually consistent

‣ Your system may be even worse

‣ Ryan’s new term: “potential consistency”
‣ Do you have write-through caching?
‣ Do you ever have MySQL replication failures?

‣ There is no automatic consistency repair there, unlike Cassandra

‣ There is no automatic consistency repair there, unlike Cassandra

‣ http://www.slideshare.net/ryansking/scaling-
twitter-with-cassandra

Rolling out Cassandra
‣ 1. Integrate Cassandra alongside MySQL
‣ 100% reads/writes to MySQL
‣ Dynamic switches for % dark reads/writes to Cassandra

‣ 2. Turn up traffic to Cassandra

‣ 3. Find something that’s broken, set switch to 0%

‣ 4. Fix it

‣ 4. Fix it
‣ 5. GOTO 2

Cassandra for Realtime Analytics
‣ Starting a project around realtime analytics
‣ Cassandra as the backing store
‣ Using, developing, testing Digg’s atomic incr patches
‣ More soon.

That was a lot of slides
‣ Thanks for sticking with me.

Questions? Follow me at
twitter.com/kevinweil

TM

NoSQL at Twitter (NoSQL EU 2010)

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a NoSQL at Twitter (NoSQL EU 2010)

Similar a NoSQL at Twitter (NoSQL EU 2010) (20)

Último

Último (20)

NoSQL at Twitter (NoSQL EU 2010)

Notas del editor