Indexing thousands of writes per second with redis

Indexing Thousands of
Writes per Second
with Redis
Paul Dix
paul@pauldix.net
@pauldix
http://pauldix.net

Benchmark Solutions*
who I work for

* we’re hiring, duh! email: paul@benchmarksolutions.com

had a spiel about the
suit..

Before we get to the
talk...

That bastard stole my
thunder!

You don’t think of suit
wearing badass

Señor Software Engineer

the janitors, cleaning
staff, and the 18 year
old intern get this title
too...

Vice President

How could I wear
anything but a suit?

Finance + VP + Suit =
douchebag

Distraction

http://www.ﬂickr.com/photos/33562486@N07/4288275204/

Bet


Bar

@ﬂavorjones coauthor
of Nokogiri

credit: @ebiltwin

and I was like: “sure, for
a beer”

and Mike was all like:
“ok, but that’s lame”

“let’s make it
interesting. Loser wears
my daughter’s fairy
wings during your talk”

Sure, that’ll be funny
and original...

Nic may have done it as
part of the talk, but he
didn’t lose a bet...

put wings on in red-faced
shame.

So who won?

REXML
(ActiveRecord.from_xml)
~ 400x slower

However, the bet said
nothing about my slides

Aaron Patterson

father of nokogiri

3 slides with
@tenderlove’s picture?
wtf?!!

Called Mike:
“Nokogiri’s mother”

Lesson:
Learn Photoshop
(this shit is embarrassing)

Anyway, the point of
the suit...

Sustained write load of
~ 5k per second

Redis + other data
stores = bad assery

@ﬂavorjones

and maybe about Mike
being some kind of virgin
mother

credit: @ebiltwin

Lesson:
Be speciﬁc about the
terms of a bet
(because at least someone
can use Photoshop)

Created by
Salvatore Sanﬁlippo
@antirez

Keys
require 'redis'
redis = Redis.new

redis.set("mike", "grants wishes") # => OK
redis.get("mike") # => "grants wishes"

Counters

redis.incr("fairy_references") # => 1
redis.decr("dignity") # => -1
redis.incrby("fairy_references", 23) # => 24
redis.decrby("dignity", 56) # => 57

Expiration

redis.expire("mike", 120)
redis.expireat("mike",
1.day.from_now.midnight)

Hashes
redis.hset("paul", "has_wings", true)
redis.hget("paul", "has_wings") # => "true"
redis.hmset("paul",
:location, "Baltimore",
:twitter, "@pauldix")
redis.hvals("paul") # => {
# "has_wings"=>"true",
# "location"=>"Baltimore",
# "twitter"=>"@pauldix" }
redis.hlen("paul") # => 3

Lists
redis.lpush("events", "first") # => 1
redis.lpush("events", "second") # => 2
redis.lrange("events", 0, -1) #
=> ["second", "first"]
redis.rpush("events", "third") # => 3
=> ["second", "first", "third"]
redis.lpop("events") # => "second"
=> ["first", "third"]
redis.rpoplpush("events", "fourth") #
=> "third"

Sets
redis.sadd("user_ids", 1) # => true
redis.scard("user_ids") # => 1
redis.smembers("user_ids") # => ["1"]
redis.sismember(1) # => true
redis.srem("user_ids", 1) # => true

Sets Continued
# know_paul ["1", "3", "4"]
# know_mike ["3", "5"]
redis.sinter("know_paul", "know_mike") # =>
["3"]
redis.sdiff("know_paul", "know_mike") # =>
["1", "4"]
redis.sdiff("know_mike", "know_paul") # =>
["5"]
redis.sunion("know_paul", "know_mike") # =>
["1", "3", "4", "5"]

Sorted Sets
redis.zadd("wish_counts", 2, "paul") # =>
true
redis.zcard("wish_counts") # => 1
redis.zismember("paul") # => true
redis.zrem("wish_counts", "paul") # => true

Sorted Sets Continued
redis.zadd("wish_counts", 12, "rubyland")
redis.zrange("wish_counts", 0, -1) # =>
["paul", "rubyland"]
redis.zrange("wish_counts", 0, -1,
:with_scores => true) # =>
["paul", "2", "rubyland", "12"]
redis.zrevrange("wish_counts", 0, -1) # =>
["rubyland", "paul"]

Sorted Sets Continued
redis.zrevrangebyscore("wish_counts",
"+inf", "-inf") # =>
["rubyland", "paul"]
"+inf", "10") # => ["rubyland"]
"+inf", "-inf", :limit => [0, 1]) # =>
["rubyland"]

Lesson:
Keeping examples consistent
with a stupid story is hard

pubsub, transactions,
more commadnds.

not covered here, leave me
alone

There’s more

Faster than a greased
cheetah

or a Delorean with 1.21
gigawatts

No Wishes Granted

f-you, f-ball!

Lesson:
Getting someone to pose
is easier
(also, learn Photoshop)

Still monolithic
not horizontally
scalable, oh noes!

Can shard in client like
memcached
I know haters, you can
do this

Still susceptible to
partitions

However, it’s wicked
cool

Don’t

you probably don’t need
it


and you’re all like,
“Paul, ...”

But I have to SCALE!

Trust me, I’m wearing a
suit

that means I have
authority and...

I know shit

and still you cry:

But no, really...

Sad SQL is Sad
thousands of
writes per
second? No me
gusto!

Average write load of
3k-5k writes per
second

LVC
redis.hset("bonds|1", "bid_price", 96.01)
redis.hset("bonds|1", "ask_price", 97.53)
redis.hset("bonds|2", "bid_price", 90.50)
redis.hset("bonds|2", "ask_price", 92.25)
redis.sadd("bond_ids", 1)
redis.sadd("bond_ids", 2)

SORT
redis.sort("bond_ids",
:get => "bonds|*->bid_price") # =>
["96.01", "90.5"]

:get => "bonds|*->bid_price",
:get => "#") # => ["1", "2"]

:get => "#", :limit => [0, 1]) # => ["1"]

SORT Continued
:get => "#", :order => "desc") # =>
["2", "1"]
redis gem returns the
redis.sort("bond_ids", results in reverse order.
wha?!

:get => "bonds|*->ask_price",
:get => "#") # => ["2", "1"]
:get => "bonds|*->ask_price",
:get => "#",
:store => "bond_ids_sorted_by_ask_price",
:expire => 300) # => 2

Getting Records
ids = redis_sort_results.map {|id| id.to_i}
bonds = Bond.find(ids) note that prices (high
write volume data) come
bond_ids_to_bond = {} from elsewhere (not the
SQL db)

bonds.each do |bond|
bond_ids_to_bond[bond.id] = bond
end

results = ids.map do |id|
bond_ids_to_bond[id]
end

Getting From Redis
redis.hset("bonds|2", "values", data.to_json)

raw_json = redis.sort("bond_ids", However, then you have
to worry about keeping
the t wo data stores in
:get => "bonds|*->bid_price", sync. We’ll talk about it
later

:get => "bonds|*->values")

results = raw_json.map do |json|
DataObject.new(JSON.parse(json))
end

Use a List
O(1) constant time
complexity to add

O(start + n) for reading

N = 500
size = redis.lpush("bond_trades|1", trade_id)

# roll the index
redis.rpop("bond_trades|1") if size > N

# get results
redis.lrange("bond_trades|1", 0, 49)

Indexing Events Since
Time T

Using a List
redis.lpush("bond_trades|1|2011-05-19-10",
trade_id)

redis.lrange("bond_trades|1|2011-05-19-10",
0, -1)
results = redis.pipelined do
0, -1)
0, -1)
end.flatten

Rolling the Index
# when something trades
redis.sadd("bonds_traded|2011-05-19-10",
bond_id)

# cron task to remove old data
traded_ids = redis.smembers(
"bonds_traded|2011-05-19-10")
keys = traded_ids.map do |id|
"bond_trades|#{id}|2011-05-19-10"
end
keys << "bonds_traded|2011-05-19-10"
redis.del(*keys)

Using a Sorted Set
# Time based Rolling Index using sorted set
redis.zadd("bond_trades|1", O(log(n)) writes

Time.now.to_i, trade_id) O(log(n) + M) reads

# last 20 trades
redis.zrevrange("bond_trades|1", 0, 20)

# trades in the last hour
redis.zrevrangebyscore("bond_trades|1",
"+inf", 1.hour.ago.to_i)

Rolling the Index
# cron task to roll the index
bond_ids = redis.smembers("bond_ids")
remove_since_time = 24.hours.ago.to_i

redis.pipelined do
bond_ids.each do |id|
redis.zremrangebyscore(
"bond_trades|#{id}", "-inf",
remove_since_time)
end
end

Or roll on read or
write
redis.zadd("bond_trades|1",
Time.now.to_i, trade_id)

redis.zremrangebyscore("bond_trades|1",
"-inf", 24.hours.ago)

Indexing N Values
redis.zadd("highest_follower_counts", 2300, 20)
redis.zadd("lowest_follower_counts", 2300, 20)
# rolling the indexes
# keep the lowest
size = redis.zcard("lowest_follower_counts")
redis.zremrangebyrank("lowest_follower_counts",
N, -1) if size > N

# keep the highest
size = redis.zcard("highest_follower_counts")
redis.zremrangebyrank("highest_follower_counts",
0, size - N) if size > N

rolling requires more
roundtrips

2 roundtrips
(only with complex
pipelining)

Roll indexes with only
one trip

Tweet to @antirez that
you want scripting

database transactions
can’t help you here, you’ll
have to put them into
your application logic

No transactions,
application logic

Could result in index
inconsistency

Write Index Times

redis.set("last_bond_trade_indexed",
trade.created_at.to_i)

Restore Each Index
time_int = redis.get("last_bond_trade_indexed").to_i
index_time = Time.at(time_int)
trades = Trade.where(
"created_at > :index_time AND created_at <= :now",
{:index_time => index_time, :now => Time.now})

trades.each do |trade| list you have to run while
not writing new data.
trade.index_in_redis Set can be made to run
end while writing new data

Sets don’t work with
intersection, union, or
diff.

SORT won’t work unless
all those keys fall on the
same server

Easy to Scale
(consistent hashing)

Use Only if you have
to!

Index the minimum to
keep memory footprint
down
use rolling indexes, don’t
keep more shit in
memory than you need.
Users won’t page through
20 pages of results, so
don’t store that many

Plan for disaster and
consistency checking

Lesson:
Never trust a guy in a suit
not pull a fast one on you

Thanks!
Paul Dix
paul@pauldix.net
@pauldix
http://pauldix.net

Indexing thousands of writes per second with redis

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Indexing thousands of writes per second with redis

Similar a Indexing thousands of writes per second with redis (20)

Más de pauldix

Más de pauldix (7)

Último

Último (20)

Indexing thousands of writes per second with redis

Notas del editor