4. 4
• Company
founded
in
2001
• 350+
employees
world
wide
• 170M
unique
visitors
per
month
• 100K
unique
visitors
per
month
on
spilgames.com
Facts
5. 5
Geographic Reach
170
Million
Monthly
Ac@ve
Users(*)
Source:
(*)
Google
Analy3cs,
December
2011
• Over
40
localized
portals
in
19
languages
• Focus
on
casual
and
social
games
• 170M
MAU
per
month
(30M
YoY
growth)
• Over
40M
registered
users
15. 15
• Databases
maintained
by
Systems
Engineering
• No
focus
on
performance,
structure
or
backups
• Looking
only
one
or
two
weeks
into
the
future
Startup
16. 16
• Mul@ple
migra@ons
to
new
hardware
• Ping-‐pong
on
Master-‐Master
setups
• Lack
of
insight
into
performance
issues
Lessons
Read+write
17. 17
• Plan
ahead
up
to
three
months
• Improve
database
pla<orm
• Reduce
number
of
repe@@ve
tasks
• Write
them
down
step
by
step
(wiki)
• Automate
where
possible
• Improve
monitoring
• A
single
monitoring
system
is
not
enough!
• Forecast
growth
• Week
/
Month
/
Year
• Look
back
and
evaluate!
• Extend
department
Professionalize
18. 18
• Scaling
the
LDAP
pla<orm
• LDAP
replaced
by
MySQL
based
solu@on
(with
help
from
Percona)
LDAP isn’t suitable for the web
MMMDMM
19. 19
• LVM
snapshot
method
• took
4
hours
on
average
with
manual
interven@on
• Innobackupex
+
netcat
+
tar
+
script
=
quick
cloning
• Takes
about
1
hour
per
100GB
• Foolproof
• Can
be
run
on
ac@ve
masters
(if
necessary)
Cloning
20. 20
• Different
monitoring
systems
give
different
insights
• Different
angles/metrics/purposes
• Early
problem
detec@on
• Signal
abnormal
use
which
could
cause
outage
Improve monitoring
21. 21
• Uneven
growth:
• Ac@ve
master
handling
all
write
requests
• Ver@cal
scaling
• Write
only
• More
writes
than
reads
• SOA
problems
• Connec@on
spawning
• Open
file
descriptors
Growing pains
22. 22
• An@cipate
more
than
one
year
in
advance
• Acknowledge
shortcomings/problems,
look
for
solu@ons
or
alterna@ves
• Don't
commit
to
one
single
solu@on!
• Be
flexible!
• Plan
for
capacity
per
instance,
not
for
growth
alone!
• Start
thinking
globally!
Outgrowing our startup phase
25. 25
• Natural
growth
• Grown
out
of
necessity
for
more
func@onality
• Adding
func@onality
means
more
interac@on
• Separa@on
of
database
func@on
• Profiles
• Highscores
• Comments
• User
Generated
Content
• etc
Functional sharding
26. 26
• KISS
• Problem
isola@on
Advantages
Disadvantages
• Uneven
growth
• Difference
in
query
panerns
• No
data
consistency
• No
clear
ownership
of
data
• Capacity
planning
on
total
number
of
reads/writes
• Horizontal
scaling
is
difficult
28. 28
• What
is
the
bucket
model?
• It
is
an
abstrac@on
layer
between
the
database
and
the
datamodel
• Each
record
has
one
unique
owner
anribute
(GID)
• The
GID
(Global
IDen@fier)
iden@fies
different
data
types
• Different
buckets
per
func@on
• Anributes
contain
record
data
• Anributes
do
not
have
to
correspond
to
schema
Bucket model
29. 29
• Flexibility
• Database
backend
independent
• Seamless
schema
changes
and
upgrades
• Sharded
on
both
func@onal
and
GID
level
• Even
distribu@on
of
queries
possible
• Capacity
planning
on
number
and
type
of
en@@es
• Asynchronous
writes
possible
• Transparent
data
migra@on
Advantages
Disadvantages
• Harder
to
find
data
• At
least
two
lookups
needed!
• Datawarehousing
needs
a
different
approach
30. 30
• Globally
sharded
on
GID
• (local)
GID
Lookup
How do GIDs work?
GID
lookup
Shard 1 Shard 2
Persis
stora
33. 33
• Each
cluster
of
two
masters
will
contain
two
shards
• Data
is
wrinen
interleaved
• HA
for
both
shards
• No
warmup
needed
• Both
masters
ac@ve
and
“warmed
up”
• Slave
added
for
backups
and
Datawarehouse
Master-Master Sharding
SSP
Shard
1
Shard
2
34. 34
• Erlang
cluster
with
many
workers
• Every
GID
has
its
own
worker
process
• (Inter)cluster
communica@on
• (Near)
linear
scalability
How are we implementing this SSP?
35. 35
• Erlang
node
caching
• Mul@ple
backend
connectors
• MySQL
library
• Handlersockets
• Any
other
connectors
if
needed
• Connec@on
pooling
Advantages
Disadvantages
• NOT
SEXY?
(
hnp://spil.com/notsexy
)