Building a backend for a successful social game is always challenging: It needs to service over 1 million users per day that generate 10.000 http requests per second or more whereas the vast majority of those requests are changing persistent state. Using a conventional technology stack that leads to over 50,000 database writes per second. Throughout the last two years half a dozen teams at Wooga have set out to build a backends for social games, each trying to improve on previous solutions. Each team was able to leverage experiences made by other teams but was free to choose their own technology stack and hosting environment. They also operated the game themselves in a DevOps way. This talk will trace back that evolution of backends: Starting out with a simple LAMP stack, first replacing PHP by Ruby, then replacing relational by NoSQL databases and ending up in maintaining stateful application servers utilizing Erlang OTP - and more. We will discuss limitations and problems we faced in live operation and show how later teams improved on the overall design.
3. Our games all look the same
Flash client Backend
Wednesday, March 7, 2012
4. Our games all look the same
Flash client
Game Session
Asynch. CommunicaEon
Wednesday, March 7, 2012
5. Our games all look the same
Backend
State Changes
ValidaEon
Persistence
Wednesday, March 7, 2012
6. But the scale is interesEng
14 billion requests / month
Wednesday, March 7, 2012
7. But the scale is interesEng
14 billion requests / month
Wednesday, March 7, 2012
8. But the scale is interesEng
14 billion requests / month
>100,000 DB operaEons / second
Wednesday, March 7, 2012
9. But the scale is interesEng
14 billion requests / month
>100,000 DB operaEons / second
>50,000 DB updates / second
Wednesday, March 7, 2012
10. 2 Developers to do it all
Typical team setup
4 product managers
4 ar0sts
4 frontend engineers
2 backend engineers
-‐ design, implementa6on, opera6on
Wednesday, March 7, 2012
16. Oct 2009 Jan 2010
Oct 2010
Wednesday, March 7, 2012
17. Oct 2009 Jan 2010
Oct 2010 Aug 2012
Wednesday, March 7, 2012
18. Architecture EvoluEon at Wooga
The Start
The Next Step
Best of Two Worlds
Company Values
Wednesday, March 7, 2012
19. Oct 2009: 1st team wanted good code quality
Good code quality
Easy to understand
Easy to test
Easy to refactor
Wednesday, March 7, 2012
20. Oct 2009: 1st team wanted good code quality
Good code quality
Easy to understand
Easy to test
Easy to refactor
Wednesday, March 7, 2012
21. EvoluEon I: Use Ruby (on Rails)
Oct 2009
Wednesday, March 7, 2012
22. A basic setup using sharding worked fine
lb
app app app app app app app app app
My My
SQL SQL
slave slave
Wednesday, March 7, 2012
23. 250K daily users
&$!!!$!!!"
%$#!!$!!!"
%$!!!$!!!"
#!!$!!!"
Life was good
!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"
Wednesday, March 7, 2012
24. 250K daily users
&$!!!$!!!"
%$#!!$!!!"
%$!!!$!!!"
#!!$!!!"
Life was good NO MORE
!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"
Wednesday, March 7, 2012
25. Welcome to 6 weeks of pain!
Heavy opEmizaEons were necessary
Wednesday, March 7, 2012
26. Welcome to 6 weeks of pain!
Heavy opEmizaEons were necessary
Numerous small fixes regarding DB config
Wednesday, March 7, 2012
27. Welcome to 6 weeks of pain!
Heavy opEmizaEons were necessary
Numerous small fixes regarding DB config
More shards
Wednesday, March 7, 2012
28. Welcome to 6 weeks of pain!
Heavy opEmizaEons were necessary
Numerous small fixes regarding DB config
More shards
Even more shards
Wednesday, March 7, 2012
29. Welcome to 6 weeks of pain!
Heavy opEmizaEons were necessary
Numerous small fixes regarding DB config
More shards
Even more shards
SpliBng the model to get more shards
Wednesday, March 7, 2012
30. Early sharding hell: 8 master and 8 slaves
lb
app app app app app app app app app
app app app app app app app app app
My My My My My My My My
SQL SQL SQL SQL SQL SQL SQL SQL
slave slave slave slave slave slave slave slave
Wednesday, March 7, 2012
31. At 500K daily users we were at a dead end
&$!!!$!!!"
%$#!!$!!!"
%$!!!$!!!"
#!!$!!!"
!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"
Wednesday, March 7, 2012
33. Jan 2010: Meanwhile at the 2nd team
Don’t break the bank
Make it faster
Make it cheaper
Make it simpler
Wednesday, March 7, 2012
34. Jan 2010: Meanwhile at the 2nd team
Don’t break the bank
Make it faster
Make it cheaper
Make it simpler
Wednesday, March 7, 2012
35. EvoluEon II: Use Redis as main database
Jan 2010
Oct 2009
Wednesday, March 7, 2012
36. If MySQL is a truck
Fast enough
Disk based
Robust
Fast enough disk based robust
Wednesday, March 7, 2012
37. If MySQL is a truck, Redis is a race car
Super fast
RAM based
Fragile
Super fast RAM based fragile
Wednesday, March 7, 2012
38. Bare metal for low latency!
lb
app app app app app app app
Re-‐ Re-‐
dis dis
disk
(S3)
Wednesday, March 7, 2012
39. How could we apply that knowledge?
&$!!!$!!!"
%$#!!$!!!"
%$!!!$!!!"
#!!$!!!"
!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"
Wednesday, March 7, 2012
48. Big and staEc data in MySQL, rest goes to Redis
256 GB data 60 GB data
10% writes 50% writes
hCp://www.flickr.com/photos/erix/245657047/
Wednesday, March 7, 2012
49. One team saved the other one
&$!!!$!!!"
%$#!!$!!!"
%$!!!$!!!"
#!!$!!!"
!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"
Wednesday, March 7, 2012
50. One team saved the other one
&$!!!$!!!"
%$#!!$!!!"
%$!!!$!!!"
#!!$!!!"
!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"
Wednesday, March 7, 2012
51. We now have more than 2 million users / day
&$!!!$!!!"
%$#!!$!!!"
%$!!!$!!!"
#!!$!!!"
!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"
Wednesday, March 7, 2012
52. We now have more than 2 million users / day
&$!!!$!!!"
%$#!!$!!!"
%$!!!$!!!"
AWS outage
#!!$!!!"
in Ireland
!"
'()*%!" +,-*%!" ./0*%!" +12*%%" '()*%%" +,-*%%" ./0*%%"
Wednesday, March 7, 2012
53. 10 single-‐points-‐of-‐failure -‐ no fun at all!
lb lb
app app app app app app app app app app app app app
app app app app app app app app app app app app app
app app app app app app app app app app app app app
My My My My My
redis redis redis redis redis
SQL SQL SQL SQL SQL
slave slave slave slave slave slave slave slave slave slave
Wednesday, March 7, 2012
71. Stateful servers are not as hard as you think
session
session
session
session
Wednesday, March 7, 2012
72. Stateful servers are not as hard as you think
Server
session
session
session
session
Wednesday, March 7, 2012
73. Stateful servers are not as hard as you think
Server
session
session
session
session
S3
Wednesday, March 7, 2012
74. Stateful servers are not as hard as you think
Server
session
session
session
session
S3
Wednesday, March 7, 2012
75. Stateful servers are not as hard as you think
Server
session
session
session
session
S3
Wednesday, March 7, 2012
76. Stateful servers are not as hard as you think
Server
session
session
session
session
S3
Wednesday, March 7, 2012
77. Stateful servers are not as hard as you think
Server
session
session
session
session
S3
Wednesday, March 7, 2012
78. Stateful servers are not as hard as you think
Server
session
session
session
session
S3
Wednesday, March 7, 2012
79. Stateful servers are not as hard as you think
Server Server Server
session
session session
session session
session
session
session session
session session
session
S3
Wednesday, March 7, 2012
80. With stateful server the DB is less used
Ruby Stateless Erlang Stateful
30,000
22,500
15,000
7,500
0
database operations / sec
Wednesday, March 7, 2012
81. With stateful server the DB is less used
Ruby Stateless Erlang Stateful
30,000
22,500
15,000
700
7,500
0
database operations / sec
Wednesday, March 7, 2012
82. Deploying with a stateful server
In order to bring up a new version
Wednesday, March 7, 2012
83. Deploying with a stateful server
In order to bring up a new version
Just deploy it
Hot code replacement is great!
Wednesday, March 7, 2012
84. There are even more advantages
Faster than Ruby (5,000 rps / node)
-‐ CPU bound
Wednesday, March 7, 2012
85. There are even more advantages
Faster than Ruby (5,000 rps / node)
-‐ CPU bound
Very few SPOFs
-‐ ... and those are easy to recover
Wednesday, March 7, 2012
86. There are even more advantages
Faster than Ruby (5,000 rps / node)
-‐ CPU bound
Very few SPOFs
-‐ ... and those are easy to recover
TransacEonal logic
-‐ Invariants instead of explicit error handling
Wednesday, March 7, 2012
100. Architecture EvoluEon at Wooga
The Start: Ruby
The Next Step: Erlang
Best of Two Worlds
Company Values
Wednesday, March 7, 2012
101. Aug 2011: 4th team wanted both
Erlang is great
Concurrency, robustness
Great for opera0on
Wednesday, March 7, 2012
102. Aug 2011: 4th team wanted both
Erlang is great
Concurrency, robustness
Great for opera0on
Ruby is great
Concise, expressive, testable
Great for development
Wednesday, March 7, 2012
103. Aug 2011: 4th team wanted both
Erlang is great
Concurrency, robustness
Great for opera0on
Ruby is great
Concise, expressive, testable
Great for development
Wednesday, March 7, 2012
104. Aug 2011: 4th team wanted both
Erlang is great
Concurrency, robustness
Great for opera0on
Ruby is great
Concise, expressive, testable
Great for development
Wednesday, March 7, 2012
105. EvoluEon IV: The best out of two worlds
Aug 2011
Oct 2010
Jan 2010
Oct 2009
Wednesday, March 7, 2012
106. The basic setup looks exactly like before
Server Server Server
session
session session
session session
session
session
session session
session session
session
S3
Wednesday, March 7, 2012
117. Example model in Ruby
Easily unit testable
Wednesday, March 7, 2012
118. Example model in Ruby
Easily unit testable
Minimal amount of code
Wednesday, March 7, 2012
119. Bringing 2 worlds together
Server
session
session
...
session
Wednesday, March 7, 2012
120. Bringing 2 worlds together
Server
session
sender
session
...
session
Wednesday, March 7, 2012
121. Bringing 2 worlds together
Server Worker
session
sender Worker
session
Worker
...
Worker
session
Worker
Wednesday, March 7, 2012
122. Bringing 2 worlds together
Server Worker
session
sender Worker
session
Worker
...
Worker
receiver
session
Worker
Wednesday, March 7, 2012
123. Bringing 2 worlds together
Server Worker
session
sender Worker
session
Worker
...
Worker
receiver
session
Worker
Wednesday, March 7, 2012
124. Bringing 2 worlds together
Server Worker
session
sender Worker
session
Worker
...
Worker
receiver
session
Worker
Wednesday, March 7, 2012
125. Game state
Game state is split in mulEple parts
user, map, fruit_trees etc.
Wednesday, March 7, 2012
126. Game state
Game state is split in mulEple parts
user, map, fruit_trees etc.
Erlang does not care about content
Serialized Ruby objects
Wednesday, March 7, 2012
127. Game state
Game state is split in mulEple parts
user, map, fruit_trees etc.
Erlang does not care about content
Serialized Ruby objects
Erlang does know mapping of state parts to URLs
Mapping provided by Ruby on startup
Wednesday, March 7, 2012
130. Looking back at the game acEon
Mapping of state parts to game acEons
Wednesday, March 7, 2012
131. Looking back at the game acEon
Mapping of state parts to game acEons
Worker knows mapping
Wednesday, March 7, 2012
132. Looking back at the game acEon
Mapping of state parts to game acEons
Worker knows mapping
Worker pushes mapping to Erlang on startup
Wednesday, March 7, 2012
133. Looking back at the game acEon
Mapping of state parts to game acEons
Worker knows mapping
Worker pushes mapping to Erlang on startup
Erlang can query mapping if needed
Wednesday, March 7, 2012
134. NICE!
http://www.flickr.com/photos/aigle_dore/
Wednesday, March 7, 2012
135. Architecture EvoluEon at Wooga
The Start: Ruby
The Next Step: Erlang
Best of Two Worlds
Company Values
Wednesday, March 7, 2012
136. Each new game brought us innovaEon
Aug 2011
Oct 2010
Jan 2010
Oct 2009
Wednesday, March 7, 2012
137. We’ve learned to value
Small teams
over
big teams
Wednesday, March 7, 2012
138. We’ve learned to value
Collaboration
over
competition
Wednesday, March 7, 2012
139. We’ve learned to value
Generalists
over
specialists
Wednesday, March 7, 2012
140. We’ve learned to value
Effort reduction
over
cost reduction
Wednesday, March 7, 2012
141. We’ve learned to value
Innovation
over
risk mitigation
Wednesday, March 7, 2012
142. A good value system
We’ve learned to value
Small teams over Big teams
CollaboraEon over Compe00on
Generalists over Specialists
Effort reducEon over Cost reduc0on
InnovaEon over Risk mi0ga0on
Wednesday, March 7, 2012