This presentation will explain how Spil Games has grown in a short time from an internet startup to a global online gaming company and how we currently are building a global cross datacenter storage solution with MySQL as its backend.
The first part will contain a short summary of where we started with our database engineering department (look ahead at most one week in time), to a more professionalized department (look ahead and plan three to four months in time) to currently growing out of the startup phase (look ahead and plan more than one year in time). This will be illustrated with some examples of the growing pains we encountered with scaling, replication and high availability and leading up to the conclusion that we need to acknowledge our problems and shortcomings to actually be able to overcome them.
The second part of the presentation will contain a comparison of our old architecture against the new architecture. In this new architecture we take into account that failure of a complete datacenter is certain to occur sometime and strive to give our users the best possible experience, even in worst case when data is inaccessible. We also introduce asynchronous calls which enable us to fire and forget most of our writes. The architecture is being built with MySQL, handler sockets, Erlang and Memcache as its building blocks.
4. Facts
• Company founded in 2001
• 350+ employees world wide
• 170M unique visitors per month
• 100K unique visitors per month on spilgames.com
4
5. Geographic Reach
170 Million Monthly Active Users(*)
• Over 40 localized portals in 19 languages
• Focus on casual and social games
• 170M MAU per month (30M YoY growth)
• Over 40M registered users
Source: (*) Google Analytics, December 2011
5
15. Startup
• Databases maintained by Systems Engineering
• No focus on performance, structure or backups
• Looking only one or two weeks into the future
15
16. Lessons
• Multiple migrations to new hardware
• Ping-pong on Master-Master setups
• Lack of insight into performance issues
Read+write
16
17. Professionalize
• Plan ahead up to three months
• Improve database platform
• Reduce number of repetitive tasks
• Write them down step by step (wiki)
• Automate where possible
• Improve monitoring
• A single monitoring system is not enough!
• Forecast growth
• Week / Month / Year
• Look back and evaluate!
• Extend department
17
18. LDAP isn’t suitable for the web
• Scaling the LDAP platform
• LDAP replaced by MySQL based solution (with help
from Percona)
DMM MMM
18
19. Cloning
• LVM snapshot method
• took 4 hours on average with manual intervention
• Innobackupex + netcat + tar + script = quick cloning
• Takes about 1 hour per 100GB
• Foolproof
• Can be run on active masters (if necessary)
19
20. Improve monitoring
• Different monitoring systems give different insights
• Different angles/metrics/purposes
• Early problem detection
• Signal abnormal use which could cause outage
20
21. Growing pains
• Uneven growth:
• Active master handling all write requests
• Vertical scaling
• Write only
• More writes than reads
• SOA problems
• Connection spawning
• Open file descriptors
21
22. Outgrowing our startup phase
• Anticipate more than one year in advance
• Acknowledge shortcomings/problems, look for
solutions or alternatives
• Don't commit to one single solution!
• Be flexible!
• Plan for capacity per instance, not for growth alone!
• Start thinking globally!
22
25. Functional sharding
• Natural growth
• Grown out of necessity for more functionality
• Adding functionality means more interaction
• Separation of database function
• Profiles
• Highscores
• Comments
• User Generated Content
• etc
25
26. Advantages
• KISS
• Problem isolation
Disadvantages
• Uneven growth
• Difference in query patterns
• No data consistency
• No clear ownership of data
• Capacity planning on total number of reads/writes
• Horizontal scaling is difficult
26
28. Bucket model
• What is the bucket model?
• It is an abstraction layer between the database and
the datamodel
• Each record has one unique owner attribute (GID)
• The GID (Global IDentifier) identifies different data
types
• Different buckets per function
• Attributes contain record data
• Attributes do not have to correspond to schema
28
29. Advantages
• Flexibility
• Database backend independent
• Seamless schema changes and upgrades
• Sharded on both functional and GID level
• Even distribution of queries possible
• Capacity planning on number and type of entities
• Asynchronous writes possible
• Transparent data migration
Disadvantages
• Harder to find data
• At least two lookups needed!
• Datawarehousing needs a different approach
29
30. How do GIDs work?
• Globally sharded on GID
• (local) GID Lookup
GID
lookup
Persis
stor
Shard 1 Shard 2
30
32. Bucket mapping and migration
Current functional
shards
LEGACY
Legacy API
adapter
SPAPI
Read only
New GID based
New
SSP shards
Application
Read + write
32
33. Master-Master Sharding
• Each cluster of two masters will contain two shards
• Data is written interleaved
• HA for both shards
• No warmup needed
• Both masters active and “warmed up”
• Slave added for backups and Datawarehouse
SSP
Shard 1 Shard 2
33
34. How are we implementing this SSP?
• Erlang cluster with many workers
• Every GID has its own worker process
• (Inter)cluster communication
• (Near) linear scalability
34
35. Advantages
• Erlang node caching
• Multiple backend connectors
• MySQL library
• Handlersockets
• Any other connectors if needed
• Connection pooling
Disadvantages
• NOT SEXY? ( http://spil.com/notsexy )
35
Welcome everybody!My name is Art van Scheppingen and I’m from Spil Games from the Netherlands.I’m the head of the Database Engineering department.Not many of you have heard from Spil Games before, but I hope at the end of my presentation you all will have a good impression what this company is all about.
So what is my presentation all about?It will consist of three major parts:First I’m going to give you a small introduction on what Spil Games is doingIn the second part I will tell you about the growing pains we faced in my departmentIn the third part I will tell you a bit more on our exciting new project!At the end of the presentation there will be room for questions.
So who are we? Who is Spil Games?It would be best by telling you some facts about our company.
We do about 170M unique visitors per monthBut our corporate website spilgames.com receives about 100 thousand unique visitors per month, so that may be the reason our name is not widely known.
Where do these 170M users come from?World wide we have over 40 localized portals in 19 languagesOn these portals we mainly have casual and social gamesApart from the 170 million users we also have 40 million registered users who create a whole lot of interactions to our databases. Fortunately it is not all direct2db!So how do we keep that 170 million users interested?
We divided our portals into three major brands: Girls, Teens and FamilyOn the left you can see our Girls brand: it is pink, targeted at Girls in the age of 8 to 12 featuring mainly dress-up and cooking games. 60% of the girls between 8 and 12 in the US visit GGG.comIn the middle you see our Teens brand which is targeted at children in the age between 10 and 16. The games are targeted at this age groupOn the right you see our Family brand which is targeted at Families, mothers and children playing together. It therefore has the largest variety in games.The games on these portals are created in our own inhouse game studio, or licensed from partners or made together in a partnership. I will show some examples of the games featured on our portals
Our most popular social game in the United States is Pet Party were you can keep your own virtual pet.We also have other social games like Forest Story where you can run your own tribeOr racing games like Lose the Heat where you can race against others and try to lose the policeAn ever popular title is Uphill Rush which already received its third sequelOur most popular game ever is: Bubble Shooter. We have been featuring this game since the beginning.So we have inhouse game studios who create our own games, partnerships with social gaming studios and 1500 licensed games so far!
Our most popular social game in the United States is Pet Party were you can keep your own virtual pet.We also have other social games like Forest Story where you can run your own tribeOr racing games like Lose the Heat where you can race against others and try to lose the policeAn ever popular title is Uphill Rush which already received its third sequelOur most popular game ever is: Bubble Shooter. We have been featuring this game since the beginning.So we have inhouse game studios who create our own games, partnerships with social gaming studios and 1500 licensed games so far!
Our most popular social game in the United States is Pet Party were you can keep your own virtual pet.We also have other social games like Forest Story where you can run your own tribeOr racing games like Lose the Heat where you can race against others and try to lose the policeAn ever popular title is Uphill Rush which already received its third sequelOur most popular game ever is: Bubble Shooter. We have been featuring this game since the beginning.So we have inhouse game studios who create our own games, partnerships with social gaming studios and 1500 licensed games so far!
Our most popular social game in the United States is Pet Party were you can keep your own virtual pet.We also have other social games like Forest Story where you can run your own tribeOr racing games like Lose the Heat where you can race against others and try to lose the policeAn ever popular title is Uphill Rush which already received its third sequelOur most popular game ever is: Bubble Shooter. We have been featuring this game since the beginning.So we have inhouse game studios who create our own games, partnerships with social gaming studios and 1500 licensed games so far!
Our most popular social game in the United States is Pet Party were you can keep your own virtual pet.We also have other social games like Forest Story where you can run your own tribeOr racing games like Lose the Heat where you can race against others and try to lose the policeAn ever popular title is Uphill Rush which already received its third sequelOur most popular game ever is: Bubble Shooter. We have been featuring this game since the beginning.So we have inhouse game studios who create our own games, partnerships with social gaming studios and 1500 licensed games so far!
Our most popular social game in the United States is Pet Party were you can keep your own virtual pet.We also have other social games like Forest Story where you can run your own tribeOr racing games like Lose the Heat where you can race against others and try to lose the policeAn ever popular title is Uphill Rush which already received its third sequelOur most popular game ever is: Bubble Shooter. We have been featuring this game since the beginning.So we have inhouse game studios who create our own games, partnerships with social gaming studios and 1500 licensed games so far!
This graph shows the overall growth of Spil Games.The last row shows increasing number of employees over the years. The middle shows the number of monthly active users which parallels the growth iin the number of employees and databases. Our growth has been quite consistent over the past years: on all sides we have more than 30% YoY growthThis rate of growth makes it inevitable that the number of database servers will also increase.
Stabilizing the platform meant upgrading from MySQL 5.0 to Percona 5.1, getting backups in orderForecasting is nothing compared to the talk Baron Schwartz presented yesterday
LDAP was first designed to do multiple master cluster. What seemed to be a good idea turned out to be your worst nightmare.389 directory server can only handle 2 replication threads max. Above that it becomes unstable in response times and queues. Thanks to our own DMM implementation machines were kicked out of the pool even though they were perfectly fine.Tree structure design to solve replication problem, but lag and SPOF was the downsideSetup of instance slow: 30 hours for 7M to 8M profilesSharding was not an option as there was a unique on username and email address“Hardware is cheap”, but investing 40k is a pretty bad deal.LDAP replacement was going from 11 servers to 2 serversEventually we could have done it ourselves, but politically it could have taken ages. Consultancy can solve your problem then.
Functionally shard a generic database machine and split off one (or more) functionalities required a lot of handworkAlso (terribly) out of sync machines were eating up a lot of timeLive migrations to new hardware were a pain
If you have monitoring from all angles you can detect abnormal usage earlier. These problems will get worse over time and might even bring down your infrastructure at worse timing ever!Also application monitoring tools, like Newrelic, give insight in code deployments and database metrics from the application point of view
Hyscor uses 1 active master and 3 read slavesConnections can spawn very quickly within a SOA architectureFeedar doing persistent write of Memcache within MySQL and never select any data from MySQL
Why start thinking global? We grow so fast we can not keep all databases located in one datacenter.Also we want to bring the data as close as possible to the end user. Why? Imagine services like ‘search as you type’ that suffer from latency.
After servingSpil Games for many years our old architecture on the left is getting overcomplex. Therefore we are now building our much more structured new architecture on which we hope to rely upon in the upcoming years
Horizontal scaling: everything of a function in one database, so sharding on, for instance uid, requires a large overhaul of all existing service
So what does our new model look like then? We call it the Spil Storage platform and it looks like this in overview. As you can see it consists of three layers and I will tell you more about those layers in detail. Our most important component is the bucket model.
How does this bucket model work?This graph shows you the three layers: the owner router will make sure you get to the correct data owner pipeline. This is important because to create atomicity we need a strict separation of ownership of data. Within this pipeline the bucket can be implemented in many different ways: the bucker may be a persistent bucket, but it could also be a ram only bucket. Then the bucket itself will contain the cache between the call and the data storage layer at the bottom. This data storage layer contains the individual shards, in this case on a MySQL InnoDB platform but it could very well be any other storage platform.