In my talk, I will talk about the internal structure of the game server, which support our games of the Vorodezh ITT division. What can be learned from the talk:
- what technologies did we choose for the development of game servers (spoiler: Vert.X Hazelcast, Postgres, Kafka, Prometheus + Grafana, Consul, Photon Cloud);
- how we use them (spoiler: not all for their intended purpose);
- how we install updates;
- some interesting mistakes that we caught while working with Vert.X and Hazelcast.
6. IN THE BEGINNING THERE WAS
SILENCE AND DARKNESS
• 2007 year
• Pure Java SE
• Proprietary protocol
• Proprietary API
• Hibernate, GWT, Trove collections,
Protobuf …
7. INTERESTING, BUT
NOT EFFECTIVE
• Code generation, change byte-code on
a fly
• No documentation
• No best-practice examples
• Stack Overflow Driven Development
• Need functionality - do it
• Hard to support
8. WHAT ARE WE USE?
• PostgreSQL
• Photon Cloud
• Kafka
• Hazelcast
• Vert.X
• Prometheus + Grafana
10. PHOTON CLOUD
• «Rooms» for the game
• Datacenters in X point on all
continents (penguins offended)
• Dynamic hardware reservation
• Made for games, but useful not only
for games(f.e. – text/sound/video
chat)
14. HAZELCAST
• Integrate in Vert.X
• Cluster from the box
• Load balancing
• Horizontal scaling
• Add/failure node support
• Distributed data structures
• Queries
In memory Data Grid
16. VERT.X
• Open-source framework for
building distributed event-driven
applications based on JVM
• Verticle – single thread service
• Distributed message bus
• Asynchrony
• Parallelism
• Multilanguage
18. OPERATION EXECUTOR
• Define entities for work (and
specify lock level to access)
• Take required locks
• Retrieve required entities from
Hazelcast
• Execute operation code
• Send changed entities to
Hazelcast
• Remove locks
• Run callback when finish operation
19. DEADLOCK WHILE
BLOCKING
• Operation 1 take lock on A
resource
• Operation 2 try to take lock on A
resource and wait
• Operation 1 try to take lock on B
resource
• … and unfortunately deadlock
22. AUTHORS REASONS
The original reason executeBlocking defaults to
ordered=true (it's more general than just hazelcast usage)
is something like this:
Imagine you have a web application and request #1 comes
in - this requests inserts some data into a database (e.g.
add to shopping basket)
Immediately after this requests #2 comes in - this selects
the same data from the database (e.g. view shopping
basket)
23. VERT.X STRESS TESTING
• We developed test server, that performs one operation
• We developed client, that sends one operation to the
server.
• Testing performance in different configurations
24. • We developed test server, that performs one operation
• We developed client, that sends one operation to the
server.
• Testing performance in different configurations
• Everything is very bad…
VERT.X STRESS TESTING
25. • We developed test server, that performs one operation
• We developed client, that sends one operation to the
server.
• Testing performance in different configurations
• Everything is very bad…
• DB??? Remove interaction with database
VERT.X STRESS TESTING
26. • We developed test server, that performs one operation
• We developed client, that sends one operation to the
server.
• Testing performance in different configurations
• Everything is very bad…
• DB??? Remove interaction with database
• Everything is very bad …
VERT.X STRESS TESTING
27. • We developed test server, that performs one operation
• We developed client, that sends one operation to the
server.
• Testing performance in different configurations
• Everything is very bad…
• DB??? Remove interaction with database
• Everything is very bad …
• Continue to test performance in different configurations
VERT.X STRESS TESTING
28. • We developed test server, that performs one operation
• We developed client, that sends one operation to the
server.
• Testing performance in different configurations
• Everything is very bad…
• DB??? Remove interaction with database
• Everything is very bad …
• Continue to test performance in different configurations
• Everything is very bad … but sometimes good…
VERT.X STRESS TESTING
30. «RAISE» INSIDE VERT.X
WHEN CREATING A
CHANNEL
• Code take locks when user create new channel inside
Vert.X
• If you create many channels at same time – code wait
most of time inside locks
• We create one channels per user – in case of large
users number the creation of channels was very slow
• We change channels schema to one channel per
message type
• We fix channel creation in Vert.X
• Create pull request to work Vert.X + Prometheus
39. STRUCTURES IN
HAZLECAST
Inside game shard (player information)
• Player description
• Player items
• Player quests
• Player counters
• e.t.c.
PlayerI
D
1
2
3
40. СТРУКТУРЫ В
HAZLECAST
Общий для шардов (например -
турнир)
PlayerI
D
1
2
3
• Battle history
• Battle history
• Battle history
№
bucket
1
2
3
• Players
• Scores
• Attempts
PlayerID
№
bucket
1 1
2 1
3 2
Players
counter 3
41. REPEATABLE COUNTER
• We made tournament, it works fine for a month,
players are registered and playing.
42. REPEATABLE COUNTER
• We made tournament, it works fine for a month,
players are registered and playing.
• Ticket from support – users see x2 players inside
tournament buckets.
43. REPEATABLE COUNTER
• We made tournament, it works fine for a month,
players are registered and playing.
• Ticket from support – users see x2 players inside
tournament buckets.
• Fix counter
44. REPEATABLE COUNTER
• We made tournament, it works fine for a month,
players are registered and playing.
• Ticket from support – users see x2 players inside
tournament buckets.
• Fix counter
• Check code for errors – nothing suspect
45. REPEATABLE COUNTER
• We made tournament, it works fine for a month,
players are registered and playing.
• Ticket from support – users see x2 players inside
tournament buckets.
• Fix counter
• Check code for errors – nothing suspect
• Remember, that some times before we shutdown
old node.
46. REPEATABLE COUNTER
• We made tournament, it works fine for a month,
players are registered and playing.
• Ticket from support – users see x2 players inside
tournament buckets.
• Fix counter
• Check code for errors – nothing suspect
• Remember, that some times before we shutdown
old node.
• Read documentation…
47. REPEATABLE COUNTER
• IAtomicLong have only 2 copy
• When 2 copies lay in old nodes for shutdown, there
was a problem
• Shutdown node with kill -9 – bad idea!