3. Introduction (of me) - game developper from kanagawa - food fighter in Tsukuba vs Ran Ran, Yume-ya, Claret, etc...
4. problem - difficulty to develop/start up online game (infrastructure cost, development difficulty) - it loses variety of online games ecosystem
5. Project purpose Provide online game developping environment which is: Coding - easy and fun Cost - cheap Robustness – available Scalability – available make total cost for starting online game so cheap and provide easy and fun developping
6. Related works : project darkstar (a.k.a sun grid server) Open source online game server framework written in Java by sun micro systems Coding - developper need to learn middle level Java programming knowledge Cost – high. developper need to prepare their own infrastructure Robustness , Scalability – it suppose to be available but it never checked by actual services.
7. Related works : BigWorld Proprietary server framework by BigWorld technology. Coding – developper only need to learn python to develop. easy coding but easily makes severe problem (because easily generate frequent access to database unconsiously) Cost – high. user need to prepare their own infrastructure Robustness , Scalability – it suppose to be available but actually NOT. (it has hanged up many times with actual environment.)
8. Problem of previous solution - Coding : still programming difficulty exists (middle level programming knowledge or big side effect) - Cost : lack of the view of decreasing cost for infrastructure (no framework can handle multiple application, so none of these cannot be PaaS)
9. lua - lightweight, reflective, imperative and (possibly) functional programming languages mainly designed for embedding to C program - powerful reflection feature called metatable (looks like C++ operator overload) - flexible feature called environment table to provide namespace - all data structure is described by one data type called table (a kind of associative array) (see wikipedia for more feature ;-))
10. terms(1) - servant (servant node) : the node which do actuall job in distribute computing system. Usually the word 'worker' may more familiar, but in this slide, the word 'worker' used for the meaning of worker thread running in each node, so use 'servant' as same meaning. - VM(luaVM) : lua byte code interpreter - fiber : thread (in lua, called coroutine) which cooperatively executes 1 lua function call. It suspends itself by explicit yielding - KVS : distributed key value store. Especially focus on master-servant type key value store like kumo-fs (made in Tsukuba :P)
11. terms(2) Object : KVS record which is given lua script binding so that developpers can access its data from their program. Object behave as lua table in script. Object ID : pfm generate KVS record by itself, so need to generate unique key by its own. Such a self-assigned unique key. Object method : lua function object(function is first class object in lua) which relate with object as its table element.
12. pfm - Programming framework for script language lua on distributed key-value store(KVS) (written in C/C++). - Developper can describe interaction between records on KVS by lua scripting with automatic inter-luaVM asynchronous RPC. - Can handle multiple application on it even only 1 node available
14. v = object:func(arg1, arg2, ..., argN) v = object.func(object,arg1, arg2, ..., argN) v = object:func(arg1, arg2, ..., argN) v = object:func(arg1, arg2, ..., argN) v = object[“func“] (object,arg1, arg2, ..., argN) v = object[“func“] (object,arg1, arg2, ..., argN) func, object,arg1, arg2, ..., argN is sent to remote fiber yields execution to another fiber v = object[“func“](object,arg1, arg2, ..., argN) Modify [...] behavior if object is stored in remote node object[“func”] is not actual value but rpc context in this case Modify (...) behavior Syntax sugar Syntax sugar RPC reply back and fiber resumed
15. function player:func(target) local n = target:foo() local r = target:bar(n) return r end function player:foo() return self.data end function player:bar(n) return self.item.baz(n) end function item:baz(n) return self.data + n end Host 1 Host 2 Host 3 Fiber1 (player: func) Fiber3 (item: baz) Fiber2 (target: foo,bar) target:foo() Fiber1 resume fiber2(foo) start return self.data target:bar(n) fiber2(bar) start (fiber2(foo), fiber2(bar) may different) self.item.baz(n) fiber2(foo) start fiber3(baz) start return self.data+n return self.item.baz(n) local r = target:bar(n) return r Fiber2 resume
16. function func() local v = global_variable return v end Real global table Environment table Each function can have own table variable which is called environment table that can replace Global table (global namespace) Switch namespace so that same name but different body Environment table All application which is hosted by pfm luaVM has its own environment table. And when each fiber created, pfm attach environment table whose owner application is dispatch this rpc call. Thus pfm luaVM can support multiple application with 1 node.
24. pfm Coding – easy and fun. lua is familiar to game developper and inter-VM RPC hides every difficulty of multi thread / distribute computing system programming from them. Cost - cheap. (pfm can run as PaaS). User dont need to prepare their infrastructure. Robustness , Scalability – because it based on KVS. It supposes to failover, and scale to some level.
25. Login/logout on pfm - usually KVS is used as backend of network service, so all node trusted. - but pfm. Servant node of KVS is also frontend of service, so many untrust node connects to servant node. - so need to authenticate each connected node.
27. 1. Send login w/account, world ID 2. Forward to master node (duplicate login check)
28. 1. Send login w/account, world ID 3. If no error, master node returns object ID. If first login, newly assigned 2. Forward to master node (duplicate login check)
29. 1. Send login w/account, world ID 3. If no error, master node returns object ID. If first login, newly assigned 2. Forward to master node (duplicate login check) 4. Actual authentication Performed each servant node For scaling
30. 1. Send login w/account, world ID 2. Forward to master node (duplicate login check) 5. Each servant node knows where object Exists from object ID and consistent hash, Then request load/create query to the node. 3. If no error, master node Returns object ID. If first login, newly assigned 4. Actual authentication Performed each servant node For scaling
31. 1. Send login w/account, world ID 2. Forward to master node (duplicate login check) 5. Each servant node knows where object Exists from object ID and consistent hash, Then request load/create query to the node. 3. If no error, master node Returns object ID. If first login, newly assigned 4. Actual authentication Performed each servant node For scaling 6. Return load/create result
32. 1. Send login w/account, world ID 2. Forward to master node (duplicate login check) 5. Each servant node knows where object Exists from object ID and consistent hash, Then request load/create query to the node. 3. If no error, master node Returns object ID. If first login, newly assigned 4. Actual authentication Performed each servant node For scaling 6. Return load/create result Loaded player object 7. if load/create success. The node Which client access at first retains copy of the object. And client access to pfm Through rpc request to this object only. Such a object that relate with client node, Called 'player object'
33. Generate object ID - it based on MAC address (6byte) + auto increment value of each node (6byte) - during initialization, generator load current auto increment value from file, and write 'fault flag' - when finalized normally, generator remove 'fault flag' from file. - if during initialization, fault flag is exist, generator thinks abnormal shutdown may happen, so add some big value (1M) to auto increment value.
34. Name convention - in Pfm, user can change behavior of rpc by specifying function name to call with obeying some convention. - like a RoR(Ruby on Rails) activerecord guess the function/variable name according to record relationship.
35. Convention #1: _{function name} e.g. object:_protected_routine() Host 1 Host 2 Host 3 Fiber1 (not trusted) e.g. client node Fiber3 (trusted) e.g. servant node Fiber2 Servant node which rpc target object is exist object:_procected_routine() NG : because Host 1 is Not trusted. (client node may cheater) OK : because Host 3 is trusted Node (servant node is prepared by service provider, so trusted) Return error Rpc call which procedure name starts with '_' only can call from trusted node. (Currently only client node is untrust.) Return result
36. Convention #2: notify_{function name} e.g. object:notify_chat(msg) Host 1 Host 2 Fiber1 Fiber2 object: notify_ chat (msg) Rpc call which procedure name start with 'notify_' , autometically understood by System as trying to call rpc which name is removal of 'notify_' from original procedure name and does not wait reply. Call object: chat (msg) does not wait reply (execution continues) Reply is back, but it will ignore
37. Convention #3: client_{function name} e.g. object:client_open_ui(url) Host 1 Host 2 Fiber1 Fiber2 object: client_ open_ui (msg) Rpc call which procedure name start with 'client_' , autometically understood by System as trying to call client node rpc which name is removal of 'client_' from original procedure name. Host 3 Fiber3( client node ) If target object is Player object (attach with Session), Forward open_ui (msg) To client Forward Reply of open_ui (msg)
38. Convention #4: combination Conventions can be used with combination. eg) notify_ client_ open_ui Try to call client procedure open_ui and does not wait reply
39. Convention #5: user-defined convention Host 1 Host 2 Fiber1 Fiber2 Call object : funcname (...) User define convention such as convention #1 - #3 by defining lua function Which receive target object and funcname and return new function which executed with fiber. Returns new function (if some convention rule enabled) or, Return original function
40. Convention #5: user-defined convention example of convention checker. ( if function start with 'broadcast_' , then call rpc which procedure name is removal of 'broadcast_' of original procedure name for all member variable of target object ) function hook_check_convention(procname,obj) local s.e = string.find(procname,“broadcast_”) If s == 0 then local funcbody = string.sub(procname,e) return local function _(object,...) local k,v = pfm.next(object) while k do If pfm.typeof(v) then v[funcbody](...) end k,v = pfm.next(object,k) end end end return obj[procname] end
41. Future of pfm - robustness (data replication, failover) - support rpc through http (for working cooperatively with web services) - support cooperation with Unity3D (game developping IDE which scripting is provided by lua)
42. Implementation plan: replication fiber(related with object) Update cache/ Refer data Fiber execution finish Implement fiber local cache. Once read/write to object data, Actual object data will not change but Fiber local cache store current value of Object data. After fiber execution finished, Fiber local cache update actual object value At once. It may cause data update conflict. But for online game, some of data changes can ignore its order, so basically update data As cache is, and if some data change need to update master data immediately, want to Prepare such a programming convention like a volatile keyword in C/C++
43. terms(3) Message ID : for asynchronous RPC, to distinguish which reply is for which RPC, pfm send RPC command with round-robin increment ID. It called Message ID.
44. Implementation plan: RPC failover RPC RPC replication object Player object Active connection for this rpc Stand-by connection for this rpc Each RPC packet which send ServantA,B -> Servant 1 is stored Sender node's memory until Servant1 reply back. Once rpc is processed, servant1 Update 'last processed message ID from servant A or B' with remote Address. It also send to servant2 (replicate host) with replication packet.
45. Implementation plan: RPC failover RPC RPC replication object Player object Active connection for this rpc Stand-by connection for this rpc After servant1 node fault, Servant A,B try to use servant2 As primary servant node. Then servant A,B re-send Unreplied packet from servant1 to servant2 Resend RPC Resend RPC receiving rpc packet which Resent from servant A,B, Servant2 compare message ID In resent packet with last processed message ID recorded in itself. If resent packet's message ID Is greater, it processed, otherwise discarded.
46. Implementation plan: RPC failover RPC RPC replication object Player object Active connection for this rpc Stand-by connection for this rpc After finish resend packet, Servant A,B send rpc packet To servant2 and servant2 start Replication to next node.
47. currently Pfm powered by - lua 5.1.4 & coco patch (for providing 'true' yield) & byte code portability patch by me - tokyocabinet 1.4.33 (fast DBM) - msgpack 0.4.1 (binary serialize format : implementation is specialized for streaming (record boundary unknown) serialize) - libconhash (consistent hash library which provide fast node search by red-black tree) - SFMT (SIMD oriented Fast Mersenne Twister) with multi-thread patch by me
48. Want to solve that - for 1 application, seems only about 1000 node at most can assign - lua fiber uses huge size memory (36K+/fiber) can reduce? - better msgpack implementation for pfm purpose - change name pfm to something else LOL (how about yue? It means lua in chinese and sounds like moe-character)
Developper : game developper who develop online game by using pfm.
Servant node : for computer science region, worker node may more familar term. But in this slide, I use 'worker' as OS thread on each servant node. So I want to use 'servant' as the meaning 'worker'. Thus, servant node means the kind of node in distribute computing system which provide actual service to client.