Streamlining Python Development: A Guide to a Modern Project Setup
Process Design and Polymorphism: Lessons Learnt from Development of Kai
1. Process Design and Polymorphism:
Lessons Learnt from Development of Kai
takemaru
08.11.20
1
2. Outline
Reviewing Dynamo and Kai
Kai: an Open Source Implementation of Amazon’s Dynamo
Features and Mechanism
Process Design for Better Performance
Based on Calling Sequence and Process State
Polymorphism in Actor Model
Two approaches to implement polymorphism
08.11.20
2
3. Dynamo: Features
3
Key, value store
Distributed hash table
High scalability
No master, peer-to-peer
Large scale cluster, maybe O(1K)
Fault tolerant
Even if an entire data center fails
Meets latency requirements in the case
08.11.20
get(“cart/joe”)
joe
joe
joe
4. Dynamo: Partitioning
08.11.20
4
Consistent Hashing
Nodes and keys are
positioned at their hash
values
MD5 (128bits)
Keys are stored in the
following N nodes
joe
Hash ring (N=3)
2^128
0
1
2
hash(node3)
hash(node1)
hash(node2)
hash(node5)
hash(node4)
hash(cart/joe)
hash(node6)
5. Dynamo: Membership
08.11.20
5
Gossip-based protocol
Spreads membership like a
rumor
Membership contains node list
and change history
Exchanges membership with a
node at random every second
Updates membership if more
recent one received
Advantages
Robust; no one can prevent a
rumor from spreading
Exponentially rapid spread
Membership change is spread
by the form of gossip
Detecting node failure
hash(node2)
node2 is down
node2 is down
node2 is down
node2 is down
6. Dynamo:
get/put Operations
08.11.20
6
Client
1. Sends a request any of
Dynamo node
The request is forwarded to
coordinator
Coordinator: one of nodes
associated with the key
Coordinator
1. Chooses N nodes by using
consistent hashing
2. Forwards a request to N
nodes
3. Waits responses from R or
W nodes, or timeouts
4. Checks replica versions if get
5. Sends a response to client
get/put operations for N,R,W = 3,2,2
Request
Response
Client
Coordinator
joe
joe
joe
7. Kai: Overview
08.11.20
7
Kai
Open source implementation of Amazon’s Dynamo
Named after my origin
OpenDynamo had been taken by a project not related to Amazon’s
Dynamo
Written in Erlang
memcache API
Found at http://sourceforge.net/projects/kai/
9. Two approaches to improve software
performance
08.11.20
9
To process it with less CPU resource
Solved by introducing better algorithms
e.g. Binary search is used instead of linear search
To use all CPU resources
Issues identified in multi-core environment
Solved by rearranging process-core mapping
e.g. Heavy process and light process run on multi-core
1st core
2nd core
boring…
heavy process
light process
10. Two approaches to improve software
performance
08.11.20
10
To process it with less CPU resource
Solved by introducing better algorithms
e.g. Binary search is used instead of linear search
To use all CPU resources
Issues identified in multi-core environment
Solved by rearranging process-core mapping
e.g. Heavy process and light process run on multi-core
1st core
2nd core
work! work!
heavy process
light process
Latter case will be discussed
11. Diagram Convention
08.11.20
11
Procedural programming
Procedures are called by a process
foo
bar
process foo
-module(foo).
-behaviour(gen_server).
ok(State) ->
{reply, bar:ok(), State}.
-module(bar).
ok() ->
ok.
process
ellipse
rectangle
not process
12. Diagram Convention, cont’d
08.11.20
12
Actor model
2 processes interact with each other
foo
bar
-module(bar).
-behaviour(gen_server).
ok(State) ->
{reply, ok, State}.
ok() ->
gen_server:call(?MODULE, ok).
process bar
process foo
process
ellipse
rectangle
not process
-module(foo).
-behaviour(gen_server).
ok(State) ->
{reply, bar:ok(), State}.
13. Design Rules in Erlang
08.11.20
13
Some of rules on process design:
“Assign exactly one parallel process to each true concurrent activity
in the system”
“Each process should only have one role”
from Program Development Using Erlang - Programming Rules and
Conventions
14. Processes in getting/putting data
08.11.20
14
Node 1
coordinator
Node 2
Client
memcache
rpc
rpc
For clarity, details of architecture are omitted.
store
data
coordinator
rpc
Choosing a node having
responsibility for requested key
15. Node 2
Node 1
Sequence in getting/putting data
08.11.20
15
memcache
coordinator
rpc
coordinator
store
Client
Blocked…
16. Node 2
Node 1
Sequence in getting/putting data, cont’d
08.11.20
16
memcache
coordinator
rpc
coordinator
store
Client
Another client
Blocked…
17. Node 2
Node 1
Sequence in getting/putting data, cont’d
08.11.20
17
memcache
coordinator
rpc
coordinator
store
Client
Another client
Accepted and…
Blocked!
18. Design Rules in Erlang, again
08.11.20
18
Another rule on process design:
“Use many processes”
Many processes are almost uniformly assigned to each processor by
statistical effect
from Chap.20 Programming Erlang
19. Processes in getting/putting data, again
08.11.20
19
Node 1
coordinator
Node 2
Client
rpc
store
data
coordinator
rpc
memcache
memcache
rpc
rpc
coordinator
coordinator
For clarity, details of architecture are omitted.
Spawning multiple coordinator processes
20. Node 2
Node 1
Sequence in getting/putting data, again
08.11.20
20
memcache
coordinator
rpc
coordinator
store
Client
Another client
Accepted and…
21. Node 2
Node 1
Sequence in getting/putting data, again
08.11.20
21
memcache
coordinator
rpc
coordinator
store
Client
Another client
Accepted and…
Spawned
It’s OK, but looks like overkill.
Concurrency is produced by memcache module.
Why it is reproduced by coordinator?
22. Design Rules in Erlang, in final
08.11.20
22
Another rule on process design:
“Don’t spawn stateless processes”
Called as procedures from concurrent processes
Introduced by me
23. Processes in getting/putting data, in final
08.11.20
23
Node 1
Node 2
Client
rpc
store
data
rpc
memcache
memcache
rpc
rpc
For clarity, details of architecture are omitted.
Not spawned
coordinator
coordinator
24. Node 2
Node 1
Sequence in getting/putting data, in final
08.11.20
24
memcache/coordinator
rpc
coordinator
store
Client
Another client
Accepted and processed
25. Lessons Learnt
08.11.20
25
Process design based on calling sequence and process state
Externally called module
Must be spawned to produce concurrency
Runs as multiple processes if needed
e.g.TCP listening process, timer process
Stateless module
No need to be spawned
e.g. coordinator of Kai
Stateful module
Should be spawned for state consistency
Runs as multiple processes if possible
If a single process, it must be a terminal one (never call other processes
synchrnously) to avoid blocking
e.g. database process, socket pool
26. Process Relationship in Kai
08.11.20
26
memcache
store
data
memcache
rpc
rpc
hash
members,
buckets
connection
version
coordinator
sockets
process
ellipse
rectangle
not process
state
current
version
vclock
config
configs
Relationships between stateful processes
and other modules are omitted.
27. Process Relationship in Kai, cont’d
08.11.20
27
store
data
rpc
rpc
hash
members,
buckets
connection
version
sockets
process
ellipse
rectangle
not process
membership
sync
with timer
vclock
current
version
config
configs
Relationships between stateful processes
and other modules are omitted.
28. Process Relationship in Kai, cont’d
08.11.20
28
“Lessons learnt” are almost satisfied
Externally called modules are spawned
As multiple processes if needed
e.g. kai_rpc, kai_memcache, kai_sync, kai_membership
Stateless modules are not spawned
e.g. kai_coordinator
Stateful modules are spawned
e.g. kai_config, kai_version, kai_store, kai_hash, kai_connection
However, some of them are NOT terminal ones
e.g. connection module calling config process, is potential bottle neck
“Lessons learnt” can point out potential bottle necks
Yes, connection module is just a thing!
29. Advanced Issues
08.11.20
29
More concurrency may be introduced if needed
Design rules from “Lessons Learnt” can be applied locally
e.g. coordinator produces N concurrency for asynchronous
calls
Referred to Web application servers in MVC model
Web application servers
Process Design from
“Lessons Learnt”
Concurrency is
produced by
Web servers, e.g.Apache
Externally called modules
Application is
controlled by
Controller of MVC
Stateless modules
State is managed by
Model of MVC
Stateful modules
30. Advanced Issues, cont’d
08.11.20
30
Is blocking never occurred in pure functional
programming?
In Kai, a process receiving requests waits for data to be replied
In pure functional programming, another process handles data
to be replied?
Not straightforward for me…
Blocking
Req.
Req. and socket
Res.
Res. and socket
Kai
Pure functional programming model?
32. What’s Polymorphism?
08.11.20
32
“a programming language feature that allows values of
different data types to be handled using a uniform
interface”
from Wikipedia
33. Polymorphism in Java
08.11.20
33
Interface
Implementation class
interface Animal {
void bark();
}
class Dog implements Animal {
public void bark() {
System.out.println(“Bow wow”);
}
}
Including no implementation
34. Polymorphism in Java, cont’d
08.11.20
34
Abstract class
Concrete class
abstract class Animal {
public void bark() {
System.out.println(this.yap());
}
abstract String yap();
}
class Dog extends Animal {
String yap() {
return “Bow wow”;
}
}
Including some implementation
35. Polymorphism Inspired by Interface
08.11.20
35
Interface module
Initializes implementation module with a name
Calls the process with the name
Implementation module
Spawns a process and registers it as the given name
Implements actual logics, which are called from interface
37. Polymorphism Inspired by Interface, cont’d
08.11.20
37
How to use
animal:start_link(dog),
animal:bark(). % Bow wow
38. Polymorphism Inspired by Interface, cont’d
08.11.20
38
Example in Kai
Provides two types of local storage with a single interface
Interface module
kai_store
Implementation modules
kai_store_ets, uses ets, memory storage
kai_store_dets, uses dets, disk storage
See actual codes in detail
39. Polymorphism Inspired by Abstract Class
08.11.20
39
Abstract module
Defines abstract functions by using behavior mechanism
Spawns a process and stores a name of concrete module
Implements base logics
Calls the process
Concrete module
Implements callbacks, which are called from the abstract
module
41. Polymorphism Inspired by Abstract Class,
cont’d
08.11.20
41
How to use
Same as an example of interface
-module(dog).
-behaviour(animal).
yap() ->
"Bow wow”.
animal:start_link(dog),
animal:bark(). % Bow wow
42. Polymorphism Inspired by Abstract Class,
cont’d
08.11.20
42
Example in Kai
Provides two types of TCP listening processes with a single
interface
Abstract module (behavior)
kai_tcp_server
Concrete modules
kai_rpc, listens RPC calls from other Kai nodes
kai_memcache, listens requests from memcache clients
See actual codes in detail
43. Lessons Learnt
08.11.20
43
Two approaches to implement polymorphism
Inspired by interface
Simple
Not efficient
Actual logics have to be implemented in each child
Inspired by abstract class
Erlang-way
Efficient
Abstract class can be shared by children
45. Outline
Reviewing Dynamo and Kai
Kai: an Open Source Implementation of Amazon’s Dynamo
Features and Mechanism
Process Design for Better Performance
Process Design Based on Calling Sequence and Process State
Polymorphism in Actor Model
Two approaches to implement polymorphism
08.11.20
45
46. Kai: Roadmap
08.11.20
46
1. Initial implementation (May, 2008)
1,000 L
2. Current status
2,200 L
Following tickets done
Module
Task
kai_coordinator
Requests from clients will be routed to coordinators
kai_version
Vector clocks
kai_store
Persistent storage
kai_rpc, kai_memcache
Process pool
kai_connection
Connection pool