3. R2D2 in a nutshell
Client
Server for
Resource
“foo”
Server for
Resource
Profile
Service
“foo”
Server for
Resource
“foo”
Server for
Resource
Inbox
Service
“foo”
Server for
Resource
“foo”
Server for
Resource
Ads
Service
“foo”
Send request to get profile?id=123
Zookeeper
• Listens to profile zookeeper node
• Get a list of servers’ URIs where profile are hosted
• Get notified if a server leaves or joins a cluster
• Choose one server to send the request to
???
Request
Servers
4. Agenda
R2D2 Architecture
How information is stored and organized in zookeeper
How R2D2 does load balancing and graceful degradation
Partitioning and sticky routing
Miscellaneous D2 use cases at LinkedIn:
- Redlining
- Cluster variants
Q&A
6. What is rest.li?
Open source Java REST framework. Go to http://rest.li
7. What is D2?
Primarily a name server and traffic router
The global “address book” is stored in zookeeper
We store the back-up in the local filesystem
Definitions:
D2 Cluster represents a collection of identical servers that host one or many D2 services
D2 Service represents a service
D2 Uri represents a server’s address and weight
8. How is D2 information organized and stored?
/ Root
/d2
/d2/clusters /d2/services /d2/uris
/d2/clusters/clusterA
/d2/clusters/clusterB
/d2/services/serviceA1
/d2/services/serviceA2
/d2/services/serviceB
Service
Properties:
-Cluster =
clusterA
-Load-balancer
configuration
-Degrader
configuration
-Strategy
configuration
-Etc.
Cluster
Properties:
-Partition
configuration
-Etc.
/d2/uris/clusterA
Uri
Properties:
-Machine URI
-Weight
/d2/uris/clusterB
/d2/uris/clusterB/ephemeralNode1
/d2/uris/clusterB/ephemeralNode2
9. 9
How is zookeeper initialized ?
Config file Zookeeper
/ Root
/d2
/d2/clusters /d2/services /d2/uris
/d2/clusters/clusterA
/d2/clusters/clusterB
/d2/clusters/clusterC
/d2/services/serviceA1
/d2/services/serviceA2
/d2/services/serviceA3
ServiceA1
Client
ClusterA
Server
/d2/uris/clusterA
/d2/uris/clusterA/ephemeralNode1
D2Config.java
10. D2 Load Balancer
Client-side load balancer
Client keeps track of the state
2 Strategies to use:
- Random
- Degrader
11. How does the degrader load balancer work?
Period 3
LOAD_BALANCE
Individual Server
stats:
Cluster total call
count:
0
Cluster average
latency:
Cluster average
2500 latency:
ms
0 ms
Cluster drop rate:
0.0
Server 1
Server 2
Client
Total Call Count: 0
Latency: 0 ms
Total Call Count: 0
Latency: 0 ms
100 points
100 points
PPeerriioodd 12
100
4900 ms
100
100 ms
61 points
CALL_DROPPING
3636.5 ms
67
133
3000 ms
0.2
LB Configuration:
Latency Low Water Mark:
500 ms
Latency High Water Mark:
2000 ms
Min Call Count: 10
Notice:
The number of points
don’t change because
we are in CALL_DROPPING
mode
12. How does the degrader recover from a bad state?
Server 1
Server 2
Period N
Client
LOAD_BALANCE
Individual Server
stats:
Cluster total call
count:
0
Cluster average
latency:
0 ms
Cluster drop rate:
1.0
1 points
1 point
Total Call Count: 0
Latency: 0 ms
Total Call Count: 0
Latency: 0 ms
CALL_DROPPING
2 2 points
Notice:
We’re in recovery mode
Because we choke all traffic
So we will try recovering
regardless of call stats
N+1
0.8
2
15
150 ms
20
200 ms
35
178.6 ms
37 points
37 points
3
50
200 50
100
200 ms
0.6
LB Configuration:
Latency Low Water Mark:
500 ms
Latency High Water Mark:
2000 ms
Min Call Count: 10
13. A few more extra details
Min call count is reduced depending on how degraded the state is
It’s not just latency, we also consider error rate and number of outstanding calls
We can use many types of latency:
- AVERAGE
- 90%
- 95%
- 99%
We can set different low/high water mark
for cluster vs for individual node
14. Call Dropping vs Load Balancing
Call Dropping Mode Load Balancing Mode
Affects the entire clusters Affects only individual machines in the
cluster
Purpose: graceful degradation Purpose: load balancing traffic
Drop Rate Points
Hints: Latency Hints: individual node latency, error
rate, #outstanding calls
15. Partitioning and Sticky Routing
D2 supports partitioning of clusters
- Range partitioning
- Hash partitioning (MD5 or Modulo)
- Use regex to extract key from URI
to determine where a request should go
Sticky routing within partition is also supported
- Use regex to extract key from URI (same
as above)
- Use consistent hash ring
16. Consistent Hash Ring
Integer.MAX_INT Integer.MIN_INT
|
100 0 -100
app1.foo.com
app2.foo.com
app3.foo.com
Request for “foo”
17. Miscellaneous D2 use cases
Redlining: Measure max capacity of server
Use real traffic
Don’t have to worry about mutable operations
Integer.MAX_INT Integer.MIN_INT
|
100 0 -100
app1.foo.com
app2.foo.com
app3.foo.com
18. Miscellaneous D2 use cases
What if there are different requirements from different clients?
Let’s say we have a service called profile.
- For clients who can only view profile, we want them to go to read-only cluster
- For clients who can edit profile, we want them to go to read-write cluster.
Use Cluster variant technique
Cluster variant allows changing D2 Service’s namespace to get around the restriction that
zookeeper node’s name must be unique.
19. Miscellaneous D2 use cases
/ Root
/d2
Request for
profile
/d2/clusters /d2/services /d2/uris
/d2/clusters/readonly
/d2/clusters/readwrite
/d2/services/profile
Service
Properties:
-Cluster =
readonly
/d2/uris/readonly
/d2/uris/readwrite
Request for
profile
/d2/profileClusterVariant
/d2/profileClusterVariant/profile
Service
Properties:
-Cluster =
readwrite
/d2/uris/readonly/ephemeralNode1
/d2/uris/readwrite/ephemeralNode1
readonly
Server
readwrite
Server
View Client Edit Client
20. Q&A
Questions?
Email me at: osumampouw@linkedin.com
Check out http://rest.li https://github.com/linkedin/rest.li for more info
We’re hiring!
Step back several years, LinkedIn has small code base. Small binary
Easy to scale up : Just add more servers
One binary becomes too big to be deployed in a single server -> Split into multiple binaries. Birth of specialized services and Service Oriented Architecture (SOA).
When a service wants to talk to another service, we have to wire in the address of the load balancer for that cluster
Now we have hundreds of services, manually wiring the address of load balancer of route is error prone and slow -> imagine as developer you have to ask around what are the ip address of the load balancer.
Load Balancer are expensive and introduces extra network hop.
Imagine you have a client
Machines can leave and join cluster at any given time. D2 has a server side and client side.
Zookeeper is a distributed service that is used to maintain the state of a system. So it’s pretty fault tolerant even if few servers inside zookeeper dies we’re still OK.
Zookeeper = similar to a file system that provides a way to publish/subscribe messages to znode.
Servers announce its address to zookeeper
Point: zookeeper is not involved in sending every request
Open source Java rest framework currently being used at LinkedIn.
This is how it works:
Application sits on top of rest.li layer
Rest.li sits on top or R2D2.
D2 finds the services that rest.li creates, load balances traffic from clients to servers and also provides graceful degradation.
R2 handles the request/response interaction between the server and clients. R2 is asynchronous and is implemented using netty/jetty.
R2D2 is independent of rest.li. D2 can be used outside of rest.li’s as a name resolvers and load balancer.
There are 3 different constructs that we use to store information for D2.
D2 Cluster comprised of identical nodes. No first class nodes or second class nodes. No master/slave. No ACL. D2 is ideal for a trusted middle tier layer (simple to understand)
Each D2 Uri will create a new client abstraction for sending traffic into
URI node is zookeeper ephemeral node. cluster and service node are zookeeper permanent node.
Point: cluster properties and service properties are rarely updated and almost static. So that’s why it’s permanent zookeeper node.
Some restrictions:
ZKFSUtil sets d2 config writer to write to /services, /clusters, /uris
/d2 path is configurable
Once the client listens, it keeps the global information inside its internal storage. So after it receives information it won’t need to contact zookeeper. Zookeeper publishes info to update D2Client internal state.
If ClusterA Server dies, zookeeper automatically removes the ephemeral node so ServiceA1 client will know that it can’t sent request to ClusterAServer
Imagine we have a client and a cluster that consists of 2 servers
The client keeps track call statistics to each server
We update statistics on a 5 seconds interval
Talk about initial state (min call is to reduce flapping)
We have 2 modes of operation: CALL_DROPPING and LOAD_BALANCING
CALL_DROPPING is we change cluster’s drop rate. LOAD_BALANCING is directing traffic to healthier machines.
Explain why there are 2 different modes. Because there are 2 types of problem that can affect a service. Cluster vs individual node
CALL_DROPPING is for cluster -> problem with downstream services
LOAD_BALANCING is for individual node -> problem with a particular server
Why call dropping mode affects the entire cluster? Because we don’t want to double penalize a bad client (reducing the number of points while increasing the drop rate for a particular client)
From request URI we will compute which partition it belongs to.
We use regex to extract the key.
Extra attribute: “D2-KeyMapper-TargetHost” : URI (but must be part of the cluster)
“D2-Hint-TargetService” to override URI
Imagine we have 3 servers. This is how the client view the servers
MD5 hash the URI. For each URI we’ll create 100 points.
The number of points is based on weight of the node * number of points per weight (configurable)
The reason we use consistent hash ring is because servers can join/leave cluster at any time. So with a consistent hash ring, we’re guaranteed that only 1/n of the request will be reshuffled if there’s changes to the cluster membership
Redlining means performance testing a server so we know what is the maximum capacity a server can handle.
We can use real production traffic and not afraid of non-immutable requests
So we’ve talked about how R2D2 work. How we discover services and how to load balance traffic. For more information you can check http://rest.li.