2. Introduction
• Published in July, 2015
• by Bruce Maggs
• Vice President of Research for
Akamai
• Professor at Duke University
• Theoretical + Practical use of
Algorithms for CDN
• Link
3. Overview
• Authoritative name server chooses Cluster
• Stable allocation
• Next, Server(edge) chosen in cluster (DNS ends here)
• Consistent hashing
• Which objects to cache and which not
• Bloom filter
• Find best route from edge to origin
• Multi-commodity flow problem
• Fault-tolerance for failures to make resilient system
• Leader election algorithm
4. Stable allocation
• Global load balancing is the process of mapping clients to the server
clusters
• Clients are represented as Map unit tuple <Client IP Address prefix, Traffic type>
• For example: <1.2.3.4/24, video>, <1.2.3.4/24, web>
• Properties: Demand(flash crowd), Traffic type, IP range
• Cluster is set of servers
• Usually Cluster is located in certain ISP or Autonomous system
• Properties
• Latency, packet-loss, throughput, capacity (CPU, RAM..)..
• Task of Global load balancing is to match MapUnit -> Cluster
• Each Map Unit has preferences for cluster in order
• Example: <1.2.3.4/24,video> =>C1, C2, ...CN
• Each Cluster has preferences for map units
• Exampe: C1 => <1,2,3,4,video> as C1 is deployed in ISP where 1.2.3.4 is local
• Algorithm used: "Gale-shapley" (next slide)
How mapping should look
5. Gale-shapley algorithm
• Introduced by Gale and Shapley in 1962
• Applied to "Stable marriage problem"
• Find the most stable marriages for N men and N women
• Input: Each man tells his/her preference by order
• Ex: Nick -> Judy, Mary, Suzy..
• Output: Stable N matches (Man-Woman)
• Reference:
• WIKI
• Code example
• Details from Lecture Notes
• Youtube animation
• How it works in Next slide
6. How it works?
Preferences of Men
and Women
1. All men propose
to each women
2. Women accept by
their preference Details
3. rejected m2 now
proposes to w4
4. w4 rejects m4,
accepts m2
5. now m4 proposes
to next. And So on.
Finally we have 4x4
match
7. Implementation challenges
• Unequal number of map units and clusters
• 10million map units vs 1000 clusters
• Partial preference lists
• To optimize we filter preference lists, for example, map
unit Boston does not need to have clusters from Tokyo in
it.
• Resource trees
• Used to capture cluster resources in Tree structure
• More
• Load-Balancing in Content-Delivery Networks [ppt]
8. Consistent Hashing
• Consistent hashing is used to balance the load within a
single cluster
• an object is mapped to the next instance of a bucket (server) that
appears in clockwise order on the unit circle
• Flash crowd to single object
• Map popular object to the next K servers (k is a function of the
popularity)
• Content provider's objects are grouped by serial number,
and hashed to same bucket
• Uses same connection for multiple contents of same web page.
• A212.g.akamai.net
• References:
• WIKI
• CODE
• Research paper
• Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot
Spots on the World Wide Web
9. cluster
Consistent Hashing in practice
Server
1
Server
3
Server
2
Server
4
Authoritative
Name server
Resolving
name server
Images.example.com
DNS CNAME:
a212.g.akamai.net
Choose Cluster with
Gale alg.
<IP,
G-general>
Cluster A
Consistent hashing
Serial num:212
"212" 's Popularity K servers
selected
e.g. K=2
DNS response:
Server1 ,server2
In random order
10. Bloom filter
• Probabilistic Data structure designed to tell you, RAPIDLY and
MEMORY efficiently whether an element is present in a SET or Not.
• Price paid for efficiency: element is definitely not in the set or maybe in the
set with probability.
• Browser Example: Detect if given URL is malicious
• Input: million Malicious URLs, size approx. 25MB
• With 99% probability 25MB reduced to 1.14MB
• DEMO
• http://billmill.org/bloomfilter-tutorial/
• Use case
• Weak password, Hbase, Cassandra, Bitcoin (wallet sync)..
11. How it works
• Input data is hashed K times (with
different salt) and stored in bitset of size
N
• K is roughly proportional to the ratio
of the size of the hash table and the
number of elements that you expect
to be.
• Calculator:
http://hur.st/bloomfilter?n=1000000&p=0.
01
• Example (figure on the right)
• K = 3
• Java implementation
• Wiki
• https://en.wikipedia.org/wiki/Bloom_filter
12. Bloom filter in CDN
• Cache summarization
• More space-efficient than storing a list of URLs
• Squid example (http://www.squid-cache.org/)
• Distributes cache summaries to nodes (digests)
• When object is evicted from cache, we use:
• "Counting Bloom filters"
• Integer replacing bits. Increment / Decrement for Insert / Deletion
• Cache filtering
• "One hit wonders"
• ¾ of web objects accessed only ONCE per TWO days
• Cache on second hit rule
• Avoids caching one-hit wonders
• Cache only when accessed second time during certain period
• Two bloom filters: Primary and Secondary
• Primary hold recent, Secondary past objects
13. Benefits of Cache on second Hit
• Experiment conducted by Research Scientist at Akamai
• On 47 prod. Servers
• Enable "Cache on second hit" between March 14 ~ April 24
• When Cache On Second Hit is ON
• Hit Rate increased from 74% to 83% (right figure)
• Disk Usage decreased by 44% (left figure)
15. Overlay routing
• Most of web objects are non-cachable
or cachable only for short time
• Short-term cache
• Stock chart, weather data
• Non-cachable:
• Live streaming, Private data, Tele conference
• We need algorithm to Construct overlay
to provide efficient communication
between edge servers and origin
• Input:
• Client demand: Edge -> origin,
• Real-time network measurements between
servers (latency, loss, bandwidth)
• Output: Set of paths with high availability
and performance
How overlay routing looks
16. Multi-commodity Flow
• The multi-commodity flow problem is a network flow problem with
multiple commodities (flow demands) between different source and sink
nodes.
• Key Aspects for modeling
• Multipath transport
• Live video mode (replicate & collect & recover)
• Web delivery
• Role of overlay servers
• Replicate, encode/decode
• Cost function
• For Dynamic web content "Latency" cost important
• For Live stream "Throughput".
• Capacity
• Server / Network resources
• Optimized transport protocols
•TCP / IP optimizations
17. Algorithmic Solutions
• Dynamic web content
• All-pairs shortest path problem
• Task: Select path with the best performance without capacity violations
• Graph: Floyd-Warshall algorithm
• Reference:
• Network Flows: Theory, Algorithms and Applications
• Live videos
• Mixed integer problem
• Reference:
• Designing Overlay Multicast networks For Streaming
• Algorithms for Constructing Overlay Networks for Live Streaming
18. Benefits of Overlay routing
• Experiment 1 (Consistent benefits)
• Origin server in Dallas, U.S, NA (North America)
• Agents deployed around the world
• Uncacheable content 38kb
• Experiment 2 (Catastrophe Insurance)
• In 2010 April, Internet outage SEA-ME-WE 4
• Cable was repaired from Apr.25~Apr.29
• Origin in Boston, Agents in Asia
Figure1.
Figure2.
19. Leader Election (nmon case)
• Key concepts that underlie leader election
• Failure model
• Candidate set
• Health
• Electoral process
• Similar to nmon (broadcasting health to others)
• Outcome requirements
• At-least-one, at-most-one
20. Used Algorithm
• Raft !! (alternative to Paxos)
• Consensus & Leader election algorithm
• Awesome explanation
• Github
21. Future work
• Experimental
• Cache on second-hit Challenges
• Regression cases
• Can be fixed with Custom header : X-Cache-On-First-Hit
• TE-Chunked, Pseudo-chunked
• Only disk
• Should be applied as patch, not SAM