2. What is Gossiping?
• Spread of information in a random manner
• Some examples:
– Human gossiping
– Epidemic diseases
– Physical phenomenon: wild fire, diffusion etc
– Computer viruses and worms
3. Gossiping in Computer Science
• Term first coined by Demers et al (1987)
• Some applications of gossip protocols
– Peer Sampling
– Data Aggregation
– Clustering
– Information Dissemination (Multicast, Pub/Sub)
– Overlay/topology
– Maintenance
– Failure detection?
5. Today’s Focus
• Theoretical angle for Gossip-based protocols
[Allavena et al PODC 2005]
– Probability of partitioning
– Time till partitioning
– Bounds on in-degree
– Essential elements of gossiping
– Simulation results
• Cyclon [Voulgaris et al]
• Scamp [Ganesh et al]
• NewsCast [Jelasity et al]
6. Membership Service
• Full Membership
– Complete knowledge at each node
– Random subset used for gossiping
– Not scalable
– Hard to maintain
• Partial Membership
– Random subset at each node
– Gossip partners chosen from local view
7. View Selection
L1
s s,p,r p
v s,p,t u r
t t,q,r
q
L2
L1 L2
v
Weighted with w
8. Essential Elements of Gossiping
• Mixing: Construct a list L1 consisting of local
views of local view of node u
– Guarantees non partitioning
– “Pull” based
• Reinforcement: Construct a list L2 consisting
of local views of nodes that requested local
view of u
– Balances network
– removes old possibly dead edges, adds new edges
9. Partitioning and Size Estimate
• A and B partition iff x=1 and y=0
• Partitioning is least possible when x=y
• Goal of protocol is to maintain this balance
10. Size Estimates
• Idea:
– Assuming edges were drawn uniformly randomly,
expected x+y |A|
– x is estimate of size of A by nodes in A
– y is estimate of size of A by nodes in B
• Mixing:
– Agreeing on estimation of x and y ensures no
partition (even if x and y are not accurate)
• Reinforcement:
– Brings estimation of x and y to correct value
11. K-regularity
• View Size: k
• Number of nodes: n
• Fraction of nodes in partition: γ
• |A|= γn ≤ |B|
• #edges from A to B: (1-x)γkn
• #edges from B to A: y (1-γ)kn
• Number of edges in A-B cut:
– (1-x)γkn +x (1-γ)kn (since x=y)
– ≥ γkn (assuming γ≤½)
12. Time Till Partitioning
• View Size: k
• Number of nodes: n
• Fraction of nodes in partition: γ
• Churn rate: μ (μn nodes leave and join)
• Claim: Expected time before a partition of size
γ happens ≈ 2γkn
– As long as μ≪γkn
13. 100, 000 nodes, view sizes of 17, a fanout of 3, and a loosely
synchronised syst em, t he maximum in-degree was always
which re-samples ran- below 4.5 t imes t hat of a random graph and t he st andard
g t he names of t he nodes deviat ion was not more t han 3.2 t imes larger t han t hat of a
Iterations until Partitioning
y or anot her is doomed
enat ion of all t he views.
random graph. T hese values would improve wit h increased
fanout , but even a fanout of 2 gives sat isfact ory perfor-
onds t o creat ing a new mance.
eplacement from t he old
om V at each it erat ion.
10000
annot reappear wit hout
cement . T he diversit y 9000
ime, and in fact rat her 8000
Number of iterations
rk. Not e t hat it is t he- 7000
e by creat ing a protnodes: n
Number of ocol
6000
t at ion on V, size: tk = log n
View but his is
Churn: n/32
esn’t necessarily behave 5000
ving or joining t he net - 4000
ively add t he names of 3000
t o V, a process we call
ome reinforcement , even 2000
n t he art icle: each pro- 1000
hen sending a message. 0
nd Cyclon [16] as well: 1 1.5 2 2.5 3 3.5
eir view t hat t hey t hen Log10 of the number of nodes
aviour in say t he cont ext F igure 4: N umber of it erat ions unt il part it ioning
t he “ News Event s” are
des. Let only t he nodes We were int erest ed in mat ching our t heoret ical result s
odes add t heir names t o about part it ioning and churn. We ran simulat ions evaluat -
Event s” inst ead of every ing t he number of it erat ions unt il part it ioning. By part it ion-
14. View Size vs Time until Partition
Number of nodes: n
View size: k = log n
Churn: n/32
15. Simplified Model for Proof
– Single randomly chosen element from view is
replaced instead of whole views
– Assumption: The out-edges of nodes of A are
identically distributed and same applies to B
– a = #edges from A to A
– c = #edges from A to B
– b = #edges from B to A
– d = #edges from B to B
17. In-Degree Analysis
• Load balancing requires balance in in-degree
distribution
• In-degree is governed by the way edges created,
copied and destroyed
• Copying some edges more than others cause
variability in in-degree
• Node living longer is expected to have higher in-
degree
• Solution: Increase reinforcement and keep track
of timestamps like in Cyclon
• Simulation: max in-degree < 4.5 times of random
graph and standard deviation < 3.2 times
18. Discussion
• Are these theoretical guarantees practically
useful?
• Goal is not provide failure detection
19. Cyclon
• Consists of same elements as suggested by
[Allavena et al PODC 2005]
• [Allavena et al PODC 2005] Analysis holds for
Cyclon
• Major differences:
– Timestamps
– shuffling
20. Basic Shuffling
• Select a random subset of l neighbors (1 ≤ l ≤ c) from P’s
own cache, and a random peer, Q, within this subset,
where l is a system parameter, called shuffle length.
• Replace Q’s address with P’s address.
• Send the updated subset to Q.
• Receive from Q a subset of no more than l of Q’s neighbors.
• Discard entries pointing to P, and entries that are already in
P’s cache.
• Update P’s cache to include all remaining entries, by
– firstly using empty cache slots (if any), and
– secondly replacing entries among the ones originally sent to Q.
22. Enhanced Shuffling
• Increase by one the age of all neighbors.
• Select neighbor Q with the highest age among all neighbors, and l −
1 other random neighbors.
• Replace Q’s entry with a new entry of age 0 and with P’s address.
• Send the updated subset to peer Q.
• Receive from Q a subset of no more that l of its own entries.
• Discard entries pointing at P and entries already contained in P’s
cache.
• Update P’s cache to include all remaining entries, by firstly using
empty
• cache slots (if any), and secondly replacing entries among the ones
sent to Q.
24. removed. Note that the number of clusters decreases as we approach 100% node
removal because the total number of surviving nodes becomes too small. Fig-
Number of Clusters
ure 7(b) shows the number of nodes not belonging to the largest cluster, in log
scale.
These graphs show considerable robustness to node failures, especially con-
sidering the fact that in the early stages of clustering very few nodes are out of
the largest cluster, which indicates that most nodes are still connected in a single
Fig. 7. (a) Number of disjoint clusters, as a result of removing a large percentage of nodes. Shows
that the overlay does not break into two or more disjoint clusters, unless a major percentage of the
nodes are removed. (b) Number of nodes not belonging to the largest cluster. Shows that in the first
steps of clustering only a few nodes are separated from the main cluster, which still connects the
27. SCAMP
• Partial knowledge of the membership: local
view
• Fanout automatically set = size of the local
view
• Fanout evolves naturally with the size of the
group
– Size of local views converges towards C.log(n)
28. Join (Subscription)
Subscription to Subscription forwarded
a random member P=1/sizeof view
s 1 s
S 0 (1-P)
s
s P=1/sizeof view
s 2 s
(1-P)
s
P=1/sizeof view
3 s
(1-P)
30. Load Balancing
• Indirection:
– Forward the subscription instead of handling
request
• Lease associated with each subscription
• Periodically nodes have to re-subscribe
– Nodes having failed permanently will time out
– Re-balance the partial views
31. Unsubscription
Local view 8 9 0 8 9 4
1 4 5 Unsub (0), [1,4,5]
4 x x
0
7 3 0 7 3 5
1 y y
6 0 2 6 0 1
5
z z
32. Degree
• System modelled as random directed graph
• D(N) = Average out-degree for N-nodes
system
• Subscription adds D(N)+1 directed arcs, so
• (N+1) D(N+1) = N D(N) + D(N)+1
• Solution of this recursion is
• D(N)=D(1)+1/2+1/3+…+1/N Log(N)
33. Distribution of view size
35000
30000
Log=13.12
25000
Number of nodes
20000
15000
10000 Log=12.2
5000
0
0 5 10 15 20 25 30 35 40 45 50
View Size
200 000 Node System 500 000 Node System
33
34. Reliability: 5000 node system
1
0.98
0.96
Reliability
0.94
0.92
0.9
0 500 1000 1500 2000 2500
Number of failures
SCAMP
Global membership knowledge, fanout=8
Global membership knowledge, fanout=9
34
35. NewsCast
• Goal: Aggregate information in
– a large and dynamic
– distributed environment
– a robust and dependable manner
36. Idea
• Gets news from application, timestamps it and
adds local peer address to the cache entry
• Finds a random peer in cache addresses
– Sends all cache entries to this peer
– Receives all cache entries from that peer
• Passes on cache entries (containing news items)
to application
• Merges old cache with received cache
– Keeps at most C most recent cache entries
37. Aggregation
• Each node ni maintains a single number xi
• Every node ni selects a random node nk, and
sends its value xi to nk
• nk responds with the aggregate (e.g. max(xi,xk)
) of the incoming and its own value
• 4. Aggregate values will converge
“exponentially”
Fig.7. (a)Numberofdisjointclusters,asaresultofremovingalargepercentageofnodes.Shows that the overlay does not break into two or more disjoint clusters, unless a major percentage of the nodes are removed. (b) Number of nodes not belonging to the largest cluster. Shows that in the first steps of clustering only a few nodes are separated from the main cluster, which still connects the grand majority of the nodes.
Note that the graph for the experiment with cache size 100 is practically a flat line. That is, for 100,000 nodes and cache size 100, the overlay created is so robust, that no matter how many nodes are removed, the remaining ones remain connected in a single cluster.
In-degreedistributioninconverged100,000nodeoverlay,forbasicshuffling,enhanced shuffling, and an overlay where each node has c randomly chosen outgoing links. t is, however, clear that enhanced shuffling does a significantly better job with respect to spreading out the links extremely evenly across all nodes. For the experiment with cache size 20, 80.31% of the nodes have an in-degree of 20 ± 5%. For the experiment with cache size 50, 93.95% of the nodes have an in-degree of 50 ± 5%. The respective percentages for basic shuffling are 36.22 and 38.47%.
This is an average case analysis. In reality, there are noise terms in this recurrence as we pick a node whose degree is only approximately d(N). In order to prove that the argument is correct in the presence of this noise, we need to control the variance of that noise (and invoke the martingale convergence theorem!).
to the overall value incase the aggregation is a average function, and “super- exponentially” incase of a maximum function
maximum finding protocol. N = 105 , points are averages of 50 runs. Standard deviation is not shown, it is several orders of magnitude lower than the average.