L O A D B A L A N C I N G
I S
I M P O S S I B L E
SLIDE
LOAD BALANCING IS IMPOSSIBLE
Tyler McMullen
tyler@fastly.com
@tbmcmullen
2
WHAT IS LOAD BALANCING?
[DIAGRAM DESCRIBING LOAD BALANCING]
[ALLEGORY DESCRIBING LOAD BALANCING]
SLIDELOAD BALANCING IS IMPOSSIBLE 6
Abstraction Balancing LoadFailure
Treat many servers as one
Single entry point
Simplifi...
R A N D O M
T H E I N G L O R I O U S D E FA U LT
A N D B A N E O F M Y E X I S T E N C E
SLIDE
LOAD BALANCING IS IMPOSSIBLE
What’s good about
random?
8
• Simplicity
• Few edge cases
• Easy failover
• Works ident...
SLIDE
LOAD BALANCING IS IMPOSSIBLE
What’s bad about
random?
9
• Latency
• Especially long-tail latency
• Useable capacity
B A L L S - I N T O - B I N S
If you throw m balls into n bins,
what is the maximum load
of any one bin?
import	numpy	as	np	
import	numpy.random	as	nr	
n	=	8				#	number	of	servers	
m	=	1000	#	number	of	requests	
bins	=	[0]	*	n...
import	numpy	as	np	
import	numpy.random	as	nr	
n	=	8				#	number	of	servers	
m	=	1000	#	number	of	requests	
bins	=	[0]	*	n...
How do you model
request latency?
Log-normal Distribution
50th: 0.6
75th: 1.2
95th: 3.1
99th: 6.0
99.9th: 14.1
MEAN: 1.0
mu	=	0.0	
sigma	=	1.15	
lognorm_mean	=	math.e	**	(mu	+	sigma	**	2	/	2)	
desired_mean	=	1.0	
def	normalize(value):	
				ret...
mu	=	0.0	
sigma	=	1.15	
lognorm_mean	=	math.e	**	(mu	+	sigma	**	2	/	2)	
desired_mean	=	1.0	
baseline	=	0.05	
def	normalize...
THIS IS WHY
PERFECTION
IS IMPOSSIBLE
WHY IS THAT A PROBLEM?
WHAT EFFECT DOES IT HAVE?
Random simulation
Actual distribution
The probability of a single resource request avoiding the 99th percentile is 99%.
The probability of all N resource reques...
SO WHAT DO WE DO ABOUT IT?
Random simulation
JSQ simulation
L E T ’ S T H R O W A W R E N C H I N T O T H I S . . .
D I S T R I B U T E D L O A D
B A L A N C I N G
A N D W H Y I T M ...
DISTRIBUTED RANDOM IS EXACTLY THE SAME
DISTRIBUTED JOIN-SHORTEST-QUEUE IS A NIGHTMARE
mu	=	0.0	
sigma	=	1.15	
lognorm_mean	=	math.e	**	(mu	+	sigma	**	2	/	2)	
desired_mean	=	1.0	
baseline	=	0.05	
def	normalize...
mu	=	0.0	
sigma	=	1.15	
lognorm_mean	=	math.e	**	(mu	+	sigma	**	2	/	2)	
desired_mean	=	1.0	
baseline	=	0.05	
def	normalize...
[100.7,	137.5,	134.3,	126.2,	113.5,	175.7,	101.6,	113.7]
[130.5,	131.7,	129.7,	132.0,	131.3,	133.2,	129.9,	132.6]
STANDARD...
Random simulation
JSQ simulation
Randomized JSQ simulation
A N O T H E R C R A Z Y I D E A
MY PROBLEM WITH THIS APPROACH...
A DIFFERENCE OF PERSPECTIVE
WRAP UP
SLIDE
LOAD BALANCING IS IMPOSSIBLE
THANKS BYE
tyler@fastly.com
@tbmcmullen
48
Load balancing is impossible
Load balancing is impossible
Load balancing is impossible
Load balancing is impossible
Load balancing is impossible
Load balancing is impossible
Load balancing is impossible
Load balancing is impossible
Load balancing is impossible
Load balancing is impossible
Load balancing is impossible
Load balancing is impossible
Load balancing is impossible
Próxima SlideShare
Cargando en…5
×

Load balancing is impossible

1.193 visualizaciones

Publicado el

Load balancing is something most of us assume is a solved problem. But the idea that load balancing is “solved” could not be further from the truth. If you use multiple load balancers, the problem is even worse. Most of us use “random” or “round-robin” techniques, which have certain advantages but are highly inefficient. Others use more complex algorithms like “least-conns,” which can be more efficient but have horrific edge cases. “Consistent hashing” is a very useful technique but only applies to certain problems.

There are several factors that exist both in theory and practice that make efficient load balancing an exceptionally hard problem, including Poisson request arrival times, exponentially distributed response latency, and oscillations when sharing data between multiple load balancers. Luckily, there are techniques and algorithms that have been developed that can make life better. Tyler McMullen explains some of the ways that we can do better than “random,” “round-robin,” and naive “least-conns,” even with distributed load balancers.

Publicado en: Software
0 comentarios
1 recomendación
Estadísticas
Notas
  • Sé el primero en comentar

Sin descargas
Visualizaciones
Visualizaciones totales
1.193
En SlideShare
0
De insertados
0
Número de insertados
19
Acciones
Compartido
0
Descargas
16
Comentarios
0
Recomendaciones
1
Insertados 0
No insertados

No hay notas en la diapositiva.

Load balancing is impossible

  1. 1. L O A D B A L A N C I N G I S I M P O S S I B L E
  2. 2. SLIDE LOAD BALANCING IS IMPOSSIBLE Tyler McMullen tyler@fastly.com @tbmcmullen 2
  3. 3. WHAT IS LOAD BALANCING?
  4. 4. [DIAGRAM DESCRIBING LOAD BALANCING]
  5. 5. [ALLEGORY DESCRIBING LOAD BALANCING]
  6. 6. SLIDELOAD BALANCING IS IMPOSSIBLE 6 Abstraction Balancing LoadFailure Treat many servers as one Single entry point Simplification Transparent failover Recover seamlessly Simplification Spread the load efficiently across servers Why Load Balance? Three major reasons. The least of which is balancing load.
  7. 7. R A N D O M T H E I N G L O R I O U S D E FA U LT A N D B A N E O F M Y E X I S T E N C E
  8. 8. SLIDE LOAD BALANCING IS IMPOSSIBLE What’s good about random? 8 • Simplicity • Few edge cases • Easy failover • Works identically when distributed
  9. 9. SLIDE LOAD BALANCING IS IMPOSSIBLE What’s bad about random? 9 • Latency • Especially long-tail latency • Useable capacity
  10. 10. B A L L S - I N T O - B I N S
  11. 11. If you throw m balls into n bins, what is the maximum load of any one bin?
  12. 12. import numpy as np import numpy.random as nr n = 8 # number of servers m = 1000 # number of requests bins = [0] * n for chosen_bin in nr.randint(0, n, m): bins[chosen_bin] += 1 print bins [129, 100, 134, 113, 117, 136, 148, 123]
  13. 13. import numpy as np import numpy.random as nr n = 8 # number of servers m = 1000 # number of requests bins = [0] * n for weight in nr.uniform(0, 2, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += weight print bins [133.1, 133.9, 144.7, 124.1, 102.9, 125.4, 114.2, 121.3]
  14. 14. How do you model request latency?
  15. 15. Log-normal Distribution 50th: 0.6 75th: 1.2 95th: 3.1 99th: 6.0 99.9th: 14.1 MEAN: 1.0
  16. 16. mu = 0.0 sigma = 1.15 lognorm_mean = math.e ** (mu + sigma ** 2 / 2) desired_mean = 1.0 def normalize(value): return value / lognorm_mean * desired_mean for weight in nr.lognormal(mu, sigma, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += normalize(weight) [128.7, 116.7, 136.1, 153.1, 98.2, 89.1, 125.4, 130.4]
  17. 17. mu = 0.0 sigma = 1.15 lognorm_mean = math.e ** (mu + sigma ** 2 / 2) desired_mean = 1.0 baseline = 0.05 def normalize(value): return (value / lognorm_mean * (desired_mean - baseline) + baseline) for weight in nr.lognormal(mu, sigma, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += normalize(weight) [100.7, 137.5, 134.3, 126.2, 113.5, 175.7, 101.6, 113.7]
  18. 18. THIS IS WHY PERFECTION IS IMPOSSIBLE
  19. 19. WHY IS THAT A PROBLEM?
  20. 20. WHAT EFFECT DOES IT HAVE?
  21. 21. Random simulation Actual distribution
  22. 22. The probability of a single resource request avoiding the 99th percentile is 99%. The probability of all N resource requests in a page avoiding the 99th percentile is (99% ^ N). 99% ^ 69 = 49.9%
  23. 23. SO WHAT DO WE DO ABOUT IT?
  24. 24. Random simulation JSQ simulation
  25. 25. L E T ’ S T H R O W A W R E N C H I N T O T H I S . . . D I S T R I B U T E D L O A D B A L A N C I N G A N D W H Y I T M A K E S E V E R Y T H I N G H A R D E R
  26. 26. DISTRIBUTED RANDOM IS EXACTLY THE SAME
  27. 27. DISTRIBUTED JOIN-SHORTEST-QUEUE IS A NIGHTMARE
  28. 28. mu = 0.0 sigma = 1.15 lognorm_mean = math.e ** (mu + sigma ** 2 / 2) desired_mean = 1.0 baseline = 0.05 def normalize(value): return (value / lognorm_mean * (desired_mean - baseline) + baseline) for weight in nr.lognormal(mu, sigma, m): chosen_bin = nr.randint(0, n) bins[chosen_bin] += normalize(weight) [100.7, 137.5, 134.3, 126.2, 113.5, 175.7, 101.6, 113.7]
  29. 29. mu = 0.0 sigma = 1.15 lognorm_mean = math.e ** (mu + sigma ** 2 / 2) desired_mean = 1.0 baseline = 0.05 def normalize(value): return (value / lognorm_mean * (desired_mean - baseline) + baseline) for weight in nr.lognormal(mu, sigma, m): a = nr.randint(0, n) b = nr.randint(0, n) chosen_bin = a if bins[a] < bins[b] else b bins[chosen_bin] += normalize(weight) [130.5, 131.7, 129.7, 132.0, 131.3, 133.2, 129.9, 132.6]
  30. 30. [100.7, 137.5, 134.3, 126.2, 113.5, 175.7, 101.6, 113.7] [130.5, 131.7, 129.7, 132.0, 131.3, 133.2, 129.9, 132.6] STANDARD DEVIATION: 1.18 STANDARD DEVIATION: 22.9
  31. 31. Random simulation JSQ simulation Randomized JSQ simulation
  32. 32. A N O T H E R C R A Z Y I D E A
  33. 33. MY PROBLEM WITH THIS APPROACH... A DIFFERENCE OF PERSPECTIVE
  34. 34. WRAP UP
  35. 35. SLIDE LOAD BALANCING IS IMPOSSIBLE THANKS BYE tyler@fastly.com @tbmcmullen 48

×