31. 29
Expected failures of a traditional HA system are
catastrophic
System not designed to be distributed
Wednesday, August 21, 13
32. 29
Expected failures of a traditional HA system are
catastrophic
System not designed to be distributed
Wednesday, August 21, 13
33. 29
Expected failures of a traditional HA system are
catastrophic
System not designed to be distributed
Failure forces it to be distributed
Wednesday, August 21, 13
34. 29
Expected failures of a traditional HA system are
catastrophic
System not designed to be distributed
Failure forces it to be distributed
Wednesday, August 21, 13
35. 29
Expected failures of a traditional HA system are
catastrophic
System not designed to be distributed
Failure forces it to be distributed
Cannot take distributed failure conditions into account
Wednesday, August 21, 13
36. 29
Expected failures of a traditional HA system are
catastrophic
System not designed to be distributed
Failure forces it to be distributed
Cannot take distributed failure conditions into account
Wednesday, August 21, 13
37. 29
Expected failures of a traditional HA system are
catastrophic
System not designed to be distributed
Failure forces it to be distributed
Cannot take distributed failure conditions into account
Best case scenario: complete failure
Wednesday, August 21, 13
38. 30
Take a distributed system and make the right
tradeoffs
Option #2
Wednesday, August 21, 13
47. 38
Reliability is the likelihood that a given component or system will be
functioning when needed as measured over a given period of time.
Wednesday, August 21, 13
48. 38
Reliability is the likelihood that a given component or system will be
functioning when needed as measured over a given period of time.
Availability is the percentage of times that a given system will be
functioning as required.
Wednesday, August 21, 13
57. 45
Greenfield
1. Focus on the service, not the server
2. Identify & tear apart stateless and stateful parts of your application
Wednesday, August 21, 13
58. 45
Greenfield
1. Focus on the service, not the server
2. Identify & tear apart stateless and stateful parts of your application
3. Make stateful parts redundant using distributed data stores
Wednesday, August 21, 13
59. 45
Greenfield
1. Focus on the service, not the server
2. Identify & tear apart stateless and stateful parts of your application
3. Make stateful parts redundant using distributed data stores
4. Know the dependencies of your system and the impact of failure
Wednesday, August 21, 13
60. 45
Greenfield
1. Focus on the service, not the server
2. Identify & tear apart stateless and stateful parts of your application
3. Make stateful parts redundant using distributed data stores
4. Know the dependencies of your system and the impact of failure
5. Use micro-services to make dependencies explicit
Wednesday, August 21, 13
64. 46
Legacy
1. Cloud Instances != server
2. Plan to reduce mean time to recovery (MTTR)
3. "We're HA, we're all good." -> Wrong.
Wednesday, August 21, 13
65. 46
Legacy
1. Cloud Instances != server
2. Plan to reduce mean time to recovery (MTTR)
3. "We're HA, we're all good." -> Wrong.
4. Think about stateful vs stateless parts of your application and
work piece by piece
Wednesday, August 21, 13
66. 46
Legacy
1. Cloud Instances != server
2. Plan to reduce mean time to recovery (MTTR)
3. "We're HA, we're all good." -> Wrong.
4. Think about stateful vs stateless parts of your application and
work piece by piece
5. Be creative about trade-offs: many apps that run on more than one
server have some type of common backend (NFS)
Wednesday, August 21, 13