The document discusses high availability and scalability approaches for enterprise systems. Traditionally, enterprises rely on highly reliable hardware, small clusters, and standby data centers. However, this approach is expensive and systems can still fail. The document proposes using virtualization, stateless servers, and NoSQL databases to achieve high availability and scalability at lower cost. Servers are provisioned as "Phoenix servers" that can be easily recreated if failed, without relying on expensive reliable hardware. Data is distributed across servers for redundancy and fine-grained scaling. This software-based approach provides better performance and availability compared to traditional hardware-centric methods.
Scaling API-first – The story of a global engineering organization
High Availability and Scalability: Too Expensive! Architectures for Future Enterprise Systems
1. High Availability and
Scalability: Too Expensive!–
Architectures for Future
Enterprise Systems
Eberhard Wolff
Freelance Consultant / Trainer
Head Technolocy Advisory Board adesso AG
Eberhard Wolff - @ewolff
19. • Failing systems do not impact user
• Failing systems are just restarted
• Restarts happen automatically
• System run in different data centers
• i.e. eu-west-1a / b / c
Eberhard Wolff - @ewolff
21. What It Takes…
• Virtualization
• +API to start new servers
• Watchdog to detect failed servers
• Redundant data centers if needed
Eberhard Wolff - @ewolff
22. Can be implemented
in your datacenter!
I have none.
So I used the Amazon Cloud
Eberhard Wolff - @ewolff
50. Problem: Estimate & Scaling
• Performance hard to estimate
• Coarse grained scaling
• Backfires
Eberhard Wolff - @ewolff
51. True Story
•
•
•
•
•
•
•
Initial estimate wrong
Just need a little more
Cluster: two servers
Add one
About 50% higher costs
Order / install server takes time
Bad performance until server
delivered
Eberhard Wolff - @ewolff
52. Problem: Load Peak
• Business has load peaks
• i.e. events that people register for
• Need to have enough hardware for
load peaks
• Costly
Eberhard Wolff - @ewolff
53. Problem: Testing
• Testing
• Need production-like infrastructure
• Prohibitive costs
• Only needed during tests
Eberhard Wolff - @ewolff
56. What You Have Just Seen
• System tunes itself depending on
load
• Same approach as for availability
• +Watchdog for load
Eberhard Wolff - @ewolff
57. Easy to create a new server
Redundancy in Software
Reliably reproducible
✔
✔
✔
Stateless ?
Eberhard Wolff - @ewolff
58. Stateless
• Stateless web servers: best practice
• Some Java framework don’t follow
the approach
• Can store HTTP session externally
• i.e. RDBMS, NoSQL, Cache
Eberhard Wolff - @ewolff
60. Databases
• Often assumed to be
just “fast and scalable”
• Large scale doable i.e.
Data Warehouse
• Often use traditional
approach
• Cluster with two nodes
• Highly available
hardware
Eberhard Wolff - @ewolff
68. Replicas & Shards
• Easy to understand
• But: Coarse grained scaling
• Adding another shard means
• Moving lots of data
• Add quite some servers
Eberhard Wolff - @ewolff
69. Amazon Dynamo Model
Server A
Shard3
Shard1
Server B
Shard1
Shard2
Shard4
Shard4
Server D
Shard2
Shard4
Server C
Shard2
Shard3
Shard3
Shard1
Eberhard Wolff - @ewolff
70. Amazon Dynamo Model
Server A
Shard3
Shard1
Server B
Shard1
Shard2
Shard4
Shard4
Server D
Shard2
Shard4
Server C
Shard2
Shard3
Shard3
Shard1
Eberhard Wolff - @ewolff
71. Amazon Dynamo Model
Server A
Shard3
Shard1
Server B
Shard1
Shard2
Shard4
Shard4
New Server
Server D
Shard2
Shard4
Server C
Shard2
Shard3
Shard3
Shard1
Eberhard Wolff - @ewolff
72. Amazon Dynamo Model
• Published in the Dynamo paper
• Implementations:
Riak, Cassandra etc
• Fine grained scaling
• Can immediately write to new node
Eberhard Wolff - @ewolff
73. Hardware
• Not highly reliable
• Scales by distributing load across
servers
• No NAS, SAN, RAID…
• As cheap as it gets
Eberhard Wolff - @ewolff