(SDD423) Elastic Load Balancing Deep Dive and Best Practices | AWS re:Invent 2014

© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in partwithout the express consent of Amazon.com, Inc.
November 13, 2014 | Las Vegas
Elastic Load Balancing
Deep Dive & Best Practices
David Brown, Director, Software Engineering

Elastic Load Balancingautomatically distributes incoming application traffic across multiple
Amazon EC2instances.

Secure
Elastic
Integrated
Cost Effective

Load Balancer used to route incoming requests to multiple EC2 instances.
ELB
EC2
Instance
EC2
Instance
EC2
Instance

Load balance over classic EC2 instances.
Support for public IP addresses only.
No control over the load balancer security group.
Load balance over EC2 instances within a VPC.
Support for both public and private IP addresses.
Full control over the load balancer security group.
Tightly integrated into the associated VPC and subnets.
EC2-Classic
EC2-VPC

Architecture
Customer VPC
EC2
Instance
EC2
Instance
us-west-1a
us-west-1b
AmazonRoute 53
ELB VPC
ELB
ELB

HTTP/HTTPS
TCP/SSL
Incoming client connection bound to server connection
No header modification
Proxy Protocolprepends source and destination IP and ports to request
Round robin algorithm used for request routing
Connection terminated at the load balancer and pooled to the server
Headers may be modified
X-Forwarded-Forheader contains client IP address
Least outstandingrequests algorithm used for request routing
Sticky session support available

Health checksallow for traffic to be shifted away from failed instances

ELB
EC2
Instance
EC2
Instance
EC2
Instance
Health checks ensure that request traffic is shifted away from a failed instance.
Health Checks

Support for TCP and HTTP health checks.
Customize the frequency and failure
thresholds.
Must return a 2xx response.
Consider the depth and accuracy of your
health checks.
Health Checks

Idle timeoutsallow for connections to be closed by the load balancer when no longer in use.

Length of time that an idle connection should be kept open.
For both client and back-end connections.
Defaults to 60 seconds but can be set between 1 and 3,600 seconds.
Timeouts should decrease as you go
up the stack.
Idle Timeouts

15s
3s
3s
ELB
15s
EC2
Instances
Amazon S3
Amazon RDS
Amazon SWF
3s
9s
Idle Timeouts

Using multipleAvailability Zones

Multiple Availability Zones
ELB VPC
Customer VPC
EC2
Instance
ELB
ELB
EC2
Instance
us-west-1a
us-west-1b
AmazonRoute 53

Multiple Availability Zones
ELB VPC
Customer VPC
EC2
Instance
ELB
ELB
us-west-1a
us-west-1b
AmazonRoute 53

Always associate two or more subnets in different zoneswith the load balancer

Using multipleAvailability Zones does bring a fewchallenges.

Request Count
Time
Traffic Imbalances

Imbalanced Instance Capacity
ELB VPC
Customer VPC
EC2
Instance
ELB
ELB
us-west-1a
us-west-1b
AmazonRoute 53
EC2
Instances

Cross-Zone Load Balancing
ELB VPC
Customer VPC
EC2
Instance
ELB
ELB
us-west-1a
us-west-1b
AmazonRoute 53
EC2
Instances

Request Count
Time
Traffic Imbalances
Cross-Zone Enabled

Load balancer absorbs impact of DNS caching.
Eliminates imbalances in back-end instance utilization.
Requests distributed evenly across multiple
Availability Zones.
Check connection limits before enabling.
No additional bandwidth charge for cross-zone traffic.
Cross-Zone Load Balancing

Each load balancer domain may contains multiple records.
Round robin used to balance traffic between Availability Zones.
DNS records will to change over time; never
target IP addresses directly.
After being removed from DNS, IP addresses
are drained and quarantined for up to 7 days.
Understanding DNS

DNS caching by clients and ISPs can often cause clients to target a specific IP address or stop resolving at all.
Register a wildcard CNAME or ALIAS within Amazon Route 53.
// Create a wildcard CNAME or ALIAS in Route 53.
*.example.com ALIAS … elb-12345.us-east-1.elb.amazon.com
*.example.com CNAME elb-12345.us-east-1.elb.amazon.com
// prepend random content for each lookup made by the application.
PROMPT> dig +short 25a8ade5-6557-4a54-a60e-8f51f3b195d1.example.com
192.0.2.1
192.0.2.2
DNS Optimization

SSL Offloading
Support for both SSL and HTTPs is provided.
Support for latest ciphers and protocols including Elliptical Curve Ciphers and Perfect Forward Secrecy.
Ability to fully customize ciphers and protocols to be used by each load balancer.
SSL Negotiation Suites provided to remove complexity of selecting ciphers and protocols.

SSL Negotiation Policies
Provide selection of ciphers and protocols that adhere to the latest industry best practices.
Balance security best practices with client’s ability to negotiate a connection, generated using traffic to Amazon.com.
Released on a regular cadence or when new
vulnerabilities are published.
Default for all new load balancers.

POODLE Mitigation
Within 24 hours, 62% of load balancers migrated to the latest SSL Negotiation Policy, disabling SSLv3.

@awscloud Thank-you #AWS for making it so easy to prevent#sslv3 #poodleattack Only took about 3 clicks of my mouse.
“
”
@granticini

13 CloudWatch metrics provided for each load balancer.
Provide detailed insight into the health of the load balancer and application stack.
CloudWatch alarms can be configured to notify or take action should any metric go outside of the acceptable range.
All metrics provided at the 1-minute granularity.
Amazon CloudWatch Metrics

HealthyHostCount
The count of the number of healthy instances in each Availability Zone.
Most common cause of unhealthy hosts are health check exceeding the allocated timeout.
Test by making repeated requests to the back- end instance from another EC2 instance.
View at the zonal dimension.

Latency
Measures the time elapsed in seconds after the request leaves the load balancer until the response is received.
Test by sending requests to the back-end instance from another instance.
Using min, average and max CloudWatch stats
provide upper and lower bounds for latency.
Debug individual requests using Access Logs.

SurgeQueue and Spillovers
Count of the number of requests that could not be sent to back-end instances.
Queue up to 1024 requests per load balancer
node, after which 503 errors will be returned.
Often caused by not being able to open
connections to the back-end instance.
Normally a sign of an under-scaled application.

CloudWatch and AutoScaling
All load balancer metrics can be used for AutoScaling.
Allow you to scale dynamically based on the load
balancers view of the application.
Important to consider all metrics when using
AutoScaling, may not be aware of resource
contention on another metric.
You may be at peak multiple times a day.

Provide detailed information on each request processed by the load balancer.
Includes request time, client IP address, latencies, request path, and server responses.
Delivered to an Amazon S3 bucket every 5 or 60 minutes.
Access Logs

Access Logs
ELB VPC
ELB
ELB
ELB
Amazon S3
Logs indexed by date but include the IP address of the load balancer node itself.

•timestamp
•elb name
•client:port
•backend:port
•request_processing_time
•backend_processing_time
•response_processing_time
•elb_status_code
•backend_state_code
•received_bytes
•sent_bytes
•“request”
2014-02-15T23:39:43.945958Z my-test-loadbalancer 192.168.131.39:2817 10.0.0.0.1 0.000073 0.001048 0.000057 200 200 0 29 "GET http://www.example.com:80/HTTP/1.1"
Access Logs

“Everything fails all the time”
Werner Vogels, CTO, Amazon.com

Mitigation
Isolation
Restore
Redundancy

Mitigation
All load balancers scaled to handle loss of single Availability Zone.
Amazon Route 53 health checks shift traffic away from the failed Availability Zone.
Completed within 150 seconds.
No other external or control plane dependencies.

Isolation
Other zones must remain unaffected.
Avoid dependencies between zones.
Be careful of work generated as a result of the event.
Operating at reduced capacity but stable.

Health checkers and edge locations perform the same volume of activity whether endpoints are healthy or unhealthy.
Constant Work
time
System activity
Time to react
When nothing is failing, volume of API calls is zero. When failure occurs, volume of API calls spikes.
time
System activity
Time to react
Work on Failure

Restore Redundancy
Restoring the system back to full capacity.
Avoid putting additional load on the system by rushing this step.
Ensure that recovered resources are left in a consistent state.
Full recovered when done.

Please give us your feedback on this presentation
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in partwithout the express consent of Amazon.com, Inc.
Join the conversation on Twitter with #reinvent
SDD423

(SDD423) Elastic Load Balancing Deep Dive and Best Practices | AWS re:Invent 2014

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a (SDD423) Elastic Load Balancing Deep Dive and Best Practices | AWS re:Invent 2014

Similar a (SDD423) Elastic Load Balancing Deep Dive and Best Practices | AWS re:Invent 2014 (20)

Más de Amazon Web Services

Más de Amazon Web Services (20)

Último

Último (20)

(SDD423) Elastic Load Balancing Deep Dive and Best Practices | AWS re:Invent 2014