As anyone using AWS will be able to tell you, there are good parts, and there are the bad ones. If you come from a datacenter background, you are most definitely not in Kansas anymore, and we had our share of learning experiences as a result.
This is the story of all the pitfalls we encountered, and how, through architecture, convention and common sense, we managed to build an infrastructure that is "Always Up" from the end user perspective and incredibly economical to build, scale and operate.
The talk will focus on leveraging the strong/economical points of AWS, while avoiding the weak/expensive ones. I'll give a break down of the pain points, how we managed them and how we avoided painting ourselves in a corner accidentally.
For many companies starting today, success is defined by large traffic or user numbers; if you are one of those companies, these lessons will very likely save you significant operational headaches.
16. AWS OUTAGE =YOUR OUTAGE
http://it.mario.wikia.com/wiki/Lakitu
Thursday 22 August 13
17. RESILIENCE @ SCALE
Embrace Failure: Hardware will fail. Humans will make errors.
Nature will produce thunderstorms.
http://blabitcanada.com/category/twitter-2/
Thursday 22 August 13
18. DEFINE 'AVAILABLE'
Things will break, so choose your degraded state.
http://libcom.org/library/occupied-wall-street-some-tactical-thoughts-malcolm-harris
Thursday 22 August 13
19. BASIC API CALL
3 potential points of failure
Thursday 22 August 13
20. FALLBACK PATTERNS
The cost of resilience should be accuracy or latency
http://redis.io/
http://memcached.org/
http://varnish-cache.org/
Thursday 22 August 13
21. FALLBACK PATTERNS
The cost of resilience should be accuracy or latency
http://redis.io/
http://memcached.org/
http://varnish-cache.org/
Thursday 22 August 13
22. FALLBACK PATTERNS
The cost of resilience should be accuracy or latency
http://redis.io/
http://memcached.org/
http://varnish-cache.org/
Thursday 22 August 13
23. FALLBACK PATTERNS
The cost of resilience should be accuracy or latency
http://redis.io/
http://memcached.org/
http://varnish-cache.org/
Thursday 22 August 13
24. FALLBACK PATTERNS
The cost of resilience should be accuracy or latency
http://redis.io/
http://memcached.org/
http://varnish-cache.org/
Thursday 22 August 13
27. MANY SMALL NODESVERSUS
A FEW LARGER NODES
The benefits of the many outweigh the benefits of the few
http://www.stealingfaith.com/2012/07/08/throw-off-the-tiny-ropes/
Thursday 22 August 13
28. DATABASES
CAPTheorem applies.
Your choice: sacrifice availability or consistency. Orange is a lie.
RDBMS
BigTable Based
Master / Slave based
CouchDB
Dynamo Based
http://ferd.ca/beating-the-cap-theorem-checklist.html
Thursday 22 August 13
29. SIMPLE STORAGE SERVICE
S3:Arguably AWS' best feature
http://www.iwallpaper.us/gold-star-fo-christmas-wallpaper-140/
http://aws.amazon.com/s3/
https://forums.aws.amazon.com/message.jspa?messageID=182919#182919
Thursday 22 August 13
30. CACHE WHATYOU CAN
HTTP Responses, DB Queries, User content
Browsers have caches too!
http://cruncht.com/95/drupal-caching/
http://redis.io/
http://memcached.org/
http://varnish-cache.org/
Thursday 22 August 13
31. CLIENT SIDE STORAGE
Keep a copy of your users data locally
http://www.w3.org/2001/tag/2010/09/ClientSideStorage.htmlhttp://www.wired.com/gadgetlab/2012/03/badass-gadget-ammo-lunch-box/
Thursday 22 August 13
32. USE ELASTIC LOAD BALANCERS
They will save you more than once
http://wallpapers5.com/wallpaper/Balance-Green-Tree-Frog/
Thursday 22 August 13
33. USE GLOBAL LOAD BALANCING
Fail over to the closest data center on region failure
Thursday 22 August 13
34. SHOUT OUT: DYN
DNS for Bit.ly, Quora,Twitter,Wikia, Fastly, etc
http://dyn.com
Thursday 22 August 13
35. USE IAM ROLES FOR ACCESS
Humans make mistakes, including your humans
Thursday 22 August 13
36. COST @ SCALE
Scaling without breaking the bank
http://mgx.com/blogs/wp-content/uploads/2013/07/piggybank.jpg
Thursday 22 August 13
37. EMR + SPOT INSTANCES
On demand rate: $0.165 / hour
http://aws.amazon.com/ec2/spot-instances/
Thursday 22 August 13
38. AMAZON REDSHIFT
Economical Business Intelligence
Scales with data size
http://www.flitemedia.com/music.php
http://aws.amazon.com/redshift
http://www.tableausoftware.com/
Thursday 22 August 13
39. AMAZON GLACIER
"Tapes for the Cloud Era"
Writes vastly cheaper than reads
http://aws.amazon.com/glacier/http://www.gorp.com/parks-guide/glacier-national-park-outdoor-pp2-guide-cid350021.html
Thursday 22 August 13
40. AWS SIMPLE EMAIL SERVICE
Dealing with email is boring and time consuming
http://aws.amazon.com/ses/http://bfsdaniels.copycop.com/blog/all-about-printing/hypertargeting-with-direct-mail/
Thursday 22 August 13
41. AWS SIMPLE QUEUE SERVICE
Excellent for latency insensitive, small volume queues
http://www.toledoblade.com/Retail/2013/01/13/Disney-s-magic-bracelet-new-key-to-its-kingdom.html
http://aws.amazon.com/sqs/
http://colby.id.au/benchmarking-sqs
Thursday 22 August 13
43. AWS DYNAMO DB
Excellent for small keys & high read rates
at known & consistent IOPS
http://hlbike.en.ecplaza.net/2.jpg http://aws.amazon.com/dynamodb/
Thursday 22 August 13
44. MAXIMIZE IOPS
RAID 0 Ephemeral drives
use m1.xlarge or c1.xlarge, or use ssds if you need >20k IOPS
http://calculator.s3.amazonaws.com/calc5.html
http://blog.scalyr.com/2012/10/16/a-systematic-look-at-ec2-io/
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html#disk-performance
Thursday 22 August 13
45. RED FLAGS
Anti-patterns to watch out for
http://grandprix247.com/2012/09/03/spa-pile-up-renews-focus-on-formula-1-safety-matters/
Thursday 22 August 13
46. PROVISIONED IOPS EBS
Ephemeral storage on c1/m1.xlarge or SSD is better
If you must: m*large or c1.xlarge for dedicated NIC
http://www.slideshare.net/AmazonWebServices/ebs-mongo-dbwebinarfinal-nn
http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.htmlhttp://navidoo.ru/interest/Nasha_jizn/17676.html
Thursday 22 August 13
47. AWS DYNAMO DB
For high write rates or
large/variable keys
http://aws.amazon.com/dynamodb/http://www.walltowall.co.uk/program/standing-tall-worlds-tallest-people_93.aspx
Thursday 22 August 13
48. HIGH IO/DISK/RAM NODES
Use them deliberately
http://elledecoration.co.za/2010/07/gigantic/
Thursday 22 August 13
49. AWS CLOUDWATCH
Metric collection,Amazon style
Cost prohibitive & resolution too low
http://www.flickr.com/photos/65683080@N08/6893582132/ http://aws.amazon.com/cloudwatch/
Thursday 22 August 13
50. LOWER COST PER METRIC
Use graphite & statsd
http://graphite.wikidot.com/
https://github.com/etsy/statsd
Thursday 22 August 13
51. HOSTED ALTERNATIVES
Circonus:All the insights you ever wanted
StackDriver: Optimized for AWS
http://circonus.com
http://stackdriver.com
Thursday 22 August 13
52. AWS CLOUDFORMATION
Templatize your entire stack
Harder to use as complexity increases
http://aws.amazon.com/cloudwatch/http://fullnfenil7.blogspot.com/2012/05/amazing-cloud-shapes-photos.html#.UhKrZmRgZHg
Thursday 22 August 13
53. RDS FOR ANALYTICS/REPORTS
Paying OLTP prices for BI usage
Sharding will be a matter of time
http://nerds.airbnb.com/redshift-performance-costhttp://business901.com/blog1/understanding-your-customer-problem/
Thursday 22 August 13