When you're handling big data in the modern world, you will come to a point where you can't just pick a “one size fits all” approach anymore. However, to get the results you want, you also don’t have to spend big money on fire breathing hardware, or expensive software. AWS offers a beautiful array of open and commercial database choices, from do-it-yourself to fully managed services which handle scaling, and gives you powerful tools to choose the right architecture. You could choose from MySQL, RDS, Oracle, SQL Server, MongoDB, DynamoDB, Cassandra, ElastiCache, Redis, and SimpleDB, and our customers use them for different use cases. Each has different strengths, and this session highlights when you would want to choose each, with examples of how we use each to solve our big data challenges and why we made those decisions. We profile the some of the choices available to you - MySQL, RDS, Elasticache, Redis, Cassandra, MongoDB and DynamoDB – and three customer case studies on RDS, Elasticache and DynamoDB.
2. AWS Database Options and Decision Factors
Best Practice Tips and Techniques
• Optimizing for Manageability and Scale Edmodo
• Optimizing for App Velocity and Scale Obama for America
• Leveraging YesSQL and NoSQL BrandVerity
Q&A
4. Easily and rapidly analyze
petabytes of data
1/10 the cost of traditional
data warehouses
Automated deployment &
administration
Compatible with popular
BI tools
5. Common BI Tools
Choose from 16TB local disk / 128 GB
JDBC/ODBC RAM or 2TB local disk / 16GB RAM
nodes
Leader
Node
Configure up to 100 nodes for up to
1.6 Pb
10GigE Mesh
Amazon Redshift Data stored in columnar format for 10X
Compute Compute Compute
I/O efficiencies and fast queries
Node Node Node
Query with standard SQL and
JDBC/ODBC
16. Should I use
Should I use SQL MySQL on EC2 or
or NoSQL? RDS?
Should I use
MongoDB,
? Should I use Redis,
Cassandra, or Memcache, or
DynamoDB? ElastiCache?
17. What are my
What are my scale transactional and
and latency needs? consistency needs?
What are my
? What are my time to
read/write, storage market and server
and IOPS needs? control needs?
18. Factors SQL NoSQL
Application • App with complex business logic? • Web app with lots of users?
Transactions • Complex txns, joins, updates? • Simple data model, updates, queries?
Scale • Developer managed • Automatic, on-demand scaling
Performance • Developer architected • Consistent, high performance at scale
Availability • Architected for fail-over • Seamless and transparent
Core Skills • SQL + Java/Ruby/Python/PhP • NoSQL + Java/Ruby/Python/PhP
Best of both worlds: Possible to Use SQL and NoSQL models in one App
19. Factors Do it Yourself (DIY) Fully Managed
Replication • Granular, app managed • Transparent and configured
Monitoring • Specific agents and custom • Automated and API driven
Security • Root access, custom configs • Hardened by the service
Resources • Requires more DBA resources and time • Requires less DBA resources and time
Time to market • Sophistication vs. speed • Rapid iteration
Core Skills • Systems, databases, monitoring • Applications, User focused
Best of both worlds: Possible to manage different tiers differently
20. Amazon RDS is a fully managed SQL database service.
Choice of Database engines
Simple to deploy and scale
Reliable and cost effective
Without any operational burden.
21. Migration
Backup and recovery
Schema design Patching
Query construction Configuration
Query optimization Software upgrades
Storage upgrades
Frequent server upgrades
Focus on the “innovation”
Hardware crash
Off load the “administration”
22. Multiple databases per instance
Standard user accounts
Connect and query using common MySQL tools & drivers
Tune engine parameters
Import and export data using standard MySQL tools (mysqldump)
Diagnostics
Native MySQL replication
SSL for encryption over the wire
Monitor metrics
Shell, super user or direct file system access (Think security!)
23. ElastiCache is a fully managed Memcache
caching service.
Easy to set up and operate
Scale cache clusters with push button ease
Ultra fast response time for read scaling
Without any operational burden.
24. Amazon DynamoDB is a fully managed NoSQL
database service.
Store and retrieve any amount of data
Scale throughput to millions of IO
Single digit millisecond latencies
Without any operational burden.
25. CreateTable PutItem
UpdateTable
GetItem
DeleteTable
UpdateItem
“Select”, “insert”, “update”
DescribeTable items
Manage tables
DeleteItem
ListTables
BatchGetItem
Query
Bulk select or update
Query specific items OR Scan BatchWriteItem (max 1MB)
scan the full table
26. So, what are the tips and techniques for
successful deployments?
27. Educates millions of students Amazon
EC2
Amazon
DynamoDB
Amazon
Reaches millions of citizens Elasticache
Amazon
RDS
Amazon
Analyzes billions of Ads S3
31. Learning 101
• Largest, fastest growing social platform for education
• Secure learning network for teachers and students
• Browser, iOS, Android
• Free for teachers and students
32. Stats 101
• 100,000 schools
• 14 million users
• 7 million new users in the last year
• 1 million visits daily
33. Web
Instance
Auto scaling Group
Amazon CloudWatch
Amazon Route 53 Elastic Load
Balancer
Cache Cache
Instance Instance
Amazon Cloudfront Instances
Amazon S3
RDS DB Instance RDS DB Instance RDS DB Instance
Read Replica Read Replica Read Replica
Availability Zone
RDS DB Instance RDS DB Instance RDS DB Instance
Read Replica MySQL DB Instance Read Replica MySQL DB Instance Read Replica MySQL DB Instance
34. DBA 101
• Restore from snapshot
• Replica creation
• Parameter tuning
• Metrics collection
• Know your app/data
35. Educates millions of students
Jay
Reaches millions of citizens
Edwards
Analyzes billions of Ads
37. Me.
• Twitter: First dedicated DBA
• OFA: Lead Database Engineer
• PalominoDB: CTO & VP/Operations
38. Obama for America.
• Technically sophisticated for a campaign
• Not “web-scale”
• Hockey-stick++ growth
• Downtime hurts. A lot…really, really, really a lot.
41. Problems!
• You always need more databases
• OFA had 24+ schemas & 100+ RDS instances
• You never have enough DBAs
• OFA had 1 – 2 x 0.5 fulltime MySQL DBAs
42. Why RDS?
• Makes operational issues very easy
• Need more replicas? BAM!
• Upsize hardware? KAPOW!
• Point in time restore? BIF!
43. Why not RDS?
• Hardware cap (vertical v. horizontal)
• Sophisticated use-cases
• Frequent topology changes
• Multi-region replication (on their roadmap)
• DBAs need busy work
44. Educates millions of students
Reaches millions of citizens
Andy
Analyzes billions of Ads
Skalet
53. • Managed services let you focus on creating value
• Amazon S3 - Very robust, handles large items, but you filter
• Amazon DynamoDB - Extremely fast, scalable, good value
• Must cast your problem as kvs or key + range
• Amazon RDS - MySQL, without the headaches
• Amazon ElastiCache - As memcached, fast kvs for small data
• Multi column queries on big data?
• Looking forward to the AWS solution
54. Thank you
Free aws.amazon.com/dynamodb
Tier
aws.amazon.com/rds
aws.amazon.com/elasticache
raghavas@amazon
55. We are sincerely eager to
hear your feedback on this
presentation and on re:Invent.
Please fill out an evaluation
form when you have a
chance.