With AWS you can choose the right database for the right job. Given the myriad of choices, from relational databases to non-relational stores, this session will profile details and examples of some of the choices available to you (MySQL, RDS, Elasticache, Redis, Cassandra, MongoDB and DynamoDB), with details on real world deployments from customers using Amazon RDS, ElastiCache and DynamoDB.
Unleash Your Potential - Namagunga Girls Coding Club
Aws Summit Berlin 2013 - Understanding database options on AWS
1. Jan Borch - AWS Solutions Architect
Understanding Database Options on AWS
Jan Borch
#awssummit
Berlin
2. We want to make it easy for you to start
1. Zero to Application in ____ Minutes
2. Zero to Millions of users in ____ Days
3. Zero to “Profits!” ASAP
7. RDS
- MySQL
- Oracle
- SQL Server
MySQL
Oracle
SQL Server
PostgreSQL
Your favorite RDBMS
Spectrum of options on AWS
SQL NoSQL
Do-it-yourself Fully Managed
8. Spectrum of options on AWS
SQL NoSQL
Do-it-yourself Fully Managed
MongoDB
Cassandra
Redis
Memcached
…
Amazon DynamoDB
Amazon ElastiCache
9. Thinking about the questions
Should I use SQL or
NoSQL?
Should I use MySQL
on EC2 or RDS?
Should I use
MongoDB, Cassandra
, or DynamoDB?
Should I use
Redis, Memcached, or
ElastiCache?
?
10. Actually, thinking about the right questions
What are my scale
and latency needs?
What are my
transactional and
consistency needs?
What are my
read/write, storage
and IOPS needs?
What are my time to
market and server
control needs?
?
19. backup & recovery,
data load & unload
performance
tuning
25%40%
5% 5%
scripting & coding
security
planning
install,
upgrade, patch
and migrate
documentation,
licensing &
training
differentiated effort
increases the
uniqueness
of an application
Why Managed Databases?
20. We believe in choice
One size does not fit all
Traditional Apps
Relational DB Needs
High
Performance, High
Scale Data
Warehouses
New Web Apps
Massive Scalability
Amazon RDS
Amazon
ElasticCache
Amazon
DynamoeDB
Amazon
Redshift
22. Amazon Relational Database Services
AmazonRDS
RDS is a fully managed relational database service
that is simple to deploy, easy to scale, reliable and
cost-effective
44. Reserve IOPS for reads and writes.
Scale up for down at any time.
Provisioned throughput.
45. Pay per capacity unit
READ
Capacity Units =
Size of item (KB) x read per second
Consistent read:
$0.0065 for 50 read units
Eventually consistent reads:
$0.0065 for 100 read units
WRITE
Capacity Units =
Size of item (KB) x write per second
$0.0065 for 10 write units
50. id = 100
date = 2012-05-16-09-
00-10 total = 25.00
id = 101
date = 2012-05-15-15-
00-11 total = 35.00
id = 101
date = 2012-05-16-12-
00-10 total = 100.00
id = 102
date = 2012-03-20-18-
23-10 total = 20.00
id = 102
date = 2012-03-20-18-
23-10 total = 120.00
Data modeling
Table
51. id = 100
date = 2012-05-16-09-
00-10 total = 25.00
id = 101
date = 2012-05-15-15-
00-11 total = 35.00
id = 101
date = 2012-05-16-12-
00-10 total = 100.00
id = 102
date = 2012-03-20-18-
23-10 total = 20.00
id = 102
date = 2012-03-20-18-
23-10 total = 120.00
Data modeling
Item
52. id = 100
date = 2012-05-16-09-
00-10 total = 25.00
id = 101
date = 2012-05-15-15-
00-11 total = 35.00
id = 101
date = 2012-05-16-12-
00-10 total = 100.00
id = 102
date = 2012-03-20-18-
23-10 total = 20.00
id = 102
date = 2012-03-20-18-
23-10 total = 120.00
Data modeling
Attributes
53. Items are indexed by primary and secondary keys
Primary keys can be composite
Secondary keys are local to the table
Indexing
58. Programming DynamoDB.
Small but perfectly formed API.
CreateTable
UpdateTable
DeleteTable
DescribeTable
ListTables
PutItem
GetItem
UpdateItem
DeleteItem
BatchGetItem
BatchWriteItem
Query
Scan
Manage tables
Query specific
items OR scan the
full table
“Select”, “insert”,
“update” items
Bulk select or
update (max 1MB)
59. Query patterns
Retrieve all items by hash key.
Range key conditions:
==, <, >, >=, <=, begins with, between.
Counts. Top and bottom n values.
Paged responses.
63. OLTP <-> OLAP
SELECT ProductID, Name
FROM Products
Where ProductID = 1234;
SELECT ProductID, count(*)
FROM Page_Hits
WHERE hour in (12,13)
GROUP BY ProductID
64. Transactional Processing
• Global context
– Daily revenue report
• Throughput
• Full table scans
• Sequential IO
• Disk Transfer rates
Analytical Processing
• Transactional context
– Get order total
• Latency
• Indexed access
• Random IO
• Disk Seek times
OLTP <-> OLAP
65. Amazon Redshift is a fast, fully managed, petabyte-scale
data warehouse service
Amazon Redshift
66. Fast and powerful
Parallelize and Distribute Everything
Dramatically Reduce I/O
Direct-attached storage
Large data block sizes
Column data store
Data compression
Zone maps
MPP
Load
Query
Resize
Backup
Restore
67. Fully Managed
Protect Operations
Simplify Provisioning
Redshift data is always encrypted
Continuously backed up to S3
Automatic node recovery
Transparent disk failure
Create a cluster in minutes
Automatic OS and software patching
Scale up to 1.6PB with a few clicks and no downtime