2. AWS databases and analytics
Broad and deep portfolio, purpose-built for builders
Data Lake
S3/Glacier
Glue
(ETL & Data Catalog)
Data Movement
Database Migration Service | Snowball | Snowmobile | Kinesis Data Firehose | Kinesis Data Streams
Non-Relational Databases
DynamoDB
ElastiCache
(Redis, Memcached)
Neptune
(Graph)
Analytics
DW | Big Data Processing | Interactive
Redshift EMR Athena
Kinesis
Analytics
Elasticsearch
Service
Real-time
Relational Databases
RDS
Aurora
Business Intelligence & Machine Learning
QuickSight SageMaker Comprehend
7. Relational data
• Divide data among tables
• Highly structured
• Relationships established via
keys enforced by the system
• Data accuracy and consistency
Patient
* Patient ID
First Name
Last Name
Gender
DOB
* Doctor ID
Visit
* Visit ID
* Patient ID
* Hospital ID
Date
* Treatment ID
Medical Treatment
* Treatment ID
Procedure
How Performed
Adverse Outcome
Contraindication
Doctor
* Doctor ID
First Name
Last Name
Medical Specialty
* Hospital Affiliation
Hospital
* Hospital ID
Name
Address
Rating
8. Relational data
// Doctors affiliated with Mercy hospital
Patient
* Patient ID
First Name
Last Name
Gender
DOB
* Doctor ID
Visit
* Visit ID
* Patient ID
* Hospital ID
Date
* Treatment ID
Medical Treatment
* Treatment ID
Procedure
How Performed
Adverse Outcome
Contraindication
Doctor
* Doctor ID
First Name
Last Name
Medical Specialty
* Hospital Affiliation
Hospital
* Hospital ID
Name
Address
Rating
SELECT
d.first_name, d.last_name
FROM
doctor as d,
hospital as h
WHERE
d.hospital = h.hospital_id
AND h.name = ‘Mercy';
// Number of patient visits each doctor
completed last week
SELECT
d.first_name, d.last_name, count(*)
FROM
visit as v,
hospital as h,
doctor as d
WHERE
v.hospital_id = h.hospital_id
AND h.hospital_id = d.hospital
AND v.t_date > date_trunc('week’,
CURRENT_TIMESTAMP - interval '1 week')
GROUP BY
d.first_name, d.last_name;
9. Managed services transform operations
Power, HVAC, net
Rack & stack
Server maintenance
OS patches
DB software patches
Database backups
High Availability
DB software installs
OS installation
Scaling
Operating
Databases
in AWS
App optimization
you
Power, HVAC, net
Rack & stack
Server maintenance
OS patches
DB software patches
Database backups
Scaling
High Availability
DB software installs
OS installation
you
App optimization
Operating
Databases
in the Old World
10. Moving to open source
database engines
+
Commercial-grade performance and reliability?
11. Scale compute
and storage with a
few clicks; minimal
downtime for your
application
Automatic Multi-AZ
data replication;
automated backup,
snapshots, and
failover
Data encryption at
rest and in transit;
industry compliance
and assurance
programs
Running many databases with Amazon RDS
Managed Relational Database Service with choice
Managed & Automated
Deploy and maintain
hardware, OS, and DB
software; built-in
monitoring
Performant & scalable Available & durable Secure & compliant
12. Key Amazon RDS Features
Managed Relational Database Service with choice
Amazon RDS
Configuration
Improve
Availability
Increase
Throughput
Reduce
Latency
Push-Button Scaling
Multi AZ
Read Replicas
Provisioned IOPS
Read ReplicasPush-Button Scaling
Provisioned IOPS
Region
Multi-AZ
availability
zone
availability
zone
13. Fully compatible with PostgreSQL
and MySQL,
with 3x – 5x the throughput
Storage volume striped across
hundreds of storage nodes
distributed over 3 different
availability zones
Six copies of data on SSD, two
copies in each availability zone, to
protect against AZ+1 failures
Continuous backup to Amazon S3
(built for 99.999999999%
durability)
Master Replica Replica Replica
Availability
Zone 1
Availability
Zone 2
Availability
Zone 3
Large relational databases with Amazon Aurora
Scale-out, distributed, multi-tenant architecture
14. Large relational databases with Amazon Aurora
Scale-out, distributed, multi-tenant architecture
• Your data is replicated 6 ways
across 3 AZs
• Storage grows up to
64 TB* seamlessly
• Up to 15 Aurora Replicas with
instant crash recovery
AZ 1 AZ 2 AZ 3
Virtualized, cross-AZ storage layer
Aurora
Aurora Aurora Aurora Aurora
Size for the peak load
-or-
Continuously monitor and
manually scale up/down
15. Aurora Serverless . . .
Responds to your application load
automatically
• Scale capacity with no downtime
• Multi-tenant proxy is highly
available
• Scale target has warm buffer
pool
• Shuts down when not in use
16. Aurora is used by ¾ of the top 100 AWS customers
Amazon Aurora customer adoption
Fastest growing service in AWS history
17. AWS Database Migration Service
Migrating
Databases to AWS
150,000+
Databases migrated
Migrate between on-premises and AWS
Migrate between databases
Data replication for zero-downtime migration
Automated schema conversion
18. Two fundamental areas of focus
“Lift and shift” existing
apps to the cloud
Quickly build new
apps in the cloud
19. A one size fits all database doesn’t fit anyone
Modern Applications Need Purpose-Built Databases
Users: 1M+
Data volume: TB–PB–EB
Locality: Global
Performance: Milliseconds–microseconds
Request Rate: Millions
Access: Mobile, IoT, devices
Scale: Up-out-in
Economics: Pay as you go
Developer Access: Instant API access
Relational Key-value Document
In-memory Graph Search
20. AWS purpose-built strategy
The right tool for the right job
Relational
Non-Relational
Aurora RDS
ElastiCacheDynamoDB
Key-value Document
Neptune
Graph
21. Data models and common use cases
Relational Key-value Document In-memory Graph Search
Referential
integrity, ACID
transactions,
schema-on-write
Low-latency,
key look-ups with
high throughput
and fast ingestion
of data
Indexing and
storing
documents with
support
for query on
any attribute
Microseconds
latency,
key-based
queries, and
specialized
data structures
Creating and
navigating
relations
between data
easily
and quickly
Indexing and
searching
semistructured
logs and data
ERP, medical records,
CRM, finance
Real-time bidding,
shopping cart, IoT device
tracking
Content management,
personalization, mobile
Leaderboards, real-time
analytics, caching
Fraud detection, social
networking,
recommendation engine
Product catalog,
help/FAQs, full-text
Amazon Aurora
Amazon RDS
Amazon Redshift
Amazon
DynamoDB
Amazon
DynamoDB
Amazon
DocumentDB
Amazon
ElastiCache for
Redis &
Memcached
Amazon Neptune Amazon
Elasticsearch
Service
22. NoSQL vs. SQL for a new app: how to choose?
• Want simplest possible DB
management?
• Want app to manage DB integrity?
• Need joins, transactions, frequent
table scans?
• Want DB engine to manage DB
integrity?
• Team has SQL skills?
Amazon DynamoDB Amazon RDS
24. Key-value data
• Simple key value
pairs
• Partitioned by
keys
• Resilient to failure
• High throughput,
low-latency reads
and writes
• Consistent
performance at
scale
Gamers
Primary Key Attributes
GamerTag Level Points High Score Plays
Hammer57 21 4050 483610 1722
FluffyDuffy 5 1123 10863 43
Lol777313 14 3075 380500 1307
Jam22Jam 20 3986 478658 1694
ButterZZ_55 7 1530 12547 66
… … … … …
PUT {
TableName:"Gamers",
Item: {
"GamerTag":"Hammer57",
"Level":21,
"Points":4050,
"Score":483610,
"Plays":1722
} }
GET {
TableName:"Gamers",
Key: {
"GamerTag":"Hammer57“,
“ProjectionExpression“:”Points”
} }
25. Gamers
Primary Key
Attributes
Gamer Tag Type
Hammer57
Rank
Level Points Tier
87 4050 Elite
Status
Health Progress
90 30
Weapon
Class Damage Range
Taser 87% 50
FluffyDuffy
Rank
Level Points Tier
5 1072 Trainee
Status
Health Progress
37 8
Key-value data
// Status of Hammer57
GET {
TableName:"Gamers",
Key: {
"GamerTag":"Hammer57",
"Type":"Status” } }
// Return all Hammer57
QUERY {
TableName:“Gamers”,
KeyConditionExpression:"GamerTag = :a",
ExpressionAttributeValues: {
":a”:”Hammer57” } }
26. Amazon DynamoDB
Fully-managed nonrelational database for any scale
Secure
Encryption at rest and transit
Fine-grained access control
PCI, HIPAA, FIPS140-2 eligible
High performance
Fast, consistent performance
Virtually unlimited throughput
Virtually unlimited storage
Fully managed
Maintenance-free
Serverless
Auto scaling
Backup and restore
Global tables Global Tables
High-performance, globally
distributed applications
Multi-region redundancy
and resiliency
Easy to set up and no application
rewrites required
27. Amazon DocumentDB
Fast, scalable, highly available, fully managed MongoDB-compatible database service
Secure and compliant
Simple
and fully managed
Same code, drivers, and tools
you use with MongoDB
Millions of requests per
second, millisecond latency
2x throughput of
managed MongoDB services
Deeply integrated with
AWS services
28. Managed services for open source software
Redis, Memcached, Elasticsearch, Apache Hadoop, etc.
Fully managed
AWS manages all hardware
and software setup,
configuration, monitoring
Extreme performance
In-memory data store and cache
for sub-millisecond response times
Easily scalable
Non-disruptive scaling
up and down to
meet changing
demands
Amazon ElastiCache
Open and Secure
Direct access to open-source APIs
Secure access with VPC
Amazon Elasticsearch Service
Apache Hadoop Ecosystem
19 open-source frameworks
Low costs with S3 storage and Spot
Amazon EMR
29. Graph data
• Relationships are first-class objects
• Vertices connected by Edges
Vertex
PURCHASED PURCHASED
FOLLOWS
PURCHASED
KNOWS
PRODUCT
SPORT
FOLLOWS
Edge
30. Amit
Graph use case
Do you know…
Customers who also follow
sports purchased…
gremlin> g.V().has(‘name’,’sara’).as(‘customer’).out(‘follows’).in(‘follows’).out(‘purchased’)
where(neq(‘customer’)).dedup().by(‘name’).properties('name')
KNOW
S
PURCHASED PURCHASED
FOLLOWS
PURCHASED
KNOWS
PRODUCT
SPORT
FOLLOWS
Bill
FOLLOWS
Sara
// Identify a friend in common and
make a recommendation
gremlin> g.V().has('name','mary').as(‘start’).
both('knows').both('knows’).
where(neq(‘start’)).
dedup().by('name').properties('name')
Kevin
Mary
31. Highly connected data best represented in
a graph
Relational model
Foreign keys used to represent relationships
Queries can involve nesting & complex joins
Performance can degrade as datasets grow
Graph model
Relationships are first-order citizens
Write queries that navigate the graph
Results returned quickly, even on large datasets
32. Amazon Neptune
Fully managed graph database
Fast & Scalable ReliableFlexible
Store billions of relationships;
query with millisecond
latency
Six replicas of your
data across three AZs
with full backup and
restore
Build powerful
queries with
Gremlin and SPARQL
Supports Apache
TinkerPop & W3C
RDF graph models
Gremlin
SPARQL
Open Standards