AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the right DB

Cobus Bernard
Sr Developer Advocate
Amazon Web Services
GettingStarted with Databases onAWS:
Choosing the right DB
@cobusbernard
cobusbernard
cobusbernard
CobusCloud

© 2020, Amazon Web Services, Inc. or its Affiliates.© 2020, Amazon Web Services, Inc. or its Affiliates.
Agenda
History of Databases
Why so many?
Relational Data
NoSQL
Q & A

© 2020, Amazon Web Services, Inc. or its Affiliates.
1970 1980 1990 2000
Oracle DB2
SQL Server
MySQL
PostgreSQL
DynamoDB
Redis
MongoDB
Elasticsearch
Neptune
CassandraAccess
Aurora
2010
Timestream
QLDB
Amazon
DocumentDB

Modern apps create new requirements
Users: 1 million+
Data volume: TB–PB–EB
Locality: Global
Performance: Milliseconds–microseconds
Request rate: Millions
Access: Web, mobile, IoT, devices
Scale: Up-down, Out-in
Economics: Pay for what you use
Developer access: No assembly requiredSocial mediaRide hailing Media streaming Dating

User search history: Amazon DynamoDB
• Massive data volume
• Need quick lookups for personalized search
Session state: Amazon ElastiCache
• In-memory store for sub-millisecond fetch
Relational data: Amazon RDS
• Referential integrity
• Primary transactional database

• DynamoDB: 31B items
tracking language exercises
• Aurora: Primary transactional
db (user data)
• ElastiCache: Instant access to
common words and phrases
300M total users
7B exercises per month

Data categories and common use cases
Relational Key value Document In-memory Graph Search Time series Ledger
Referential
integrity,ACID
transactions,
schema-
on-write
Low-latency,
key lookups
with high
throughput and
fast ingestion
of data
Indexing and
storing
documents
with support
for query on
any attribute
Microseconds
latency, key-
based queries,
and specialized
data structures
Creating and
navigating
data relations
easily and quickly
Lift and shift,
EMR, CRM,
finance
Real-time
bidding,
shopping cart,
social
Content
management,
personalization,
mobile
Leaderboards,
real-time
analytics,
caching
Fraud detection,
social networks,
recommendation
engines
Indexing and
searching
semistructured
logs and data
Product
catalog, help,
and FAQs,
full text
Collect, store,
and process data
sequenced by
time
IoT
applications,
event tracking
Complete,
immutable, and
verifiable history
of all changes to
application data
Systems
of record,
supply chain,
healthcare,
registrations,
financial

AWS: Purpose-built databases
Relational Key value Document In-memory Graph Search
Amazon
DynamoDB
Amazon
Neptune
Amazon RDS
Aurora CommercialCommunity
Amazon
ElastiCache
Amazon
Elasticsearch
Service
Amazon
DocumentDB
Time series Ledger
Amazon
Timestream
Amazon
Quantum
Ledger
Database
MemcachedRedis

Relational data

Traditional SQL
• TCP based wire protocol
• Well Known, lots of uses
• Common drivers (JDBC)
• Frequently used with ORMs
• Scale UP individual instances
• Scale OUT with read replicas
• Sharding at application level
• Lots of flavors but very similar language
INSERT INTO users
(id, first_name, last_name)
VALUES (1, ‘Cobus’, ‘Bernard’);
SELECT col1, col2, col3
FROM table1
WHERE col4 = 1 AND col5 = 2
GROUP BY col1
HAVING count(*) > 1
ORDER BY col2

Relational model
Data model
• Data is stored in rows and tables
• Data is normalized
• Strict schema
• Relationships established via
keys enforced by the system
• Data accuracy and consistency
• Complex queries to extract and
reshape data on-demand
Patient
* Patient ID
First Name
Last Name
Gender
DOB
* Doctor ID
Visit
* Visit ID
* Patient ID
* Hospital ID
Date
* Treatment ID
MedicalTreatment
* Treatment ID
Procedure
How Performed
Adverse Outcome
Contraindication
Doctor
* Doctor ID
First Name
Last Name
Medical Specialty
* Hospital Affiliation
Hospital
* Hospital ID
Name
Address
Rating

Patient
* Patient ID
First Name
Last Name
Gender
DOB
* Doctor ID
Visit
* Visit ID
* Patient ID
* Hospital ID
Date
* Treatment ID
MedicalTreatment
* Treatment ID
Procedure
How Performed
Adverse Outcome
Contraindication
Doctor
* Doctor ID
First Name
Last Name
Medical Specialty
* Hospital Affiliation
Hospital
* Hospital ID
Name
Address
Rating
Query model: SQL
SELECT
d.first_name, d.last_name, count(*)
FROM
visit as v,
hospital as h,
doctor as d
WHERE
v.hospital_id = h.hospital_id
AND h.hospital_id = d.hospital
AND v.t_date > date_trunc('week’,
CURRENT_TIMESTAMP - interval '1 week')
GROUP BY
d.first_name, d.last_name;
Relational model

Amazon Relational Database Service (RDS)
Managed relational database service with a choice of six popular database engines
Easy to administer Available and durable Highly scalable Fast and secure
No need for infrastructure
provisioning, installing, and
maintaining DB software
Automatic Multi-AZ data replication;
automated backup, snapshots,
failover
Scale database compute
and storage with a few
clicks with no app
downtime
SSD storage and guaranteed
provisioned I/O; data
encryption at rest and in transit

Amazon Aurora ascendant: How we designed a cloud-native
relational database
allthingsdistributed.com

Amazon Aurora
MySQL and PostgreSQL-compatible relational database built for the cloud
Performance and availability of commercial-grade databases at 1/10th the cost
Performance
and scalability
Availability
and durability
Highly secure Fully managed
5x throughput of standard MySQL
and 3x of standard PostgreSQL;
scale-out up to
15 read replicas
Fault-tolerant, self-healing storage;
six copies of data
across three Availability Zones;
continuous backup to Amazon S3
Network isolation,
encryption at rest/transit
Managed by RDS:
No hardware provisioning, software
patching, setup, configuration, or
backups

NoSQL vs SQL

SQL vs NoSQL
Optimized for storage Optimized for compute
Normalized/relational Denormalized/hierarchical
Ad hoc queries Instantiated views
Scale vertically Scale horizontally
Good for OLAP Built for OLTP at scale
SQL NoSQL

Key-Value DBs

Key-value data
• Simple key-value pairs
• Partitioned by keys
• Resilient to failure
• High throughput, low-
latency reads and writes
• Consistent performance at
scale
Table 1
…
…
Partitions
…
Highly partitionable data

Gamers
Primary Key
Attributes
GamerTag Type
Hammer57
Rank
Level Points Tier
87 4050 Elite
Status
Health Progress
90 30
Weapon
Class Damage Range
Taser 87% 50
FluffyDuffy
Rank
Level Points Tier
5 1072 Trainee
Status
Health Progress
37 8
// Status of Hammer57
GET {
TableName:"Gamers",
Key: {
"GamerTag":"Hammer57",
"Type":"Status” } }
// Return all Hammer57
SCAN {
TableName:“Gamers”,
KeyConditionExpression:"GamerTag = :a",
ExpressionAttributeValues: {
":a”:”Hammer57” } }
Key-value data

Amazon DynamoDB
Fast and flexible key value database service for any scale
Comprehensive
security
Encrypts all data by default
and fully integrates with AWS
Identity and Access
Management for robust
security
Performance at scale
Consistent, single-digit millisecond
response times at any scale; build
applications with virtually unlimited
throughput
Global database for global
users and apps
Build global applications with fast
access to local data by easily
replicating tables across multiple
AWS Regions
Serverless
No server provisioning, software
patching, or upgrades; scales up
or down automatically;
continuously backs up your data

Document DBs

Document databases
• Data is stored in JSON-like
documents
• Documents map naturally to
how humans model data
• Flexible schema and indexing
• Expressive query language built
for documents (ad hoc queries
and aggregations)
JSON documents are
first-class objects
of the database
{
id: 1,
name: "sue",
age: 26,
email: "sue@example.com",
promotions: ["new user", "5%", "dog lover"],
memberDate: 2018-2-22,
shoppingCart: [
{product:"abc", quantity:2, cost:19.99},
{product:"edf", quantity:3, cost: 2.99}
]
}

!===
Evolution of document databases
JSON became the
de facto data
interchange format
Friction when
converting JSON
to the relational
model
Object-relational
mappings (ORMs)
were created to help
with this friction
Document
databases solved
the problem
(Client)
(App) (Database)
JSON RelationalJSON

Use cases for document data
User profiles
{
id: 181276,
username: "beardy1980",
name: {first: "Cobus",
last: "Bernard"}
}
{
id: 181276,
last: "Bernard"}
}
{
id: 181276,
last: "Bernard"},
ExploidingSnails: {
hi_score: 3185400,
global_rank: 5139,
bonus_levels: true
},
promotions: ["new user","5%","snail lover"]
}
{
id: 181276,
last: "Bernard"},
ExploidingSnails: {
hi_score: 3185400,
global_rank: 5139,
bonus_levels: true
}
}

Use cases for document data
Mobile
Retail and
marketing
User profilesCatalog
Content
management
Personalization

Amazon DocumentDB
Fast, scalable, highly available, fully managed MongoDB-compatible database service
Fully Managed
Managed by AWS:
No hardware provisioning,
software patching, setup,
configuration, or backups
Fast
Millions of requests per second,
millisecond latency
MongoDB-compatible
Compatible with MongoDB
Community Edition 3.6. Use the
same drivers and tools
Reliable
Six replicas of your data across
three AZs with full backup and
restore

In-Memory stores

In-memory
• No persistence, in-memory
• Microsecond performance
• Simple commands for
manipulating in memory data
structures
• Strings, hashes, lists, sets,
and sorted sets
Database
Memory
(buffer pool)
Disk
Query processor Get/PutAPIs
Memory
Milliseconds to microseconds (10x faster)
Storage engine

In-memory ops
set a "hello" // Set key "a" with a string value and no expiration OK
get a // Get value for key "a"
"hello"
get b // Get value for key "b" results in miss
(nil)
set b "Good-bye" EX 5 // Set key "b" with a string value and a 5 second expiration
"Good-bye"
get b // Get value for key "b" "Good-bye"
// wait >= 5 seconds
get b (nil) // key has expired, nothing returned

Amazon ElastiCache
Redis and Memcached compatible, in-memory data store and cache
Secure and reliable
Network isolation, encryption
at rest/transit, HIPAA, PCI,
FedRAMP, multiAZ, and
automatic failover
Redis & Memcached
compatible
Fully compatible with open source
Redis and Memcached
Easily scalable
Scale writes and reads with sharding
and replicas
Extreme performance
In-memory data store and cache
for microsecond response times

Searching data

Search: Full text search

Search
_search?q=house
{
"hits": {
"total": 85,
"max_score": 6.6137657,
"hits": [{
"_index": "movies",
"_type": "movie",
"_id": "tt0077975",
"_score": 6.6137657,
"_source": {
"directors": [ "John Landis" ],
"release_date": "2020-08-11T00:00:00Z",
"rating": 7.5,
"genres": [ "Comedy", "Romance" ],
"image_url": "http://ia.jpg",
"plot": "In a webinar in 2020, the beard gets cut!",
"title": "Animal House",
"rank": 527,
"running_time_secs": 6540,
"actors": [ "John Belushi","Karen Allen","Tom Hulce" ],
"year": 1978, "id": "tt0077975"
}
}]
}
},

Amazon Elasticsearch Service
Fully managed, reliable, and scalable Elasticsearch service
Easy to use Scalable Highly available Secure
Deploy a production-ready
Elasticsearch
cluster in minutes
Resize your cluster
with a few clicks
or a single API call
Replicate across
AZs, with monitoring and
automated self-healing
Deploy intoVPC
and restrict access
using security groups
and IAM policies

Graph data

Graph
Use cases
• Social Networking
• Recommendation Engines
• Fraud Detection
• Knowledge Graphs
• Life Sciences
• Network / IT Operations
PURCHASED PURCHASED
FOLLOWS
PURCHASED
KNOWS
PRODUCT
SPORT
FOLLOWS

Graph use case
// Product recommendation to a user
gremlin> V().has(‘name’,’cobus’).as(‘customer’).out(‘follows’).in(‘follows’).out(‘purchased’)
( (‘customer’)).dedup() (‘name’) ('name')
PURCHASED PURCHASED
FOLLOWS
PURCHASED
KNOWS
PRODUCT
SPORT
FOLLOWS
FOLLOWS
// Identify a friend in common and make
a recommendation
gremlin> g.V().has('name','mary').as(‘start’).
both('knows').both('knows’).
where(neq(‘start’)).
dedup().by('name').properties('name')

Amazon Neptune
Fully managed graph database
Easy
Build powerful queries easily
with Gremlin and SPARQL
Fast
Query billions of relationships with
millisecond latency
Open
SupportsApacheTinkerPop &W3C
RDF graph models
Reliable
Six replicas of your data across
three AZs with full backup and
restore

Time series

Time series data
What is time series data?
What is special about a time
series database? A sequence of data points
recorded over a time interval
Time is the
single primary axis
of the data model
t

Time series use cases
Application events
IoT sensor readings
DevOps data
Humidity
%Water vapor
91.094.086.093.0

Existing time-series databasesRelational databases
Difficult to
maintain high
availability
Difficult to scale Limited data
lifecycle
management
Inefficient
time-series data
processing
Unnatural for
time-series data
Rigid schema
inflexible for fast
moving time-series
data
Building with time-series data is challenging

AmazonTimestream
Fast, scalable, fully managed time-series database
1,000x faster and 1/10th the cost
of relational databases
Collect data at the rate of
millions of inserts per second
(10M/second)
Trillions of
daily events
Adaptive query processing
engine maintains steady,
predictable performance
Time-series analytics
Built-in functions for
interpolation, smoothing, and
approximation
Serverless
Automated setup, configuration,
server provisioning, software
patching

Ledgers & journals

Common customer use cases
Ledgers with centralized control
Healthcare
Verify and track hospital
equipment inventory
Manufacturers
Track distribution of a
recalled product
HR & Payroll
Track changes to an
individual’s profile
Government
Track vehicle title
history

Challenges with building ledgers
Adds unnecessary
complexity
BlockchainRDBMS - audit tables
Difficult to
maintain
Hard to use and
slow
Hard to build
Custom audit functionality using
triggers or stored procedures
Impossible to verify
No way to verify changes made to
data by sys admins

Amazon Quantum Ledger Database (QLDB)
Fully managed ledger database
Track and verify history of all changes made to your application’s data
Immutable
Maintains a sequenced record of
all changes to your data, which
cannot be deleted or modified;
you have the ability to query and
analyze the full history
Cryptographically
verifiable
Uses cryptography to
generate a secure output
file of your data’s history
Easy to use
Easy to use, letting you
use familiar database
capabilities like SQL APIs for
querying the data
Highly scalable
Executes 2–3X as many
transactions than ledgers
in common blockchain
frameworks

Purpose-built

twitch.tv/aws – Mo/Fr @ 11am SAST Bean Streaming
twitch.tv/aws – Thu @ 12:00 SAST AWS Africa Office Hours
youtube.com/c/CobusCloud
youtube.com/c/Ruptwelve
bit.ly/notC_notD (Watch the recorded weekly sessions)

AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the right DB

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the right DB

Similar to AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the right DB (20)

More from Cobus Bernard

More from Cobus Bernard (20)

Recently uploaded

Recently uploaded (20)

AWS SSA Webinar 32 - Getting Started with databases on AWS: Choosing the right DB

Editor's Notes