AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB

Cobus Bernard
Sr Developer Advocate
Amazon Web Services
GettingStartedwith DatabasesonAWS:
Amazon DynamoDB
@cobusbernard
cobusbernard
cobusbernard
CobusCloud

© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Agenda
NoSQL Basics
Amazon DynamoDB
Tables & Indices
Customer Use Cases
Additional Concepts
Q & A

© 2020, Amazon Web Services, Inc. or its Affiliates.
NoSQL Basics

Data volume since 2010
• 90% of stored data generated in
last few years
• 1 terabyte of data in 2010 equals
40 petabytes in 2019
• Linear correlation between data
pressure and technical innovation
• No reason these trends will not
continue over time

Not Only SQL (NoSQL)
0000 {“Texas”}
0001 {“Illinois”}
0002 {“Oregon”}
TXW
A
I
L
Key
Column
0000-0000-0000-0001
Game Heroes
Version 3.4
CRC ADE4
Key Value GraphDocument Column-family
Amazon’s
Highly Available
Key-value
Store
Jan 2012Fall 2007 Late 2007
Amazon
SimpleDB
Amazon
DynamoDBDynamo
Nov 2012
Amazon
RedShift
Amazon
Neptune
May 2019
Amazon
DocumentDB
Jan 2019

Scaling relational vs. non-relational databases
Traditional SQL NoSQL
DB
DB
Scale up by moving to a
larger instance size
Scale out easily by adding more
partitions

Technical Challenges in Scaling Traditional
Relational Database
• Moving to a larger instance is not trivial
• A larger instance must exist
• Sharding is required after a certain point
• Amazon RDS minimizes these pains, but not entirely

Scaling NoSQL
• Good partitioning scheme affords even distribution of both
data and workload, as they grow
• Key concept: partition key
• Ideal scaling conditions:
1. Partition key is from a high cardinality set (that grows)
2. Requests are evenly spread over the key space
3. Requests are evenly spread over time

Good Partitioning
Partition K
2000 RCUs
Partition M
2000 RCUs
Partition A
Shoppers
A B C D
Partition B Partition C Partition D
Try to avoid hot
partitions!!
Ideally traffic should be distributed evenly across partitions

Data Modeling: SQL vs. NoSQL
Product Database
ID
Type
Price
Description
Products
ID
Author
Title
Category
Books
ID
Artist
Title
Genre
Albums
ID
Title
Category
Producer
Videos
ID
AlbumID
Title
Duration
Tracks
ID
Name
Bio
Actors
ActorID
VideoID
Actors
SQL
{
ProductID,
Type,
Price,
Description,
Author,
Title,
Fiction,
Category,
Date,
...
}
{
ProductID,
Type,
Price,
Description,
Artist,
Title,
Genre,
Tracks: [{
Title1,
Duration1
},
{
Title2,
Duration2
}]
...
}
{
ProductID,
Type,
Price,
Description,
Title,
Category,
Producer,
Director,
Actors: [{
ActorID,
Name,
Age,
ShortBio
}, ... ]
}
NoSQL
NoSQL design optimizes for
compute instead of storage.

Why NoSQL?
• Massive scale – At an affordable price
• Predictable, low latency – Regardless of the scale or load
• Flexible schema – e.g. DynamoDB: Key-value pairs and JSON
documents stored in the same table do not need to be identical in
form
Why not NoSQL?
• If you need object relations or joins
• If you need flexible, ad hoc queries – Use SQL databases

SQL NoSQL
Optimized for storage Optimized for compute
Normalized/relational Denormalized/hierarchical
Ad-hoc queries Instantiated views
Scale vertically Scale horizontally
Good for traditional OLTP Handle unstructured/semi-structured
data at scale
SQL vs NoSQL?

Use Cases
Market orders
Tokenization
(PHI, credit cards)
Chat messages
User profiles
IoT sensor data
& device status
File metadataSocial media feeds
Shopping cart
Sessions

Amazon DynamoDB

Highly available
and durable
Consistently fast at any scaleFully managed
Secure
Integrates with AWS Lambda,
Amazon Redshift, and more
Amazon DynamoDB
Cost-effective

Single tables serving…
Millions of requests per second
Trillions of items
Hundreds of terabytes of storage

Fully Managed Service = Automated Operations
Scaling
High Availability
Database backups
DB software patches
DB software installs
OS patches
Server maintenance
Rack and stack
Power, HVAC, net
OS Installation
App Optimization
DB hosted on premises
you

you
DB hosted on premises DB hosted Amazon EC2
you
Scaling
High Availability
Database backups
DB software patches
OS patches
Server maintenance
Rack and stack
Power, HVAC, net
OS Installation
App Optimization
Server maintenance
Rack and stack
Power, HVAC, net
OS Installation
Scaling
High Availability
Database backups
DB software patches
OS patches
App Optimization

you
DB hosted on premises DynamoDB
you
Scaling
High Availability
Database backups
DB software patches
OS patches
Server maintenance
Rack and stack
Power, HVAC, net
OS Installation
App Optimization
Scaling
High Availability
Database backups
DB software patches
OS patches
Server maintenance
Rack and stack
Power, HVAC, net
OS Installation
App Optimization

Consistently low latency at scale
PREDICTABLE PERFORMANCE!
ConsistentSingle-Digit Millisecond Latency
Requests(millions)
Latency(milliseconds)

Durability

Highly Available And Durable
Designed to support
99.99%
of availability
Built for high durability
WRITES
3-way replication
Persisted to disk
(Custom SSD)
READS
Strongly or eventually consistent
No latency trade-off
Data is always replicated to
three Availability Zones

Availability Zone A
Partition A
Host 4 Host 6
Availability Zone B Availability Zone C
Partition APartition A Partition CPartition C Partition C
Host 5
Partition B
Host 1 Host 3Host 2
Partition B
Host 7 Host 9Host 8
Partition B
CustomerOrdersTable
Data is always
replicated to three
Availability Zones
3-way replication
OrderId: 1
CustomerId: 1
ASIN: [B00X4WHP5E]
Hash(1) = 7B
Highly Available And Durable
Partition A

Amazon DynamoDB – Backup and Restore
Only cloud database to provide on demand and continuous backups
Point in time
restore for short
term retention and
data corruption
protection (35
days)
Point in time recovery
with restore times in a
few hours depending on
table size
On-demand
backups for
long-term data
archival and
compliance

Security

Secure
Fully integrated with AWS Identity and Access Management (IAM)
for authentication and access control.
Provides fine-grained access control at a table, item or attribute
level.
Integrated with AWS CloudTrail to capture changes to DynamoDB
configuration and table setup.
Integrated with AWS CloudWatch to measure metrics around
DynamoDB performance and set alarms to track specific events.

Amazon Virtual Private Cloud (VPC) Endpoints
• Access DynamoDB via secure Amazon VPC
endpoint
• Control access to tables via each VPC endpoint
with unique IAM roles and permissions
Features
Key Benefits
• Turn off access from public internet gateways
enhancing privacy and security
• Fast, secure data transfer between Amazon
VPC and DynamoDB

Encryption at Rest
Server-
side
encryptio
n
Support
compliance
certification
s
No
application
code rewrites

Performance

Consistent, High Performance
Request Volume Latency

• Dynamically adjusts partition capacity based on real-time traffic
• To better handle imbalanced workloads
• Best practices for even load distribution still matter
• Reduces throttling
• As long as there is enough capacity provisioned for the table
• Up to the hard limit of partition write capacity
• Triggered by throttling Now instantaneous!!!
Adaptive Capacity

Adaptive Capacity

Real world impact
Background operations don’t limit burst anymore

DynamoDB Accelerator (DAX)
• Fully managed, highly available: handles all software management,
fault tolerant, replication across multi-AZs within a region
• DynamoDB API compatible: seamlessly caches DynamoDB API
calls, no application re-writes required
• Write-through: DAX handles caching for writes
• Flexible: Configure DAX for one table or many
• Scalable: scales-out to any workload with up to 9 read replicas
• Manageability: fully integrated AWS service: Amazon CloudWatch,
Tagging for DynamoDB, AWS Console
• Security: Amazon VPC, AWS IAM, AWS CloudTrail, AWS
Organizations
Features
DynamoDB
Your Applications
DynamoDB Accelerator
Table #1
Table #2

DynamoDB Accelerator (DAX)
Milliseconds to
microseconds
In-Memory performance and throughput
More than an order of magnitude faster!

Scalability

Scaling NoSQL
Throughput
- Provision capacity on demand, as needed
- WCU: Write capacity per second (up to 1KB)
- RCU: Read capacity per second (up to 4KB)
- Partitions are scaled out horizontally to automatically cover the
throughput requirements of the workload
Size
- Add any number of items to a table (Max item size is 400 KB)

ScalingThroughput: Auto Scaling
$$$ Savings

ScalingThroughput: Auto Scaling
• Fully-managed, automatic
• Scale up when you need it
• Scale down when you don’t
• On by default
• Scheduled Auto Scaling

Auto Scaling

With Auto
Scaling
Without Auto Scaling
• Remove the guesswork out of provisioning
adequate capacity
• Increases capacity as application requests
increase, ensuring performance
• Decreases capacity as application requests
reduce, reducing costs
• Full visibility into scaling activities from
console
Key Benefits
Scaling Throughput: Auto Scaling

ScalingThroughput: On-demand
Start at Zero
No Limit Features
• No capacity planning, provisioning, or
reservations– simply make API calls
• Pay only for the reads and writes you
perform
• Instantly accommodates your
workload as traffic ramps up or down

Govern Max Consumption
Auto Scaling
Provisioned
Set a Minimum Start at Zero
No Limit
On-demand
ScalingThroughput: On-demand

Availability

DynamoDB GlobalTables
F u l l y m a n a g e d , m u l t i - p r i m a r y, m u l t i - r e g i o n d a t a b a s e
Build high performance, globally distributed applications
Low latency reads & writes to locally available tables
Disaster proof with multi-region redundancy
Easy to setup and no application re-writes required

Cost-effective
- Perpetual free tier:
- 25GB of storage
- 25 WCUs and 25 RCUs of provisioned capacity
- 2.5 million read requests from DynamoDB Streams
- 25 rWCUs for global tables deployed in two AWS Regions
- Pay-as-you-grow for capacity and storage independently
- Auto scaling (Target utilization at 75%)
- Time-to-live (TTL)
- Automatically purges data at no extra charge
- (Deleting tables doesn’t incur charges either)
- Cost Allocation Tagging

Integrated: DynamoDB + AWS Ecosystem
Amazon
DynamoDBAmazon
S3
Amazon
Kinesis
Amazon
EMR
Amazon
Redshift
AWS
Data Pipeline
AWS
Mobile Hub
AWS
Lambda
Amazon ES Amazon
SNS
Amazon
CloudWatch
Amazon
IOT

Tables & Indices

Partition Key
Mandatory
Key-value access pattern
Determines data distribution
Optional
Model 1:N relationships
Enables rich query capabilities
DynamoDBTable
A1
(partition key)
A2
(sort key)
A3 A4 A7
A1
(partition key)
A2
(sort key)
A6 A4 A5
A1
(partition key)
A2
(sort key)
A1
(partition key)
A2
(sort key)
A3 A4 A5
Sort Key
Table
Items
All items for key
==, <, >, >=, <=
“begins with”
“between”
“contains”
“in”
sorted results
counts
top/bottom N values

Local secondary index (LSI)
Alternate sort key attribute
Index is local to a partition key
A1
(partition)
A3
(sort)
A2
(item key)
A1
(partition)
A2
(sort)
A3 A4 A5
LSIs A1
(partition)
A4
(sort)
A2
(item key)
A3
(projected)
Table
KEYS_ONLY
INCLUDE A3
A1
(partition)
A5
(sort)
A2
(item key)
A3
(projected)
A4
(projected)
ALL
10 GB maximum per
partition key; LSIs limit the
number of range keys!

Global Secondary Index (GSI)
Alternate partition and/or sort key
Index is across all partition keys
A1
(partition)
A2 A3 A4 A5
GSIs A5
(partition)
A4
(sort)
A1
(item key)
A3
(projected)
Table
INCLUDE A3
A4
(partition)
A5
(sort)
A1
(item key)
A2
(projected)
A3
(projected) ALL
A2
(partition)
A1
(itemkey) KEYS_ONLY
Online indexing
Read capacity units
(RCUs) and write
capacity units (WCUs)
are provisioned
separately for GSIs

How do GSI updates work?
Table
Primary
table
Primary
table
Primary
table
Primary
table
Global
secondary
index
Client
2. Asynchronous
update (in progress)
If GSIs don’t have enough write capacity, table writes are throttled!

LSI or GSI?
LSI GSI
Create at table creation Create any time
Shares WCU/RCU with table WCU/RCU independent of table
Size <= 10GB* No size limits
Limit = 5 Limit = 20
Strong Consistency Eventual Consistency
*10GB size limit is for a item collection size with an LSI

Data types
Type DynamoDB Type
String String
Integer, Float Number
Timestamp Number or String
Blob Binary
Boolean Bool
Null Null
List List
Set
Set of String, Number,
or Binary
Map Map

Customer Use Cases

Scaling high-velocity use cases with DynamoDB

RDBMS
DynamoDB
Amazon’s Path to DynamoDB

Prime Day 2019:
45.4M requests / second

Migration from Cassandra:
Backup & restore on mobile app for 300M users
Almost 1 PB in DynamoDB, 130M daily API requests
Migrated from Cassandra to DynamoDB
Consistent performance and 70% cost savings (TCO)
DynamoDB provided consistent
high performance at a drastically
lower cost than Cassandra.”
Seongkyu Kim
Samsung
“

Additional Concepts

Time-to-live (TTL)
ID Name Size Expiry
1234 A 100 1456702305
2222 B 240 1456702400
3423 C 150 1459207905
• Automatically delete items from a table based on
expiration timestamp
• User defined TTL attribute in epoch time format
• TTL activity recorded in DynamoDB Streams
TTL Attribute
Features
Key Benefits
• Reduce costs by deleting items no longer
needed
• Optimize application performance by controlling
table size growth
• Trigger custom workflows with Streams and
Lambda

Cost allocation tagging

Cost allocation tagging
• Track costs: AWS bills broken down by tags in
detailed monthly bills and Cost Explorer
• Flexible: Add customizable tags to tables,
indexes and DAX clusters
Features
Key Benefits
• Transparency: know exactly how much your
DynamoDB resources cost
• Consistent: report of spend across AWS
services

DynamoDB Transactions
Single
API
Call
Simplify your code by executing multiple, all-or-nothing
actions within and across tables with a single API call.

Key Features
• Provides atomicity, consistency, isolation, and durability
(ACID) in DynamoDB.
• You can perform transactions both within and across
multiple DynamoDB tables.
• Native, server-side solution that provides better
performance and lower costs than client-side libraries.

NoSQL Workbench for DynamoDB
• A graphical user interface to visualize data and perform
DynamoDB operations.
• Easily build new data models from scratch, or import and
modify existing data models.
• Visualize data models based on their applications'
access patterns.
• Perform data-plane operations and generate sample
code in multiple languages.

Getting Started
DynamoDB Local Document SDKs
DynamoDB
Developer Resources
https://aws.amazon.com/dynamodb/developer-resources/
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.html
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.SDKOverview.html
https://amazon-dynamodb-labs.com/

twitch.tv/aws – Mo/Fr @ 11am SAST Bean Streaming
twitch.tv/aws – Thu @ 12:00 SAST AWS Africa Office Hours
youtube.com/c/CobusCloud
bit.ly/notC_notD (Watch the recorded weekly sessions)

AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB

Similar a AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB (20)

Más de Cobus Bernard

Más de Cobus Bernard (20)

Último

Último (9)

AWS SSA Webinar 33 - Getting started with databases on AWS Amazon DynamoDB

Notas del editor