Amazon DynamoDB is a fully managed NoSQL database service for applications that need consistent, single-digit millisecond latency at any scale. This talk explores DynamoDB capabilities and benefits in detail and discusses how to get the most out of your DynamoDB database. We go over schema design best practices with DynamoDB across multiple use cases, including gaming, AdTech, IoT, and others. We also explore designing efficient indexes, scanning, and querying, and go into detail on a number of recently released features, including JSON document support, Streams, Time-to-Live (TTL), and more.
2. Dating Website Serverless IoT
o DAX
o GSIs
o TTL
o Streams
o DAX
Getting Started
o Developer Resources
Amazon DynamoDB
o Foundations
o Tables
o Indexes
o Partitioning
New Features
o TTL
o VPC Endpoints
o Auto Scaling
o DAX
Plan
4. NoSQL foundations
0000 {“Texas”}
0001 {“Illinois”}
0002 {“Oregon”}
TXW
A
I
L
Key
Column
0000-0000-0000-0001
Game Heroes
Version 3.4
CRC ADE4
Key Value Graph Document Column-family
Dynamo:
Amazon’s
Highly Available
Key-value
Store
January 2012Fall 2007 June 2009
Meetup
235 2nd St
San
Francisco
5. What (some) customers store in NoSQL DBs
Market Orders Tokenization
(PHI, Credit Cards)
Chat MessagesUser Profiles
(Mobile)
IoT Sensor Data
(& device status!)
File MetadataSocial Media Feeds
6. DataXu’s Attribution Store
AWS
Direct
Connect
Amazon
DynamoDB
Amazon
RDS
AWS Data
Pipeline
AWS IAM
Amazon
SNS
Amazon
CloudWatch
EMR Job
Amazon
EC2
Amazon
S3 Bucket
1st Party Data
3rd Party Data
“Attribution" is the marketing term of art for allocating full or partial credit to
individual advertisements that eventually lead to a purchase or other desired
consumer interaction.
8. Highly available Consistent, single-digit
millisecond latency
at any scale
Fully managed
Secure
Integrates with AWS Lambda,
Amazon Redshift, and more
Amazon DynamoDB
9. Elastic is the new normal
Write Capacity Units
Read Capacity Units
ConsumedCapacityUnits
>200% increase from baseline
>300% increase from baseline
Time
12. 10 GB max per
partition key,
i.e. LSIs limit the
# of sort keys!
A1
(partition key)
A3
(sort key)
A2 A4 A5
A1
(partition key)
A4
(sort key)
A2 A3 A5
A1
(partition key)
A5
(sort key)
A2 A3 A4
• Alternate sort key
attribute
• Index is local to a
partition key
Local secondary indexes
13. RCUs/WCUs
provisioned
separately for GSIs
INCLUDE A2
ALL
KEYS_ONLY
A3
(partition key)
A1
(table key)
A2 A4 A7
A3
(partition key)
A1
(table key)
A3
(partition key)
A1
(table key)
A2
• Alternate partition
(+sort) key
• Index is across all
table partition keys
• Can be added or
removed anytime
A3
(partition key)
A1
(table key)
A2 A4 A7
A3
(partition key)
A1
(table key)
A2
A3
(partition key)
A1
(table key)
Global secondary indexes
14. Data types
Type DynamoDB Type
String String
Integer, Float Number
Timestamp Number or String
Blob Binary
Boolean Bool
Null Null
List List
Set
Set of String,
Number, or Binary
Map Map
15. Table creation options
PartitionKey, Type:
SortKey, Type:
Provisioned Reads:
Provisioned Writes:
LSI Schema GSI Schema
AttributeName [S,N,B]
AttributeName [S,N,B]
1+
1+
Provisioned Reads: 1+
Provisioned Writes: 1+
TableNameOptionalRequired
CreateTable
String,
Number,
Binary ONLY
Per Second
Unique to
Account and
Region
16. Provisioned capacity
Provisioned capacity
Read Capacity Unit (RCU)
1 RCU returns 4KB of data for strongly
consistent reads, or double the data
at the same cost for eventually
consistent reads
Capacity is per second, rounded up to the
next whole number
Write Capacity Unit (WCU)
1 WCU writes 1KB of data, and each
item consumes 1 WCU minimum
17. Horizontal sharding
Host 1 Host 99 Host n
~Each new host brings compute, storage, and network bandwidth~
CustomerOrdersTable
19. CustomerOrdersTable
00
55
AA
FF
Partition A
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition B
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition C
33.33 % Keyspace
33.33 % Provisioned Capacity
Hash.MIN = 0
Hash.MAX = FF
Keyspace
Time
Partition A
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition B
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition D
Partition E
16.66 %
16.66 %
16.66 %
16.66 %
Partition split due to partition size
00
55
AA
FF
Partition A
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition B
33.33 % Keyspace
33.33 % Provisioned Capacity
Partition C
33.33 % Keyspace
33.33 % Provisioned Capacity
Time
Partition A
Partition C
16.66 %
16.66 %
16.66 %
16.66 %
Partition split due to capacity increase
16.66 %
16.66 %
16.66 %
16.66 %
16.66 %
16.66 %
16.66 %
16.66 %
Partition B
Partition D
Partition E
Partition F
The desired size of a
partition is 10GB* and
when a partition surpasses
this it can split
*=subject to change
Split for partition size
The desired capacity of a
partition is expressed as:
3w + 1r < 3000 *
Where w = WCU & r = RCU
*=subject to change
Split for provisioned capacity
Partitioning
20. Partition A
1000 RCUs
100 WCUs
Partition C
1000 RCUs
100 WCUs
Host A Host C
Availability Zone A
Partition A
1000 RCUs
100 WCUs
Partition C
1000 RCUs
100 WCUs
Host E Host G
Availability Zone B
Partition A
1000 RCUs
100 WCUs
Partition C
1000 RCUs
100 WCUs
Host H Host J
Availability Zone C
CustomerOrdersTable
54:∞00:0 54:∞00:0 54:∞00:0
FF:∞AA:0 FF:∞AA:0 FF:∞AA:0
Data is replicated to
three Availability
Zones by design
3-way replication
OrderId: 1
CustomerId: 1
ASIN: [B00X4WHP5E]
Hash(1) = 7B
Partition B
1000 RCUs
100 WCUs
Host B Host F Host I
Partition B
1000 RCUs
100 WCUs
Partition B
1000 RCUs
100 WCUs
A9:∞55:0 A9:∞55:0 A9:∞55:0
Partitioning
21. DynamoDB Streams
Partition A
Partition B
Partition C
Ordered stream of item
changes
Exactly once, strictly
ordered by key
Highly durable, scalable
24-hour retention
Sub-second latency
Compatible with Kinesis
Client Library
DynamoDB Streams
1
Shards have a lineage and
automatically close after time
or when the associated
DynamoDB partition splits
2
3
Updates
KCL
Worker
Amazon
Kinesis Client
Library
Application
KCL
Worker
KCL
Worker
GetRecords
Amazon DynamoDB
Table
DynamoDB Streams Stream
Shards
22. TTL job
Time-To-Live (TTL)
Amazon DynamoDB
Table
CustomerActiveOrder
OrderId: 1
CustomerId: 1
MyTTL: 1492641900
DynamoDB Streams
Amazon Kinesis
Amazon Redshift
An epoch timestamp marking when
an item can be deleted by a
background process, without
consuming any provisioned capacity
Time-To-Live
Removes data that is no longer relevant
23. Time-To-Live (TTL)
TTL items
identifiable in
DynamoDB
Streams
Configuration protected by
AWS Identity and Access
Management (IAM), auditable
with AWS CloudTrail
Eventual deletion,
free to use
26. DynamoDB in the VPC
Availability Zone #1 Availability Zone #2
Private Subnet Private Subnet
VPC endpoint
web
app
server
security
group
security
group
oMicroseconds latency in-memory cache
oMillions of requests per second
oFully managed, highly available
oRole-based access control
oNo IGW or VPC endpoint required
DAX
oDynamoDB in the VPC
oIAM resource policy
restricted
VPC Endpoints
AWS Lambda
security
group
security
group
DAX
web
app
server
DAX
27. DynamoDB Accelerator (DAX)
Private IP, Client-side
Discovery
Supports AWS Java SDK on launch,
with more AWS SDKs to come
Cluster based, Multi-AZ Separate Query and
Item cache
29. DynamoDB key choice
To get the most out of DynamoDB throughput, create tables where the
partition key has a large number of distinct values, and values are
requested fairly uniformly, as randomly as possible.
Amazon DynamoDB Developer Guide
30. Elements of even access
Partitions
Time
Heat
1. Key choice: high key cardinality
2. Uniform access: access is evenly spread over the keyspace
31. Elements of even access
Time
3. Requests arrive evenly spaced in time
36. What causes throttling?
If sustained throughput goes beyond provisioned throughput on a partition
A throttle comes from a partition
37. What causes throttling?
In Amazon CloudWatch, if consumed capacity
is well under provisioned and throttling
occurs, it must be “partition throttling”
If sustained throughput goes beyond provisioned throughput on a partition
38. What causes throttling?
Disable retries, write your own retry
code, and log all throttled or returned keys
• Fire TV Stick
• Echo Dot – Black
• Amazon Fire TV
• Amazon Echo – Black
• Fire HD 8
• Echo Dot – White
• Kindle Paperwhite
• Fire Tablet with Alexa
• Fire HD 8 Tablet with A…
• Fire HD 8 Tablet with A…
Top Items
If sustained, throughput goes beyond provisioned throughput on a partition
40. Online dating website running on AWS
Users have people they like, and conversely
people who like them
Hourly batch job matches users
Data stored in Likes and Matches tables
Dating Website
DESIGN PATTERNS:
DynamoDB Accelerator and GSIs
41. Schema Design Part 1
GSI_Other
user_id_other
(Partition key)
user_id_self
(sort key)
Requirements:
1. Get all people I like
2. Get all people that like me
3. Expire likes after 90 days
LIKES|
Likes
user_id_self
(Partition key)
user_id_other
(sort key)
MyTTL
(TTL attribute)
… Attribute N
42. Schema Design Part 2
Matches
event_id
(Partition key)
timestamp
(sort key)
UserIdLeft
(GSI left)
UserIdRight
(GSI right)
Attribute N
GSI Left
UserIdLeft
(Partition key)
event_id
(Table key)
timestamp
(Table Key)
UserIdRight
GSI Right
UserIdRight
(Partition key)
event_id
(Table key)
timestamp
(Table Key)
UserIdLeft
Requirements:
1. Get my matches
MATCHES|
Table Keys
43. Matchmaking
LIKES
Requirements:
1. Get all new likes every hour
2. For each like, get the other user’s likes
3. Store matches in matches table
Partition 1
Partition …
Partition N Availability Zone
Public Subnet
match
making
server
security group
Auto Scaling group
44. Matchmaking
LIKES
Requirements:
1. Get all new likes every hour
2. For each like, get the other user’s likes
3. Store matches in matches table
Partition 1
Partition …
Partition N Availability Zone
Public Subnet
match
making
server
security group
Auto Scaling group
THROTTLE!
45. Matchmaking Requirements:
1. Get all new likes every hour
2. For each like, get the other user’s likes
3. Store matches in matches table
1. Key choice: High key cardinality
2. Uniform access: access is evenly spread over the keyspace
3. Time: requests arrive evenly spaced in time
Even Access:
46. Matchmaking
LIKES
Requirements:
1. Get all new likes every hour
2. For each like, get the other user’s likes
3. Store matches in matches table
Partition 1
Partition …
Partition N Availability Zone
Public Subnet
match
making
server
security group
Auto Scaling group
0. Write like to like table, then query by user id to
warm cache, then queue for batch processing
security group
DAX
47. Takeaways:
Keep DAX warm by querying after writing
Use GSIs for many-to-many relationships
Dating Website
DESIGN PATTERNS:
DynamoDB Accelerator and GSIs
48. Amazon DynamoDB
DESIGN PATTERNS:
TTL, DynamoDB Streams, and DAX
Single DynamoDB table for storing sensor data
Tiered storage to remove archive old events to
Amazon S3
Data stored in data table
Serverless IoT
49. Schema design
Data
DeviceId
(Partition key)
EventEpoch
(sort key)
MyTTL
(TTL attribute)
… Attribute N
Requirements:
1. Get all events for a device
2. Archive old events after 90 days
DATA|
UserDevices
UserId
(Partition key)
DeviceId
(sort key)
Attribute 1 … Attribute N
Requirements:
1. Get all devices for a userUSERDEVICES|
References
50. DATA
DeviceId: 1
EventEpoch: 1492641900
MyTTL: 1492736400 Expiry
AWS Lambda
Amazon S3
Bucket
Amazon DynamoDB Amazon DynamoDB Streams
Single DynamoDB table for storing sensor data
Tiered storage to remove archive old events to Amazon S3
Data stored in data table
USERDEVICES
Serverless IoT
51. Serverless IoT
DATA
Partition A Partition B Partition DPartition C
Throttling
Noisy sensor produces data at
a rate several times greater
than others
52. Data
00
3F
BF
FF
Partition A
25.0 % Keyspace
25.0 % Provisioned Capacity
Partition B
25.0 % Keyspace
25.0 % Provisioned Capacity
Partition D
25.0 % Keyspace
25.0 % Provisioned Capacity
Hash.MIN = 0
Hash.MAX = FF
Keyspace
Partition C
25.0 % Keyspace
25.0 % Provisioned Capacity
7F
53. Data
00
3F
BF
FF
Partition A
25.0 % Keyspace
25.0 % Provisioned Capacity
Partition B
25.0 % Keyspace
25.0 % Provisioned Capacity
Partition D
25.0 % Keyspace
25.0 % Provisioned Capacity
Hash.MIN = 0
Hash.MAX = FF
Keyspace
Partition C
25.0 % Keyspace
25.0 % Provisioned Capacity
7F
1. Key choice: High key cardinality
2. Uniform access: access is evenly
spread over the keyspace
3. Time: requests arrive evenly
spaced in time
Even Access:
54. Serverless IoT
Requirements:
1. Single DynamoDB table for storing sensor
data
2. Tiered storage to remove archive old events
to S3
3. Data stored in data table
0. Capable of dynamically sharding to overcome
throttling
55. Schema Design
Shard
DeviceId
(Partition key)
ShardCount
Requirements:
1. Get shard count for given device
2. Always grow the count of shards
SHARD|
Requirements:
1. Get all events for a device
2. Archive old events after 90 days
Data |
Data
DeviceId
(Partition key)
EventEpoch
(sort key)
MyTTL
(TTL attribute)
… Attribute N
A sharding scheme where the number of
shards is not predefined, and will grow
over time but never contract. Contrast
with a fixed shard count
Naïve Sharding
Range: 0..1,000
56. DATA
DeviceId_ShardId: 1_3
EventEpoch: 1492641900
MyTTL: 1492736400
SHARD
DeviceId: 1
ShardCount: 10
1.
2.
Serverless IoT: Naïve sharding
Request path:
1. Read ShardCount from Shard table
2. Write to a random shard
3. If throttled, review shard count
Expiry
57. Serverless IoT
DATA
Partition A Partition B Partition DPartition C
Pick a random shard to write data to
DeviceId_ShardId:
1_Rand(0,10)
EventEpoch: 1492641900
MyTTL: 1492736400
2.
?
SHARD
DeviceId: 1
ShardCount: 10
1.
58. DATA
DeviceId: 1
EventEpoch: 1492641900
MyTTL: 1492736400 Expiry
AWS Lambda
Amazon S3
Bucket
DynamoDB
Streams
Single DynamoDB table for storing sensor data
Tiered storage to remove archive old events to Amazon S3
Data stored in data table
Capable of dynamically sharding to overcome throttling
USERDEVICES
Serverless IoT
SHARD
DeviceId: 1
ShardCount: 10
DAX
+
Amazon Kinesis
Firehose
59. DESIGN PATTERNS:
TTL, DynamoDB Streams, and DAX
Takeaways:
Use naïve write sharding to dynamically expand shards
Use DAX for hot reads, especially from AWS Lambda
Use TTL to create tiered storage
Serverless IoT