AWS re:Invent 2017 - Advanced Design Patterns for Amazon DynamoDB

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
Advanced Design Patterns for
Amazon DynamoDB
R i c k H o u l i h a n – S e n i o r P r a c t i c e M a n a g e r , N o S Q L
A W S P r o f e s s i o n a l S e r v i c e s
D A T 4 0 3
N o v e m b e r 2 9 , 2 0 1 7

© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
What to Expect from the Session
• When to use NoSQL and why
• Brief overview of Amazon DynamoDB
• Hot keys, design considerations
• NoSQL data modeling
• Normalized versus unstructured schema
• Common NoSQL design patterns
• Time series, write-sharding, MVCC, etc.
• Amazon DynamoDB in the serverless ecosystem

What is a database?
“A place to put stuff my app needs.” – Average Developer

Technology adoption and the hype curve

Why NoSQL?
Optimized for storage Optimized for compute
Normalized/relational Denormalized/hierarchical
Ad hoc queries Instantiated views
Scale vertically Scale horizontally
Good for OLAP Built for OLTP at scale
SQL NoSQL

Amazon DynamoDB
Document or key-value Scales to any workloadFully managed NoSQL
Access control Event-driven programmingFast and consistent

Table
Table
Items
Attributes
Partition
key
Sort
key
Mandatory
Key-value access pattern
Determines data distribution
Optional
Model 1:N relationships
Enables rich query capabilities
All items for key
==, <, >, >=, <=
“begins with”
“between”
“contains”
“in”
sorted results
counts
top/bottom N values

NoSQL data modeling

It’s all about aggregations…
Document management Process controlSocial network
Data treesIT monitoring

SQL vs. NoSQL design pattern

Design patterns and best practices

DynamoDB Streams and AWS Lambda

Triggers
Lambda function
Notify change
Item/table level metrics
Amazon CloudSearch
Kinesis Firehose

Write sharding
Handling high velocity writes

Partition 1
1000 WCUs
Partition K
1000 WCUs
Partition M
1000 WCUs
Partition N
1000 WCUs
Votes Table
Candidate A Candidate B
Scaling bottlenecks
Voters
Provision 200,000 WCUs

Queue-based load leveling
Workers
Dashboard
SQS
• Write incoming votes to SQS
• Scale workers as needed to
process queue
• Avoid key pressure

Write sharding
Candidate A_2
Candidate B_1
Candidate B_2
Candidate B_3
Candidate B_5
Candidate B_4
Candidate B_7
Candidate B_6
Candidate A_1
Candidate A_3
Candidate A_4
Candidate A_7 Candidate B_8
Candidate A_6 Candidate A_8
Candidate A_5
Voter
Votes Table

Write sharding
Candidate A_2
Candidate B_1
Candidate B_2
Candidate B_3
Candidate B_5
Candidate B_4
Candidate B_7
Candidate B_6
Candidate A_1
Candidate A_3
Candidate A_4
Insert: “CandidateA_” + rand(0, 10)
Candidate A_5
Voter
Votes Table

Votes Table
Shard aggregation
Candidate A_2
Candidate B_1
Candidate B_2
Candidate B_3
Candidate B_5
Candidate B_4
Candidate B_7
Candidate B_6
Candidate A_1
Candidate A_3
Candidate A_4
Candidate A_5
Periodic
process
Candidate A
Total: 2.5M
1. Sum
2. Store Voter

Query a single sharded item efficiently
Candidate A_2
Candidate B_1
Candidate B_2
Candidate B_3
Candidate B_5
Candidate B_4
Candidate B_7
Candidate B_6
Candidate A_1
Candidate A_3
Candidate A_4
Insert: “CandidateA_” + hash(SSN)
Candidate A_5
Voter
Votes Table

Calculating partition counts (reads)
Items per partition
Average item size
RCU Size
Requests per second
100 K * 0.2 KB / 4 KB * 10 / 3000 =
Partition max RCU
~ 17

Calculating partition counts (writes)
Items per second
Average item size
100 K * (itemSize < 1KB ? 1KB : itemSize) / 1000 =
Partition max WCU
100

Increase throughput with concurrency
Consider RCU/WCU per key, item size and request rate
Shard write-heavy partition keys
your write workload is not horizontally
scalable

Time-based workflows
Processing the entire table
efficiently

Finding expired items
Active Tickets_Table
Event_id
(Partition)
Timestamp GSIKey
Rand(0-N)
… Attribute N
Expired Tickets GSI
GSIKey
(Partition)
Timestamp
(Sort)
Archive Table
Event_id
(Partition)
Timestamp
(Sort)
Attribute1 …. Attribute N
RCUs = 10000
WCUs = 10000
RCUs = 100
WCUs = 1
Current table
HotdataColddata
Scatter query GSI for expired tickets and use TTL to archive

Find items efficiently with AWS Lambda
Use a write-sharded GSI to selectively query the entire table
Create a Lambda “stored procedure” to process items
Migrate data between tables with TTL/Streams/Lambda
there is a need to query all items on the
table selectively

Product catalog
Popular items (read)

Partition 1
2000 RCUs
Partition K
2000 RCUs
Partition M
2000 RCUs
Partition 50
2000 RCU
Scaling bottlenecks
Product A Product B
Shoppers
Product Catalog Table
SELECT Id, Description, ...
FROM ProductCatalog
WHERE Id="POPULAR_PRODUCT"

• Fully managed, highly available: handles all software management,
fault tolerant, replication across multi-AZs within a region
• DynamoDB API compatible: seamlessly caches DynamoDB API
calls, no application re-writes required
• Write-through: DAX handles caching for writes
• Flexible: configure DAX for one table or many
• Scalable: scales-out to any workload with up to 10 read replicas
• Manageability: fully integrated AWS service: Amazon CloudWatch,
tagging for DynamoDB, AWS Console
• Security: Amazon VPC, AWS IAM, AWS CloudTrail, AWS
Organizations
Features
DynamoDB Accelerator (DAX)
DynamoDB
Your Applications
DynamoDB Accelerator

Targeting queries
Query filters, composite keys, and
sparse indexes

Secondary index
Opponent Date GameId Status Host
Alice 2014-10-02 d9bl3 DONE David
Carol 2014-10-08 o2pnb IN_PROGRESS Bob
Bob 2014-09-30 72f49 PENDING Alice
Bob 2014-10-03 b932s PENDING Carol
Bob 2014-10-03 ef9ca IN_PROGRESS David
BobPartition key Sort key
Multi-value sorts and filters

Secondary Index
Approach 1: Query filter
Bob
Opponent Date GameId Status Host
Alice 2014-10-02 d9bl3 DONE David
Carol 2014-10-08 o2pnb IN_PROGRESS Bob
Bob 2014-09-30 72f49 PENDING Alice
Bob 2014-10-03 b932s PENDING Carol
Bob 2014-10-03 ef9ca IN_PROGRESS David
SELECT * FROM Game
WHERE Opponent='Bob'
ORDER BY Date DESC
FILTER ON Status='PENDING'
(filtered out)

Approach 2: Composite key
StatusDate
DONE_2014-10-02
IN_PROGRESS_2014-10-08
IN_PROGRESS_2014-10-03
PENDING_2014-09-30
PENDING_2014-10-03
Status
DONE
IN_PROGRESS
IN_PROGRESS
PENDING
PENDING
Date
2014-10-02
2014-10-08
2014-10-03
2014-10-03
2014-09-30
+ =

Secondary Index
Opponent StatusDate GameId Host
Alice DONE_2014-10-02 d9bl3 David
Carol IN_PROGRESS_2014-10-08 o2pnb Bob
Bob IN_PROGRESS_2014-10-03 ef9ca David
Bob PENDING_2014-09-30 72f49 Alice
Bob PENDING_2014-10-03 b932s Carol
Partition key Sort key

Opponent StatusDate GameId Host
Alice DONE_2014-10-02 d9bl3 David
Carol IN_PROGRESS_2014-10-08 o2pnb Bob
Bob IN_PROGRESS_2014-10-03 ef9ca David
Bob PENDING_2014-09-30 72f49 Alice
Bob PENDING_2014-10-03 b932s Carol
Secondary index
Bob
SELECT * FROM Game
WHERE Opponent='Bob'
AND StatusDate BEGINS_WITH 'PENDING'

Sparse indexes
Id
(Partition)
User Game Score Date Award
1 Bob G1 1300 2012-12-23
2 Bob G1 1450 2012-12-23
3 Jay G1 1600 2012-12-24
4 Mary G1 2000 2012-10-24 Champ
5 Ryan G2 123 2012-03-10
6 Jones G2 345 2012-03-20
Game-scores-table
Award
(Partition)
Id User Score
Champ 4 Mary 2000
Award-GSI
Scan sparse GSIs

Concatenate attributes to form useful
secondary index keys
Take advantage of sparse indexes
Replace filter with indexes
you want to optimize a query as much
as possible
Status + Date

Vertical Partitioning
Large items
Filters vs. indexes
M:N modeling—inbox and outbox

Reports
Table
Workflow Management App
David
SELECT *
FROM Reports
WHERE Owner = ‘David’
AND State = ‘Pending’
LIMIT 50
ORDER BY Date DESC
Pending
SELECT *
FROM Reports
WHERE Owner = ‘David’
AND State = ‘Processed’
LIMIT 50
ORDER BY Date DESC
Processed

Owner StateDate Document
David Pending#2014-10-02 …
… many more Reports for David …
David Processed#2014-10-03 …
Alice Pending#2014-09-28 …
Alice Pending#2014-10-01 …
Large and small attributes mixed
(Many more report items)
David
Reports table
50 items × 256 KB each
Large attachments
SELECT *
FROM Reports
WHERE Recipient=‘David’
AND Status=‘Pending’
LIMIT 50
ORDER BY Date DESC
Inbox

Computing read query cost
Items evaluated by query
Average item size
Conversion ratio
Eventually consistent reads
50 * 256KB * (1 RCU / 4 KB) * (1 / 2) = 1600 RCU

Owner StateDate Summary ReportID
David Pending#2014-10-02 … afed
David Processed#2014-10-03 … 3kf8
Alice Processed#2014-09-28 … 9d2b
Alice Processed#2014-10-01 … ct7r
Separate the bulk data
Pending-GSI Reports table
ReportID Body
9d2b …
3kf8 …
ct7r …
afed …
David
1. Query Pending-GSI: 1 RCU
2. BatchGetItem messages: 1600 RCU
(50 separate items at 256 KB)
(50 sequential items at 128 bytes)
Uniformly distributes large item reads

Messaging app
Reports
table
David
Pending GSI
Pending
Processed GSI
Processed

Reduce one-to-many item sizes
Configure secondary index projections
Use GSIs to model M:N relationship
Distribute large items
querying many large items at once
ProcessedReportsPending

Advanced data modeling

Multi-version concurrency
Transactions in NoSQL

How OLTP apps use data
 Mostly hierarchical
structures
 Entity driven workflows
 Data spread across tables
 Requires complex queries
 Primary driver for ACID

ItemID
(PK)
Version
(SK)
CurVer Attrs
1
v0 2 …
v1 …
v2 …
v3 …
Emulating ACID transactions
(Many more item partitions)
Item versions
Overwrite v0 Item to
Commit changes
COPY Item.v0 -> Item1.v3 IF Item.v3 == NULL
UPDATE Item1.v3 SET Attr1 += 1
UPDATE Item1.v3 SET Attr2 = …
UPDATE Item1.v3 SET Attr3 = …
COPY Item1.v3 -> Item1.v0 SET CurVer = 3
Transaction

Versioning item sets
Get all items for a product
assembly
SELECT *
FROM Assemblies
WHERE ID=1
Order
Item
Picker
ID PartID Vendor Bin
1
1 ACME 27Z19
2 ACME 39M97
3 ACME 75B25
(Many more assemblies)
Assemblies table

1
1 ACME 27Z19
2 ACME 39M97
3 ACME 75B25
4 UAC 53G56
5 UAC 64B17
6 UAC 48J19
Updates are problematic
Assemblies table
Old items
SELECT *
FROM Assemblies
WHERE ID=1
Order Item Picker
New items

1
1 ACME 27Z19
2 ACME 39M97
3 ACME 75B25
4 UAC 53G56
5 UAC 64B17
6 UAC 48J19
Creating partition “locks”
Assemblies table
• Use metadata item for versioning
SET #ItemList.i = list append(:newList, #ItemList.i ), #Lock =
1
• Obtain locks with conditional writes
--condition-expression “Lock = 0”
• Remove old items or add version info to Sort Key
Query composite keys for specific versions
• Readers can determine how to handle “fat” reads
• Add additional metadata to manage transactional
workflow as needed
Current state, breadcrumbs, timeout, etc.
ID PartID ItemList Lock
1 0 {i:[[1,2,3],[4,5,6]]} 0

Use item partitions to manage transactional
workflows
Manage versioning across items with metadata
Tag attributes to maintain multiple versions
Code your app to recognize when updates are in progress
Implement app layer error handling and recovery logic
transactional writes across items is
required

Geo-hashing
FleetID
(PK)
Location
(SK)
GSIKey
Rand(0-N)
GUID
0201200 …
0213211 …
0233321 …
0320011 …
• Break down map area into a grid
• Continue to subdivide grid sections to
narrow down search areas
• Use range queries to find nearby items
• Configure write-sharded GSI to query
across partitions
GSIKey
(PK)
Location
(SK)
FleetID
0-N
… GUID
… GUID
… GUID
… GUID

Hierarchical data
Composite key modeling

Hierarchical data structures as items
• Use composite sort key to define a hierarchy
• Highly selective queries with sort conditions
• Reduce query complexity

… or as documents (JSON)
JSON data types (M, L, BOOL, NULL)
Document SDKs available
Indexing only by using DynamoDB Streams or AWS Lambda
400 KB maximum item size (limits hierarchical data structure)
Primary Key
Attributes
PK
Items
BookID
type title author genre publisher datePublished ISBN
Book Ringworld Larry Niven Science Fiction Ballantine Oct-70 0-345-02046-4
AlbumID
type title artist genre Attributes
Album
Dark Side of the
Moon
Pink Floyd Progressive Rock
{ label:"Harvest", studio: "Abbey Road", published: "3/1/73", producer: "Pink
Floyd", tracks: [{title: "Speak to Me", length: "1:30", music: "Mason", vocals:
"Instrumental"},{title: ”Breathe", length: ”2:43", music: ”Waters, Gilmour,
Wright", vocals: ”Gilmour"},{title: ”On the Run", length: “3:30", music: ”Gilmour,
Waters", vocals: "Instrumental"}]}
MovieID
type title genre writer Attributes
Movie Idiocracy Scifi Comedy Mike Judge
{ producer: "20th Century Fox", actors: [{ name: "Luke Wilson", dob: "9/21/71",
character: "Joe Bowers", image: "img2.jpg"},{ name: "Maya Rudolph", dob:
"7/27/72", character: "Rita", image: "img1.jpg"},{ name: "Dax Shepard", dob:
"1/2/75", character: "Frito Pendejo", image: "img3.jpg"}]

… or as nested sets
• Store data trees on a table as
array of leaf items
• Non-leaf items define start/end
index values
• Query for leaf items by
category using index ranges

• Partition table on node ID, add
edges to define adjacency list
• Define a default edge for every
node type to describe the node
itself
• Use partitioned GSIs to query large
nodes (dates, places, etc.)
• Use Dynamo DB
Streams/Lambda/EMR for graph
query projections
• Neighbor entity state
• Subtree aggregations
• Breadth first search
• Node ranking
Adjacency lists and materialized graphs
GSI Primary Key Attributes
GSIkey Data Target Type Node Projection
0-N
Jason Bourne 1
Person
1 …
James John Doe 4 4 …
20170418 2
Birthdate
1 …
4 …
Date 2 …
Finland 3
Birthplace
1 …
4 …
Place 3 …
GSI Primary Key Attributes
GSIkey Type Target Data Node Projection
0-N
Person
1 Jason Bourne 1 …
4 John Doe 4 …
Birthdate 2 20170418
1 …
4 …
Birthplace 3 Finland
1 …
4 …
Date 2 20170418 2 …
Place 3 Finland 3 …
Table Primary Key Attributes
Node Type Target Data GSIkey Projection
1
Person 1 Jason Bourne
HASH(Person.Data)
Edge/spanning
tree rollups
2 Date 2 20170418
HASH(Data) (Summary Stats)
3 Place 3 Finland
4
Person 4 John Doe
HASH(Person.Data
Edge/spanning
tree rollups

Audible eBook sync service
• Allows users to save session
state for Audible eBooks
• Maintains mappings per user
for eBooks and audio products
• Spikey load patterns require
significant overprovisioning
• Large number of access
patterns

Access patterns
#
ACCESS PATTERNS: https://w.amazon.com/index.php/WfV/Services/DBMigration
USE CASE
1 CompanionMapping getCompanionMappingsByAsin
2 CompanionMapping getCompanionMappingsByEbookAndAudiobookContentId
3 CompanionMapping getCompanionMappingsFromCache
4 CompanionMapping getCompanionMappings
5 CompanionMapping getCompanionMappingsAvailable
6 AcrInfo getACRInfo
7 AcrInfo getACRs
8 AcrInfo getACRInfos
9 AcrInfo getACRInfosbySKU
10 AudioProduct getAudioProductsForACRs
11 AudioProduct getAudioProducts
12 AudioProduct deleteAudioProductsMatchingSkuVersions
13 AudioProduct getChildAudioProductsForSKU
14 Product getProductInfoByAsins
15 Product getParentChildDataByParentAsins
16 AudioFile getAudioFilesForACR
17 AudioFile getAudioFilesForChildACR
18 AudioFile getAudioFilesByParentAsinVersionFormat
19 AudioFile getAudioFiles
20 AudioFile getAudioFilesForChildAsin

Primary table
T
A
B
L
E
Primary Key
Attributes
PK SK (GSI 3)
ABOOKACR1
v0#ABOOKACR1
GSI-1 GSI-2
ABOOK-ASIN1 ABOOK-SKU1
EBOOKACR1
GSI-1 GSI-2
SyncFileAcr ABOOK-ASIN1
ABOOKACR1#TRACK#1
GSI-1 GSI-2
ABOOKACR1#TRACK#2
GSI-1 GSI-2
EBOOKACR1 EBOOKACR1
GSI-1 EBookAsin
EBOOK-SKU1 ASIN

Indexes
G
S
I
1
Partition Key Projected Attributes
ABOOK-ASIN1 ABOOKACR1
ABOOKACR1-v1
ABOOKACR1#TRACK#1
ABOOKACR1#TRACK#2
SyncFileAcr ABOOKACR1 MAP-EBOOKACR1
EBOOK-SKU1 ABOOKACR1 EBOOKACR1
G
S
I
2
Partition Key Projected Attributes
ABOOK-ASIN1 ABOOKACR1 MAP-EBOOKACR1
ABOOK-SKU1 ABOOKACR1
ABOOKACR1-v1
ABOOKACR1#TRACK#1
ABOOKACR1#TRACK#2
G
S
I
3
GSI Partition Key Projected Attributes
V0#ABOOKACR1 ABOOKACR1 ABOOKACR1-v1
EBOOKACR1 ABOOKACR1 MAP-EBOOKACR1
ABOOKACR1#TRACK#1 ABOOKACR1 ABOOKACR1#TRACK#1
ABOOKACR1#TRACK#2 ABOOKACR1 ABOOKACR1#TRACK#2
EBOOKACR1 ABOOKACR1 EBOOKACR1

Query conditions
#
ACCESS PATTERNS: https://w.amazon.com/index.php/WfV/Services/DBMigration
USE CASE Lookup parameters INDEX Key Conditions Filter Conditions
1 CompanionMapping getCompanionMappingsByAsin audiobookAsin/ebookSku GSI2 GSI-2=ABOOK-ASIN1 None
2 CompanionMapping
getCompanionMappingsByEbookAndAudi
obookContentId
ebookAcr/sku,version,format or
audiobookAcr/asin,version,format
GSI-3 on TargetACR attribute OR
PrimaryKey on Table
GSI-3=MAP-EBOOKACR1 version=v and format=f
3 CompanionMapping getCompanionMappingsFromCache
ebookAcr/sku,version,format or
audiobookAcr/asin,version,format
GSI-3 on TargetACR attribute OR
PrimaryKey on Table
GSI-3=MAP-EBOOKACR1 version=v and format=f
4 CompanionMapping getCompanionMappings
syncfileAcr, ebookAcr?,
audiobookAcr?
GSI1 GSI-1=SyncFileAcr None
5 CompanionMapping getCompanionMappingsAvailable ebookAcr, audiobookAcr Primary Key on Table
Acr=ABOOKACR1 and
TargetACR beginswith "MAP-"
6 AcrInfo getACRInfo acr Primary Key on Table
Acr=ABOOKACR1 and
TargetACR beginswith "ABOOKACR1-v"
7 AcrInfo getACRs acr / asin,version,format Primary Key on Table Acr=ABOOKACR1 version=v and format=f
8 AcrInfo getACRInfos acr Primary Key on table
Acr=ABOOKACR1 and
TargetACR beginswith "ABOOKACR1"
9 AcrInfo getACRInfosbySKU sku GSI2 GSI-2=ABOOK-SKU1
10 AudioProduct getAudioProductsForACRs acr Primary Key on table
Acr=ABOOKACR1 and TargetACR
beginswith "ABOOKACR1"
11 AudioProduct getAudioProducts sku, version, format GSI2 GSI-2=ABOOK-SKU1 version=v and format=f
12 AudioProduct deleteAudioProductsMatchingSkuVersions sku, version GSI2 GSI-2=ABOOK-SKU1 version=v
13 AudioProduct getChildAudioProductsForSKU sku GSI2 GSI-2=ABOOK-SKU1
14 Product getProductInfoByAsins asin GSI1 GSI-1=ABOOK-ASIN1
15 Product getParentChildDataByParentAsins asin GSI1 GSI-1=ABOOK-ASIN1
16 AudioFile getAudioFilesForACR acr Primary Key on table
Acr=ABOOKACR1 and
TargetACR beginswith "ABOOKACR1#"
17 AudioFile getAudioFilesForChildACR acr, parent_asin Primary Key on table Acr=ABOOKACR1 version=v and format=f
18 AudioFile getAudioFilesByParentAsinVersionFormat parent_asin, version, format GSI1 GSI-1=ABOOK-ASIN1 version=v and format=f
19 AudioFile getAudioFiles sku, version, format GSI2 GSI-2=ABOOK-SKU1 version=v and format=f
20 AudioFile getAudioFilesForChildAsin asin, parent_asin, version, format GSI1 GSI-1=ABOOK-ASIN1 version=v and format=f

Thank you!

AWS re:Invent 2017 - Advanced Design Patterns for Amazon DynamoDB

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a AWS re:Invent 2017 - Advanced Design Patterns for Amazon DynamoDB

Similar a AWS re:Invent 2017 - Advanced Design Patterns for Amazon DynamoDB (20)

Más de Amazon Web Services

Más de Amazon Web Services (20)

AWS re:Invent 2017 - Advanced Design Patterns for Amazon DynamoDB