4. Local secondary index (LSI)
§Alternate range key attribute
§Index is local to a hash key (or partition)
A1
(hash)
A3
(range)
A2
(table key)
A1
(hash)
A2
(range)
A3 A4 A5
LSIs A1
(hash)
A4
(range)
A2
(table key)
A3
(projected)
Table
KEYS_ONLY
INCLUDE A3
A1
(hash)
A5
(range)
A2
(table key)
A3
(projected)
A4
(projected)
ALL
10 GB max per hash key,
i.e. LSIs limit the # of ran
ge keys!
5. Global secondary index (GSI)
§Alternate hash (+range) key
§Index is across all table hash keys (partitions)
A1
(hash)
A2 A3 A4 A5
GSIs A5
(hash)
A4
(range)
A1
(table key)
A3
(projected)
Table
INCLUDE A3
A4
(hash)
A5
(range)
A1
(table key)
A2
(projected)
A3
(projected) ALL
A2
(hash)
A1
(table key) KEYS_ONLY
RCUs/WCUs provisioned se
parately for GSIs
Online indexing
6. How do GSI updates work?
Table
Primary ta
ble
Primary ta
ble
Primary ta
ble
Primary ta
ble
Global Seco
ndary Index
Client
2. Asynchronous
update (in progress)
If GSIs don’t have enough write capacity, table writes will be throttled!
7. LSI or GSI?
§LSI can be modeled as a GSI
§If data size in an item collection > 10 GB, use GSI
§If eventual consistency is okay for your scenario, use GSI!
9. Scaling
§Throughput
§ Provision any amount of throughput to a table
§Size
§ Add any number of items to a table
§ Max item size is 400 KB
§ LSIs limit the number of range keys due to 10 GB limit
§Scaling is achieved through partitioning
10. Throughput
§Provisioned at the table level
§ Write capacity units (WCUs) are measured in 1 KB per second
§ Read capacity units (RCUs) are measured in 4 KB per second
§ RCUs measure strictly consistent reads
§ Eventually consistent reads cost 1/2 of consistent reads
§Read and write throughput limits are independent
WCURCU
15. Getting the most out of DynamoDB throughput
“To get the most out of
DynamoDB throughput, create
tables where the hash key
element has a large number of
distinct values, and values are
requested fairly uniformly, as
randomly as possible.”
—DynamoDB Developer Guide
§Space: access is evenly
spread over the key-
space
§Time: requests arrive
evenly spaced in time
16. What causes throttling?
§ If sustained throughput goes beyond provisioned throughput per partition
§ Non-uniform workloads
§ Hot keys/hot partitions
§ Very large bursts
§ Mixing hot data with cold data
§ Use a table per time period
§ From the example before:
§ Table created with 5000 RCUs, 500 WCUs
§ RCUs per partition = 1666.67
§ WCUs per partition = 166.67
§ If sustained throughput > (1666 RCUs or 166 WCUs) per key or partition, DynamoDB may
throttle requests
§ Solution: Increase provisioned throughput
18. 1:1 relationships or key-values
§Use a table or GSI with a hash key
§Use GetItem or BatchGetItem API
§Example: Given an SSN or license number, get
attributes
Users Table
Hash key Attributes
SSN = 123-45-6789 Email = johndoe@nowhere.com, License = TDL25478134
SSN = 987-65-4321 Email = maryfowler@somewhere.com, License = TDL78309234
Users-Email-GSI
Hash key Attributes
License = TDL78309234 Email = maryfowler@somewhere.com, SSN = 987-65-4321
License = TDL25478134 Email = johndoe@nowhere.com, SSN = 123-45-6789
19. 1:N relationships or parent-children
§Use a table or GSI with hash and range key
§Use Query API
Example:
§ Given a device, find all readings between epoch X, Y
Device-measurements
Hash Key Range key Attributes
DeviceId = 1 epoch = 5513A97C Temperature = 30, pressure = 90
DeviceId = 1 epoch = 5513A9DB Temperature = 30, pressure = 90
20. N:M relationships
§Use a table and GSI with hash and range key elements
switched
§Use Query API
Example: Given a user, find all games. Or given a game,
find all users.
User-Games-Table
Hash Key Range key
UserId = bob GameId = Game1
UserId = fred GameId = Game2
UserId = bob GameId = Game3
Game-Users-GSI
Hash Key Range key
GameId = Game1 UserId = bob
GameId = Game2 UserId = fred
GameId = Game3 UserId = bob
21. Documents (JSON)
§ New data types (M, L, BOOL,
NULL) introduced to support
JSON
§ Document SDKs
§ Simple programming model
§ Conversion to/from JSON
§ Java, JavaScript, Ruby, .NET
§ Cannot index (S,N) elements
of a JSON object stored in M
§ Only top-level table attributes
can be used in LSIs and
GSIs without
Streams/Lambda
JavaScript DynamoDB
string S
number N
boolean BOOL
null NULL
array L
object M
25. Time series tables
Events_table_2015_April
Event_id
(Hash key)
Timestamp
(range key)
Attribute1 …. Attribute N
Events_table_2015_March
Event_id
(Hash key)
Timestamp
(range key)
Attribute1 …. Attribute N
Events_table_2015_Feburary
Event_id
(Hash key)
Timestamp
(range key)
Attribute1 …. Attribute N
Events_table_2015_January
Event_id
(Hash key)
Timestamp
(range key)
Attribute1 …. Attribute N
RCUs = 1000
WCUs = 100
RCUs = 10000
WCUs = 10000
RCUs = 100
WCUs = 1
RCUs = 10
WCUs = 1
Current table
Older tables
Hot dataCold data
Don’t mix hot and cold data; archive cold data to Amazon S3
26. Important when:
Use a table per time period
§Pre-create daily, weekly, monthly tables
§Provision required throughput for current table
§Writes go to the current table
§Turn off (or reduce) throughput for older tables
Dealing with time series data
33. GameId Date Host Opponent Status
d9bl3 2014-10-02 David Alice DONE
72f49 2014-09-30 Alice Bob PENDING
o2pnb 2014-10-08 Bob Carol IN_PROGRESS
b932s 2014-10-03 Carol Bob PENDING
ef9ca 2014-10-03 David Bob IN_PROGRESS
Games Table
Multiplayer online game data
Hash key
34. Query for incoming game requests
§DynamoDB indexes provide hash and range
§What about queries for two equalities and a range?
SELECT * FROM Game
WHERE Opponent='Bob‘
AND Status=‘PENDING'
ORDER BY Date DESC
(hash)
(range)
(?)
35. Secondary Index
Opponent Date GameId Status Host
Alice 2014-10-02 d9bl3 DONE David
Carol 2014-10-08 o2pnb IN_PROGRESS Bob
Bob 2014-09-30 72f49 PENDING Alice
Bob 2014-10-03 b932s PENDING Carol
Bob 2014-10-03 ef9ca IN_PROGRESS David
Approach 1: Query filter
BobHash key Range key
36. Secondary Index
Approach 1: query filter
Bob
Opponent Date GameId Status Host
Alice 2014-10-02 d9bl3 DONE David
Carol 2014-10-08 o2pnb IN_PROGRESS Bob
Bob 2014-09-30 72f49 PENDING Alice
Bob 2014-10-03 b932s PENDING Carol
Bob 2014-10-03 ef9ca IN_PROGRESS David
SELECT * FROM Game
WHERE Opponent='Bob'
ORDER BY Date DESC
FILTER ON Status='PENDING'
(filtered out)
38. Important when:
Use query filter
§Send back less data “on the wire”
§Simplify application code
§Simple SQL-like expressions
§ AND, OR, NOT, ()
Your index isn’t entirely selective
40. Secondary Index
Approach 2: composite key
Opponent StatusDate GameId Host
Alice DONE_2014-10-02 d9bl3 David
Carol IN_PROGRESS_2014-10-08 o2pnb Bob
Bob IN_PROGRESS_2014-10-03 ef9ca David
Bob PENDING_2014-09-30 72f49 Alice
Bob PENDING_2014-10-03 b932s Carol
Hash key Range key
41. Opponent StatusDate GameId Host
Alice DONE_2014-10-02 d9bl3 David
Carol IN_PROGRESS_2014-10-08 o2pnb Bob
Bob IN_PROGRESS_2014-10-03 ef9ca David
Bob PENDING_2014-09-30 72f49 Alice
Bob PENDING_2014-10-03 b932s Carol
Secondary Index
Approach 2: composite key
Bob
SELECT * FROM Game
WHERE Opponent='Bob'
AND StatusDate BEGINS_WITH 'PENDING'
43. Sparse indexes
Id
(Hash)
User Game Score Date Award
1 Bob G1 1300 2012-12-23
2 Bob G1 1450 2012-12-23
3 Jay G1 1600 2012-12-24
4 Mary G1 2000 2012-10-24 Champ
5 Ryan G2 123 2012-03-10
6 Jones G2 345 2012-03-20
Game-scores-table
Award
(Hash)
Id User Score
Champ 4 Mary 2000
Award-GSI
Scan sparse hash GSIs
44. Important when:
Replace filter with indexes
§Concatenate attributes to form useful
§secondary index keys
§Take advantage of sparse indexes
You want to optimize a query as much
as possible
Status + Date
47. DDB 사용현황
§HIT 메인 게임DB
• 27 Tables
• Various Alarms
• Flexible Dashboards
48. DDB, 이런 분에게 추천합니다.
§Capacity 산정
§Monitoring
§백업 및 복원
§Tips
https://i.ytimg.com/vi/C6U2M7ZkZI8/maxresdefault.jpg
49. Capacity 산정
§ 서비스 전 읽기/쓰기에 대한 패턴 테스트를 꼭 수행하세요.
§ 모든 패턴이 읽혀지진 않지만, 병목의 중심이 되는 대략적인 내용은
파악할 수 있습니다.
§ 동시 사용자 증가 속도를 꼭 염두에 두셔야 합니다. 대응하려는
순간 이미 서비스 불가일 수 있습니다.
Ø 2분 안에 800,000 증가
Ø 초당 RCUs 변환 시 13,000 증가
시간
읽
기
호
출
수
50. Capacity 변경
§웹콘솔, AWS앱, CLI 등으로 쉽게 Capacity 변경이
가능합니다.
§다만, 즉시 적용은 되지 않습니다. AWS에도 작업 시간을
주세요.
§대량의 RCUs, WCUs 변경이 필요한 경우는 미리
준비하는 것이 좋습니다.
52. 백업 및 복원
§ Export / Import 개념입니다.
§ Data Pipeline 을 사용하세요.
§ CLI 에 친숙해 지세요. 반복적인 작업을 쉽게 구현할 수
있습니다.
53. 소소한 Tip
§Read / Write 캐시 레이어 도입을 고려하세요.
• Retry 시 지수 백오프 알고리즘을 도입하세요.
• Data 용량계획을 고려하세요.
http://cfile30.uf.tistory.com/image/221DF544538C11D62866E9
54. DDB 를 사용해보니…
§전통적인 RDB와 비교하여 손댈 것이 거의 없습니다.
§AWS Document 에 근거하여 개발하셔야 합니다.
§운영 시 Cloud Watch 및 CLI 가 많은 도움이 됩니다.