Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Dynamodb
1. DynamoDB: Data Example
userId date value unlockedAchievments
hadr-fb 18-07-2012 72 [’10 days’, ‘2 levels day’]
hadr-fb 19-07-2012 1 None
hadr-fb 20-07-2012 56789 [‘top 10 progress’]
Table: ‘Waldo-Scores’
Id platform Name JoinDate Score
hadr fb Hadrien 31-02-2011 10 457
hadr G+ Hadrien 18-07-2012 357
pior fb Pior 12-12-2012 18 951
Table: ‘Players’
2. Data types (Lean. . . )
Types
single
string (utf-8)
number (entre 10-128 et 10+126 )
set
string (utf-8)
number
Constraints
no “Embeded Documents”
no complex types (dates, . . . )
3. Dimensionning 1/2: Big picture
Units
acces/s ∗ roundUp(kb) ∗ item
provisionning
updates are. . . constraining
Storage
tables are “elastic”
64KB max per item
overhead = 100o per item
4. Dimensionning 2/2: Traps and constraints
TRAPS:
Units are divided among each partition.
Bigger tables often means higher throughput. Divide tables ?
CONSTRAINTS for throughput:
absolute
min 5
max 10 000
1 single table in UPDATING state
increase
min 10%
max 100%
decrease
min 10%
max once a day
5. Integrated Service 1/3: IAM
API level
table level (except for “ListTables”)
Example: “Fair” Scores table use
{
"Statement":[{
"Effect":"Allow",
"Action":["DynamoDB:DeleteItem", "DynamoDB:PutItem",
"DynamoDB:UpdateItem", "DynamoDB:GetItem",
"DynamoDB:Query"],
"Resource":
"arn:aws:DynamoDB:<region>:<account>:table/Scores"
}]
}
7. Integrated Service 3/3: EMR
out of the scope of this presentation
basically, HIVE integrated with DynamoDB => HiveQL
use cases:
custom index generation
export to S3 (backup, data removal)
data analysis / aggregation
8. Data access 1/3: GetItem
Fastest: primary key(s)
0-1 item
Cost = 1 unit
Example : ‘Hadrien’ Player of ‘fb’ platform
table = conn.get_table(’Players’)
item = table.get_item(
hash_key=’hadr’,
range_key=’fb’
)
9. Data access 2/3: Query
Fast
primary key
range key conditions =, <, >, <=, >=, startsWith
0+ item(s)
Cost = 1 unit per returned item
Example : All ‘Waldo-Scores’ of ‘hadr-fb’ Player
table = conn.get_table(’Waldo-Scores’)
item = table.get_item(
hash_key=’hadr-fb’,
#range_key_condition=
)
10. Data access 3/3: Scan
Slooooow
filter on any key
tests ALL the table !
0+ item(s)
Cost = 1/2 unit for each parsed KB ! => Starvations
Use case: get a full (small) table. Ex: ‘powerups’
Example : All days where ‘hadr-*’ did better than 100
table = conn.get_table(’Waldo-Scores’)
item = table.get_item(
scan_filter={
’userId’: BEGINSWITH(’hadr-’),
’value’: GT(100)
})
11. Performance considerations: non indexed data 1/2
De-normalisation
Ex: Waldo and Players table :)
big picture: data duplication to fit the
view point
need
12. Performance considerations: non indexed data 2/2
Scan
sloooooow (sequential)
(bad) unit consumption (sequential)
EMR
scales (less slow :p)
(better) units consumption (parallele)
TL;DR
Index your data !
13. Eventual vs strong consitence
write => propagation ∼ 1s
read => may not be up to date . . .
Consistence Applications Cost (Units) performance
strong critical 1 per KB good
eventual aware 1/2 per KB maximal
14. Critical/specific applications
Redundancy/backup
managed => no need
“∼ Snapshot” => EMR + S3
∼ Transactions
conditional operations (idempotent)
atomic counter (idempotent BUT strong consistence)