Más contenido relacionado La actualidad más candente (20) Similar a How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Invent 2018 (20) Más de Amazon Web Services (20) How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Invent 20182. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How GumGum Migrated
from Cassandra to Amazon
DynamoDB
Anirban Roy
Lead Engineer
GumGum
D A T 3 4 5
3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Introduction
Background
Alternatives and comparison
About the data
Migration strategy
Observations and benefits
Q&A
4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
High traffic with surges
90% of our traffic
involves
our programmatic
partners
Introduction: Background
Low response time
Maintaining low latency
is key to revenue
12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Apache Cassandra
We use to run 106 nodes
of i3.2xlarge instances
on AWS
Introduction: The problem
Scaling
Required adding nodes
manually to the cluster
13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data center outrages
Introduction: The problem
Revenue loss Engineering fatigue
14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Alternatives
15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Alternatives
More than 225 available
(source: nosql-database.org)
16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Alternatives
GumGum’s Blogpost: https://techblog.gumgum.com/articles/moving-to-amazon-
dynamodb-from-hosted-cassandra
17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benchmarking
DynamoDB
• YCSB
benchmarked
• Loaded ~20
million items
(~22 GB)
GumGum’s blogpost: https://techblog.gumgum.com/articles/moving-to-
amazon-dynamodb-from-hosted-cassandra
YCSB https://github.com/brianfrankcooper/YCSB
Apache Cassandra
• Achieved ~125,000
reads per second and
~40,000 writes per
second
• ~3-5ms read latency
18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Behavioral targeting data
DMP partners DSP partners
Cookie syncing
30 days TTL
20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
GumGum Metadata Store (replicated across all four data centers of GG)
Contextual targeting data
Image URL Page
URL
30 Days to one
year TTL for
images
Seven days to one year
TTL for pages
GumGum TaPas (NLP)GumGum Vertex (CV)
ECS spot ECS spot ECS spot
images_metadata pages_metadata
Vertex spot
node
Vertex spot
node
Vertex spot
node
21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Behavioral targeting data migration
Migration involved the following
• Data volume is considerably bigger
• No ETL operation required for migration
• WRITE -> WAIT -> READ approach
• Exploit the fact that TTL is short (30 days
- WAIT phase) Visitors keyspace
visitors
Ad server Ad server Ad server
23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Contextual targeting data migration
images_metadata
pages_metadata
Extract data Transform data Load dataCassandra
keyspace
images_metadata
pages_metadata
24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Caching: DAX or Memcached
When using DAX (only with DynamoDB)
AWS DAX
When using Memcached
GumGum ad
servers
Memcached
node
Memcached
node
DAX node
DAX node
NOSQL store
GumGum ad
servers
Ad server
Ad server
Ad server
Ad Server
Ad Server
Ad Server
26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data replication requirements
• Behavioral targeting
• Data is required to be replicated
between the US east and US west
data centers
• Global replication is not required
• Contextual targeting
• Data replication is required across all
the four data centers of GumGum
• Global Tables was used to achieve
replication
During development for behavioral targeting
data, replication was not yet supported by
DynamoDB
28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data replication architecture: Master-Master
Modified dynamodb-cross-region-library to perform Master-Master replication. Changes can be found
at https://github.com/awslabs/dynamodb-cross-region-library/pull/53
AWS Region US East 1
AWS Cloud
VPC
AWS Region US West 2
VPC
Auto
scaling
replicator
replicator
Auto
scaling
replicator
replicator
29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits: performance
• 4-5ms read
latency
• No throttles
• Zero outages so
far
• Less timeouts
than Cassandra
4-5ms read latency
31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits: Cost
• Cassandra hosting cost
• 80 i3.2xlarge instances
• Total hosting cost: 0.624000 x 24 x 365
x 80 = $437299.2 USD
• DynamoDB running cost
• Per month = ~450 x 30 = ~13500 USD
• Estimated annual cost = 14100 x 12 =
$162000 USD
• % Saving
• {(437299.2 - 162000) x 100}/ 437299.2 =
62.95%
65-70%
32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Operational stats
2 TB data
16.2 billion
items
~ 8 million reads
per minute
All at <3ms read latency
33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
But wait - There’s more about DynamoDB
A list of all DynamoDB sessions, workshops, and chalk talks
• Migrating Apache Cassandra to DynamoDB
• What’s new with DynamoDB
• Purpose-built databases in AWS
• DynamoDB service level agreement
• Adaptive capacity
• Point-in-time recovery (PITR)
• Global tables
35. Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Anirban Roy
LinkedIn: anirban51roy
36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.