SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How GumGum Migrated
from Cassandra to Amazon
DynamoDB
Anirban Roy
Lead Engineer
GumGum
D A T 3 4 5
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Introduction
Background
Alternatives and comparison
About the data
Migration strategy
Observations and benefits
Q&A
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
High traffic with surges
90% of our traffic
involves
our programmatic
partners
Introduction: Background
Low response time
Maintaining low latency
is key to revenue
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Apache Cassandra
We use to run 106 nodes
of i3.2xlarge instances
on AWS
Introduction: The problem
Scaling
Required adding nodes
manually to the cluster
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data center outrages
Introduction: The problem
Revenue loss Engineering fatigue
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Alternatives
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Alternatives
More than 225 available
(source: nosql-database.org)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Alternatives
GumGum’s Blogpost: https://techblog.gumgum.com/articles/moving-to-amazon-
dynamodb-from-hosted-cassandra
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benchmarking
DynamoDB
• YCSB
benchmarked
• Loaded ~20
million items
(~22 GB)
GumGum’s blogpost: https://techblog.gumgum.com/articles/moving-to-
amazon-dynamodb-from-hosted-cassandra
YCSB https://github.com/brianfrankcooper/YCSB
Apache Cassandra
• Achieved ~125,000
reads per second and
~40,000 writes per
second
• ~3-5ms read latency
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Behavioral targeting data
DMP partners DSP partners
Cookie syncing
30 days TTL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
GumGum Metadata Store (replicated across all four data centers of GG)
Contextual targeting data
Image URL Page
URL
30 Days to one
year TTL for
images
Seven days to one year
TTL for pages
GumGum TaPas (NLP)GumGum Vertex (CV)
ECS spot ECS spot ECS spot
images_metadata pages_metadata
Vertex spot
node
Vertex spot
node
Vertex spot
node
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Behavioral targeting data migration
Migration involved the following
• Data volume is considerably bigger
• No ETL operation required for migration
• WRITE -> WAIT -> READ approach
• Exploit the fact that TTL is short (30 days
- WAIT phase) Visitors keyspace
visitors
Ad server Ad server Ad server
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Contextual targeting data migration
images_metadata
pages_metadata
Extract data Transform data Load dataCassandra
keyspace
images_metadata
pages_metadata
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Caching: DAX or Memcached
When using DAX (only with DynamoDB)
AWS DAX
When using Memcached
GumGum ad
servers
Memcached
node
Memcached
node
DAX node
DAX node
NOSQL store
GumGum ad
servers
Ad server
Ad server
Ad server
Ad Server
Ad Server
Ad Server
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data replication requirements
• Behavioral targeting
• Data is required to be replicated
between the US east and US west
data centers
• Global replication is not required
• Contextual targeting
• Data replication is required across all
the four data centers of GumGum
• Global Tables was used to achieve
replication
During development for behavioral targeting
data, replication was not yet supported by
DynamoDB
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Data replication architecture: Master-Master
Modified dynamodb-cross-region-library to perform Master-Master replication. Changes can be found
at https://github.com/awslabs/dynamodb-cross-region-library/pull/53
AWS Region US East 1
AWS Cloud
VPC
AWS Region US West 2
VPC
Auto
scaling
replicator
replicator
Auto
scaling
replicator
replicator
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits: performance
• 4-5ms read
latency
• No throttles
• Zero outages so
far
• Less timeouts
than Cassandra
4-5ms read latency
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Benefits: Cost
• Cassandra hosting cost
• 80 i3.2xlarge instances
• Total hosting cost: 0.624000 x 24 x 365
x 80 = $437299.2 USD
• DynamoDB running cost
• Per month = ~450 x 30 = ~13500 USD
• Estimated annual cost = 14100 x 12 =
$162000 USD
• % Saving
• {(437299.2 - 162000) x 100}/ 437299.2 =
62.95%
65-70%
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Operational stats
2 TB data
16.2 billion
items
~ 8 million reads
per minute
All at <3ms read latency
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
But wait - There’s more about DynamoDB
A list of all DynamoDB sessions, workshops, and chalk talks
• Migrating Apache Cassandra to DynamoDB
• What’s new with DynamoDB
• Purpose-built databases in AWS
• DynamoDB service level agreement
• Adaptive capacity
• Point-in-time recovery (PITR)
• Global tables
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Anirban Roy
LinkedIn: anirban51roy
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

Más contenido relacionado

La actualidad más candente

클라우드 기반 AWS 데이터베이스 선택 옵션 - AWS Summit Seoul 2017
클라우드 기반 AWS 데이터베이스 선택 옵션 - AWS Summit Seoul 2017 클라우드 기반 AWS 데이터베이스 선택 옵션 - AWS Summit Seoul 2017
클라우드 기반 AWS 데이터베이스 선택 옵션 - AWS Summit Seoul 2017
Amazon Web Services Korea
 

La actualidad más candente (20)

AWS Summit Seoul 2023 | 100만명이 사용하는 GenerativeAI 이루다를 만들면서 배운 것 : 스캐터랩의 AWS 활용법
AWS Summit Seoul 2023 | 100만명이 사용하는 GenerativeAI 이루다를 만들면서 배운 것 : 스캐터랩의 AWS 활용법AWS Summit Seoul 2023 | 100만명이 사용하는 GenerativeAI 이루다를 만들면서 배운 것 : 스캐터랩의 AWS 활용법
AWS Summit Seoul 2023 | 100만명이 사용하는 GenerativeAI 이루다를 만들면서 배운 것 : 스캐터랩의 AWS 활용법
 
알고리즘 시각화 라이브러리 ipytracer 개발기
알고리즘 시각화 라이브러리 ipytracer 개발기알고리즘 시각화 라이브러리 ipytracer 개발기
알고리즘 시각화 라이브러리 ipytracer 개발기
 
AWS Service Catalog
AWS Service CatalogAWS Service Catalog
AWS Service Catalog
 
Amazon CloudFront
Amazon CloudFrontAmazon CloudFront
Amazon CloudFront
 
Running Mission Critical Workloads on AWS
Running Mission Critical Workloads on AWSRunning Mission Critical Workloads on AWS
Running Mission Critical Workloads on AWS
 
AWS Summit Seoul 2023 | 오픈소스 데이터베이스로 탈 오라클! Why not?
AWS Summit Seoul 2023 | 오픈소스 데이터베이스로 탈 오라클! Why not?AWS Summit Seoul 2023 | 오픈소스 데이터베이스로 탈 오라클! Why not?
AWS Summit Seoul 2023 | 오픈소스 데이터베이스로 탈 오라클! Why not?
 
AWS Summit Seoul 2023 | 생성 AI 모델의 임베딩 벡터를 이용한 서버리스 추천 검색 구현하기
AWS Summit Seoul 2023 | 생성 AI 모델의 임베딩 벡터를 이용한 서버리스 추천 검색 구현하기AWS Summit Seoul 2023 | 생성 AI 모델의 임베딩 벡터를 이용한 서버리스 추천 검색 구현하기
AWS Summit Seoul 2023 | 생성 AI 모델의 임베딩 벡터를 이용한 서버리스 추천 검색 구현하기
 
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
효과적인 NoSQL (Elasticahe / DynamoDB) 디자인 및 활용 방안 (최유정 & 최홍식, AWS 솔루션즈 아키텍트) :: ...
 
Voldemort
VoldemortVoldemort
Voldemort
 
Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption
 
Packer, Terraform, Vault를 이용해 만드는 
재현 가능한 게임 인프라
Packer, Terraform, Vault를 이용해 만드는 
재현 가능한 게임 인프라Packer, Terraform, Vault를 이용해 만드는 
재현 가능한 게임 인프라
Packer, Terraform, Vault를 이용해 만드는 
재현 가능한 게임 인프라
 
고객의 플랫폼/서비스를 개선한 국내 사례 살펴보기 – 장준성 AWS 솔루션즈 아키텍트, 강산아 NDREAM 팀장, 송영호 야놀자 매니저, ...
고객의 플랫폼/서비스를 개선한 국내 사례 살펴보기 – 장준성 AWS 솔루션즈 아키텍트, 강산아 NDREAM 팀장, 송영호 야놀자 매니저, ...고객의 플랫폼/서비스를 개선한 국내 사례 살펴보기 – 장준성 AWS 솔루션즈 아키텍트, 강산아 NDREAM 팀장, 송영호 야놀자 매니저, ...
고객의 플랫폼/서비스를 개선한 국내 사례 살펴보기 – 장준성 AWS 솔루션즈 아키텍트, 강산아 NDREAM 팀장, 송영호 야놀자 매니저, ...
 
Automate Best Practices and Operational Health for AWS Resources with AWS Tru...
Automate Best Practices and Operational Health for AWS Resources with AWS Tru...Automate Best Practices and Operational Health for AWS Resources with AWS Tru...
Automate Best Practices and Operational Health for AWS Resources with AWS Tru...
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
20220328_-_Web_Island_-_SEO_Horrors_Stories.pptx
20220328_-_Web_Island_-_SEO_Horrors_Stories.pptx20220328_-_Web_Island_-_SEO_Horrors_Stories.pptx
20220328_-_Web_Island_-_SEO_Horrors_Stories.pptx
 
Deep Dive on Amazon Elastic Container Service (ECS) and Fargate
Deep Dive on Amazon Elastic Container Service (ECS) and FargateDeep Dive on Amazon Elastic Container Service (ECS) and Fargate
Deep Dive on Amazon Elastic Container Service (ECS) and Fargate
 
클라우드 기반 AWS 데이터베이스 선택 옵션 - AWS Summit Seoul 2017
클라우드 기반 AWS 데이터베이스 선택 옵션 - AWS Summit Seoul 2017 클라우드 기반 AWS 데이터베이스 선택 옵션 - AWS Summit Seoul 2017
클라우드 기반 AWS 데이터베이스 선택 옵션 - AWS Summit Seoul 2017
 
AWS Summit Seoul 2023 | AWS에서 최소한의 비용으로 구현하는 멀티리전 DR 자동화 구성
AWS Summit Seoul 2023 | AWS에서 최소한의 비용으로 구현하는 멀티리전 DR 자동화 구성AWS Summit Seoul 2023 | AWS에서 최소한의 비용으로 구현하는 멀티리전 DR 자동화 구성
AWS Summit Seoul 2023 | AWS에서 최소한의 비용으로 구현하는 멀티리전 DR 자동화 구성
 
Amazon API Gateway
Amazon API GatewayAmazon API Gateway
Amazon API Gateway
 
[AWS Migration Workshop] 데이터베이스를 AWS로 손쉽게 마이그레이션 하기
[AWS Migration Workshop]  데이터베이스를 AWS로 손쉽게 마이그레이션 하기[AWS Migration Workshop]  데이터베이스를 AWS로 손쉽게 마이그레이션 하기
[AWS Migration Workshop] 데이터베이스를 AWS로 손쉽게 마이그레이션 하기
 

Similar a How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Invent 2018

Similar a How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Invent 2018 (20)

Migrating database to cloud
Migrating database to cloudMigrating database to cloud
Migrating database to cloud
 
What's New with Amazon DynamoDB
What's New with Amazon DynamoDBWhat's New with Amazon DynamoDB
What's New with Amazon DynamoDB
 
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
Running Lean Architectures: How to Optimize for Cost Efficiency (ARC202-R2) -...
 
Why GE Aviation Migrated from Cassandra to Amazon DynamoDB (DAT332) - AWS re:...
Why GE Aviation Migrated from Cassandra to Amazon DynamoDB (DAT332) - AWS re:...Why GE Aviation Migrated from Cassandra to Amazon DynamoDB (DAT332) - AWS re:...
Why GE Aviation Migrated from Cassandra to Amazon DynamoDB (DAT332) - AWS re:...
 
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
Lessons Learned from a Large-Scale Legacy Migration with Sysco (STG311) - AWS...
 
What’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech Talks
What’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech TalksWhat’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech Talks
What’s New for Amazon DynamoDB - 2018 Q1 Update - AWS Online Tech Talks
 
Database NoSQL gestiti
Database NoSQL gestitiDatabase NoSQL gestiti
Database NoSQL gestiti
 
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
Migrate Your Hadoop/Spark Workload to Amazon EMR and Architect It for Securit...
 
Scaling from zero to millions of users
Scaling from zero to millions of usersScaling from zero to millions of users
Scaling from zero to millions of users
 
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
Accelerate Analytics at Scale with Amazon EMR - AWS Summit Sydney 2018
 
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
Shift-Left SRE: Self-Healing with AWS Lambda Functions (DEV313-S) - AWS re:In...
 
Cost Optimisation Using Modern Cloud Architectures - AWS Summit Sydney 2018
Cost Optimisation Using Modern Cloud Architectures - AWS Summit Sydney 2018Cost Optimisation Using Modern Cloud Architectures - AWS Summit Sydney 2018
Cost Optimisation Using Modern Cloud Architectures - AWS Summit Sydney 2018
 
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS SummitOptimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
Optimize EC2 for Fun and Profit - SRV203 - Anaheim AWS Summit
 
How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...
How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...
How Amazon Migrated Items & Offers for Retail, Marketplace, & Digital to Dyna...
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech Talks
 
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
Under the Hood: How Amazon Uses AWS Services for Analytics at a Massive Scale...
 
Best of AWS re:Invent 2017
Best of AWS re:Invent 2017Best of AWS re:Invent 2017
Best of AWS re:Invent 2017
 
深入淺出 Amazon Database Migration Service
深入淺出 Amazon Database Migration Service 深入淺出 Amazon Database Migration Service
深入淺出 Amazon Database Migration Service
 
Getting Started with Amazon Database Migration Service
Getting Started with Amazon Database Migration ServiceGetting Started with Amazon Database Migration Service
Getting Started with Amazon Database Migration Service
 
Vanguard's Journey with Tableau to the AWS Cloud (FSV307-S) - AWS re:Invent 2018
Vanguard's Journey with Tableau to the AWS Cloud (FSV307-S) - AWS re:Invent 2018Vanguard's Journey with Tableau to the AWS Cloud (FSV307-S) - AWS re:Invent 2018
Vanguard's Journey with Tableau to the AWS Cloud (FSV307-S) - AWS re:Invent 2018
 

Más de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How GumGum Migrated from Cassandra to Amazon DynamoDB Anirban Roy Lead Engineer GumGum D A T 3 4 5
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Introduction Background Alternatives and comparison About the data Migration strategy Observations and benefits Q&A
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 5.
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 7.
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 9.
  • 10.
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. High traffic with surges 90% of our traffic involves our programmatic partners Introduction: Background Low response time Maintaining low latency is key to revenue
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Apache Cassandra We use to run 106 nodes of i3.2xlarge instances on AWS Introduction: The problem Scaling Required adding nodes manually to the cluster
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data center outrages Introduction: The problem Revenue loss Engineering fatigue
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Alternatives
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Alternatives More than 225 available (source: nosql-database.org)
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Alternatives GumGum’s Blogpost: https://techblog.gumgum.com/articles/moving-to-amazon- dynamodb-from-hosted-cassandra
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Benchmarking DynamoDB • YCSB benchmarked • Loaded ~20 million items (~22 GB) GumGum’s blogpost: https://techblog.gumgum.com/articles/moving-to- amazon-dynamodb-from-hosted-cassandra YCSB https://github.com/brianfrankcooper/YCSB Apache Cassandra • Achieved ~125,000 reads per second and ~40,000 writes per second • ~3-5ms read latency
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Behavioral targeting data DMP partners DSP partners Cookie syncing 30 days TTL
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. GumGum Metadata Store (replicated across all four data centers of GG) Contextual targeting data Image URL Page URL 30 Days to one year TTL for images Seven days to one year TTL for pages GumGum TaPas (NLP)GumGum Vertex (CV) ECS spot ECS spot ECS spot images_metadata pages_metadata Vertex spot node Vertex spot node Vertex spot node
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Behavioral targeting data migration Migration involved the following • Data volume is considerably bigger • No ETL operation required for migration • WRITE -> WAIT -> READ approach • Exploit the fact that TTL is short (30 days - WAIT phase) Visitors keyspace visitors Ad server Ad server Ad server
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Contextual targeting data migration images_metadata pages_metadata Extract data Transform data Load dataCassandra keyspace images_metadata pages_metadata
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Caching: DAX or Memcached When using DAX (only with DynamoDB) AWS DAX When using Memcached GumGum ad servers Memcached node Memcached node DAX node DAX node NOSQL store GumGum ad servers Ad server Ad server Ad server Ad Server Ad Server Ad Server
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data replication requirements • Behavioral targeting • Data is required to be replicated between the US east and US west data centers • Global replication is not required • Contextual targeting • Data replication is required across all the four data centers of GumGum • Global Tables was used to achieve replication During development for behavioral targeting data, replication was not yet supported by DynamoDB
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Data replication architecture: Master-Master Modified dynamodb-cross-region-library to perform Master-Master replication. Changes can be found at https://github.com/awslabs/dynamodb-cross-region-library/pull/53 AWS Region US East 1 AWS Cloud VPC AWS Region US West 2 VPC Auto scaling replicator replicator Auto scaling replicator replicator
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Benefits: performance • 4-5ms read latency • No throttles • Zero outages so far • Less timeouts than Cassandra 4-5ms read latency
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Benefits: Cost • Cassandra hosting cost • 80 i3.2xlarge instances • Total hosting cost: 0.624000 x 24 x 365 x 80 = $437299.2 USD • DynamoDB running cost • Per month = ~450 x 30 = ~13500 USD • Estimated annual cost = 14100 x 12 = $162000 USD • % Saving • {(437299.2 - 162000) x 100}/ 437299.2 = 62.95% 65-70%
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Operational stats 2 TB data 16.2 billion items ~ 8 million reads per minute All at <3ms read latency
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. But wait - There’s more about DynamoDB A list of all DynamoDB sessions, workshops, and chalk talks • Migrating Apache Cassandra to DynamoDB • What’s new with DynamoDB • Purpose-built databases in AWS • DynamoDB service level agreement • Adaptive capacity • Point-in-time recovery (PITR) • Global tables
  • 35. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Anirban Roy LinkedIn: anirban51roy
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.