SlideShare a Scribd company logo
1 of 47
Download to read offline
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
EmoDB
Store your feelings
here
www.bazaarvoice.com
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SaaS serving software that collects
and displays user generated content,
crunches analytics, and extracts
insights.
Thousands of clients
Hundreds of millions of pieces of content
Hundreds of millions of unique visitors per
month
Tens of billions of pageviews per month
Austin-based company founded in
2005
Austin San Francisco New YorkEngineering offices
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Fahd Siddiqui
Senior Software Engineer, Data Infrastructure
Bazaarvoice
linkedin.com/in/fahdsiddiqui
fahd.siddiqui@bazaarvoice.com
$ whoami
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Global Monthly Unique Visitors
1B
1B
500M
1B
400M
200M
250M
450M
1B
600M
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Monthly stats as of July 2013
16B
1B
480M
250M
118M
3M
4000
2500
Review impressions
Pageviews (37k rps)
Unique users
Products in catalog
Total reviews
Monthly new reviews
Customer implementations
Servers
95 Engineers
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Data Infrastructure
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Data Infrastructure
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Data Infrastructure
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Data Infrastructure
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Data Infrastructure
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Goals for EmoDB
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Goals for EmoDB
Store in a flexible way about anything
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Goals for EmoDB
Store in a flexible way about anything
Support “Universal Content Type” – store
any content type without any re-
architecture
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Goals for EmoDB
Store in a flexible way about anything
Support “Universal Content Type” – store
any content type without any re-
architecture
Watch for changes to data events
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Goals for EmoDB
Store in a flexible way about anything
Support “Universal Content Type” – store
any content type without any re-
architecture
Watch for changes to data events
Exposes RESTful API
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Goals for EmoDB
Store in a flexible way about anything
Support “Universal Content Type” – store
any content type without any re-
architecture
Watch for changes to data events
Exposes RESTful API
Multi-master, multi-datacenter, fault
tolerant, horizontal scale on r/w
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
EmoDB Overview
System of Record
Databus
Queue Service
Blob Store
….. Backed by Cassandra
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - tables
What is an Emo Table?
It is a bucket that contains json document. Creating it is cheap, and you may create as many as
you want e.g.., review:testcustomer
Offers a way to fetch any particular row id, and
Complete table scan – uses splits for parallel scans
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - tables
Create a table
$ curl -s -XPUT -H "Content-Type: application/json" 
"http://localhost:8080/sor/1/_table/review:testcustomer
?options=placement:'ugc_global:ugc'&audit=comment:'initial+provisioning',host:aws-tools-02" 
--data-binary '{"type":"review","client":"TestCustomer"}' | jsonpp
{
"success": true
}
• Store a document
$ curl -s -XPUT -H "Content-Type: application/json" 
http://localhost:8080/sor/1/review:testcustomer/demo1?audit=comment:'initial+submission',host:aws-submit-09 
--data-binary '{"author":"Bob","title":"Best Ever!","rating":5}' | jsonpp
{
"success": true
}
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – rows
Row is composed of deltas
Writers append deltas, and readers resolve deltas to produce a resolved object
Compaction occurs when data has been replicated to all data centers
Due to this, EmoDb is not good for systems high update/create ratio
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Data Access in EmoDB
3 ways to read data out of EmoDB
Lookup by primary key
Bulk extract (scan)
Change feed (using EmoDB databus)
What’s missing?
Where, join, group by, anything other than primary key lookup
Use other indexing mechanism for complex queries (such as elasticsearch, solr, etc.)
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Data storage challenges
Problem 1:
Need a way to cheaply create 10’s of 1000’s of “tables”
As of Cassandra 1.1, at least 1 MB of memory in every node for each CF is needed
Way too much overhead to dedicate a CF for each user-defined table
Hint: We’ll use only one Column Family to store all tables
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Data storage challenges
Problem 2 (once Problem 1 is solved):
Need to scan entire table to be indexed by Polloi (Elasticsearch)
Require a way to split tables into shards that enable sequential scan
Shards for each table should be fully distributed over Cassandra cluster
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Data storage
Solution to both Problem 1 and 2:
Row key byte buffer contains a 9 byte “table prefix”
0 – 0: 8-bit shard identifier
1 – 8: 64-bit table UUID
N-byte - UTF-8-encoded content key
Shard identifier is determined by
Bottom 8 bits of 32-bit Murmur3 hash of
(table UUID | content key)
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Scanning Table
Shard identifier serves to spread content for a given table to avoid hotspots (using
ByteOrderedPartitioner)
All content for a table can be fetched in parallel using 2^8 = 256 range queries
There you have it, a single CF offering range scans for segments (tables) that are fully
distributed over the cluster !
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Scanning Table
Table UUIDs also solved another problem for us
Multiple tables can now be stored in the same CF
Since we use UUID, it allows us to DROP tables, and CREATE with the same name.
DROP’ed table deleted lazily – specially important in an eventually consistent world
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Parallel Scan
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Parallel Scan
Call getSplits() method to get a list of split identifiers
Then, in parallel, scan the data in each split by calling the getSplit() method
Java:
Collection<String> getSplits(String table, int desiredRecordsPerSplit);
Iterator<Map<String, Object>> getSplit(String table, String split,
@Nullable String fromKeyExclusive, long limit, ReadConsistency
consistency);
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Parallel Scan
Java code sample
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Documents are stored as a sequence of deltas
Readers evaluate deltas in order to produce document
Create, update, and delete documents by creating deltas
Weak consistency – no document level locking
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Typically a replication conflict between t2 and t3
But since each delta specifies only the fields it modifies, the deltas merge together cleanly and
produce the desired result.
No cross-data center synchronous communication required for concurrent modification
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Recursive, pattern matching approach
Operations available for:
Setting a value
Deleting a value
Updating a value for a key in a map
No operation for modifying a list
Model list using a map
Time UUID is a good candidate for list keys
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Literal – “smash” operation
Delete
Map
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Conditional
Perform a delta conditionally
Designed to help resolve the most common concurrent write conflict situations
Simple and reliable
Eg., Mark review “approved” only if moderation hasn’t begun
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR – Deltas
Other types of conditions
Equal, Intrinsic, Is, Map, And, Or, Not, Constant
Eg., {..,"type":or("product","category"),"client":"TestCustomer"}
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Read-Modify-Write
Read original state
Compute new version
The write succeeds, or
Eventually, the write conflicts, and databus fires an event for the application to detect it, and
retry the write.
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Data
center A
• T1
• Conditio
nal T3
Data
center B
• T1
• T2
•
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
SoR - Deltas
Compaction
For efficiency, older deltas get compacted and replaced by a single delta – a “compaction”
record
Ensures intrinsics like ~version, ~firstUpdateAt, etc. are maintained
Compaction happens opportunistically, whenever documents are read
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Databus
Allows applications to get notified of updates to SoR
Must create a persistent subscription
A table or multiple tables (based on value of attributes)
SoR “DVR”s updates for all subscriptions
Supports multiple concurrent writers, and readers (polls and acks)
No guarantees on order
To help SoR provides ~version, and ~signature
Exposes RESTful API
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Databus – Subscription Management
Subscribe to changes to a set of tables in the System of Record
Table filters are the same as conditions for deltas
Follow events on all tables for which the condition evaluates to true
To subscribe to all tables in the SoR, omit the condition or pass ‘alwaysTrue’
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Databus
Subscribe for multiple tables
Count events
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Databus
Poll for events
Check for unclaimed, unacknowledged events
If events not ack’d, then they will return in another poll after claim period expires
Renew claims
Acknowledge Claims
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Blob Store
REST storage service for photos.
No single point of failure (data loss after 3 servers fail.)
Sweet spot is blobs of a few MB, not GB (not designed for video.)
Data replicates to all data centers
Except where replication is restricted by legal
Why not Amazon S3?
Lower latency: reads & writes are always served out of the local data center.
If you don't read cross-data center or you don't mind writing to buckets in multiple regions, use
S3 or S3+Cloudfront.
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Highly Scalable Architecture
We serve traffic out of three AWS
regions simultaneously
DNS Global Traffic Management sends user
requests to the fastest region
Application services are all auto-scaled
and self-healing
Our Cassandra-based EmoDB operations out of
multiple Availability Zones, so that an AZ failure
doesn’t result in downtime
Cassandra replicates across all three regions
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
Emo/Polloi Contributors
Aaron Dixon
Ahaduzzaman Munna
Dave Barcelo
Fahd Siddiqui
John Roesler
Mark Brandt
Matt Bogner
Nate Bauernfiend
Shawn Smith
Steven Grotten
Confidential and Proprietary. © 2013 Bazaarvoice, Inc.
@Bazaarvoice
@BazaarvoiceDev
http://www.bazaarvoice.com/
http://blog.developer.bazaarvoice.com/
Learn
more

More Related Content

Recently uploaded

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Cassandra at Bazaarvoice - EmoDB

  • 1. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. EmoDB Store your feelings here www.bazaarvoice.com
  • 2. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SaaS serving software that collects and displays user generated content, crunches analytics, and extracts insights. Thousands of clients Hundreds of millions of pieces of content Hundreds of millions of unique visitors per month Tens of billions of pageviews per month Austin-based company founded in 2005 Austin San Francisco New YorkEngineering offices
  • 3. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Fahd Siddiqui Senior Software Engineer, Data Infrastructure Bazaarvoice linkedin.com/in/fahdsiddiqui fahd.siddiqui@bazaarvoice.com $ whoami
  • 4. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Global Monthly Unique Visitors 1B 1B 500M 1B 400M 200M 250M 450M 1B 600M
  • 5. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Monthly stats as of July 2013 16B 1B 480M 250M 118M 3M 4000 2500 Review impressions Pageviews (37k rps) Unique users Products in catalog Total reviews Monthly new reviews Customer implementations Servers 95 Engineers
  • 6. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Data Infrastructure
  • 7. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Data Infrastructure
  • 8. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Data Infrastructure
  • 9. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Data Infrastructure
  • 10. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Data Infrastructure
  • 11. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Goals for EmoDB
  • 12. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Goals for EmoDB Store in a flexible way about anything
  • 13. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Goals for EmoDB Store in a flexible way about anything Support “Universal Content Type” – store any content type without any re- architecture
  • 14. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Goals for EmoDB Store in a flexible way about anything Support “Universal Content Type” – store any content type without any re- architecture Watch for changes to data events
  • 15. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Goals for EmoDB Store in a flexible way about anything Support “Universal Content Type” – store any content type without any re- architecture Watch for changes to data events Exposes RESTful API
  • 16. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Goals for EmoDB Store in a flexible way about anything Support “Universal Content Type” – store any content type without any re- architecture Watch for changes to data events Exposes RESTful API Multi-master, multi-datacenter, fault tolerant, horizontal scale on r/w
  • 17. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. EmoDB Overview System of Record Databus Queue Service Blob Store ….. Backed by Cassandra
  • 18. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - tables What is an Emo Table? It is a bucket that contains json document. Creating it is cheap, and you may create as many as you want e.g.., review:testcustomer Offers a way to fetch any particular row id, and Complete table scan – uses splits for parallel scans
  • 19. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - tables Create a table $ curl -s -XPUT -H "Content-Type: application/json" "http://localhost:8080/sor/1/_table/review:testcustomer ?options=placement:'ugc_global:ugc'&audit=comment:'initial+provisioning',host:aws-tools-02" --data-binary '{"type":"review","client":"TestCustomer"}' | jsonpp { "success": true } • Store a document $ curl -s -XPUT -H "Content-Type: application/json" http://localhost:8080/sor/1/review:testcustomer/demo1?audit=comment:'initial+submission',host:aws-submit-09 --data-binary '{"author":"Bob","title":"Best Ever!","rating":5}' | jsonpp { "success": true }
  • 20. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – rows Row is composed of deltas Writers append deltas, and readers resolve deltas to produce a resolved object Compaction occurs when data has been replicated to all data centers Due to this, EmoDb is not good for systems high update/create ratio
  • 21. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Data Access in EmoDB 3 ways to read data out of EmoDB Lookup by primary key Bulk extract (scan) Change feed (using EmoDB databus) What’s missing? Where, join, group by, anything other than primary key lookup Use other indexing mechanism for complex queries (such as elasticsearch, solr, etc.)
  • 22. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Data storage challenges Problem 1: Need a way to cheaply create 10’s of 1000’s of “tables” As of Cassandra 1.1, at least 1 MB of memory in every node for each CF is needed Way too much overhead to dedicate a CF for each user-defined table Hint: We’ll use only one Column Family to store all tables
  • 23. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Data storage challenges Problem 2 (once Problem 1 is solved): Need to scan entire table to be indexed by Polloi (Elasticsearch) Require a way to split tables into shards that enable sequential scan Shards for each table should be fully distributed over Cassandra cluster
  • 24. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Data storage Solution to both Problem 1 and 2: Row key byte buffer contains a 9 byte “table prefix” 0 – 0: 8-bit shard identifier 1 – 8: 64-bit table UUID N-byte - UTF-8-encoded content key Shard identifier is determined by Bottom 8 bits of 32-bit Murmur3 hash of (table UUID | content key)
  • 25. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Scanning Table Shard identifier serves to spread content for a given table to avoid hotspots (using ByteOrderedPartitioner) All content for a table can be fetched in parallel using 2^8 = 256 range queries There you have it, a single CF offering range scans for segments (tables) that are fully distributed over the cluster !
  • 26. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Scanning Table Table UUIDs also solved another problem for us Multiple tables can now be stored in the same CF Since we use UUID, it allows us to DROP tables, and CREATE with the same name. DROP’ed table deleted lazily – specially important in an eventually consistent world
  • 27. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Parallel Scan
  • 28. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Parallel Scan Call getSplits() method to get a list of split identifiers Then, in parallel, scan the data in each split by calling the getSplit() method Java: Collection<String> getSplits(String table, int desiredRecordsPerSplit); Iterator<Map<String, Object>> getSplit(String table, String split, @Nullable String fromKeyExclusive, long limit, ReadConsistency consistency);
  • 29. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Parallel Scan Java code sample
  • 30. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Documents are stored as a sequence of deltas Readers evaluate deltas in order to produce document Create, update, and delete documents by creating deltas Weak consistency – no document level locking
  • 31. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas
  • 32. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Typically a replication conflict between t2 and t3 But since each delta specifies only the fields it modifies, the deltas merge together cleanly and produce the desired result. No cross-data center synchronous communication required for concurrent modification
  • 33. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Recursive, pattern matching approach Operations available for: Setting a value Deleting a value Updating a value for a key in a map No operation for modifying a list Model list using a map Time UUID is a good candidate for list keys
  • 34. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Literal – “smash” operation Delete Map
  • 35. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Conditional Perform a delta conditionally Designed to help resolve the most common concurrent write conflict situations Simple and reliable Eg., Mark review “approved” only if moderation hasn’t begun
  • 36. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR – Deltas Other types of conditions Equal, Intrinsic, Is, Map, And, Or, Not, Constant Eg., {..,"type":or("product","category"),"client":"TestCustomer"}
  • 37. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Read-Modify-Write Read original state Compute new version The write succeeds, or Eventually, the write conflicts, and databus fires an event for the application to detect it, and retry the write.
  • 38. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Data center A • T1 • Conditio nal T3 Data center B • T1 • T2 •
  • 39. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. SoR - Deltas Compaction For efficiency, older deltas get compacted and replaced by a single delta – a “compaction” record Ensures intrinsics like ~version, ~firstUpdateAt, etc. are maintained Compaction happens opportunistically, whenever documents are read
  • 40. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Databus Allows applications to get notified of updates to SoR Must create a persistent subscription A table or multiple tables (based on value of attributes) SoR “DVR”s updates for all subscriptions Supports multiple concurrent writers, and readers (polls and acks) No guarantees on order To help SoR provides ~version, and ~signature Exposes RESTful API
  • 41. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Databus – Subscription Management Subscribe to changes to a set of tables in the System of Record Table filters are the same as conditions for deltas Follow events on all tables for which the condition evaluates to true To subscribe to all tables in the SoR, omit the condition or pass ‘alwaysTrue’
  • 42. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Databus Subscribe for multiple tables Count events
  • 43. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Databus Poll for events Check for unclaimed, unacknowledged events If events not ack’d, then they will return in another poll after claim period expires Renew claims Acknowledge Claims
  • 44. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Blob Store REST storage service for photos. No single point of failure (data loss after 3 servers fail.) Sweet spot is blobs of a few MB, not GB (not designed for video.) Data replicates to all data centers Except where replication is restricted by legal Why not Amazon S3? Lower latency: reads & writes are always served out of the local data center. If you don't read cross-data center or you don't mind writing to buckets in multiple regions, use S3 or S3+Cloudfront.
  • 45. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Highly Scalable Architecture We serve traffic out of three AWS regions simultaneously DNS Global Traffic Management sends user requests to the fastest region Application services are all auto-scaled and self-healing Our Cassandra-based EmoDB operations out of multiple Availability Zones, so that an AZ failure doesn’t result in downtime Cassandra replicates across all three regions
  • 46. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. Emo/Polloi Contributors Aaron Dixon Ahaduzzaman Munna Dave Barcelo Fahd Siddiqui John Roesler Mark Brandt Matt Bogner Nate Bauernfiend Shawn Smith Steven Grotten
  • 47. Confidential and Proprietary. © 2013 Bazaarvoice, Inc. @Bazaarvoice @BazaarvoiceDev http://www.bazaarvoice.com/ http://blog.developer.bazaarvoice.com/ Learn more