SlideShare una empresa de Scribd logo
1 de 17
Letters from the Trenches:

Lessons Learned Taking MongoDB to Production
October 17, 2013

Rick Warren

rick.warren@eharmony.com
Traditional Internet Dating Service
Unidirectional User-Defined Criteria
eHarmony Matching
Bidirectional User-Defined Criteria
eHarmony Matching: 3 Parts

1. Bidirectional
User-Defined
Criteria

2. Research-Based
Compatibility
Models

3. Machine-Learned
Affinity Models

Photo Credits

Magnifying glass: andercismo @ http://www.flickr.com/photos/andercismo/
Machine learning: University of Maryland Press Releases @ http://www.flickr.com/photos/umdnews/
Application: Find Potential Matches
As fast as possible:
1. Find people who
meet each other’s
preferences

1. Bidirectional
User-Defined
Criteria

2. Discard combos
that violate
Compatibility
Models
Application: Find Potential Matches
• User attributes in
MongoDB
– Replicated
– Sharded

• Data access pattern:
1. Bidirectional
User-Defined
Criteria

– Read-heavy
– Complex queries

• Java application
Application: Find Potential Matches
• In full production
> 6 mos
– Following several mos
limited production
– Following several mos
intensive dev+testing

• No production
outages
• MongoDB no longer
the thing we worry
about most

• User attributes in
MongoDB
– Replicated
– Sharded

• Data access pattern:
– Read-heavy
– Complex queries

• Java application
Lesson: Provision for Success
 Fit all data & indexes in memory
– MongoDB storage implemented using
mem-mapped files
– Beware under-provisioned VMs

 Minimize field names to keep data
as small as possible
– “Schema-less records” ==
“schema repeated millions of times”
– Morphia Java library can help with mapping
Lesson: Provision for Success
Scale write ops & data volume by adding shards

Scale read ops

by adding secondaries

Shard / RS

Shard / RS

Primary

Primary

Secondary

Secondary

Secondary

Secondary

…

…

…
Lesson: Be Ready to Tinker
• Many processes:

 Use Puppet, Chef, or similar

– mongod on each
node, primary or secondary

– Helps with config
files, command-line arguments

– 2 MMS agents

– Insufficient for adding
secondaries, configuring
indexes, etc.

– Plus, if sharding:
• mongos for each app instance
• 3 config servers

• …Each configured
separately & differently
– Configuration file
– Manual commands to set up

• Less likely to have
DBA support
– …and relational Best
Practices may not transfer

 If scripting, use real client
driver, not mongo shell
– Doesn’t handle output or errors
consistently
– Can’t wait in JavaScript

 Train your DB/Ops team(s)
– And expect to do more yourself
Lesson: Shadow Mode Is Your Friend
 Test with real production data, conditions, and queries
 Measure everything (MMS is a good start, but insufficient)
Real Application

Real Events
& Requests

“Shadow” Application

X

 Kill mongod instances to verify resiliency
Primary school enrollment, Armenia:

http://data.worldbank.org/country/armenia
Lesson: Be Ready to Restore Your Data
• Schemas will
change

 Maintain 2nd copy in
another format
– Backing source of truth?

• Shard key(s) will
change
– More on this later…

• You’ll experience

MongoDB bugs

– Backup in standard format?
– Second cluster with different
version of MongoDB?

 Increment DB name
with each reload
 Automate reload
process, and use it

Image credit:

http://tutorialphotoshopcs-putradom.blogspot.com/2012/11/create-dramatic-meteor-and-burning-city.html
Lesson: Pick a Good Shard Key

1. Distribute Data Volume Evenly
– This is what auto-balancing does for you.

2. Multiply Query Performance
– Isolate queries to 1 shard to multiply read
capacity by # of shards.

3. Distribute Workload Evenly
– Conflicts with above!
Lesson: Pick a Good Shard Key
Shard 1

Shard 2

mongos
1. Distribute Data Volume Evenly

– This is what auto-balancing does for you.

2. Multiply Query Performance
– Isolate queries to 1 shard to multiply read
capacity by # of shards.

3. Distribute Workload Evenly
– Conflicts with above!

Jessica Rabbit: http://disney.wikia.com/wiki/Jessica_Rabbit
Steve Urkel:
http://celebratingtvandfilmgeeks.wordpress.com/2010/04/25/steve-urkel-the-
Lesson: Pick a Good Shard Key
DO These Things

BEWARE These Things

 Use fields appearing in
every query

• Include serial numbers
(or similar)

 Choose combo that
finely partitions data

• Hash fields when reads
might be a problem

 Measure relative load
across shards

• Mutable fields in shard
key—remove and add

– Consider adding
secondaries to loaded
shard(s) ONLY
Summary

1. Provision for Success
2. Be Ready to Tinker

3. Shadow Mode Is Your Friend
4. Be Ready to Restore Your Data

5. Pick a Good Shard Key
We’re Hiring

http://www.eharmony.com/about/careers

rick.warren@eharmony.com

Más contenido relacionado

Similar a Letters from the Trenches: Lessons Learned Taking MongoDB to Production

ModelMine a tool to facilitate mining models from open source repositories pr...
ModelMine a tool to facilitate mining models from open source repositories pr...ModelMine a tool to facilitate mining models from open source repositories pr...
ModelMine a tool to facilitate mining models from open source repositories pr...Sayed Mohsin Reza
 
Beautiful Models in PHP
Beautiful Models in PHPBeautiful Models in PHP
Beautiful Models in PHPbrandonsavage
 
Building an interactive timeline from facebook photos
Building an interactive timeline from facebook photosBuilding an interactive timeline from facebook photos
Building an interactive timeline from facebook photosRakesh Rajan
 
Your first web application. From Design to Launch
Your first web application. From Design to LaunchYour first web application. From Design to Launch
Your first web application. From Design to LaunchDavid Brooks
 
10-Step Methodology to Building a Single View with MongoDB
10-Step Methodology to Building a Single View with MongoDB10-Step Methodology to Building a Single View with MongoDB
10-Step Methodology to Building a Single View with MongoDBMat Keep
 
Web Macros
Web MacrosWeb Macros
Web Macroscscaffid
 
Finding Patterns in the Clouds - Cloud Design Patterns
Finding Patterns in the Clouds - Cloud Design PatternsFinding Patterns in the Clouds - Cloud Design Patterns
Finding Patterns in the Clouds - Cloud Design PatternsSteven Smith
 
Data Abstraction for Large Web Applications
Data Abstraction for Large Web ApplicationsData Abstraction for Large Web Applications
Data Abstraction for Large Web Applicationsbrandonsavage
 
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE
 
Techorama - Evolvable Application Development with MongoDB
Techorama  - Evolvable Application Development with MongoDBTechorama  - Evolvable Application Development with MongoDB
Techorama - Evolvable Application Development with MongoDBbwullems
 
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagramferreroroche11
 
Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagramiammutex
 
Adaptive Educational Hypermedia
Adaptive Educational HypermediaAdaptive Educational Hypermedia
Adaptive Educational HypermediaAlaaZ
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud ComputingRahul Pola
 

Similar a Letters from the Trenches: Lessons Learned Taking MongoDB to Production (20)

ModelMine a tool to facilitate mining models from open source repositories pr...
ModelMine a tool to facilitate mining models from open source repositories pr...ModelMine a tool to facilitate mining models from open source repositories pr...
ModelMine a tool to facilitate mining models from open source repositories pr...
 
Beautiful Models in PHP
Beautiful Models in PHPBeautiful Models in PHP
Beautiful Models in PHP
 
DBMS Bascis
DBMS BascisDBMS Bascis
DBMS Bascis
 
Building an interactive timeline from facebook photos
Building an interactive timeline from facebook photosBuilding an interactive timeline from facebook photos
Building an interactive timeline from facebook photos
 
Your first web application. From Design to Launch
Your first web application. From Design to LaunchYour first web application. From Design to Launch
Your first web application. From Design to Launch
 
Recsys 2016
Recsys 2016Recsys 2016
Recsys 2016
 
10-Step Methodology to Building a Single View with MongoDB
10-Step Methodology to Building a Single View with MongoDB10-Step Methodology to Building a Single View with MongoDB
10-Step Methodology to Building a Single View with MongoDB
 
Web Macros
Web MacrosWeb Macros
Web Macros
 
Sec presentation
Sec presentationSec presentation
Sec presentation
 
Finding Patterns in the Clouds - Cloud Design Patterns
Finding Patterns in the Clouds - Cloud Design PatternsFinding Patterns in the Clouds - Cloud Design Patterns
Finding Patterns in the Clouds - Cloud Design Patterns
 
Data Abstraction for Large Web Applications
Data Abstraction for Large Web ApplicationsData Abstraction for Large Web Applications
Data Abstraction for Large Web Applications
 
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.comHABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
HABIB FIGA GUYE {BULE HORA UNIVERSITY}(habibifiga@gmail.com
 
Techorama - Evolvable Application Development with MongoDB
Techorama  - Evolvable Application Development with MongoDBTechorama  - Evolvable Application Development with MongoDB
Techorama - Evolvable Application Development with MongoDB
 
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
89025069 mike-krieger-instagram-at-the-airbnb-tech-talk-on-scaling-instagram
 
Scaling Instagram
Scaling InstagramScaling Instagram
Scaling Instagram
 
Adaptive Educational Hypermedia
Adaptive Educational HypermediaAdaptive Educational Hypermedia
Adaptive Educational Hypermedia
 
Social job search
Social job searchSocial job search
Social job search
 
Session1
Session1Session1
Session1
 
Session1
Session1Session1
Session1
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 

Más de Rick Warren

Patterns of Data Distribution
Patterns of Data DistributionPatterns of Data Distribution
Patterns of Data DistributionRick Warren
 
Data-centric Invocable Services
Data-centric Invocable ServicesData-centric Invocable Services
Data-centric Invocable ServicesRick Warren
 
Engineering Interoperable and Reliable Systems
Engineering Interoperable and Reliable SystemsEngineering Interoperable and Reliable Systems
Engineering Interoperable and Reliable SystemsRick Warren
 
Scaling DDS to Millions of Computers and Devices
Scaling DDS to Millions of Computers and DevicesScaling DDS to Millions of Computers and Devices
Scaling DDS to Millions of Computers and DevicesRick Warren
 
DDS in a Nutshell
DDS in a NutshellDDS in a Nutshell
DDS in a NutshellRick Warren
 
Java 5 Language PSM for DDS: Final Submission
Java 5 Language PSM for DDS: Final SubmissionJava 5 Language PSM for DDS: Final Submission
Java 5 Language PSM for DDS: Final SubmissionRick Warren
 
Java 5 PSM for DDS: Revised Submission (out of date)
Java 5 PSM for DDS: Revised Submission (out of date)Java 5 PSM for DDS: Revised Submission (out of date)
Java 5 PSM for DDS: Revised Submission (out of date)Rick Warren
 
C++ PSM for DDS: Revised Submission
C++ PSM for DDS: Revised SubmissionC++ PSM for DDS: Revised Submission
C++ PSM for DDS: Revised SubmissionRick Warren
 
Web-Enabled DDS: Revised Submission
Web-Enabled DDS: Revised SubmissionWeb-Enabled DDS: Revised Submission
Web-Enabled DDS: Revised SubmissionRick Warren
 
Java 5 PSM for DDS: Initial Submission (out of date)
Java 5 PSM for DDS: Initial Submission (out of date)Java 5 PSM for DDS: Initial Submission (out of date)
Java 5 PSM for DDS: Initial Submission (out of date)Rick Warren
 
Extensible and Dynamic Topic Types for DDS, Beta 1
Extensible and Dynamic Topic Types for DDS, Beta 1Extensible and Dynamic Topic Types for DDS, Beta 1
Extensible and Dynamic Topic Types for DDS, Beta 1Rick Warren
 
Mapping the RESTful Programming Model to the DDS Data-Centric Model
Mapping the RESTful Programming Model to the DDS Data-Centric ModelMapping the RESTful Programming Model to the DDS Data-Centric Model
Mapping the RESTful Programming Model to the DDS Data-Centric ModelRick Warren
 
Large-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and FinanceLarge-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and FinanceRick Warren
 
Data-Centric and Message-Centric System Architecture
Data-Centric and Message-Centric System ArchitectureData-Centric and Message-Centric System Architecture
Data-Centric and Message-Centric System ArchitectureRick Warren
 
Extensible and Dynamic Topic Types for DDS
Extensible and Dynamic Topic Types for DDSExtensible and Dynamic Topic Types for DDS
Extensible and Dynamic Topic Types for DDSRick Warren
 
Easing Integration of Large-Scale Real-Time Systems with DDS
Easing Integration of Large-Scale Real-Time Systems with DDSEasing Integration of Large-Scale Real-Time Systems with DDS
Easing Integration of Large-Scale Real-Time Systems with DDSRick Warren
 
Java 5 API for DDS RFP (out of date)
Java 5 API for DDS RFP (out of date)Java 5 API for DDS RFP (out of date)
Java 5 API for DDS RFP (out of date)Rick Warren
 
Introduction to DDS
Introduction to DDSIntroduction to DDS
Introduction to DDSRick Warren
 
Extensible and Dynamic Topic Types for DDS
Extensible and Dynamic Topic Types for DDSExtensible and Dynamic Topic Types for DDS
Extensible and Dynamic Topic Types for DDSRick Warren
 

Más de Rick Warren (20)

Real-World Git
Real-World GitReal-World Git
Real-World Git
 
Patterns of Data Distribution
Patterns of Data DistributionPatterns of Data Distribution
Patterns of Data Distribution
 
Data-centric Invocable Services
Data-centric Invocable ServicesData-centric Invocable Services
Data-centric Invocable Services
 
Engineering Interoperable and Reliable Systems
Engineering Interoperable and Reliable SystemsEngineering Interoperable and Reliable Systems
Engineering Interoperable and Reliable Systems
 
Scaling DDS to Millions of Computers and Devices
Scaling DDS to Millions of Computers and DevicesScaling DDS to Millions of Computers and Devices
Scaling DDS to Millions of Computers and Devices
 
DDS in a Nutshell
DDS in a NutshellDDS in a Nutshell
DDS in a Nutshell
 
Java 5 Language PSM for DDS: Final Submission
Java 5 Language PSM for DDS: Final SubmissionJava 5 Language PSM for DDS: Final Submission
Java 5 Language PSM for DDS: Final Submission
 
Java 5 PSM for DDS: Revised Submission (out of date)
Java 5 PSM for DDS: Revised Submission (out of date)Java 5 PSM for DDS: Revised Submission (out of date)
Java 5 PSM for DDS: Revised Submission (out of date)
 
C++ PSM for DDS: Revised Submission
C++ PSM for DDS: Revised SubmissionC++ PSM for DDS: Revised Submission
C++ PSM for DDS: Revised Submission
 
Web-Enabled DDS: Revised Submission
Web-Enabled DDS: Revised SubmissionWeb-Enabled DDS: Revised Submission
Web-Enabled DDS: Revised Submission
 
Java 5 PSM for DDS: Initial Submission (out of date)
Java 5 PSM for DDS: Initial Submission (out of date)Java 5 PSM for DDS: Initial Submission (out of date)
Java 5 PSM for DDS: Initial Submission (out of date)
 
Extensible and Dynamic Topic Types for DDS, Beta 1
Extensible and Dynamic Topic Types for DDS, Beta 1Extensible and Dynamic Topic Types for DDS, Beta 1
Extensible and Dynamic Topic Types for DDS, Beta 1
 
Mapping the RESTful Programming Model to the DDS Data-Centric Model
Mapping the RESTful Programming Model to the DDS Data-Centric ModelMapping the RESTful Programming Model to the DDS Data-Centric Model
Mapping the RESTful Programming Model to the DDS Data-Centric Model
 
Large-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and FinanceLarge-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and Finance
 
Data-Centric and Message-Centric System Architecture
Data-Centric and Message-Centric System ArchitectureData-Centric and Message-Centric System Architecture
Data-Centric and Message-Centric System Architecture
 
Extensible and Dynamic Topic Types for DDS
Extensible and Dynamic Topic Types for DDSExtensible and Dynamic Topic Types for DDS
Extensible and Dynamic Topic Types for DDS
 
Easing Integration of Large-Scale Real-Time Systems with DDS
Easing Integration of Large-Scale Real-Time Systems with DDSEasing Integration of Large-Scale Real-Time Systems with DDS
Easing Integration of Large-Scale Real-Time Systems with DDS
 
Java 5 API for DDS RFP (out of date)
Java 5 API for DDS RFP (out of date)Java 5 API for DDS RFP (out of date)
Java 5 API for DDS RFP (out of date)
 
Introduction to DDS
Introduction to DDSIntroduction to DDS
Introduction to DDS
 
Extensible and Dynamic Topic Types for DDS
Extensible and Dynamic Topic Types for DDSExtensible and Dynamic Topic Types for DDS
Extensible and Dynamic Topic Types for DDS
 

Último

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Último (20)

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Letters from the Trenches: Lessons Learned Taking MongoDB to Production

  • 1. Letters from the Trenches: Lessons Learned Taking MongoDB to Production October 17, 2013 Rick Warren rick.warren@eharmony.com
  • 2. Traditional Internet Dating Service Unidirectional User-Defined Criteria
  • 4. eHarmony Matching: 3 Parts 1. Bidirectional User-Defined Criteria 2. Research-Based Compatibility Models 3. Machine-Learned Affinity Models Photo Credits Magnifying glass: andercismo @ http://www.flickr.com/photos/andercismo/ Machine learning: University of Maryland Press Releases @ http://www.flickr.com/photos/umdnews/
  • 5. Application: Find Potential Matches As fast as possible: 1. Find people who meet each other’s preferences 1. Bidirectional User-Defined Criteria 2. Discard combos that violate Compatibility Models
  • 6. Application: Find Potential Matches • User attributes in MongoDB – Replicated – Sharded • Data access pattern: 1. Bidirectional User-Defined Criteria – Read-heavy – Complex queries • Java application
  • 7. Application: Find Potential Matches • In full production > 6 mos – Following several mos limited production – Following several mos intensive dev+testing • No production outages • MongoDB no longer the thing we worry about most • User attributes in MongoDB – Replicated – Sharded • Data access pattern: – Read-heavy – Complex queries • Java application
  • 8. Lesson: Provision for Success  Fit all data & indexes in memory – MongoDB storage implemented using mem-mapped files – Beware under-provisioned VMs  Minimize field names to keep data as small as possible – “Schema-less records” == “schema repeated millions of times” – Morphia Java library can help with mapping
  • 9. Lesson: Provision for Success Scale write ops & data volume by adding shards Scale read ops by adding secondaries Shard / RS Shard / RS Primary Primary Secondary Secondary Secondary Secondary … … …
  • 10. Lesson: Be Ready to Tinker • Many processes:  Use Puppet, Chef, or similar – mongod on each node, primary or secondary – Helps with config files, command-line arguments – 2 MMS agents – Insufficient for adding secondaries, configuring indexes, etc. – Plus, if sharding: • mongos for each app instance • 3 config servers • …Each configured separately & differently – Configuration file – Manual commands to set up • Less likely to have DBA support – …and relational Best Practices may not transfer  If scripting, use real client driver, not mongo shell – Doesn’t handle output or errors consistently – Can’t wait in JavaScript  Train your DB/Ops team(s) – And expect to do more yourself
  • 11. Lesson: Shadow Mode Is Your Friend  Test with real production data, conditions, and queries  Measure everything (MMS is a good start, but insufficient) Real Application Real Events & Requests “Shadow” Application X  Kill mongod instances to verify resiliency Primary school enrollment, Armenia: http://data.worldbank.org/country/armenia
  • 12. Lesson: Be Ready to Restore Your Data • Schemas will change  Maintain 2nd copy in another format – Backing source of truth? • Shard key(s) will change – More on this later… • You’ll experience MongoDB bugs – Backup in standard format? – Second cluster with different version of MongoDB?  Increment DB name with each reload  Automate reload process, and use it Image credit: http://tutorialphotoshopcs-putradom.blogspot.com/2012/11/create-dramatic-meteor-and-burning-city.html
  • 13. Lesson: Pick a Good Shard Key 1. Distribute Data Volume Evenly – This is what auto-balancing does for you. 2. Multiply Query Performance – Isolate queries to 1 shard to multiply read capacity by # of shards. 3. Distribute Workload Evenly – Conflicts with above!
  • 14. Lesson: Pick a Good Shard Key Shard 1 Shard 2 mongos 1. Distribute Data Volume Evenly – This is what auto-balancing does for you. 2. Multiply Query Performance – Isolate queries to 1 shard to multiply read capacity by # of shards. 3. Distribute Workload Evenly – Conflicts with above! Jessica Rabbit: http://disney.wikia.com/wiki/Jessica_Rabbit Steve Urkel: http://celebratingtvandfilmgeeks.wordpress.com/2010/04/25/steve-urkel-the-
  • 15. Lesson: Pick a Good Shard Key DO These Things BEWARE These Things  Use fields appearing in every query • Include serial numbers (or similar)  Choose combo that finely partitions data • Hash fields when reads might be a problem  Measure relative load across shards • Mutable fields in shard key—remove and add – Consider adding secondaries to loaded shard(s) ONLY
  • 16. Summary 1. Provision for Success 2. Be Ready to Tinker 3. Shadow Mode Is Your Friend 4. Be Ready to Restore Your Data 5. Pick a Good Shard Key

Notas del editor

  1. Specifically, we’ll be talking about 5 lessons.It should take about 30 minutes.
  2. At some point, you’ll realize the data in your cluster isn’t what and/or how you need. You’ll need to reconstruct it.In first two cases, you could dump and reload a single cluster.What about production changes in the mean time?
  3. Idea is for the breakdown of data across shards to reflect the same natural divisions of data you’re likely to query against.