SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Finding Love With MongoDB

  { name    : "Oliver Dodd",
    email   : "oliver.dodd@gmail.com",
    twitter : "01001111"
  }
Traditional Search


  Unidirectional User Defined Criteria
eHarmony Matching


  Bidirectional User Defined Criteria
Matching Overview




                        Potential Match Finder                                           Machine Learned Matching           Match Delivery




Photo	
  Credits	
  
Magnifying	
  glass:	
  andercismo	
  @	
  h7p://www.flickr.com/photos/andercismo/	
  
Machine	
  learning:	
  University	
  of	
  Maryland	
  Press	
  Releases	
  @	
  h7p://www.flickr.com/photos/umdnews/	
  
Mailman:	
  h7p://www.flickr.com/photos/noizephotography/	
  
Potential Match Generator


  •  Find candidates that meet user’s
     preferences.

  •  Ensure user doesn’t violate each
     candidate’s preferences.

  •  Discard pairings that violate Compatibility
     Models.

  •  Do this as fast as possible.
Legacy “Potential Match Generator”
Redesign


  Requirements for a new data store

     –  Centralized
     –  Scalable
     –  Automagical
     –  Easy to maintain
     –  Fast, multi-attribute searches
New ”Potential Match Generator”
Why MongoDB?


  •  Scalability

  •  Built in sharding and replication

  •  Autobalancing

  •  Rich, complex queries
Why MongoDB?




               MongoDB is web scale.
Wins


  •  Deploy new instances on demand.
       –  No need to load a local database.


  •  Adding replicas is easy and fast.

  •  Fast queries when isolated to a shard.

  •  Flexible schema
       –  No more reloading for minor data model changes.


  •  Built-in iterative fetching.
Losses


  •  No schema = larger footprint.


  •  Traditional DBAs can’t help (without training).

  •  Aggregation queries are drastically different.

  •  Initial configuration can be a long, manual
     process.
Protips
Use Real Queries




Turn on the fire hose
    When testing or even evaluating, use production data and
    queries.	
  




                        photo by Official U.S. Navy Imagery on Flickr
Use Real Queries




Unleash the Chaos Monkey
    Kill your own mongod instances to ensure your cluster and
    applications continue to function normally.                         	
  




                    photo by dboy @ http://www.flickr.com/photos/dannyboyster/
Minimize


  Minify property names.
      –  In Java, use Morphia for mapping or Salat in Scala
           (also good for queries but we developed our own generic Query API)
      –  Use one or two characters per property name.


  Consider retrieving full objects from another
  collection or data store, storing only what you
  absolutely need for your queries in the search
  store.
      –  On a related note, cache full objects; cache query results only if
         your queried attributes are small in number.
Indexes


  When performing large, variable, multi-
  attribute searches, have a decent number of
  them. Cover the major types of queries and
  the worst performing outliers.

      –  What is present in every query?


      –  What are the best performing attributes when present?

      –  What should my index look like when no high performing
         attributes appear in the query?
Indexes


  Omit ranges unless they are absolutely critical;
  if needed, put them at the end.
      –  Can I replace this with an $in clause?

      –  Can this be prioritized in its own index?

      –  Should there be versions of this index with and without this
         particular attribute?

      –  Will the appearance of this attribute in the index give me any
         speed advantage over inspecting the full object?
Indexes


  Ordering is very, very important.
      –  Attributes for which a user can only have a single value
         should appear towards the top of the index.

      –  Attributes that depend on the values of another attribute
         should appear in immediate succession.

      –  Again, put ranges at the bottom. If multiple ranges are
         necessary, ensure that they appear in order of their ability to
         reduce the working set.

      The order of fields in an index should be:
          First, fields on which you will query for exact values.
          Second, fields on which you will sort.
          Finally, fields on which you will query for a range of values.
                             Eric@MongoLab - http://blog.mongolab.com/2012/06/cardinal-ins/   	
  
Indexes


  Analyze slow queries to find out what attributes
  you can capitalize on.

  When building a compound index, don’t include
  fields that only appear in $or queries as part of
  multi-attribute queries.
          db.toasters.find({
             slots: 4,
             canBagel: true,
             $or: [
               { material: "stainless-steel"},
               { price: {$lte: 50}},
             ]
          })
Queries – Ranges


  Translate "between" queries to in clauses when
  dealing with discrete values.

      $and: [
         {a: { $gte: 0}},
         {a: { $lte: 5}}
      ]

      becomes


      a: { $in: [0,1,2,3,4,5]}
Attributes - Decrease Granularity




  birthdate => birthyear

  floats => ints

  number _of_items => has_items?
Sharding


  •  Try to isolate queries to a particular shard.

  •  Ensure that your data and indexes can fit
     entirely in memory.

  •  If certain attributes ALWAYS appear in the
     query and, in combination, give you a large
     number of well distributed data partitions,
     consider making them the shard key.
We’re Hiring




               h7p://www.eharmony.com/about/careers	
  

Más contenido relacionado

La actualidad más candente

Designing a generic Python Search Engine API - BarCampLondon 8
Designing a generic Python Search Engine API - BarCampLondon 8Designing a generic Python Search Engine API - BarCampLondon 8
Designing a generic Python Search Engine API - BarCampLondon 8Richard Boulton
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
What You Missed in Computer Science
What You Missed in Computer ScienceWhat You Missed in Computer Science
What You Missed in Computer ScienceTaylor Lovett
 
Elasticsearch a real-time distributed search and analytics engine
Elasticsearch a real-time distributed search and analytics engineElasticsearch a real-time distributed search and analytics engine
Elasticsearch a real-time distributed search and analytics enginegautam kumar
 
A recommendation engine for your php application
A recommendation engine for your php applicationA recommendation engine for your php application
A recommendation engine for your php applicationMichele Orselli
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to ElasticsearchJason Austin
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Lucidworks
 
Modernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with ElasticsearchModernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with ElasticsearchTaylor Lovett
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Enginelucenerevolution
 
Introduction to GraphQL: Mobile Week SF
Introduction to GraphQL: Mobile Week SFIntroduction to GraphQL: Mobile Week SF
Introduction to GraphQL: Mobile Week SFAmazon Web Services
 
Elasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & AggregationsElasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & AggregationsAlaa Elhadba
 
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...lucenerevolution
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Lucidworks
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...Spark Summit
 
Harnessing Free Content with Web Service APIs
Harnessing Free Content with Web Service APIsHarnessing Free Content with Web Service APIs
Harnessing Free Content with Web Service APIsALATechSource
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 

La actualidad más candente (19)

Designing a generic Python Search Engine API - BarCampLondon 8
Designing a generic Python Search Engine API - BarCampLondon 8Designing a generic Python Search Engine API - BarCampLondon 8
Designing a generic Python Search Engine API - BarCampLondon 8
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
What You Missed in Computer Science
What You Missed in Computer ScienceWhat You Missed in Computer Science
What You Missed in Computer Science
 
Introduction to GraphQL
Introduction to GraphQLIntroduction to GraphQL
Introduction to GraphQL
 
Elasticsearch a real-time distributed search and analytics engine
Elasticsearch a real-time distributed search and analytics engineElasticsearch a real-time distributed search and analytics engine
Elasticsearch a real-time distributed search and analytics engine
 
A recommendation engine for your php application
A recommendation engine for your php applicationA recommendation engine for your php application
A recommendation engine for your php application
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
Modernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with ElasticsearchModernizing WordPress Search with Elasticsearch
Modernizing WordPress Search with Elasticsearch
 
Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Engine
 
Introduction to GraphQL: Mobile Week SF
Introduction to GraphQL: Mobile Week SFIntroduction to GraphQL: Mobile Week SF
Introduction to GraphQL: Mobile Week SF
 
Elasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & AggregationsElasticsearch Introduction to Data model, Search & Aggregations
Elasticsearch Introduction to Data model, Search & Aggregations
 
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
 
Harnessing Free Content with Web Service APIs
Harnessing Free Content with Web Service APIsHarnessing Free Content with Web Service APIs
Harnessing Free Content with Web Service APIs
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 

Destacado

The use of the internet to meet life partners
The use of the internet to meet life partnersThe use of the internet to meet life partners
The use of the internet to meet life partnersMaria Brown
 
Reynolds-Poeschl
Reynolds-Poeschl Reynolds-Poeschl
Reynolds-Poeschl razialx
 
Fucking with algorithms
Fucking with algorithmsFucking with algorithms
Fucking with algorithmsMyriam Jessier
 
The autodiscover algorithm for locating the source of information part 05#36
The autodiscover algorithm for locating the source of information  part 05#36The autodiscover algorithm for locating the source of information  part 05#36
The autodiscover algorithm for locating the source of information part 05#36Eyal Doron
 
Untitled Presentation
Untitled PresentationUntitled Presentation
Untitled Presentationlourdes_rpma
 
Decoding The Facebook News Feed
Decoding The Facebook News FeedDecoding The Facebook News Feed
Decoding The Facebook News FeedMMI Agency
 
Automating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLPAutomating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLPJustin Long
 
UX: internal search for e-commerce
UX: internal search for e-commerceUX: internal search for e-commerce
UX: internal search for e-commerceMyriam Jessier
 
9 Tips to Avoid Getting Penalized by the Facebook Algorithm Update
9 Tips to Avoid Getting Penalized by the Facebook Algorithm Update9 Tips to Avoid Getting Penalized by the Facebook Algorithm Update
9 Tips to Avoid Getting Penalized by the Facebook Algorithm UpdateInside Social
 
Big Dating at eHarmony
Big Dating at eHarmonyBig Dating at eHarmony
Big Dating at eHarmonyMongoDB
 
Facebook News Feed Algorithm: Facebook User Awareness
Facebook News Feed Algorithm: Facebook User AwarenessFacebook News Feed Algorithm: Facebook User Awareness
Facebook News Feed Algorithm: Facebook User AwarenessJakub Ruzicka
 
Online Dating Insider Online Dating Summit Keynote Miami 2013
Online Dating Insider Online Dating Summit Keynote Miami 2013Online Dating Insider Online Dating Summit Keynote Miami 2013
Online Dating Insider Online Dating Summit Keynote Miami 2013Digicraft
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisitedXavier Amatriain
 
The secret of Tinder
The secret of TinderThe secret of Tinder
The secret of TinderDori Adar
 
Balancing Discovery and Continuation in Recommendations
Balancing Discovery and Continuation in RecommendationsBalancing Discovery and Continuation in Recommendations
Balancing Discovery and Continuation in RecommendationsMohammad Hossein Taghavi
 
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...Dawen Liang
 
(Some) pitfalls of distributed learning
(Some) pitfalls of distributed learning(Some) pitfalls of distributed learning
(Some) pitfalls of distributed learningYves Raimond
 
Genetic Algorithm by Example
Genetic Algorithm by ExampleGenetic Algorithm by Example
Genetic Algorithm by ExampleNobal Niraula
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
 

Destacado (19)

The use of the internet to meet life partners
The use of the internet to meet life partnersThe use of the internet to meet life partners
The use of the internet to meet life partners
 
Reynolds-Poeschl
Reynolds-Poeschl Reynolds-Poeschl
Reynolds-Poeschl
 
Fucking with algorithms
Fucking with algorithmsFucking with algorithms
Fucking with algorithms
 
The autodiscover algorithm for locating the source of information part 05#36
The autodiscover algorithm for locating the source of information  part 05#36The autodiscover algorithm for locating the source of information  part 05#36
The autodiscover algorithm for locating the source of information part 05#36
 
Untitled Presentation
Untitled PresentationUntitled Presentation
Untitled Presentation
 
Decoding The Facebook News Feed
Decoding The Facebook News FeedDecoding The Facebook News Feed
Decoding The Facebook News Feed
 
Automating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLPAutomating Tinder w/ Eigenfaces and StanfordNLP
Automating Tinder w/ Eigenfaces and StanfordNLP
 
UX: internal search for e-commerce
UX: internal search for e-commerceUX: internal search for e-commerce
UX: internal search for e-commerce
 
9 Tips to Avoid Getting Penalized by the Facebook Algorithm Update
9 Tips to Avoid Getting Penalized by the Facebook Algorithm Update9 Tips to Avoid Getting Penalized by the Facebook Algorithm Update
9 Tips to Avoid Getting Penalized by the Facebook Algorithm Update
 
Big Dating at eHarmony
Big Dating at eHarmonyBig Dating at eHarmony
Big Dating at eHarmony
 
Facebook News Feed Algorithm: Facebook User Awareness
Facebook News Feed Algorithm: Facebook User AwarenessFacebook News Feed Algorithm: Facebook User Awareness
Facebook News Feed Algorithm: Facebook User Awareness
 
Online Dating Insider Online Dating Summit Keynote Miami 2013
Online Dating Insider Online Dating Summit Keynote Miami 2013Online Dating Insider Online Dating Summit Keynote Miami 2013
Online Dating Insider Online Dating Summit Keynote Miami 2013
 
Kdd 2014 Tutorial - the recommender problem revisited
Kdd 2014 Tutorial -  the recommender problem revisitedKdd 2014 Tutorial -  the recommender problem revisited
Kdd 2014 Tutorial - the recommender problem revisited
 
The secret of Tinder
The secret of TinderThe secret of Tinder
The secret of Tinder
 
Balancing Discovery and Continuation in Recommendations
Balancing Discovery and Continuation in RecommendationsBalancing Discovery and Continuation in Recommendations
Balancing Discovery and Continuation in Recommendations
 
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
Factorization Meets the Item Embedding: Regularizing Matrix Factorization wit...
 
(Some) pitfalls of distributed learning
(Some) pitfalls of distributed learning(Some) pitfalls of distributed learning
(Some) pitfalls of distributed learning
 
Genetic Algorithm by Example
Genetic Algorithm by ExampleGenetic Algorithm by Example
Genetic Algorithm by Example
 
Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 

Similar a Finding Love with MongoDB's Scalability

Intresting changes in mongo 2.6
Intresting changes in mongo 2.6Intresting changes in mongo 2.6
Intresting changes in mongo 2.6David Murphy
 
Indexing in eXist database
Indexing in eXist database Indexing in eXist database
Indexing in eXist database redchilly
 
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_WilkinsMongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkinskiwilkins
 
Mongophilly indexing-2011-04-26
Mongophilly indexing-2011-04-26Mongophilly indexing-2011-04-26
Mongophilly indexing-2011-04-26kreuter
 
One to Many: The Story of Sharding at Box
One to Many: The Story of Sharding at BoxOne to Many: The Story of Sharding at Box
One to Many: The Story of Sharding at BoxFlorian Jourda
 
ElasticSearch, Elastica, ElasticaBundle
ElasticSearch, Elastica, ElasticaBundleElasticSearch, Elastica, ElasticaBundle
ElasticSearch, Elastica, ElasticaBundleNicolas Badey
 
Efficient Rails Test Driven Development (class 3) by Wolfram Arnold
Efficient Rails Test Driven Development (class 3) by Wolfram ArnoldEfficient Rails Test Driven Development (class 3) by Wolfram Arnold
Efficient Rails Test Driven Development (class 3) by Wolfram ArnoldMarakana Inc.
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 
The need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsThe need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsBen DeMott
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Karen Thompson
 
Jose portillo dev con presentation 1138
Jose portillo   dev con presentation 1138Jose portillo   dev con presentation 1138
Jose portillo dev con presentation 1138Jose Portillo
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
ExStreamlycheap Final Slides
ExStreamlycheap Final SlidesExStreamlycheap Final Slides
ExStreamlycheap Final SlidesEmmanuel Awa
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBMongoDB
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital.AI
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"George Stathis
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsMatias Cascallares
 
Salesforce Training Institute & Courses Pune | SFDC Training PCMC
Salesforce Training Institute & Courses Pune | SFDC Training PCMCSalesforce Training Institute & Courses Pune | SFDC Training PCMC
Salesforce Training Institute & Courses Pune | SFDC Training PCMCvictoriousdigital
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineTrey Grainger
 

Similar a Finding Love with MongoDB's Scalability (20)

Intresting changes in mongo 2.6
Intresting changes in mongo 2.6Intresting changes in mongo 2.6
Intresting changes in mongo 2.6
 
Indexing in eXist database
Indexing in eXist database Indexing in eXist database
Indexing in eXist database
 
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_WilkinsMongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
 
Sindice warehousing meetup
Sindice warehousing meetupSindice warehousing meetup
Sindice warehousing meetup
 
Mongophilly indexing-2011-04-26
Mongophilly indexing-2011-04-26Mongophilly indexing-2011-04-26
Mongophilly indexing-2011-04-26
 
One to Many: The Story of Sharding at Box
One to Many: The Story of Sharding at BoxOne to Many: The Story of Sharding at Box
One to Many: The Story of Sharding at Box
 
ElasticSearch, Elastica, ElasticaBundle
ElasticSearch, Elastica, ElasticaBundleElasticSearch, Elastica, ElasticaBundle
ElasticSearch, Elastica, ElasticaBundle
 
Efficient Rails Test Driven Development (class 3) by Wolfram Arnold
Efficient Rails Test Driven Development (class 3) by Wolfram ArnoldEfficient Rails Test Driven Development (class 3) by Wolfram Arnold
Efficient Rails Test Driven Development (class 3) by Wolfram Arnold
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
The need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementationsThe need for sophistication in modern search engine implementations
The need for sophistication in modern search engine implementations
 
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
Cis 555 Week 4 Assignment 2 Automated Teller Machine (Atm)...
 
Jose portillo dev con presentation 1138
Jose portillo   dev con presentation 1138Jose portillo   dev con presentation 1138
Jose portillo dev con presentation 1138
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
ExStreamlycheap Final Slides
ExStreamlycheap Final SlidesExStreamlycheap Final Slides
ExStreamlycheap Final Slides
 
How to Achieve Scale with MongoDB
How to Achieve Scale with MongoDBHow to Achieve Scale with MongoDB
How to Achieve Scale with MongoDB
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
 
Salesforce Training Institute & Courses Pune | SFDC Training PCMC
Salesforce Training Institute & Courses Pune | SFDC Training PCMCSalesforce Training Institute & Courses Pune | SFDC Training PCMC
Salesforce Training Institute & Courses Pune | SFDC Training PCMC
 
Building a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engineBuilding a real time, solr-powered recommendation engine
Building a real time, solr-powered recommendation engine
 

Más de MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Más de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Finding Love with MongoDB's Scalability

  • 1. Finding Love With MongoDB { name : "Oliver Dodd", email : "oliver.dodd@gmail.com", twitter : "01001111" }
  • 2. Traditional Search Unidirectional User Defined Criteria
  • 3. eHarmony Matching Bidirectional User Defined Criteria
  • 4. Matching Overview Potential Match Finder Machine Learned Matching Match Delivery Photo  Credits   Magnifying  glass:  andercismo  @  h7p://www.flickr.com/photos/andercismo/   Machine  learning:  University  of  Maryland  Press  Releases  @  h7p://www.flickr.com/photos/umdnews/   Mailman:  h7p://www.flickr.com/photos/noizephotography/  
  • 5. Potential Match Generator •  Find candidates that meet user’s preferences. •  Ensure user doesn’t violate each candidate’s preferences. •  Discard pairings that violate Compatibility Models. •  Do this as fast as possible.
  • 7. Redesign Requirements for a new data store –  Centralized –  Scalable –  Automagical –  Easy to maintain –  Fast, multi-attribute searches
  • 8. New ”Potential Match Generator”
  • 9. Why MongoDB? •  Scalability •  Built in sharding and replication •  Autobalancing •  Rich, complex queries
  • 10. Why MongoDB? MongoDB is web scale.
  • 11. Wins •  Deploy new instances on demand. –  No need to load a local database. •  Adding replicas is easy and fast. •  Fast queries when isolated to a shard. •  Flexible schema –  No more reloading for minor data model changes. •  Built-in iterative fetching.
  • 12. Losses •  No schema = larger footprint. •  Traditional DBAs can’t help (without training). •  Aggregation queries are drastically different. •  Initial configuration can be a long, manual process.
  • 14. Use Real Queries Turn on the fire hose When testing or even evaluating, use production data and queries.   photo by Official U.S. Navy Imagery on Flickr
  • 15. Use Real Queries Unleash the Chaos Monkey Kill your own mongod instances to ensure your cluster and applications continue to function normally.   photo by dboy @ http://www.flickr.com/photos/dannyboyster/
  • 16. Minimize Minify property names. –  In Java, use Morphia for mapping or Salat in Scala (also good for queries but we developed our own generic Query API) –  Use one or two characters per property name. Consider retrieving full objects from another collection or data store, storing only what you absolutely need for your queries in the search store. –  On a related note, cache full objects; cache query results only if your queried attributes are small in number.
  • 17. Indexes When performing large, variable, multi- attribute searches, have a decent number of them. Cover the major types of queries and the worst performing outliers. –  What is present in every query? –  What are the best performing attributes when present? –  What should my index look like when no high performing attributes appear in the query?
  • 18. Indexes Omit ranges unless they are absolutely critical; if needed, put them at the end. –  Can I replace this with an $in clause? –  Can this be prioritized in its own index? –  Should there be versions of this index with and without this particular attribute? –  Will the appearance of this attribute in the index give me any speed advantage over inspecting the full object?
  • 19. Indexes Ordering is very, very important. –  Attributes for which a user can only have a single value should appear towards the top of the index. –  Attributes that depend on the values of another attribute should appear in immediate succession. –  Again, put ranges at the bottom. If multiple ranges are necessary, ensure that they appear in order of their ability to reduce the working set. The order of fields in an index should be: First, fields on which you will query for exact values. Second, fields on which you will sort. Finally, fields on which you will query for a range of values. Eric@MongoLab - http://blog.mongolab.com/2012/06/cardinal-ins/  
  • 20. Indexes Analyze slow queries to find out what attributes you can capitalize on. When building a compound index, don’t include fields that only appear in $or queries as part of multi-attribute queries. db.toasters.find({ slots: 4, canBagel: true, $or: [ { material: "stainless-steel"}, { price: {$lte: 50}}, ] })
  • 21. Queries – Ranges Translate "between" queries to in clauses when dealing with discrete values. $and: [ {a: { $gte: 0}}, {a: { $lte: 5}} ] becomes a: { $in: [0,1,2,3,4,5]}
  • 22. Attributes - Decrease Granularity birthdate => birthyear floats => ints number _of_items => has_items?
  • 23. Sharding •  Try to isolate queries to a particular shard. •  Ensure that your data and indexes can fit entirely in memory. •  If certain attributes ALWAYS appear in the query and, in combination, give you a large number of well distributed data partitions, consider making them the shard key.
  • 24. We’re Hiring h7p://www.eharmony.com/about/careers