SlideShare a Scribd company logo
Practical Scaling and
      Sharding
      Eliot Horowitz
      @eliothorowitz
         MongoSF
       May 24, 2011
Scaling by Optimization
• Schema Design
• Index Design
• Hardware Configuration
Horizontal Scaling
• Vertical scaling is limited
• Hard to scale vertically in the cloud
• Can scale wider than higher
Replica Sets


• One master at any time
• Programmer determines if read hits master
  or a slave
• Easy to setup to scale reads
db.people.find( { state : “NY” } ).addOption( SlaveOK )
• routed to a secondary automatically
• will use master if no secondary is available
Not Enough
• Writes don’t scale
• Reads are out of date on slaves
• RAM/Data Size doesn’t scale
Why Shard?
• Distribute write load
• Keep working set in RAM
• Consistent reads
• Preserve functionality
Sharding Design Goals
• Scale linearly
• Increase capacity with no downtime
• Transparent to the application
• Low administration to add capacity
Sharding and
         Documents
• Rich documents reduce need for joins
• No joins makes sharding solvable
Basics
• Choose how you partition data
• Convert from single replica set to sharding
  with no downtime
• Full feature set
• Fully consistent by default
Architecture
                      Shards

          mongod      mongod            mongod
                                                             ...
          mongod      mongod            mongod

          mongod      mongod            mongod
Config
Servers

mongod
                      mongos          mongos           ...
mongod

mongod
                   client   client   client   client
Typical Basic Setup
 Data Center Primary   Data Center Secondary

  S1 p=1    S1 p=1      S1 p=0

  S2 p=1    S2 p=1      S2 p=0

  S3 p=1    S3 p=1      S3 p=0


 Config 1 Config 2       Config 2

mongos mongos          mongos     mongos
Range Based




• collection is broken into chunks by range
• chunks default to 64mb or 100,000 objects
Choosing a Shard Key

• Shard key determines how data is
  partitioned
• Hard to change
• Most important performance decision
Use Case: Photos
  { photo_id : ???? , data : <binary> }
  What’s the right key?
• auto increment
• MD5( data )
• month() + MD5(data)
Initial Loading

• System start with 1 chunk
• Writes will hit 1 shard and then move
• Pre-splitting for initial bulk loading can
  dramatically improve bulk load time
Administering a
         Cluster

• Do not wait too long to add capacity
• Need capacity for normal workload + cost
  of moving data
• Stay < 70% operational capacity
Hardware
      Considerations
• Understand working set and make sure it
  can fit in RAM
• Choose appropriate sized boxes for shards
 • Too small and admin/overhead goes up
 • Too large, and you can’t add capacity
    smoothly
Download MongoDB
      http://www.mongodb.org



   and
let
us
know
what
you
think
    @eliothorowitz



@mongodb


       10gen is hiring!
http://www.10gen.com/jobs
Use Case: User Profiles
  { email : “eliot@10gen.com” ,
      addresses : [ { state : “NY” } ]
  }
• Shard by email
• Lookup by email hits 1 node
• Index on { “addresses.state” : 1 }
Use Case: Activity
          Stream
  { user_id : XXX, event_id : YYY , data : ZZZ }
• Shard by user_id
• Looking up an activity stream hits 1 node
• Writing even is distributed
• Index on { “event_id” : 1 } for deletes

More Related Content

What's hot

Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDB
MongoDB
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best Practices
MongoDB
 
Mongo presentation conf
Mongo presentation confMongo presentation conf
Mongo presentation conf
Shridhar Joshi
 

What's hot (20)

Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
Securing Your MongoDB Deployment
Securing Your MongoDB DeploymentSecuring Your MongoDB Deployment
Securing Your MongoDB Deployment
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDB
 
Rpsonmongodb
RpsonmongodbRpsonmongodb
Rpsonmongodb
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
MongoDB and AWS Best Practices
MongoDB and AWS Best PracticesMongoDB and AWS Best Practices
MongoDB and AWS Best Practices
 
Using cassandra as a distributed logging to store pb data
Using cassandra as a distributed logging to store pb dataUsing cassandra as a distributed logging to store pb data
Using cassandra as a distributed logging to store pb data
 
Writing Space and the Cassandra NoSQL DBMS
Writing Space and the Cassandra NoSQL DBMSWriting Space and the Cassandra NoSQL DBMS
Writing Space and the Cassandra NoSQL DBMS
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
 
Back to Basics 2017: Introduction to Sharding
Back to Basics 2017: Introduction to ShardingBack to Basics 2017: Introduction to Sharding
Back to Basics 2017: Introduction to Sharding
 
Compare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDBCompare DynamoDB vs. MongoDB
Compare DynamoDB vs. MongoDB
 
What's new in MongoDB 2.6
What's new in MongoDB 2.6What's new in MongoDB 2.6
What's new in MongoDB 2.6
 
An Introduction to MongoDB Compass
An Introduction to MongoDB CompassAn Introduction to MongoDB Compass
An Introduction to MongoDB Compass
 
Mongo presentation conf
Mongo presentation confMongo presentation conf
Mongo presentation conf
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
Back to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production DeploymentBack to Basics Webinar 6: Production Deployment
Back to Basics Webinar 6: Production Deployment
 
MongoDB .local Bengaluru 2019: Lift & Shift MongoDB to Atlas
MongoDB .local Bengaluru 2019: Lift & Shift MongoDB to AtlasMongoDB .local Bengaluru 2019: Lift & Shift MongoDB to Atlas
MongoDB .local Bengaluru 2019: Lift & Shift MongoDB to Atlas
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
 
Shift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to CassandraShift: Real World Migration from MongoDB to Cassandra
Shift: Real World Migration from MongoDB to Cassandra
 

Similar to 2011 mongo sf-scaling

Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)
MongoSF
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to Sharding
MongoDB
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
MongoDB
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 

Similar to 2011 mongo sf-scaling (20)

Scaling MongoDB (Mongo Austin)
Scaling MongoDB (Mongo Austin)Scaling MongoDB (Mongo Austin)
Scaling MongoDB (Mongo Austin)
 
Mongodb sharding
Mongodb shardingMongodb sharding
Mongodb sharding
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)
 
MongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & PerformanceMongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & Performance
 
Lessons from lhc
Lessons from lhcLessons from lhc
Lessons from lhc
 
MongoDB
MongoDBMongoDB
MongoDB
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Introduction to Sharding
Introduction to ShardingIntroduction to Sharding
Introduction to Sharding
 
MongoDB in FS
MongoDB in FSMongoDB in FS
MongoDB in FS
 
Voldemort Nosql
Voldemort NosqlVoldemort Nosql
Voldemort Nosql
 
MongoDB Internals
MongoDB InternalsMongoDB Internals
MongoDB Internals
 
Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the Cloud
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Scaling MongoDB - Presentation at MTP
Scaling MongoDB - Presentation at MTPScaling MongoDB - Presentation at MTP
Scaling MongoDB - Presentation at MTP
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 

More from MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Motion for AI: Creating Empathy in Technology
Motion for AI: Creating Empathy in TechnologyMotion for AI: Creating Empathy in Technology
Motion for AI: Creating Empathy in Technology
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
The architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdfThe architecture of Generative AI for enterprises.pdf
The architecture of Generative AI for enterprises.pdf
 
Agentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdfAgentic RAG What it is its types applications and implementation.pdf
Agentic RAG What it is its types applications and implementation.pdf
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
Enterprise Security Monitoring, And Log Management.
Enterprise Security Monitoring, And Log Management.Enterprise Security Monitoring, And Log Management.
Enterprise Security Monitoring, And Log Management.
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 

2011 mongo sf-scaling

  • 1. Practical Scaling and Sharding Eliot Horowitz @eliothorowitz MongoSF May 24, 2011
  • 2. Scaling by Optimization • Schema Design • Index Design • Hardware Configuration
  • 3. Horizontal Scaling • Vertical scaling is limited • Hard to scale vertically in the cloud • Can scale wider than higher
  • 4. Replica Sets • One master at any time • Programmer determines if read hits master or a slave • Easy to setup to scale reads
  • 5. db.people.find( { state : “NY” } ).addOption( SlaveOK ) • routed to a secondary automatically • will use master if no secondary is available
  • 6. Not Enough • Writes don’t scale • Reads are out of date on slaves • RAM/Data Size doesn’t scale
  • 7. Why Shard? • Distribute write load • Keep working set in RAM • Consistent reads • Preserve functionality
  • 8. Sharding Design Goals • Scale linearly • Increase capacity with no downtime • Transparent to the application • Low administration to add capacity
  • 9. Sharding and Documents • Rich documents reduce need for joins • No joins makes sharding solvable
  • 10. Basics • Choose how you partition data • Convert from single replica set to sharding with no downtime • Full feature set • Fully consistent by default
  • 11. Architecture Shards mongod mongod mongod ... mongod mongod mongod mongod mongod mongod Config Servers mongod mongos mongos ... mongod mongod client client client client
  • 12. Typical Basic Setup Data Center Primary Data Center Secondary S1 p=1 S1 p=1 S1 p=0 S2 p=1 S2 p=1 S2 p=0 S3 p=1 S3 p=1 S3 p=0 Config 1 Config 2 Config 2 mongos mongos mongos mongos
  • 13. Range Based • collection is broken into chunks by range • chunks default to 64mb or 100,000 objects
  • 14. Choosing a Shard Key • Shard key determines how data is partitioned • Hard to change • Most important performance decision
  • 15. Use Case: Photos { photo_id : ???? , data : <binary> } What’s the right key? • auto increment • MD5( data ) • month() + MD5(data)
  • 16. Initial Loading • System start with 1 chunk • Writes will hit 1 shard and then move • Pre-splitting for initial bulk loading can dramatically improve bulk load time
  • 17. Administering a Cluster • Do not wait too long to add capacity • Need capacity for normal workload + cost of moving data • Stay < 70% operational capacity
  • 18. Hardware Considerations • Understand working set and make sure it can fit in RAM • Choose appropriate sized boxes for shards • Too small and admin/overhead goes up • Too large, and you can’t add capacity smoothly
  • 19. Download MongoDB http://www.mongodb.org and
let
us
know
what
you
think @eliothorowitz



@mongodb 10gen is hiring! http://www.10gen.com/jobs
  • 20. Use Case: User Profiles { email : “eliot@10gen.com” , addresses : [ { state : “NY” } ] } • Shard by email • Lookup by email hits 1 node • Index on { “addresses.state” : 1 }
  • 21. Use Case: Activity Stream { user_id : XXX, event_id : YYY , data : ZZZ } • Shard by user_id • Looking up an activity stream hits 1 node • Writing even is distributed • Index on { “event_id” : 1 } for deletes

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. ec2 goes up to 64gb, maybe mention 256gb box here??? ($30-40k)\nmaybe can but 256gb box, but i spin up 10 ec2 64gb boxes in 10 minutes\n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. Don&amp;#x2019;t pre-emptively shard - easy to add later\n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. Hashed shard keys coming soon\n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n