SlideShare una empresa de Scribd logo
1 de 21
Sharding in 20 minutes 
Why;Who;When;Where; 
David Murphy , Mongo Master 
Lead DBA, ObjectRocket 
@dmurphy_data @objectrocket
Background 
• 16 yrs in databases, development, & system 
engineering 
• Lead DBA @ ObjectRocket 
• Mongo Master with a focus on sharding, chunks, and 
scaling mongo beyond normal means.
What does a sharded cluster look like?
Why;Who;When;Where; 
i) Why do we shard? 
ii) Who should I shard? 
iii) When do we shard? 
iv) Where is my shard key
Why do we shard? 
• Scaling out write locks
Why do we shard? 
• Scaling out write locks 
• Small dataset to search per node
Why do we shard? 
• Scaling out write locks 
• Small dataset to search per node 
• Getting more connections to the data
Why do we shard? 
• Scaling out write locks 
• Small dataset to search per node 
• Getting more connections to the data 
• More smaller node vs Scaling up to expensive 
nodes
Who should I shard? 
• Biggest Collections by Size
Who should I shard? 
• Biggest Collections by Size 
• Busiest Collection by changes
Who should I shard? 
• Biggest Collections by Size 
• Busiest Collection by changes 
• Groupings of data (example): 
• State/Country 
• UserID 
• Company 
• Category
When do I shard? 
ALWAYS as early as possible!
When do I shard? 
ALWAYS as early as possible! 
Reasons: 
• Not all commands work
When do I shard? 
ALWAYS as early as possible! 
Reasons: 
• Not all commands work 
• Future Proof - No recoding
When do I shard? 
ALWAYS as early as possible! 
Reasons: 
• Not all commands work 
• Future Proof - No recoding 
• Adding index once your live can take time 
you don’t have!
Where (and what) is my shard key? 
You have to pick your own :/ 
But there are some quick hints…
Where (and what) is my shard key? 
Sharding Quick Hints: 
• Hashed Shard keys 
Great for even disk usage 
Uses Scatter-Gathers == More Conns 
Dates,Increasing IDs , and text are great here
Where (and what) is my shard key? 
Sharding Quick Hints: 
• Hashed Shard keys 
Great for even disk usage 
Uses Scatter-Gathers == More Conns 
Dates,Increasing IDs , and text are great here 
• Non-Hashed Keys 
Use profiler_level:2 & review ALL operations 
Things you wont change only 
No Dates 
No Increasing ID numbers 
No Text
Why mongo sharding/balancing 
Modulus Sharding with MySQL: 
Hard to rebalance online 
Requires application coding to support 
Ring Topologies like Cassandra - 
Cant change schema online 
Hard to rebalance online
Further Reading 
Presentations: 
Kenny Gorman Sharding - bit.ly/1oXYDfm 
David Murphy - Adv Sharding for Operations - bit.ly/1oXYDfm 
Other Sharding MongoDB Links - bit.ly/ZTtDI1 
Picking a shard key (manual) - http://bit.ly/1ozuzMH 
Choosing a shard key - http://slidesha.re/1nBnGtq
Contact 
@dmurphy_data 
@objectrocket 
david@objectrocket.com 
https://www.objectrocket.com 
WE ARE HIRING! (DBA,DEVOPS, and more) 
https://www.objectrocket.com/careers

Más contenido relacionado

La actualidad más candente

Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performance
Daum DNA
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Imply
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDB
MongoDB
 

La actualidad más candente (20)

Mongodb sharding
Mongodb shardingMongodb sharding
Mongodb sharding
 
Mongo db multidc_webinar
Mongo db multidc_webinarMongo db multidc_webinar
Mongo db multidc_webinar
 
MongoDB Deployment Checklist
MongoDB Deployment ChecklistMongoDB Deployment Checklist
MongoDB Deployment Checklist
 
Mongodb - Scaling write performance
Mongodb - Scaling write performanceMongodb - Scaling write performance
Mongodb - Scaling write performance
 
Webinar: Choosing the Right Shard Key for High Performance and Scale
Webinar: Choosing the Right Shard Key for High Performance and ScaleWebinar: Choosing the Right Shard Key for High Performance and Scale
Webinar: Choosing the Right Shard Key for High Performance and Scale
 
MongoDB Capacity Planning
MongoDB Capacity PlanningMongoDB Capacity Planning
MongoDB Capacity Planning
 
Sharding
ShardingSharding
Sharding
 
Apache Druid Design and Future prospect
Apache Druid Design and Future prospectApache Druid Design and Future prospect
Apache Druid Design and Future prospect
 
Benchmarking Apache Druid
Benchmarking Apache Druid Benchmarking Apache Druid
Benchmarking Apache Druid
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary DatabaseRedis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
Redis Day TLV 2018 - 10 Reasons why Redis should be your Primary Database
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in Cassandra
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Druid realtime indexing
Druid realtime indexingDruid realtime indexing
Druid realtime indexing
 
Archmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on DruidArchmage, Pinterest’s Real-time Analytics Platform on Druid
Archmage, Pinterest’s Real-time Analytics Platform on Druid
 
Hardware Provisioning for MongoDB
Hardware Provisioning for MongoDBHardware Provisioning for MongoDB
Hardware Provisioning for MongoDB
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
 
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
 
Programmatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidProgrammatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & Druid
 
Re-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series DatabaseRe-Engineering PostgreSQL as a Time-Series Database
Re-Engineering PostgreSQL as a Time-Series Database
 

Similar a Lightning Talk: What You Need to Know Before You Shard in 20 Minutes

Austin NoSQL 2011-07-06
Austin NoSQL 2011-07-06Austin NoSQL 2011-07-06
Austin NoSQL 2011-07-06
jimbojsb
 

Similar a Lightning Talk: What You Need to Know Before You Shard in 20 Minutes (20)

Sharding why,what,when, how
Sharding   why,what,when, howSharding   why,what,when, how
Sharding why,what,when, how
 
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_WilkinsMongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
MongoDB Revised Sharding Guidelines MongoDB 3.x_Kimberly_Wilkins
 
Austin NoSQL 2011-07-06
Austin NoSQL 2011-07-06Austin NoSQL 2011-07-06
Austin NoSQL 2011-07-06
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
Blazing Data With Redis (and LEGOS!)
Blazing Data With Redis (and LEGOS!)Blazing Data With Redis (and LEGOS!)
Blazing Data With Redis (and LEGOS!)
 
Silicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in productionSilicon Valley Code Camp 2016 - MongoDB in production
Silicon Valley Code Camp 2016 - MongoDB in production
 
NoSQL
NoSQLNoSQL
NoSQL
 
Dapper: the microORM that will change your life
Dapper: the microORM that will change your lifeDapper: the microORM that will change your life
Dapper: the microORM that will change your life
 
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
Google Developer Group Lublin 8 - Modern Lambda architecture in Big Data
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
 
Cassandra Summit 2014: Fuzzy Entity Matching at Scale
Cassandra Summit 2014: Fuzzy Entity Matching at ScaleCassandra Summit 2014: Fuzzy Entity Matching at Scale
Cassandra Summit 2014: Fuzzy Entity Matching at Scale
 
MongoDB World 2019: Look, Ma, No Servers! Serverless Application Development ...
MongoDB World 2019: Look, Ma, No Servers! Serverless Application Development ...MongoDB World 2019: Look, Ma, No Servers! Serverless Application Development ...
MongoDB World 2019: Look, Ma, No Servers! Serverless Application Development ...
 
Combining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User ProfilingCombining Data Mining and Machine Learning for Effective User Profiling
Combining Data Mining and Machine Learning for Effective User Profiling
 
Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?Why do they call it Linked Data when they want to say...?
Why do they call it Linked Data when they want to say...?
 
Hands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop EcosystemHands On: Introduction to the Hadoop Ecosystem
Hands On: Introduction to the Hadoop Ecosystem
 
Big Data vs Data Warehousing
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
 
Neo4j Training Modeling
Neo4j Training ModelingNeo4j Training Modeling
Neo4j Training Modeling
 
Linked Data: The Real Web 2.0 (from 2008)
Linked Data: The Real Web 2.0 (from 2008)Linked Data: The Real Web 2.0 (from 2008)
Linked Data: The Real Web 2.0 (from 2008)
 
Look, Ma! No servers! Serverless application development with MongoDB Stitch
Look, Ma! No servers! Serverless application development with MongoDB StitchLook, Ma! No servers! Serverless application development with MongoDB Stitch
Look, Ma! No servers! Serverless application development with MongoDB Stitch
 
Cloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure HuntCloud Storage Spring Cleaning: A Treasure Hunt
Cloud Storage Spring Cleaning: A Treasure Hunt
 

Más de MongoDB

Más de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Lightning Talk: What You Need to Know Before You Shard in 20 Minutes

  • 1. Sharding in 20 minutes Why;Who;When;Where; David Murphy , Mongo Master Lead DBA, ObjectRocket @dmurphy_data @objectrocket
  • 2. Background • 16 yrs in databases, development, & system engineering • Lead DBA @ ObjectRocket • Mongo Master with a focus on sharding, chunks, and scaling mongo beyond normal means.
  • 3. What does a sharded cluster look like?
  • 4. Why;Who;When;Where; i) Why do we shard? ii) Who should I shard? iii) When do we shard? iv) Where is my shard key
  • 5. Why do we shard? • Scaling out write locks
  • 6. Why do we shard? • Scaling out write locks • Small dataset to search per node
  • 7. Why do we shard? • Scaling out write locks • Small dataset to search per node • Getting more connections to the data
  • 8. Why do we shard? • Scaling out write locks • Small dataset to search per node • Getting more connections to the data • More smaller node vs Scaling up to expensive nodes
  • 9. Who should I shard? • Biggest Collections by Size
  • 10. Who should I shard? • Biggest Collections by Size • Busiest Collection by changes
  • 11. Who should I shard? • Biggest Collections by Size • Busiest Collection by changes • Groupings of data (example): • State/Country • UserID • Company • Category
  • 12. When do I shard? ALWAYS as early as possible!
  • 13. When do I shard? ALWAYS as early as possible! Reasons: • Not all commands work
  • 14. When do I shard? ALWAYS as early as possible! Reasons: • Not all commands work • Future Proof - No recoding
  • 15. When do I shard? ALWAYS as early as possible! Reasons: • Not all commands work • Future Proof - No recoding • Adding index once your live can take time you don’t have!
  • 16. Where (and what) is my shard key? You have to pick your own :/ But there are some quick hints…
  • 17. Where (and what) is my shard key? Sharding Quick Hints: • Hashed Shard keys Great for even disk usage Uses Scatter-Gathers == More Conns Dates,Increasing IDs , and text are great here
  • 18. Where (and what) is my shard key? Sharding Quick Hints: • Hashed Shard keys Great for even disk usage Uses Scatter-Gathers == More Conns Dates,Increasing IDs , and text are great here • Non-Hashed Keys Use profiler_level:2 & review ALL operations Things you wont change only No Dates No Increasing ID numbers No Text
  • 19. Why mongo sharding/balancing Modulus Sharding with MySQL: Hard to rebalance online Requires application coding to support Ring Topologies like Cassandra - Cant change schema online Hard to rebalance online
  • 20. Further Reading Presentations: Kenny Gorman Sharding - bit.ly/1oXYDfm David Murphy - Adv Sharding for Operations - bit.ly/1oXYDfm Other Sharding MongoDB Links - bit.ly/ZTtDI1 Picking a shard key (manual) - http://bit.ly/1ozuzMH Choosing a shard key - http://slidesha.re/1nBnGtq
  • 21. Contact @dmurphy_data @objectrocket david@objectrocket.com https://www.objectrocket.com WE ARE HIRING! (DBA,DEVOPS, and more) https://www.objectrocket.com/careers