SlideShare una empresa de Scribd logo
1 de 19
Introduction to sharding
                                Christos Soulios
                     Software Architect, Persado
                       (christos.soulios@persado.com)



15 Jan 2013                                             0
Lets start with an example:

 We have launched our latest and greatest web application
 We use MongoDB database which is fast and cool
 We even have setup replication for high availability
 Our application turns out to be popular and we are already
  planning our next project
 Cool!




                                                               1
Unfortunately, our website becomes too popular too fast.
And this causes problems




                                                           2
MongoDB problems when dataset grows

 Dataset does not fit on local disks.
Solution: Let’s buy more disks
 Database indexes do not fit in memory. They have to be paged
   in and out. Database becomes sluggish
Solution: Let’s buy more memory
 High throughput writing operations cause high contention on
   the infamous MongoDB locks
Now what?

       We need to scale horizontally. We need sharding



                                                                 3
What is sharding?

 Shardingis automatic data partitioning
 Distributes data evenly across cluster nodes (called shards)
 Allows for seamless querying. Almost no functionality lost
  over single master
 Keeps database consistent




                                                                 4
How sharding works

 Collection data is broken into chunks based on the range of a
  selected collection field. This field is called the shard key
 Chunks are evenly distributed across shards. Each data chunk is
  controlled by a single shard
 Special config servers are responsible for storing which shard
  controls which chunks
 Database clients communicate with the shards through the mongos
  router process
 mongos router behaves to the client just as a normal mongod
  server. Sharding is transparent to the client
 For each database operation, the mongosrouter queries the config
  servers using the shard key and redirects the operation to the
  correct shards
 While more data is inserted, ranges are split into more chunks

                                                                     5
Example (Users collection)
{„user_id‟ : 45,
     „username‟: „asterix‟,
     „email‟ : asterix@google.com
 „last_login‟: ‟11/11/2012‟
},
{„user_id‟ : 4503,
     „username‟: „gandalf‟,
     „email‟ : gandalf_rules@yahoo.com
 „last_login‟: ‟01/14/2013‟
},
{„user_id‟ : 1153,
     „username‟: „superman‟,
     „email‟ : superman@superdomain.com
  „last_login‟: ‟10/30/2012‟
},
{„user_id‟ : 5434,
     „username‟: „darth_vader‟,
     „email‟ : darth@stardestroyer.org
 „last_login‟: ‟07/01/2012‟
}



          >db.runCommand( { shardcollection: “test.users”, key: { username: 1 }} )


                                                                                     6
Shard architecture (sharding by user_id)




                                           7
Database operations

 All queries are routed through the mongosprocess
 Insert operations are routed by shard key. Shard key is
  required
 Querying by shard key routes the query to shards
 Querying by non-shard key scatters the query to all shards
  and gathers results
 Updates and deletes behave like queries




                                                               8
Data balancing

 System becomes unbalanced when one shard stores more
  data chunks than others
 Data is automatically balanced without intervention from the
  client application or the administrator




                                                                 9
Data balancing

 The range of the loaded shard is split and chunks are migrated
  to other shards




                                                                   10
Data balancing

 Config servers are updated using a 2phase commit process to
  ensure database consistency
 System ends up balanced




                                                                11
Choosing a shard key

 Choosing a good shard key is critical
 Once chosen, we are stuck with it
 Shard key must be immutable
 Should distribute data load evenly across shards
 Should be of high cardinality. Enumerated values are not good
  shard keys
 Should not be monotically increasing. ObjectIds, dates or database
  sequences are not good shard keys, because they create hotspots
 Should be used by most critical queries to provide query isolation.
  Avoid scatter-gather queries
 Should provide good data affinity to avoid disk to memory transfers
  (random values are not good shard keys)


                                                                        13
Choosing a shard key

Know your data. It is important
 What is the expected dataset size?
 What is the write throughput?
 How do data look like? Which fields are random or increasing?
  Are there low cardinality fields?
 Can we identify any access patterns for reads?
 What data is indexed?
 What is the active working set? Are there historical data that
  are not used after sometime?




                                                                   14
Choosing a shard key

 It is not trivial
 Most of the times there is no single field that can be used as
  shard key
 We have to invent one




                                                                   15
Choosing a shard key

 Usually applications access lately inserted data more often
 What about a compound shard key?
 What about a combination of a coarsely ascending field and a
  commonly queried search key?
 Coarsely ascending key should have a few hundreds of chunks
  per value. This provides good data locality and even
  distribution
 Search key provides query isolation

         Rule of thumb: {coarseLocality: 1, search : 1}



                                                                 16
Example (Tweets collection)
{user: „asterix‟,
ts: ISODate(“01/14/2013Z22:53:33.123”),
 month: „2013-01‟
retweets: 45,
 client: „TweetDeck‟,
 text: „Mongodbsharding is super cool!‟
}


We are typically looking for the latest tweets of a user.

 Therefore, a combination of „month + user‟ fields would create a
  good shard key
 monthfield is coarsely ascending, allowing to transfer only
  latest tweets to memory
 user field is a commonly searched key


                                                                     17
Conclusion

   Sharding allows MongoDB databases to scale horizontally
   Shard balancing is performed automatically by the system
   Sharding is transparent to the client application
   Choosing a good shard key is critical
   Choosing a good shard key is not trivial
   Be creative and experiment with your data before choosing
    the shard key




                                                                18
Questions ?

Más contenido relacionado

La actualidad más candente

MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS
Chris Harris
 
Big Data Tutorial - Marko Grobelnik - 25 May 2012
Big Data Tutorial - Marko Grobelnik - 25 May 2012Big Data Tutorial - Marko Grobelnik - 25 May 2012
Big Data Tutorial - Marko Grobelnik - 25 May 2012
Marko Grobelnik
 

La actualidad más candente (20)

When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
MongoDB Best Practices in AWS
MongoDB Best Practices in AWS MongoDB Best Practices in AWS
MongoDB Best Practices in AWS
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
How To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own DatasourceHow To Connect Spark To Your Own Datasource
How To Connect Spark To Your Own Datasource
 
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
Cassandra Summit 2014: Social Media Security Company Nexgate Relies on Cassan...
 
Building a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache DruidBuilding a Real-Time Gaming Analytics Service with Apache Druid
Building a Real-Time Gaming Analytics Service with Apache Druid
 
Why data warehouses cannot support hot analytics
Why data warehouses cannot support hot analyticsWhy data warehouses cannot support hot analytics
Why data warehouses cannot support hot analytics
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
 
Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2Webinar: What's New in MongoDB 3.2
Webinar: What's New in MongoDB 3.2
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
Blazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkBlazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & Spark
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and Hadoop
 
Apache Druid Vision and Roadmap
Apache Druid Vision and RoadmapApache Druid Vision and Roadmap
Apache Druid Vision and Roadmap
 
Apache Druid®: A Dance of Distributed Processes
 Apache Druid®: A Dance of Distributed Processes Apache Druid®: A Dance of Distributed Processes
Apache Druid®: A Dance of Distributed Processes
 
Globally Distributed RESTful Object Storage
Globally Distributed RESTful Object Storage Globally Distributed RESTful Object Storage
Globally Distributed RESTful Object Storage
 
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
 
Schema Design Best Practices with Buzz Moschetti
Schema Design Best Practices with Buzz MoschettiSchema Design Best Practices with Buzz Moschetti
Schema Design Best Practices with Buzz Moschetti
 
C* Summit 2013: Suicide Risk Prediction Using Social Media and Cassandra by K...
C* Summit 2013: Suicide Risk Prediction Using Social Media and Cassandra by K...C* Summit 2013: Suicide Risk Prediction Using Social Media and Cassandra by K...
C* Summit 2013: Suicide Risk Prediction Using Social Media and Cassandra by K...
 
Big Data Tutorial - Marko Grobelnik - 25 May 2012
Big Data Tutorial - Marko Grobelnik - 25 May 2012Big Data Tutorial - Marko Grobelnik - 25 May 2012
Big Data Tutorial - Marko Grobelnik - 25 May 2012
 

Similar a Hellenic MongoDB user group - Introduction to sharding

Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
George Stathis
 

Similar a Hellenic MongoDB user group - Introduction to sharding (20)

Comparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsbComparison between mongo db and cassandra using ycsb
Comparison between mongo db and cassandra using ycsb
 
Hadoop bank
Hadoop bankHadoop bank
Hadoop bank
 
NOSQL
NOSQLNOSQL
NOSQL
 
CouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big DataCouchBase The Complete NoSql Solution for Big Data
CouchBase The Complete NoSql Solution for Big Data
 
Scaling MongoDB - Presentation at MTP
Scaling MongoDB - Presentation at MTPScaling MongoDB - Presentation at MTP
Scaling MongoDB - Presentation at MTP
 
Spark
SparkSpark
Spark
 
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your DataCloudera Breakfast Series, Analytics Part 1: Use All Your Data
Cloudera Breakfast Series, Analytics Part 1: Use All Your Data
 
What Your Database Query is Really Doing
What Your Database Query is Really DoingWhat Your Database Query is Really Doing
What Your Database Query is Really Doing
 
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ PanoraysQuick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
 
Your data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the futureYour data layer - Choosing the right database solutions for the future
Your data layer - Choosing the right database solutions for the future
 
Hortonworks Big Data & Hadoop
Hortonworks Big Data & HadoopHortonworks Big Data & Hadoop
Hortonworks Big Data & Hadoop
 
Privacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPrivacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud data
 
No sq lv1_0
No sq lv1_0No sq lv1_0
No sq lv1_0
 
MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014MongoDB Sharding Webinar 2014
MongoDB Sharding Webinar 2014
 
Big data and computing grid
Big data and computing gridBig data and computing grid
Big data and computing grid
 
MongoDB meetup at Hike
MongoDB meetup at HikeMongoDB meetup at Hike
MongoDB meetup at Hike
 
Privacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud dataPrivacy preserving multi-keyword ranked search over encrypted cloud data
Privacy preserving multi-keyword ranked search over encrypted cloud data
 
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
Webinar: ROI on Big Data - RDBMS, NoSQL or Both? A Simple Guide for Knowing H...
 
No sql database
No sql databaseNo sql database
No sql database
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 

Último

Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
panagenda
 

Último (20)

How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Your enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4jYour enemies use GenAI too - staying ahead of fraud with Neo4j
Your enemies use GenAI too - staying ahead of fraud with Neo4j
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
Easier, Faster, and More Powerful – Alles Neu macht der Mai -Wir durchleuchte...
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptxBT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
BT & Neo4j _ How Knowledge Graphs help BT deliver Digital Transformation.pptx
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 

Hellenic MongoDB user group - Introduction to sharding

  • 1. Introduction to sharding Christos Soulios Software Architect, Persado (christos.soulios@persado.com) 15 Jan 2013 0
  • 2. Lets start with an example:  We have launched our latest and greatest web application  We use MongoDB database which is fast and cool  We even have setup replication for high availability  Our application turns out to be popular and we are already planning our next project  Cool! 1
  • 3. Unfortunately, our website becomes too popular too fast. And this causes problems 2
  • 4. MongoDB problems when dataset grows  Dataset does not fit on local disks. Solution: Let’s buy more disks  Database indexes do not fit in memory. They have to be paged in and out. Database becomes sluggish Solution: Let’s buy more memory  High throughput writing operations cause high contention on the infamous MongoDB locks Now what? We need to scale horizontally. We need sharding 3
  • 5. What is sharding?  Shardingis automatic data partitioning  Distributes data evenly across cluster nodes (called shards)  Allows for seamless querying. Almost no functionality lost over single master  Keeps database consistent 4
  • 6. How sharding works  Collection data is broken into chunks based on the range of a selected collection field. This field is called the shard key  Chunks are evenly distributed across shards. Each data chunk is controlled by a single shard  Special config servers are responsible for storing which shard controls which chunks  Database clients communicate with the shards through the mongos router process  mongos router behaves to the client just as a normal mongod server. Sharding is transparent to the client  For each database operation, the mongosrouter queries the config servers using the shard key and redirects the operation to the correct shards  While more data is inserted, ranges are split into more chunks 5
  • 7. Example (Users collection) {„user_id‟ : 45, „username‟: „asterix‟, „email‟ : asterix@google.com „last_login‟: ‟11/11/2012‟ }, {„user_id‟ : 4503, „username‟: „gandalf‟, „email‟ : gandalf_rules@yahoo.com „last_login‟: ‟01/14/2013‟ }, {„user_id‟ : 1153, „username‟: „superman‟, „email‟ : superman@superdomain.com „last_login‟: ‟10/30/2012‟ }, {„user_id‟ : 5434, „username‟: „darth_vader‟, „email‟ : darth@stardestroyer.org „last_login‟: ‟07/01/2012‟ } >db.runCommand( { shardcollection: “test.users”, key: { username: 1 }} ) 6
  • 9. Database operations  All queries are routed through the mongosprocess  Insert operations are routed by shard key. Shard key is required  Querying by shard key routes the query to shards  Querying by non-shard key scatters the query to all shards and gathers results  Updates and deletes behave like queries 8
  • 10. Data balancing  System becomes unbalanced when one shard stores more data chunks than others  Data is automatically balanced without intervention from the client application or the administrator 9
  • 11. Data balancing  The range of the loaded shard is split and chunks are migrated to other shards 10
  • 12. Data balancing  Config servers are updated using a 2phase commit process to ensure database consistency  System ends up balanced 11
  • 13. Choosing a shard key  Choosing a good shard key is critical  Once chosen, we are stuck with it  Shard key must be immutable  Should distribute data load evenly across shards  Should be of high cardinality. Enumerated values are not good shard keys  Should not be monotically increasing. ObjectIds, dates or database sequences are not good shard keys, because they create hotspots  Should be used by most critical queries to provide query isolation. Avoid scatter-gather queries  Should provide good data affinity to avoid disk to memory transfers (random values are not good shard keys) 13
  • 14. Choosing a shard key Know your data. It is important  What is the expected dataset size?  What is the write throughput?  How do data look like? Which fields are random or increasing? Are there low cardinality fields?  Can we identify any access patterns for reads?  What data is indexed?  What is the active working set? Are there historical data that are not used after sometime? 14
  • 15. Choosing a shard key  It is not trivial  Most of the times there is no single field that can be used as shard key  We have to invent one 15
  • 16. Choosing a shard key  Usually applications access lately inserted data more often  What about a compound shard key?  What about a combination of a coarsely ascending field and a commonly queried search key?  Coarsely ascending key should have a few hundreds of chunks per value. This provides good data locality and even distribution  Search key provides query isolation Rule of thumb: {coarseLocality: 1, search : 1} 16
  • 17. Example (Tweets collection) {user: „asterix‟, ts: ISODate(“01/14/2013Z22:53:33.123”), month: „2013-01‟ retweets: 45, client: „TweetDeck‟, text: „Mongodbsharding is super cool!‟ } We are typically looking for the latest tweets of a user.  Therefore, a combination of „month + user‟ fields would create a good shard key  monthfield is coarsely ascending, allowing to transfer only latest tweets to memory  user field is a commonly searched key 17
  • 18. Conclusion  Sharding allows MongoDB databases to scale horizontally  Shard balancing is performed automatically by the system  Sharding is transparent to the client application  Choosing a good shard key is critical  Choosing a good shard key is not trivial  Be creative and experiment with your data before choosing the shard key 18