08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Cignex mongodb-sharding-mongodbdays
1. CIGNEX Datamatics Confidential www.cignex.com
Scaling MongoDB with
Sharding – A Case Study
Presented by: Nikhil Naib
Title: Lead Consultant – Big Data
For MongoDB and CIGNEX Datamatics Use Only
2. CIGNEX Datamatics Confidential www.cignex.com
Who We Are?
• Since 2000, delivering solutions
using Open Source technologies
to
– Address business goals
– Increase business velocity
– Lower the cost of doing business
– Gain competitive advantage
• Dramatically reduce Total Cost of
Ownership (TCO) & deployment
time of IT solutions
2
400+
Implementations
450+
Experts
200+
Integrations
13
Books
5000+
Community
Contributions
Offices : America | India | UK | Europe | Singapore | Australia
Portal
Solutions Content
Solutions
Big Data Analytics
Solutions
3. CIGNEX Datamatics Confidential www.cignex.com
Our Big Data Analytics Practice
3
Team Size: 110+ Projects: 10+
• 20+ Big Data, 100+ Analytics & DW/BI
• Partnership –MongoDB, Cloudera, IBM
• Technical expertise –MongoDB, Hadoop,
Neo4j, Solr, Pentaho, Talend, Cognos, Business
Objects, Tableau, Jasper Reports
• Research & Analytics division with data
scientists
• Connectors/Accelerators, Frameworks
• BIGArchive – Enterprise Scale Archival
• Liferay MongoDB Store
• Drupal MongoDB Connector
Big Data Partners
Business Intelligence Expertise
4. CIGNEX Datamatics Confidential www.cignex.com 4
• Use Case & Database Requirements
• Why MongoDB?
• Solution
• To Shard Or Not To Shard
• Scaling with Sharding
– Sharding Basics
– Architecture and Hardware Sizing
– Sharding – Choosing the RIGHT Shard Key
– Benchmarking with Results
• Key Takeaways
Agenda
5. CIGNEX Datamatics Confidential www.cignex.com 5
Use Case
Load Balancer DatabaseDevices
7 Million Users
Across Geography
Users
8 devices / user
Home/Office/Any
where
High volume of
concurrent CRUD
requests routes
to DB cluster
MongoDB Data
Storage cluster
enabled with
sharding, Auto
replication for
failover, Indexes
Ability to access the digital assets of the service provider across array of
devices registered by the user with the facility of resuming (session shifting).
6. CIGNEX Datamatics Confidential www.cignex.com
Database Requirements
6
Agility in
Development
& Deployment
High
Availability
Flexibility
in Schema
Enterprise
Level
Support
High
Performance
7. CIGNEX Datamatics Confidential www.cignex.com
• Global Coverage
• 24x7 Support
• Ease of
maintenance
Why MongoDB?
7
• Programming
Language drivers
• Shorter Dev cycle
• Faster deployment
• Automatic failover
• Redundancy
• ~100% uptime
Agility in
Development
& Deployment
• Easy integration
• Ease of schema
design
• Document oriented
storage
Loose Schema
Replication
Driver Support
Strong Community
• Concurrent CRUD
• Fast Updates
• Write distribution
with Sharding
Indexes & Sharding
Availability
Flexibility
in Schema
Enterprise
Level
Support
High
Performance
8. CIGNEX Datamatics Confidential www.cignex.com
Sharding – What is it?
8
• Distributes single logical database across multiple mongod
nodes
• Advantages:
– Raises limits of data size beyond a single node
– Increases Write capacity
– Ability to support larger working sets
– Read scaling (By the means of targeting specific shards through
routed requests and distributed data. It is possible to support good
amount of Scatter-gather requests if used judiciously. )
9. CIGNEX Datamatics Confidential www.cignex.com
Sharding – When to use?
9
Storage
Drive
Your data set approaches or exceeds the storage capacity
of a single node in your system
Working Set
RAM
The size of your system’s active working set will soon
exceed the capacity of the maximum amount of RAM
for your system
Storage
Drive
Your system has a large amount of write
activity, a single MongoDB instance cannot
write data fast enough to meet demand, and all
other approaches have not reduced contention
10. CIGNEX Datamatics Confidential www.cignex.com
Sharding - Features
10
• Range-based Data Partitioning
• Automatic Data volume distribution
• Transparent query routing
• Horizontal capacity
– Additional write capacity through distribution
– Right shard key allows expansion of working set
11. CIGNEX Datamatics Confidential www.cignex.com
Solution: Approach
1111
• Schema Design
• Collections and Field DefinitionsSchema
• Document Size
• Total expected data sizeDatabase Size
• Frequency of CRUD operations
• Read/Write ratioConcurrent Load
• Replication, Backup and Automatic Failover
• Right Replication Factor (RF)
• Read Scaling for the use cases with eventual consistency.
Availability
• Working Set
• Access PatternsIndexing
• Horizontal Scaling
• Read/Write ScalingSharding
• Cluster sizing
• RAM and Disk storageHardware Sizing
12. CIGNEX Datamatics Confidential www.cignex.com
To Shard Or Not To Shard ?
• Sharding is a very powerful technique provided by
MongoDB to scale, but it should be used only after due
diligence, else it proves to be an over kill.
• It brings substantial amount of overhead from
infrastructure and maintenance standpoint.
• It should be used only when you have done all the possible
optimizations for the single node and still the write
capacity of the single node proves to be a bottleneck.
• In production minimum 6 server instances are required to
have a sharded cluster with no failover capability.
• In production we can not afford to have no
redundancy/failover. Hence minimum RF of 2 is required
which also brings an arbiter node into picture.
12
15. CIGNEX Datamatics Confidential www.cignex.com
Shard Keys
• The ideal shard key :
– High cardinality which makes it
easy for MongoDB to split the
chunks.
– Higher “randomness”
– Targeted queries
– May need to be computed
15
Shard Keys:
Exist in every document in a
collection. MongoDB uses shard
key to distribute documents
among the shards. Just like
indexes, they can be either a
single field, or a compound key.
16. CIGNEX Datamatics Confidential www.cignex.com
Choosing Right Shard Key
16
Different approach for Shard Keys
• Approach 1: Random Key – UserId + AssetId
• Approach 2: Coarsely ascending key + Random Key –
YearMonth + UserId + AssetId
• Hashed Shard Keys (Not Tested/Applicable here.)
– New in version 2.4.
– Hashed shard keys use a hashed index of a single field as the shard
key to partition data across your sharded cluster.
– Field should good cardinality.
– Hashed keys work well with fields that increase monotonically.
18. CIGNEX Datamatics Confidential www.cignex.com
Results - INSERTS
18
Over 80 million documents
inserted with a decreasing
threshold over 10 million
Over 225 million documents
inserted at a stable rate of 6000
documents/sec
Approach 1
Approach 2
Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines
19. CIGNEX Datamatics Confidential www.cignex.com
Results - UPDATES
19
Over 50 million documents updated
at avg. 400 documents/sec
Over 100 million documents
updated at as high as. 4000
documents/sec
Approach 1
Approach 2
Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines
20. CIGNEX Datamatics Confidential www.cignex.com
Results – INSERT, UPDATE
20
>6000 documents/ second
>70 million records
>6000 documents/ second
>50 million records
Simultaneous INSERT
Simultaneous UPDATE
Approach 2
Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines
21. CIGNEX Datamatics Confidential www.cignex.com
Benchmarking – Sharding Vs Non Sharding
21
Operation Sharding (YearMonth +
UserId)
Non-Sharding
INSERTS ~6000 docs/sec ~2900 docs/sec
UPDATES ~4000 docs/sec ~620 updates/sec
INSERT &
UPDATES
~6000 docs/sec &
~6100 docs/sec
~2000 docs/sec &
~600 docs/sec
Benchmarks done on 2.2 GHz 8 core, 32GB, 7200RPM spinning drives with no RAID support Bare Metal Machines
22. CIGNEX Datamatics Confidential www.cignex.com
Key Takeaways
• MongoDB scales & shines.
– Expected - 690 Million CRUD operations per day.
– Achieved - 840 Million CRUD operations per day.
• Plan early for sharding.
• Sharding scales INSERTS/UPDATES Vs Non sharding.
• There is no magic recipe for finding an ideal shard key.
• DO NOT go to production without benchmarking the shard key. Shard key cannot be
changed for the given configuration.
• Use MMS. It’s a great tool to assess the health of the cluster and identify the bottlenecks
well in advance.
• Sharding with Approach 2(Coarsely ascending Key + Random Key) provides sustained
results & better utilization of the RAM (better index locality).
22
Disclaimer: Suitable shard key depends on your data, so while this Shard Key approach delivers good results for this
use case, it is not a generic approach.
23. CIGNEX Datamatics Confidential www.cignex.com
Key Takeaways
23
• Routed Requests are always faster than scatter/gather requests.
• Identify the consistency requirements for the read queries. In
case of eventual consistency using read preference secondary-
preferred can help you to squeeze more performance.
• Different set of server/s for NON-Sharded collections.
• Indexes to be defined carefully. More number of Indexes
substantially bring down the write throughput.
• Sharded collections should have minimal number of indexes.
Disclaimer: Suitable shard key depends on your data, so while this Shard Key approach delivers good results for this
use case, it is not a generic approach.
24. CIGNEX Datamatics Confidential www.cignex.com
Our Success Stories : At a Glance
24
1
2
3
4
5
6
Big Data Analytics for Telecom
Optimum network bandwidth management & policy
configuration for telecom companies
Social Media Research Platform
for Legal Firms
Leverage social media & unstructured data analytics for collecting
supporting evidences for trials
US based Advanced GPS
Solutions Provider
Real time analysis of data accumulated from 200,000 GPS based
devices
Global Provider of Risk
Management Solutions
Collection and analysis of data from external and internal
applications delivered to a dashboard
US based Networking
Equipment Leader
Cluster configuration of high volume video uploads including 30
million inserts/hour
European Chemical Giant
Patent search – 10x increased in performance and 20x reduction
in TCO
7
US based Social Security
e-Benefits System
Managing billion object repository with enterprise search and
retrieval
25. CIGNEX Datamatics Confidential www.cignex.com
For queries reach out to us at info@cignex.com
Thank You. Any Questions ?
Making Open Source Work