SlideShare una empresa de Scribd logo
1 de 44
Everything
You Need to
Know About
Sharding
Dylan Tong
dylan.tong@mongodb.com
Senior Solutions Architect
2
Agenda
Overview
• What is sharding?
• Why and what should I use sharding for?
Building your First Sharded Cluster
• What do I need to know to succeed with sharding?
Q&A
3
What is Sharding?
Sharding is a means of partitioning data across servers to enable:
Scale
needed by
modern
applications
to support
massive work
loads and
data volume.
Geo-Locality
to support
geographically
distributed
deployments to
support optimal UX
for customers across
vast geographies.
Hardware
Optimizations
on Performance vs.
Cost
Lower Recovery Times
to make “Recovery Time Objectives”
(RTO) feasible.
4
What is Sharding?
Sharding involves a shard key defined by a data modeler
that describes the partition space of a data set.
MongoDB provides 3 Types of Sharding Strategies:
• Ranged
• Hashed
• Tag-aware
Data is partitioned into data chunks by the shard key, and
these chunks are distributed evenly across shards that
reside across many physical servers.
5
Range Sharding
Shard Key: {deviceId}
1001……1000 2001……2000 3001……3000 4001……4000
Composite Keys Supported: {deviceId, timestamp}
…1000,1418244824 …1000,1418244825
6
Hash Sharding
Hash Sharding is a subset of Range Sharding.
…3334…3333 …8001…8000 …AAAB…AAAA …DDDF…DDDD
MongoDB apples a MD5 hash on the key when a hash shard key is
used:
Hash Shard Key(deviceId) = MD5(deviceId)
Ensures data is distributed randomly within the range of MD5 values
7
Tag-aware Sharding
Tag-aware sharding allows subset of shards to be tagged, and assigned
to a sub-range of the shard-key.
Tag Start End
West MinKey, MinKey MaxKey,50
East MinKey, 50 MaxKey, MaxKey
Collection: Users, Shard Key: {uId, regionCode}
Secondary
Secondary
Secondary
Secondary
Shard2,
Tag=West
Shard3,
Tag=East
Secondary
Secondary
Shard1,
Tag=West
Secondary
Secondary
Shard4,
Tag=East
Example: Sharding User Data belong to users from 100 “regions”
Assign Regions
1-50 to the West
Assign Regions
51-100 to the
East
8
Applying Sharding
Usage Required Strategy
Scale Range or Hash
Geo-Locality Tag-aware
Hardware Optimization Tag-aware
Lower Recovery Times Range or Hash
9
Sharding for Scale
Performance Scale: Throughput and Latency
Data Scale: Cardinality, Data Volume
10
Typical Small Deployment
Highly Available
but not Scalable
Replica Set
Writes Reads
Limited by capacity
of the Primary’s
host
When Immediate
Consistency
Matters: Limited by
capacity of the
Primary’s host
When Eventual
Consistency is
Acceptable: Limited
by capacity of
available replicaSet
members
11
Sharded Architecture
Auto-balancing:
data is partitioned
based on a shard
key, and
automatically
balanced across
shards by MongoDB
Query Routing: database
operations are
transparently routed
across the cluster
through a routing proxy
process (software).
Horizontal Scalability:
load is distributed and
resources are pooled
across commodity
servers.
Increasing read/write capacity
Decoupling for
Development and
Operational Simplicity:
Ops can add capacity
without app dependencies
and Dev involvement.
13
Value of Scale-out Architecture
System Capacity
$
Scale-up Limits
High
Capacity/$
Scale-out to 100-1000s
of servers
Optimize on Capacity/$
Scale-down and re-allocate
resources
Apps of
the Past
Modern Apps Trend
14
Sharding for Geo-Locality
Adobe Cloud Services among other popular consumer and
Enterprise services use sharding to run servers across multiple data
centers across geographies.
● Amazon - Every 1/10 second delay resulted in 1% loss of
sales.
● Google - Half a second delay caused a 20% drop in traffic.
● Aberdeen Group - 1-second delay in page-load time
o 11% fewer page views
o 16% decrease in customer satisfaction
o 7% loss in conversions
Network latency from West to East is ~80ms
15
Multi-Active DCs via Tag-aware
Sharding
Secondary Secondary
WEST EAST
Local Reads (Eg. Read Preference = Nearest)
Query
Query
Shard 1
Secondary Secondary
Shard 2
Tag = East
Tag = West
Secondary Secondary
Shard 3
Tag = East
Update
Update
Collection: Users, Shard Key: {regionCode,uId}
Priority=10Priority=5
Priority=10 Priority=5
Tag Start End
West MinKey, MinKey MaxKey,50
East MinKey, 50 MaxKey, MaxKey
17
Optimizing Latency and Cost
Event Latency Normalized to 1 s
RAM access 120 ns 6 min
SSD access 150 µs 6 days
HDD access 10 ms 12 months
Storage Type Avg. Cost ($/GB) Cost at 100TB ($)
RAM 5.50 550K
SSD 0.50-1.00 50K to 100K
HDD 0.03 3K
Magnitudes of Difference in Speed
Magnitudes of Difference in Cost
18
Optimizing Latency and Cost
Data Type Description Latency SLA Data Volume
Meta Data Fast look-ups to
drive real-time
decisions
95th
Percentile <
1ms
< 1 TB
Last 90 days of
Metrics
95+% of data
reported and
monitored
95th
Percentile <
30ms
< 10 TB
Historic Used for historic
reporting. Access
infrequently
95th
Percentile < 2s > 100TB
Use Case: Sensor data collected from millions of devices. Data used for
real-time decision automation, real-time monitoring and historical
reporting.
19
Hardware Optimizations
Tag: Cache Tag: Flash Tag: Archive
Collections Tag Start End
Meta Cache MinKey MaxKey
Metrics Flash MinKey,MinKey MaxKey, 90 days ago
Metrics Archive MinKey,>90 days ago MaxKey, MaxKey
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Collection Shard Key
Meta DeviceId
Metrics DeviceId, TimestampHigh Memory Ratio,
Fast Cores
HDD
Medium Memory Ratio,
High Compute
SSDs
Low Memory Ratio,
Medium Compute
HHD
20
Restoration Times
Scenario: Application bug causes logical corruption of the data, and the database
needs to be rolled back to a previous PIT. What’s RTO does your business require in
this event?
N = 10
10X10TB Snapshots generated
and/or transferred in parallel
Total DB Snapshot Size = 100TB
N = 100
100X1TB Snapshots generated
and/or transferred in parallel.
Potentially 10X faster
restoration time.
Tar.g
z
Tar.g
z
Tar.g
z
Tar.g
z
Building Your First
Sharded Cluster
22
Life-cycle of Sharding
Design &
Development
Test/QA Pre-Production Production
Product Definition: Starts with an idea to build something big!
Predictive Maintenance Platform: a cloud platform for building
predictive maintenance applications and services—such as a service
that monitors various vehicle components by collecting data from
sensors, and automatically prescribes actions to take.
•Allow tenant to register, ingest and modify data collected by sensors
•Define and apply workflows and business rules
•Publish and subscribe to notifications
•Data access API for things like reporting
23
Life-cycle of Sharding
Design &
Development
Test/QA Pre-Production Production
Data Modeling: Do I need to Shard?
Throughput: data from millions of sensors updated in real-
time
Latency: The value of certain attributes need to be access
with 95th
percentile < 10ms to support real-time decisions
and automation.
Volume: 1TB of data collected per day. Retained for 5
years.
26
Life-cycle of Sharding
Design &
Development
Test/QA Pre-Production Production
Data Modeling: Select a Good Shard Key
Critical Step
•Sharding is only effective as the shard key.
•Shard Key attributes are immutable.
•Re-sharding is non-trivial. Requires re-partitioning
data.
27
Good Shard Key
Cardinality
Write Distribution
Query Isolation
Reliability
Index Locality
28
Cardinality
Key = Data Center
29
Cardinality
Key = Timestamp
30
Write Distribution
Key = Timestamp
31
Write Distribution
Key = Hash(Timestamp)
32
Query Isolation
Key = Hash(Timestamp)
“Scatter-gather Query”
33
Query Isolation
Key = Hash(DeviceId)
*Assumes bulk of queries on collection are in
context of a single deviceId
34
Reliability
Key = Hash(DeviceId)Key = Hash(Timestamp)
35
Index Locality
 MD5  -0 day-1 day-2 day...
Right balanced index may only
need to be partially in RAM to be
effective.
Working Set
Random Access Index Right Balance Index
Key = Hash(DeviceId) Key = DeviceId, Timestamp
… Device N
36
Good Shard Key
Cardinality
Write Distribution
Query Isolation
Reliability
Index Locality
Key = DeviceId,
Timestamp
Random Sequential
37
Life-cycle of Sharding
Design &
Development
Test/QA Pre-Production Production
Performance Testing: Avoid Pitfalls
Best Practices:
•Pre-split:
1. Hash Shard Key: specify numInitialChunks:
http://docs.mongodb.org/manual/reference/command/shardC
ollection/
2. Custom Shard Key: create a pre-split script:
http://docs.mongodb.org/manual/tutorial/create-chunks-in-
sharded-cluster/
• Run mongos (query router) on app server if possible.
Splits happen on demand
as chunks grow
Migrations happen when an
imbalance is detected
Sharding results in massive
performance degradation!
38
Life-cycle of Sharding
Design &
Development
Test/QA Pre-Production Production
Capacity Planning: How many shards do I need?
Sizing:
•What are the total resources required for your initial
deployment?
•What are the ideal hardware specs, and the # shards
necessary?
Capacity Planning: create a model to scale MongoDB for a
specific app.
•How do I determine when more shards need to be added?
•How much capacity do I gain from adding a shard?
39
How Many Servers?
Strategy Accuracy Level of Effort Feasibility of
Early Project
Analysis
Domain Expert High to Low:
inversely related
to complexity of
the Application
Low Yes
Empirical (Load
Testing)
High High Unlikely
41
Domain Expert
Model and Load Definition
Business Solution
Analysis
Resource Analysis Hardware Specification
• What is the document model? Collections, documents, indexes
• What are the major operations?
- Throughput
- Latency
• What is the working set? Eg. Last 90 days of orders
Normally, performed by MongoDB Solution Architect:
http://bit.ly/1rkXcfN
42
Domain Expert
Model and Load Definition
Business Solution
Analysis
Resource Analysis Hardware Specification
Resource Methodology
RAM Standard: Working Set + Indexes
Adjust more or less depending on latency vs. cost requirements. Very large clusters
should account for connection pooling/thread overhead (1MB per active thread)
IOPs Primarily based on throughput requirements. Writes + estimation on query page
faults. Assume random IO. Account for replication, journal and log (note:
sequential IO). Ideally, estimated empirically through prototype testing. Experts
can use experience from similar applications as an estimate. Spot testing maybe
needed.
Storage Estimate using throughput, document and index size approximations, and
retention requirements. Account for overhead like fragmentation if applicable.
CPU Rarely the bottleneck; a lot less CPU intensive than RDBMs. Using current
commodity CPU specs will suffice.
Network Estimate using throughput and document size approximations.
44
Sizing by Empirical Testing
• Sizing can be more accurately obtained by prototyping your application, and
performing load tests on selected hardware.
• Capacity Planning can be simultaneously accomplished through load testing.
• Past Webinars: http://www.mongodb.com/presentations/webinar-capacity-planning
Strategy:
1. Implement a prototype that can at least simulate major workloads
2. Select an economical server that you plan to scale-out on.
3. Saturate a single replicaSet or shard (maintaining latency SLA as needed). Address
bottlenecks, optimize and repeat.
4. Add an additional shard (as well as mongos and clients as needed). Saturate and
confirm roughly linear scaling.
5. Repeat step 4 until you are able to model capacity gains (throughput + latency)
versus #physical servers.
45
Operational Scale
Design &
Development
Test/QA Pre-Production Production
Business Critical Operations: How do I manage 100s to 1000s of nodes?
MongoDB Management Services (MMS): https://mms.mongodb.com
• Real-time monitoring
and visualization of
cluster health
• Alerting
• Automated cluster
provisioning
• Automation of daily
operational tasks like no-
downtime upgrades
• Centralized configuration
management
• Automated PIT
snapshotting of
clusters
• PITR support for
sharded clusters
46
MMS Automation
Server Resources
(anywhere)
Agent
On-Prem or
SaaS
MMS
48
Scalable, Anywhere
Quick Demo
Get Expert Advice on Scaling. For Free.
For a limited time, if you’re
considering a commercial
relationship with
MongoDB, you can sign up
for a free one hour consult
about scaling with one of
our MongoDB Engineers.
Sign Up: http://bit.ly/1rkXcfN
Webinar Q&A
dylan.tong@mongodb.com
Stay tuned after
the webinar
and take our
survey for your
chance to win
MongoDB swag.
MongoDB Sharding Webinar 2014

Más contenido relacionado

La actualidad más candente

Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Jordan Chung
 
Webinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBWebinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBMongoDB
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing DataWorks Summit
 
MongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB
 
Continuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseDataWorks Summit
 
Why advanced monitoring is key for healthy
Why advanced monitoring is key for healthyWhy advanced monitoring is key for healthy
Why advanced monitoring is key for healthyDenodo
 
Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Data Virtualization Reference Architectures: Correctly Architecting your Solu...Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Data Virtualization Reference Architectures: Correctly Architecting your Solu...Denodo
 
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...MongoDB
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameCloudera, Inc.
 
A Comparison of EDB Postgres to Self-Supported PostgreSQL
A Comparison of EDB Postgres to Self-Supported PostgreSQLA Comparison of EDB Postgres to Self-Supported PostgreSQL
A Comparison of EDB Postgres to Self-Supported PostgreSQLEDB
 
Building an analytical platform
Building an analytical platformBuilding an analytical platform
Building an analytical platformDavid Walker
 
Cloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsCloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsSeeling Cheung
 
MongoDB and RDBMS: Using Polyglot Persistence at Equifax
MongoDB and RDBMS: Using Polyglot Persistence at Equifax MongoDB and RDBMS: Using Polyglot Persistence at Equifax
MongoDB and RDBMS: Using Polyglot Persistence at Equifax MongoDB
 
Data architecture for modern enterprise
Data architecture for modern enterpriseData architecture for modern enterprise
Data architecture for modern enterprisekayalvizhi kandasamy
 
Cloud Economics
Cloud EconomicsCloud Economics
Cloud EconomicsRackspace
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationMongoDB
 
Entity Resolution Service - Bringing Petabytes of Data Online for Instant Access
Entity Resolution Service - Bringing Petabytes of Data Online for Instant AccessEntity Resolution Service - Bringing Petabytes of Data Online for Instant Access
Entity Resolution Service - Bringing Petabytes of Data Online for Instant AccessDataWorks Summit
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?Slim Baltagi
 

La actualidad más candente (20)

GridGain & Hadoop: Differences & Synergies
GridGain & Hadoop: Differences & SynergiesGridGain & Hadoop: Differences & Synergies
GridGain & Hadoop: Differences & Synergies
 
Big-Data Server Farm Architecture
Big-Data Server Farm Architecture Big-Data Server Farm Architecture
Big-Data Server Farm Architecture
 
Webinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBWebinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDB
 
Ibm big data ibm marriage of hadoop and data warehousing
Ibm big dataibm marriage of hadoop and data warehousingIbm big dataibm marriage of hadoop and data warehousing
Ibm big data ibm marriage of hadoop and data warehousing
 
In-Memory Data Grids: Explained...
In-Memory Data Grids: Explained...In-Memory Data Grids: Explained...
In-Memory Data Grids: Explained...
 
MongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB Operations for Developers
MongoDB Operations for Developers
 
Continuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the Enterprise
 
Why advanced monitoring is key for healthy
Why advanced monitoring is key for healthyWhy advanced monitoring is key for healthy
Why advanced monitoring is key for healthy
 
Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Data Virtualization Reference Architectures: Correctly Architecting your Solu...Data Virtualization Reference Architectures: Correctly Architecting your Solu...
Data Virtualization Reference Architectures: Correctly Architecting your Solu...
 
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
MongoDB Evenings Houston: What's the Scoop on MongoDB and Hadoop? by Jake Ang...
 
The Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the SameThe Future of Data Warehousing: ETL Will Never be the Same
The Future of Data Warehousing: ETL Will Never be the Same
 
A Comparison of EDB Postgres to Self-Supported PostgreSQL
A Comparison of EDB Postgres to Self-Supported PostgreSQLA Comparison of EDB Postgres to Self-Supported PostgreSQL
A Comparison of EDB Postgres to Self-Supported PostgreSQL
 
Building an analytical platform
Building an analytical platformBuilding an analytical platform
Building an analytical platform
 
Cloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and AnalyticsCloud Based Data Warehousing and Analytics
Cloud Based Data Warehousing and Analytics
 
MongoDB and RDBMS: Using Polyglot Persistence at Equifax
MongoDB and RDBMS: Using Polyglot Persistence at Equifax MongoDB and RDBMS: Using Polyglot Persistence at Equifax
MongoDB and RDBMS: Using Polyglot Persistence at Equifax
 
Data architecture for modern enterprise
Data architecture for modern enterpriseData architecture for modern enterprise
Data architecture for modern enterprise
 
Cloud Economics
Cloud EconomicsCloud Economics
Cloud Economics
 
Creating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital TransformationCreating a Modern Data Architecture for Digital Transformation
Creating a Modern Data Architecture for Digital Transformation
 
Entity Resolution Service - Bringing Petabytes of Data Online for Instant Access
Entity Resolution Service - Bringing Petabytes of Data Online for Instant AccessEntity Resolution Service - Bringing Petabytes of Data Online for Instant Access
Entity Resolution Service - Bringing Petabytes of Data Online for Instant Access
 
How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 

Destacado

MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Bl...
MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Bl...MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Bl...
MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Bl...MongoDB
 
EclipseConEurope2012 SOA - Talend with EasySOA
EclipseConEurope2012 SOA - Talend with EasySOAEclipseConEurope2012 SOA - Talend with EasySOA
EclipseConEurope2012 SOA - Talend with EasySOAMarc Dutoo
 
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...MongoDB
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity PlanningMongoDB
 
HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaTed Dunning
 
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte RangeScaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte RangeMongoDB
 
Building a High-Performance Distributed Task Queue on MongoDB
Building a High-Performance Distributed Task Queue on MongoDBBuilding a High-Performance Distributed Task Queue on MongoDB
Building a High-Performance Distributed Task Queue on MongoDBMongoDB
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)MongoSF
 
Mongodb sharding
Mongodb shardingMongodb sharding
Mongodb shardingxiangrong
 
Event-Based Subscription with MongoDB
Event-Based Subscription with MongoDBEvent-Based Subscription with MongoDB
Event-Based Subscription with MongoDBMongoDB
 
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...Gabriele Baldassarre
 
Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)
Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)
Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)Kai Wähner
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDBMongoDB
 
PHP, Lithium and MongoDB
PHP, Lithium and MongoDBPHP, Lithium and MongoDB
PHP, Lithium and MongoDBMitch Pirtle
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation FrameworkMongoDB
 
Back to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica SetsBack to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica SetsMongoDB
 
Back to Basics 2017: Introduction to Sharding
Back to Basics 2017: Introduction to ShardingBack to Basics 2017: Introduction to Sharding
Back to Basics 2017: Introduction to ShardingMongoDB
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBMongoDB
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessMongoDB
 
MongoDB as Message Queue
MongoDB as Message QueueMongoDB as Message Queue
MongoDB as Message QueueMongoDB
 

Destacado (20)

MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Bl...
MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Bl...MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Bl...
MongoDB San Francisco 2013: Basic Sharding in MongoDB presented by Brandon Bl...
 
EclipseConEurope2012 SOA - Talend with EasySOA
EclipseConEurope2012 SOA - Talend with EasySOAEclipseConEurope2012 SOA - Talend with EasySOA
EclipseConEurope2012 SOA - Talend with EasySOA
 
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
MongoDB San Francisco 2013: Hash-based Sharding in MongoDB 2.4 presented by B...
 
Capacity Planning
Capacity PlanningCapacity Planning
Capacity Planning
 
HPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with KattaHPTS talk on micro-sharding with Katta
HPTS talk on micro-sharding with Katta
 
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte RangeScaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
Scaling MongoDB; Sharding Into and Beyond the Multi-Terabyte Range
 
Building a High-Performance Distributed Task Queue on MongoDB
Building a High-Performance Distributed Task Queue on MongoDBBuilding a High-Performance Distributed Task Queue on MongoDB
Building a High-Performance Distributed Task Queue on MongoDB
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)
 
Mongodb sharding
Mongodb shardingMongodb sharding
Mongodb sharding
 
Event-Based Subscription with MongoDB
Event-Based Subscription with MongoDBEvent-Based Subscription with MongoDB
Event-Based Subscription with MongoDB
 
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
Talend Open Studio Fundamentals #1: Workspaces, Jobs, Metadata and Trips & Tr...
 
Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)
Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)
Enterprise Integration Patterns Revisited (EIP, Apache Camel, Talend ESB)
 
Sharding Methods for MongoDB
Sharding Methods for MongoDBSharding Methods for MongoDB
Sharding Methods for MongoDB
 
PHP, Lithium and MongoDB
PHP, Lithium and MongoDBPHP, Lithium and MongoDB
PHP, Lithium and MongoDB
 
The Aggregation Framework
The Aggregation FrameworkThe Aggregation Framework
The Aggregation Framework
 
Back to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica SetsBack to Basics Webinar 3: Introduction to Replica Sets
Back to Basics Webinar 3: Introduction to Replica Sets
 
Back to Basics 2017: Introduction to Sharding
Back to Basics 2017: Introduction to ShardingBack to Basics 2017: Introduction to Sharding
Back to Basics 2017: Introduction to Sharding
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 
Webinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your BusinessWebinar: 10-Step Guide to Creating a Single View of your Business
Webinar: 10-Step Guide to Creating a Single View of your Business
 
MongoDB as Message Queue
MongoDB as Message QueueMongoDB as Message Queue
MongoDB as Message Queue
 

Similar a MongoDB Sharding Webinar 2014

Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About ShardingMongoDB
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj
 
Cignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysCignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysMongoDB APAC
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...Antonios Giannopoulos
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDBMongoDB
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data gridBogdan Dina
 
Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDBMongoDB
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at ScaleMongoDB
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareData Con LA
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8MongoDB
 
Accra MongoDB User Group
Accra MongoDB User GroupAccra MongoDB User Group
Accra MongoDB User GroupMongoDB
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...Amazon Web Services
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed_Hat_Storage
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xOscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xshradha ambekar
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics PlatformSantanu Dey
 

Similar a MongoDB Sharding Webinar 2014 (20)

Everything You Need to Know About Sharding
Everything You Need to Know About ShardingEverything You Need to Know About Sharding
Everything You Need to Know About Sharding
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Cignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdaysCignex mongodb-sharding-mongodbdays
Cignex mongodb-sharding-mongodbdays
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
Data has a better idea the in-memory data grid
Data has a better idea   the in-memory data gridData has a better idea   the in-memory data grid
Data has a better idea the in-memory data grid
 
Scaling MongoDB
Scaling MongoDBScaling MongoDB
Scaling MongoDB
 
MongoDB at Scale
MongoDB at ScaleMongoDB at Scale
MongoDB at Scale
 
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout SoftwareMaking Hadoop Realtime by Dr. William Bain of Scaleout Software
Making Hadoop Realtime by Dr. William Bain of Scaleout Software
 
Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8Webinar: High Performance MongoDB Applications with IBM POWER8
Webinar: High Performance MongoDB Applications with IBM POWER8
 
Accra MongoDB User Group
Accra MongoDB User GroupAccra MongoDB User Group
Accra MongoDB User Group
 
optimizing_ceph_flash
optimizing_ceph_flashoptimizing_ceph_flash
optimizing_ceph_flash
 
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
AWS re:Invent 2016: Amazon CloudFront Flash Talks: Best Practices on Configur...
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xOscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
 
AquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks PresentationAquaQ Analytics Kx Event - Data Direct Networks Presentation
AquaQ Analytics Kx Event - Data Direct Networks Presentation
 
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14thSnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
SnappyData Ad Analytics Use Case -- BDAM Meetup Sept 14th
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 

Último

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 

Último (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 

MongoDB Sharding Webinar 2014

  • 1. Everything You Need to Know About Sharding Dylan Tong dylan.tong@mongodb.com Senior Solutions Architect
  • 2. 2 Agenda Overview • What is sharding? • Why and what should I use sharding for? Building your First Sharded Cluster • What do I need to know to succeed with sharding? Q&A
  • 3. 3 What is Sharding? Sharding is a means of partitioning data across servers to enable: Scale needed by modern applications to support massive work loads and data volume. Geo-Locality to support geographically distributed deployments to support optimal UX for customers across vast geographies. Hardware Optimizations on Performance vs. Cost Lower Recovery Times to make “Recovery Time Objectives” (RTO) feasible.
  • 4. 4 What is Sharding? Sharding involves a shard key defined by a data modeler that describes the partition space of a data set. MongoDB provides 3 Types of Sharding Strategies: • Ranged • Hashed • Tag-aware Data is partitioned into data chunks by the shard key, and these chunks are distributed evenly across shards that reside across many physical servers.
  • 5. 5 Range Sharding Shard Key: {deviceId} 1001……1000 2001……2000 3001……3000 4001……4000 Composite Keys Supported: {deviceId, timestamp} …1000,1418244824 …1000,1418244825
  • 6. 6 Hash Sharding Hash Sharding is a subset of Range Sharding. …3334…3333 …8001…8000 …AAAB…AAAA …DDDF…DDDD MongoDB apples a MD5 hash on the key when a hash shard key is used: Hash Shard Key(deviceId) = MD5(deviceId) Ensures data is distributed randomly within the range of MD5 values
  • 7. 7 Tag-aware Sharding Tag-aware sharding allows subset of shards to be tagged, and assigned to a sub-range of the shard-key. Tag Start End West MinKey, MinKey MaxKey,50 East MinKey, 50 MaxKey, MaxKey Collection: Users, Shard Key: {uId, regionCode} Secondary Secondary Secondary Secondary Shard2, Tag=West Shard3, Tag=East Secondary Secondary Shard1, Tag=West Secondary Secondary Shard4, Tag=East Example: Sharding User Data belong to users from 100 “regions” Assign Regions 1-50 to the West Assign Regions 51-100 to the East
  • 8. 8 Applying Sharding Usage Required Strategy Scale Range or Hash Geo-Locality Tag-aware Hardware Optimization Tag-aware Lower Recovery Times Range or Hash
  • 9. 9 Sharding for Scale Performance Scale: Throughput and Latency Data Scale: Cardinality, Data Volume
  • 10. 10 Typical Small Deployment Highly Available but not Scalable Replica Set Writes Reads Limited by capacity of the Primary’s host When Immediate Consistency Matters: Limited by capacity of the Primary’s host When Eventual Consistency is Acceptable: Limited by capacity of available replicaSet members
  • 11. 11 Sharded Architecture Auto-balancing: data is partitioned based on a shard key, and automatically balanced across shards by MongoDB Query Routing: database operations are transparently routed across the cluster through a routing proxy process (software). Horizontal Scalability: load is distributed and resources are pooled across commodity servers. Increasing read/write capacity Decoupling for Development and Operational Simplicity: Ops can add capacity without app dependencies and Dev involvement.
  • 12. 13 Value of Scale-out Architecture System Capacity $ Scale-up Limits High Capacity/$ Scale-out to 100-1000s of servers Optimize on Capacity/$ Scale-down and re-allocate resources Apps of the Past Modern Apps Trend
  • 13. 14 Sharding for Geo-Locality Adobe Cloud Services among other popular consumer and Enterprise services use sharding to run servers across multiple data centers across geographies. ● Amazon - Every 1/10 second delay resulted in 1% loss of sales. ● Google - Half a second delay caused a 20% drop in traffic. ● Aberdeen Group - 1-second delay in page-load time o 11% fewer page views o 16% decrease in customer satisfaction o 7% loss in conversions Network latency from West to East is ~80ms
  • 14. 15 Multi-Active DCs via Tag-aware Sharding Secondary Secondary WEST EAST Local Reads (Eg. Read Preference = Nearest) Query Query Shard 1 Secondary Secondary Shard 2 Tag = East Tag = West Secondary Secondary Shard 3 Tag = East Update Update Collection: Users, Shard Key: {regionCode,uId} Priority=10Priority=5 Priority=10 Priority=5 Tag Start End West MinKey, MinKey MaxKey,50 East MinKey, 50 MaxKey, MaxKey
  • 15. 17 Optimizing Latency and Cost Event Latency Normalized to 1 s RAM access 120 ns 6 min SSD access 150 µs 6 days HDD access 10 ms 12 months Storage Type Avg. Cost ($/GB) Cost at 100TB ($) RAM 5.50 550K SSD 0.50-1.00 50K to 100K HDD 0.03 3K Magnitudes of Difference in Speed Magnitudes of Difference in Cost
  • 16. 18 Optimizing Latency and Cost Data Type Description Latency SLA Data Volume Meta Data Fast look-ups to drive real-time decisions 95th Percentile < 1ms < 1 TB Last 90 days of Metrics 95+% of data reported and monitored 95th Percentile < 30ms < 10 TB Historic Used for historic reporting. Access infrequently 95th Percentile < 2s > 100TB Use Case: Sensor data collected from millions of devices. Data used for real-time decision automation, real-time monitoring and historical reporting.
  • 17. 19 Hardware Optimizations Tag: Cache Tag: Flash Tag: Archive Collections Tag Start End Meta Cache MinKey MaxKey Metrics Flash MinKey,MinKey MaxKey, 90 days ago Metrics Archive MinKey,>90 days ago MaxKey, MaxKey Secondary Secondary Secondary Secondary Secondary Secondary Secondary Secondary Secondary Secondary Secondary Secondary Secondary Secondary Collection Shard Key Meta DeviceId Metrics DeviceId, TimestampHigh Memory Ratio, Fast Cores HDD Medium Memory Ratio, High Compute SSDs Low Memory Ratio, Medium Compute HHD
  • 18. 20 Restoration Times Scenario: Application bug causes logical corruption of the data, and the database needs to be rolled back to a previous PIT. What’s RTO does your business require in this event? N = 10 10X10TB Snapshots generated and/or transferred in parallel Total DB Snapshot Size = 100TB N = 100 100X1TB Snapshots generated and/or transferred in parallel. Potentially 10X faster restoration time. Tar.g z Tar.g z Tar.g z Tar.g z
  • 20. 22 Life-cycle of Sharding Design & Development Test/QA Pre-Production Production Product Definition: Starts with an idea to build something big! Predictive Maintenance Platform: a cloud platform for building predictive maintenance applications and services—such as a service that monitors various vehicle components by collecting data from sensors, and automatically prescribes actions to take. •Allow tenant to register, ingest and modify data collected by sensors •Define and apply workflows and business rules •Publish and subscribe to notifications •Data access API for things like reporting
  • 21. 23 Life-cycle of Sharding Design & Development Test/QA Pre-Production Production Data Modeling: Do I need to Shard? Throughput: data from millions of sensors updated in real- time Latency: The value of certain attributes need to be access with 95th percentile < 10ms to support real-time decisions and automation. Volume: 1TB of data collected per day. Retained for 5 years.
  • 22. 26 Life-cycle of Sharding Design & Development Test/QA Pre-Production Production Data Modeling: Select a Good Shard Key Critical Step •Sharding is only effective as the shard key. •Shard Key attributes are immutable. •Re-sharding is non-trivial. Requires re-partitioning data.
  • 23. 27 Good Shard Key Cardinality Write Distribution Query Isolation Reliability Index Locality
  • 27. 31 Write Distribution Key = Hash(Timestamp)
  • 28. 32 Query Isolation Key = Hash(Timestamp) “Scatter-gather Query”
  • 29. 33 Query Isolation Key = Hash(DeviceId) *Assumes bulk of queries on collection are in context of a single deviceId
  • 31. 35 Index Locality  MD5  -0 day-1 day-2 day... Right balanced index may only need to be partially in RAM to be effective. Working Set Random Access Index Right Balance Index Key = Hash(DeviceId) Key = DeviceId, Timestamp … Device N
  • 32. 36 Good Shard Key Cardinality Write Distribution Query Isolation Reliability Index Locality Key = DeviceId, Timestamp Random Sequential
  • 33. 37 Life-cycle of Sharding Design & Development Test/QA Pre-Production Production Performance Testing: Avoid Pitfalls Best Practices: •Pre-split: 1. Hash Shard Key: specify numInitialChunks: http://docs.mongodb.org/manual/reference/command/shardC ollection/ 2. Custom Shard Key: create a pre-split script: http://docs.mongodb.org/manual/tutorial/create-chunks-in- sharded-cluster/ • Run mongos (query router) on app server if possible. Splits happen on demand as chunks grow Migrations happen when an imbalance is detected Sharding results in massive performance degradation!
  • 34. 38 Life-cycle of Sharding Design & Development Test/QA Pre-Production Production Capacity Planning: How many shards do I need? Sizing: •What are the total resources required for your initial deployment? •What are the ideal hardware specs, and the # shards necessary? Capacity Planning: create a model to scale MongoDB for a specific app. •How do I determine when more shards need to be added? •How much capacity do I gain from adding a shard?
  • 35. 39 How Many Servers? Strategy Accuracy Level of Effort Feasibility of Early Project Analysis Domain Expert High to Low: inversely related to complexity of the Application Low Yes Empirical (Load Testing) High High Unlikely
  • 36. 41 Domain Expert Model and Load Definition Business Solution Analysis Resource Analysis Hardware Specification • What is the document model? Collections, documents, indexes • What are the major operations? - Throughput - Latency • What is the working set? Eg. Last 90 days of orders Normally, performed by MongoDB Solution Architect: http://bit.ly/1rkXcfN
  • 37. 42 Domain Expert Model and Load Definition Business Solution Analysis Resource Analysis Hardware Specification Resource Methodology RAM Standard: Working Set + Indexes Adjust more or less depending on latency vs. cost requirements. Very large clusters should account for connection pooling/thread overhead (1MB per active thread) IOPs Primarily based on throughput requirements. Writes + estimation on query page faults. Assume random IO. Account for replication, journal and log (note: sequential IO). Ideally, estimated empirically through prototype testing. Experts can use experience from similar applications as an estimate. Spot testing maybe needed. Storage Estimate using throughput, document and index size approximations, and retention requirements. Account for overhead like fragmentation if applicable. CPU Rarely the bottleneck; a lot less CPU intensive than RDBMs. Using current commodity CPU specs will suffice. Network Estimate using throughput and document size approximations.
  • 38. 44 Sizing by Empirical Testing • Sizing can be more accurately obtained by prototyping your application, and performing load tests on selected hardware. • Capacity Planning can be simultaneously accomplished through load testing. • Past Webinars: http://www.mongodb.com/presentations/webinar-capacity-planning Strategy: 1. Implement a prototype that can at least simulate major workloads 2. Select an economical server that you plan to scale-out on. 3. Saturate a single replicaSet or shard (maintaining latency SLA as needed). Address bottlenecks, optimize and repeat. 4. Add an additional shard (as well as mongos and clients as needed). Saturate and confirm roughly linear scaling. 5. Repeat step 4 until you are able to model capacity gains (throughput + latency) versus #physical servers.
  • 39. 45 Operational Scale Design & Development Test/QA Pre-Production Production Business Critical Operations: How do I manage 100s to 1000s of nodes? MongoDB Management Services (MMS): https://mms.mongodb.com • Real-time monitoring and visualization of cluster health • Alerting • Automated cluster provisioning • Automation of daily operational tasks like no- downtime upgrades • Centralized configuration management • Automated PIT snapshotting of clusters • PITR support for sharded clusters
  • 42. Get Expert Advice on Scaling. For Free. For a limited time, if you’re considering a commercial relationship with MongoDB, you can sign up for a free one hour consult about scaling with one of our MongoDB Engineers. Sign Up: http://bit.ly/1rkXcfN
  • 43. Webinar Q&A dylan.tong@mongodb.com Stay tuned after the webinar and take our survey for your chance to win MongoDB swag.

Notas del editor

  1. EA Sports FIFA: world&amp;apos;s best-selling sports video game franchise. User data and game state for millions of players, Yandex: The largest search engine in Russia uses MongoDB to manage all user and metadata for its file sharing service. MongoDB has scaled to support tens of billions of objects and TBs of data, growing at 10 million new file uploads per day. eBay FourSquare: Foursquare is used by over 50 million people worldwide, who have checked in over 6 billion times, with millions more added every day. MongoDB is Foursquare’s main database, supporting hundreds of thousands of operations per second and storing all check-ins and history, user and venue data along with reviews. AHL, a part of Man Group plc, is a quantitative investment manager based in London and Hong Kong, with over $11.3 billion in assets under management. After evaluating multiple technology options, AHL used MongoDB to replace its relational and specialised &amp;quot;tick&amp;quot; databases. MongoDB supports 250 million ticks per second, at 40x lower cost than the legacy technologies it replaced. Adobe: Many of the world’s most recognizable brands use Adobe Experience Manager to accelerate development of digital experiences that increase customer loyalty, engagement and demand. Adobe uses MongoDB to store petabytes of data the large-scale content repositories underpinning the Experience Manager. MongoDB MMS Back-up: 2PB Mcafee: MongoDB powers McAfee Global Threat Intelligence (GTI), a cloud-based intelligence service that correlates data from millions of sensors around the globe. Billions of documents are stored and analyzed in MongoDB to deliver real-time threat intelligence to other McAfee end-client products. Carfax: CARFAX relies on its Vehicle History database to connect potential buyers with used vehicles in their area, and for analytics to guide the business. To improve customer experience, CARFAX migrated to MongoDB which now manages over 13 billion documents, before replication across multiple data centers.
  2. Limited capacity: monolithic architecture, siloed data access, or complex application-level facilitated sharding that is difficult and expensive to successfully implement Scale down: system with seasonal load (ex. Online games- popular from the start and fade over time). Sharding enables scaling-down to reallocate resources on bare-metal servers. Can’t do the same for a database appliance
  3. Hash Key isn’t always a good key. Can be good for the most most simplistic k-v access patterns, but any query retrieving multiple documents like range based queries can potentially lead to scatter-gather queries, which have high overhead and impair the ability to scale linearly.
  4. www.mongodb.com/lp/contact/scaling-101 http://www.mongodb.com/lp/contact/planning-for-scale