SlideShare una empresa de Scribd logo
1 de 31
MongoDB
https://www.mongodb.com/
Prutha Date (dprutha1@umbc.edu)
Siraj Memon (siraj1@umbc.edu)
Outline
• Introduction to MongoDB
• Storage Layout
• Data Management Features
• Performance Analysis
• Limitations
• Conclusion
• Demo
• References
What is MongoDB?
• MongoDB is a NoSQL Document-Oriented database.
• It provides semi-structured flexible schema.
• It provides high performance, high availability, and easy scalability.
• MongoDB is free and open source software.
• License: GNU Affero General Public License (AGPL) and Apache License
• MongoDB is a server process that runs on Linux, Windows and OS X. It can
be run both as a 32 or 64-bit application.
When to use MongoDB?
“Knowing when to use a hammer, and when to use a screwdriver.”
• Account and user profiles: can store arrays of addresses with ease (MetLife)
• Content Management Systems (CMS): the flexible schema of MongoDB is great for heterogeneous
collections of content types (MongoPress)
• Form data: MongoDB makes it easy to evolve the structure of form data over time (ADP)
• Blogs / user-generated content: can keep data with complex relationships together in one object (Forbes,
AOL)
• Messaging: vary message meta-data easily per message or message type without needing to maintain
separate collections or schemas (Viber)
• System configuration: just a nice object graph of configuration values, which is very natural in MongoDB
(Cisco)
• Log data of any kind: structured log data is the future (ebay)
• Location based systems: makes use of Geospatial indices (Foursquare, City government of Chicago)
Terminologies – RDBMS vs MongoDB
*JSON – JavaScript Object Notation
Storage Internals - Directory Layout
Data Directory is found at /data/db
Internal File Format
Extent Structure
Extents and Records
To Sum Up: Internal File Format
• Files on disk are broken into extents which contain the documents.
• A collection has one or more extents.
• Extent grow exponentially up to 2GB.
• Namespace entries in the ns (namespace) file point to the first extent
for that collection.
Virtual Address Space
Storage Engine - MMAP (Memory Mapped)
• All data files are memory mapped to Virtual Memory by the
OS.
• MongoDB just reads / writes to RAM in the filesystem cache
• OS takes care of the rest!
• Virtual process size = total files size + overhead (connections,
heap)
• Uses Memory-mapped file using mmap() system call.
Storage Engine - WiredTiger
• Designed especially for Write-Intensive applications
• Document level locking
• Compression and Record-level locking
• Multi-version concurrency control (MVCC)
• Multi-document transactions
• Support for Log Structured Merge (LSM) trees for very high
insert workloads
What makes MongoDB cool?
• Sharding
• Aggregation Framework and Map-Reduce
• Capped Collection
• GridFS
• Geo-Spatial Indexing
Sharding
• Horizontal scaling - divides the data set and distributes the data over
multiple servers, or shards.
• Used to support deployments with very large data sets and high
throughput operations.
• Sharded Cluster Components –
• Shards – mongod instance or replica sets
• Config Server – Multiple mongod instances
• Routing Instances – Multiple mongos instances
• Shards are divided into fixed size chunks using ranges of shard key
values.
Sharding Internals
Choosing a Shard key
The choice of shard key affects:
• Distribution of reads and writes
• Uneven distribution of reads/writes across shards.
• Solution – Hashed ids
• Size of chunks
• Jumbo chunks cause uneven distribution of data.
• Moving data between shards becomes difficult.
• Solution – Multi-tenant compound index
• The number of shards each query hits
Aggregation Framework
• Aggregation Pipeline
• Map-Reduce
• Single Purpose Aggregation Operations (deprecated in latest version)
Aggregation Pipeline
• The aggregation pipeline is a framework for performing aggregation
tasks, modeled on the concept of data processing pipelines.
• Using this framework, MongoDB passes the documents of a single
collection through a pipeline.
• The pipeline transforms the documents into aggregated results, and is
accessed through the aggregate database command.
• Operators: $match, $project, $unwind, $sort, $limit
• User gets to choose the operator.
Aggregation Pipeline - Example
Continued…
Map-Reduce
Capped Collection
• Fixed size collection called capped collection
• Use the db.createCollection command and marked it as capped
• e.g - db.createCollection(‘logs’, {capped: true, size: 2097152})
• When it reaches the size limit, old documents are automatically
removed
• Guarantees preservation of the insertion order
• Maintains insertion order identical to the order on disk by prohibiting
updates that increase document size
• Allows the use of tailable cursor to retrieve documents
GridFS
• GridFS is a specification for storing and retrieving files that exceed
the BSON (binary JSON) document size limit of 16MB.
• Instead of storing a file in a single document, GridFS divides a file into
parts, or chunks, and stores each of those chunks as a separate
document.
• By default GridFS limits chunk size to 255k.
• GridFS uses two collections to store files. One collection stores the file
chunks, and the other stores file metadata.
• GridFS is useful not only for storing files that exceed 16MB but also
for storing any files for which you want access without having to load
the entire file into memory.
GeoSpatial Indexing
• To support efficient queries of geospatial coordinate data, MongoDB
provides two special indexes:
• 2d indexes that uses planar geometry when returning results.
• 2sphere indexes that use spherical geometry to return results.
• Store location data as GeoJSON objects with this coordinate-axis
order: longitude, latitude.
• GeoJSON Object Supported: Point, LineString, Polygon, etc.
• Query Operations: Inclusion, Intersection, Proximity.
• You cannot use a geospatial index as the shard key index.
Performance Analysis
• Yahoo! Cloud Serving Benchmark (YCSB)
• Throughput (ops/second)
WORKLOADS Cassandra Couchbase MongoDB
50% read, 50% update 134,839 106,638 160,719
95% read, 5% update 144,455 187,798 196,498
50% read, 50% update
(Durability Optimized)
6,289 1,236 31,864
Limitations
• Need to have enough memory to fit your working set into memory,
otherwise performance might suffer.
• MapReduce and Aggregation are single-threaded. To be more specific,
one per mongod.
• No joins across collections.
• On 32-bit, it has limitation of 2.5 Gb data.
• Sharding has some unique exceptions. If you plan to shard your data,
you need to shard early as some things that are feasible on a single
server are not feasible on a sharded collection.
Conclusion
• MongoDB is a semi-structured document-oriented NoSQL Database.
• It has two storage engines: MMAP and WiredTiger
• Multiple Aggregation Frameworks: Aggregation Pipeline and Map-
Reduce
• Support for GridFS, GeoSpatial Indexing, Capped Collection
• Better Performance as compared to Cassandra and Couchbase.
• On-going work – In-memory and HDFS support
DEMO
References
• https://www.mongodb.com/presentations/storage-engine-internals
• http://docs.mongodb.org/manual/core/data-modeling-introduction/
• http://docs.mongodb.org/manual/core/aggregation-introduction/
• https://2013.nosql-matters.org/bcn/wp-content/uploads/2013/12/storage-talk-
mongodb.pdf
• http://info-mongodb-com.s3.amazonaws.com/High Performance Benchmark White
Paper final.pdf
• https://www.mongodb.com/collateral/mongodb-architecture-guide
• Book - MongoDB: The Definitive Guide by Kristina Chodorow and Michael Dirolf
Questions?
Thank you!

Más contenido relacionado

La actualidad más candente

A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerMongoDB
 
Dependency Injection in Apache Spark Applications
Dependency Injection in Apache Spark ApplicationsDependency Injection in Apache Spark Applications
Dependency Injection in Apache Spark ApplicationsDatabricks
 
Apache HBase - Just the Basics
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the BasicsHBaseCon
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBNodeXperts
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]MongoDB
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architectureBishal Khanal
 
PostgreSQL and JDBC: striving for high performance
PostgreSQL and JDBC: striving for high performancePostgreSQL and JDBC: striving for high performance
PostgreSQL and JDBC: striving for high performanceVladimir Sitnikov
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Julien Le Dem
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremRahul Jain
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineJason Terpko
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELKGeert Pante
 
Data Modeling for MongoDB
Data Modeling for MongoDBData Modeling for MongoDB
Data Modeling for MongoDBMongoDB
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseMike Dirolf
 
Mongo DB schema design patterns
Mongo DB schema design patternsMongo DB schema design patterns
Mongo DB schema design patternsjoergreichert
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMydbops
 
Vectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at FacebookVectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at FacebookDatabricks
 

La actualidad más candente (20)

A Technical Introduction to WiredTiger
A Technical Introduction to WiredTigerA Technical Introduction to WiredTiger
A Technical Introduction to WiredTiger
 
Mongo indexes
Mongo indexesMongo indexes
Mongo indexes
 
Introduction to mongodb
Introduction to mongodbIntroduction to mongodb
Introduction to mongodb
 
Dependency Injection in Apache Spark Applications
Dependency Injection in Apache Spark ApplicationsDependency Injection in Apache Spark Applications
Dependency Injection in Apache Spark Applications
 
Apache HBase - Just the Basics
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the Basics
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
Naver속도의, 속도에 의한, 속도를 위한 몽고DB (네이버 컨텐츠검색과 몽고DB) [Naver]
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
 
PostgreSQL and JDBC: striving for high performance
PostgreSQL and JDBC: striving for high performancePostgreSQL and JDBC: striving for high performance
PostgreSQL and JDBC: striving for high performance
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELK
 
Data Modeling for MongoDB
Data Modeling for MongoDBData Modeling for MongoDB
Data Modeling for MongoDB
 
Inside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source DatabaseInside MongoDB: the Internals of an Open-Source Database
Inside MongoDB: the Internals of an Open-Source Database
 
Mongo DB schema design patterns
Mongo DB schema design patternsMongo DB schema design patterns
Mongo DB schema design patterns
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
 
Vectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at FacebookVectorized Query Execution in Apache Spark at Facebook
Vectorized Query Execution in Apache Spark at Facebook
 
MongoDB
MongoDBMongoDB
MongoDB
 

Destacado

Evolution and Scaling of MongoDB Management Service Running on MongoDB
Evolution and Scaling of MongoDB Management Service Running on MongoDBEvolution and Scaling of MongoDB Management Service Running on MongoDB
Evolution and Scaling of MongoDB Management Service Running on MongoDBMongoDB
 
Evolution of mongodb
Evolution of mongodbEvolution of mongodb
Evolution of mongodbanshuman ravi
 
MongoDB gridfs
MongoDB gridfsMongoDB gridfs
MongoDB gridfsXue Wei
 
Microsoft Hekaton
Microsoft HekatonMicrosoft Hekaton
Microsoft HekatonSiraj Memon
 
Getting Started with MongoDB and NodeJS
Getting Started with MongoDB and NodeJSGetting Started with MongoDB and NodeJS
Getting Started with MongoDB and NodeJSMongoDB
 
MongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo dbAmit Thakkar
 
Gridfs and MongoDB
Gridfs and MongoDBGridfs and MongoDB
Gridfs and MongoDBMitch Pirtle
 
MongoDB- Crud Operation
MongoDB- Crud OperationMongoDB- Crud Operation
MongoDB- Crud OperationEdureka!
 
MongoDB on EC2 and EBS
MongoDB on EC2 and EBSMongoDB on EC2 and EBS
MongoDB on EC2 and EBSJared Rosoff
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBMongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBEdureka!
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databasesArangoDB Database
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsMongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 

Destacado (19)

Tim marston
Tim marstonTim marston
Tim marston
 
Evolution and Scaling of MongoDB Management Service Running on MongoDB
Evolution and Scaling of MongoDB Management Service Running on MongoDBEvolution and Scaling of MongoDB Management Service Running on MongoDB
Evolution and Scaling of MongoDB Management Service Running on MongoDB
 
Evolution of mongodb
Evolution of mongodbEvolution of mongodb
Evolution of mongodb
 
MongoDB gridfs
MongoDB gridfsMongoDB gridfs
MongoDB gridfs
 
Microsoft Hekaton
Microsoft HekatonMicrosoft Hekaton
Microsoft Hekaton
 
Getting Started with MongoDB and NodeJS
Getting Started with MongoDB and NodeJSGetting Started with MongoDB and NodeJS
Getting Started with MongoDB and NodeJS
 
MongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB Operations for Developers
MongoDB Operations for Developers
 
MongoDB
MongoDBMongoDB
MongoDB
 
Get expertise with mongo db
Get expertise with mongo dbGet expertise with mongo db
Get expertise with mongo db
 
Gridfs and MongoDB
Gridfs and MongoDBGridfs and MongoDB
Gridfs and MongoDB
 
MongoDB- Crud Operation
MongoDB- Crud OperationMongoDB- Crud Operation
MongoDB- Crud Operation
 
MongoDB on EC2 and EBS
MongoDB on EC2 and EBSMongoDB on EC2 and EBS
MongoDB on EC2 and EBS
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Introduction to column oriented databases
Introduction to column oriented databasesIntroduction to column oriented databases
Introduction to column oriented databases
 
Webinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in DocumentsWebinar: Back to Basics: Thinking in Documents
Webinar: Back to Basics: Thinking in Documents
 
Mongo DB
Mongo DBMongo DB
Mongo DB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Grid FS
Grid FSGrid FS
Grid FS
 

Similar a MongoDB Internals

MongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & PerformanceMongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & PerformanceSasidhar Gogulapati
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring dataJimmy Ray
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSMongoDB
 
Running MongoDB on AWS
Running MongoDB on AWSRunning MongoDB on AWS
Running MongoDB on AWSMongoDB
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreducehansen3032
 
Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónMongoDB
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Emprovise
 
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB
 
Scaling MongoDB - Presentation at MTP
Scaling MongoDB - Presentation at MTPScaling MongoDB - Presentation at MTP
Scaling MongoDB - Presentation at MTPdarkdata
 
The Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb ClusterThe Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb ClusterChris Henry
 
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...MongoDB
 
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...MongoDB
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsDataStax
 

Similar a MongoDB Internals (20)

MongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & PerformanceMongoDB : Scaling, Security & Performance
MongoDB : Scaling, Security & Performance
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring data
 
Drop acid
Drop acidDrop acid
Drop acid
 
Running MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWSRunning MongoDB 3.0 on AWS
Running MongoDB 3.0 on AWS
 
MongoDB
MongoDBMongoDB
MongoDB
 
Running MongoDB on AWS
Running MongoDB on AWSRunning MongoDB on AWS
Running MongoDB on AWS
 
mongodb tutorial
mongodb tutorialmongodb tutorial
mongodb tutorial
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 
Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producción
 
MongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness PlatformMongoDB Versatility: Scaling the MapMyFitness Platform
MongoDB Versatility: Scaling the MapMyFitness Platform
 
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
Highlights of AWS ReInvent 2023 (Announcements and Best Practices)
 
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
 
Scaling MongoDB - Presentation at MTP
Scaling MongoDB - Presentation at MTPScaling MongoDB - Presentation at MTP
Scaling MongoDB - Presentation at MTP
 
Mongo db 3.4 Overview
Mongo db 3.4 OverviewMongo db 3.4 Overview
Mongo db 3.4 Overview
 
The Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb ClusterThe Care + Feeding of a Mongodb Cluster
The Care + Feeding of a Mongodb Cluster
 
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...
MongoDB.local Austin 2018: Solving Your Backup Needs Using MongoDB Ops Manage...
 
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...
MongoDB.local DC 2018: Solving Your Backup Needs Using MongoDB Ops Manager, C...
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
 

Último

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 

MongoDB Internals

  • 2. Outline • Introduction to MongoDB • Storage Layout • Data Management Features • Performance Analysis • Limitations • Conclusion • Demo • References
  • 3. What is MongoDB? • MongoDB is a NoSQL Document-Oriented database. • It provides semi-structured flexible schema. • It provides high performance, high availability, and easy scalability. • MongoDB is free and open source software. • License: GNU Affero General Public License (AGPL) and Apache License • MongoDB is a server process that runs on Linux, Windows and OS X. It can be run both as a 32 or 64-bit application.
  • 4. When to use MongoDB? “Knowing when to use a hammer, and when to use a screwdriver.” • Account and user profiles: can store arrays of addresses with ease (MetLife) • Content Management Systems (CMS): the flexible schema of MongoDB is great for heterogeneous collections of content types (MongoPress) • Form data: MongoDB makes it easy to evolve the structure of form data over time (ADP) • Blogs / user-generated content: can keep data with complex relationships together in one object (Forbes, AOL) • Messaging: vary message meta-data easily per message or message type without needing to maintain separate collections or schemas (Viber) • System configuration: just a nice object graph of configuration values, which is very natural in MongoDB (Cisco) • Log data of any kind: structured log data is the future (ebay) • Location based systems: makes use of Geospatial indices (Foursquare, City government of Chicago)
  • 5. Terminologies – RDBMS vs MongoDB *JSON – JavaScript Object Notation
  • 6. Storage Internals - Directory Layout Data Directory is found at /data/db
  • 10. To Sum Up: Internal File Format • Files on disk are broken into extents which contain the documents. • A collection has one or more extents. • Extent grow exponentially up to 2GB. • Namespace entries in the ns (namespace) file point to the first extent for that collection.
  • 12. Storage Engine - MMAP (Memory Mapped) • All data files are memory mapped to Virtual Memory by the OS. • MongoDB just reads / writes to RAM in the filesystem cache • OS takes care of the rest! • Virtual process size = total files size + overhead (connections, heap) • Uses Memory-mapped file using mmap() system call.
  • 13. Storage Engine - WiredTiger • Designed especially for Write-Intensive applications • Document level locking • Compression and Record-level locking • Multi-version concurrency control (MVCC) • Multi-document transactions • Support for Log Structured Merge (LSM) trees for very high insert workloads
  • 14. What makes MongoDB cool? • Sharding • Aggregation Framework and Map-Reduce • Capped Collection • GridFS • Geo-Spatial Indexing
  • 15. Sharding • Horizontal scaling - divides the data set and distributes the data over multiple servers, or shards. • Used to support deployments with very large data sets and high throughput operations. • Sharded Cluster Components – • Shards – mongod instance or replica sets • Config Server – Multiple mongod instances • Routing Instances – Multiple mongos instances • Shards are divided into fixed size chunks using ranges of shard key values.
  • 17. Choosing a Shard key The choice of shard key affects: • Distribution of reads and writes • Uneven distribution of reads/writes across shards. • Solution – Hashed ids • Size of chunks • Jumbo chunks cause uneven distribution of data. • Moving data between shards becomes difficult. • Solution – Multi-tenant compound index • The number of shards each query hits
  • 18. Aggregation Framework • Aggregation Pipeline • Map-Reduce • Single Purpose Aggregation Operations (deprecated in latest version)
  • 19. Aggregation Pipeline • The aggregation pipeline is a framework for performing aggregation tasks, modeled on the concept of data processing pipelines. • Using this framework, MongoDB passes the documents of a single collection through a pipeline. • The pipeline transforms the documents into aggregated results, and is accessed through the aggregate database command. • Operators: $match, $project, $unwind, $sort, $limit • User gets to choose the operator.
  • 23. Capped Collection • Fixed size collection called capped collection • Use the db.createCollection command and marked it as capped • e.g - db.createCollection(‘logs’, {capped: true, size: 2097152}) • When it reaches the size limit, old documents are automatically removed • Guarantees preservation of the insertion order • Maintains insertion order identical to the order on disk by prohibiting updates that increase document size • Allows the use of tailable cursor to retrieve documents
  • 24. GridFS • GridFS is a specification for storing and retrieving files that exceed the BSON (binary JSON) document size limit of 16MB. • Instead of storing a file in a single document, GridFS divides a file into parts, or chunks, and stores each of those chunks as a separate document. • By default GridFS limits chunk size to 255k. • GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata. • GridFS is useful not only for storing files that exceed 16MB but also for storing any files for which you want access without having to load the entire file into memory.
  • 25. GeoSpatial Indexing • To support efficient queries of geospatial coordinate data, MongoDB provides two special indexes: • 2d indexes that uses planar geometry when returning results. • 2sphere indexes that use spherical geometry to return results. • Store location data as GeoJSON objects with this coordinate-axis order: longitude, latitude. • GeoJSON Object Supported: Point, LineString, Polygon, etc. • Query Operations: Inclusion, Intersection, Proximity. • You cannot use a geospatial index as the shard key index.
  • 26. Performance Analysis • Yahoo! Cloud Serving Benchmark (YCSB) • Throughput (ops/second) WORKLOADS Cassandra Couchbase MongoDB 50% read, 50% update 134,839 106,638 160,719 95% read, 5% update 144,455 187,798 196,498 50% read, 50% update (Durability Optimized) 6,289 1,236 31,864
  • 27. Limitations • Need to have enough memory to fit your working set into memory, otherwise performance might suffer. • MapReduce and Aggregation are single-threaded. To be more specific, one per mongod. • No joins across collections. • On 32-bit, it has limitation of 2.5 Gb data. • Sharding has some unique exceptions. If you plan to shard your data, you need to shard early as some things that are feasible on a single server are not feasible on a sharded collection.
  • 28. Conclusion • MongoDB is a semi-structured document-oriented NoSQL Database. • It has two storage engines: MMAP and WiredTiger • Multiple Aggregation Frameworks: Aggregation Pipeline and Map- Reduce • Support for GridFS, GeoSpatial Indexing, Capped Collection • Better Performance as compared to Cassandra and Couchbase. • On-going work – In-memory and HDFS support
  • 29. DEMO
  • 30. References • https://www.mongodb.com/presentations/storage-engine-internals • http://docs.mongodb.org/manual/core/data-modeling-introduction/ • http://docs.mongodb.org/manual/core/aggregation-introduction/ • https://2013.nosql-matters.org/bcn/wp-content/uploads/2013/12/storage-talk- mongodb.pdf • http://info-mongodb-com.s3.amazonaws.com/High Performance Benchmark White Paper final.pdf • https://www.mongodb.com/collateral/mongodb-architecture-guide • Book - MongoDB: The Definitive Guide by Kristina Chodorow and Michael Dirolf