3. NoSQL Features
Flexible Data Models
• Lists, embedded
objects
• Sparse data
• Semi-structured data
• Agile development
High Data Throughput
• Reads
• Writes
Big Data
• Aggregate Data Size
• Number of Objects
Low Latency
• For reads and writes
• Millisecond Latency
Cloud Computing
• Runs everywhere
• No special hardware
Commodity Hardware
• Ethernet
• Local data storage
• JSON Based
• Dynamic
Schemas
• Replica Sets to
scale reads
• Sharding to
scale writes
• 1000s of shards
in a single DB
• Data partitioning
• Designed for
“typical” OS and
local file system
• Scale-out to
overcome
hardware
limitations
• In-memory
cache
• Scale-out
working set
5. High Volume Data Feeds
• More machine forms, sensors & data
• Variably structured
Machine
Generated
Data
• High frequency trading
• Daily closing priceSecurities Data
• Multiple data sources
• Each changes their format consistently
• Student Scores, ISP logs
Social Media /
General Public
6. High Volume Data Feeds
Data
Sources
Asynchronous Writes
Flexible document
model can adapt to
changes in sensor
format
Write to memory with
periodic disk flush
Data
Sources
Data
Sources
Data
Sources
Scale writes over
multiple shards
7. Operational Intelligence
• Large volume of users
• Very strict latency requirements
• Sentiment Analysis
Ad Targeting
• Expose data to millions of customers
• Reports on large volumes of data
• Reports that update in real time
Real time
dashboards
• Join the conversation
• Catered Games
• Customized Surveys
Social Media
Monitoring
8. Operational Intelligence
Dashboards
API
Low latency reads
Parallelize queries
across replicas and
shards
In database
aggregation
Flexible schema
adapts to changing
input data
Can use same
cluster to
collect, store and
report on data
9. { cookie_id: ‚1234512413243‛,
advertiser:{
apple: {
actions: [
{ impression: ‘ad1’, time: 123 },
{ impression: ‘ad2’, time: 232 },
{ click: ‘ad2’, time: 235 },
{ add_to_cart: ‘laptop’,
sku: ‘asdf23f’,
time: 254 },
{ purchase: ‘laptop’, time: 354 }
] …
Behavioural Profiles
1
2
3
See Ad
See Ad
4
Click
Convert
Rich profiles
collecting multiple
complex actions
Scale out to support
high throughput of
activities tracked
Dynamic schemas
make it easy to
10. Metadata
• Diverse product portfolio
• Complex querying and filtering
• Multi-faceted product attributes
Product
Catalogue
• Data mining
• Call records
• Insurance Claims
Data analysis
• Retina Scans
• FingerprintsBiometric
11. Metadata
{ ISBN: ‚00e8da9b‛,
type: ‚Book‛,
country: ‚Egypt‛,
title: ‚Ancient Egypt‛
}
{ type: ‚Artifact‛,
medium: ‚Ceramic‛,
country: ‚Egypt‛,
year: ‚3000 BC‛
}
Flexible data model
for similar but
different objects
Indexing and rich query
API for easy searching
and sorting
db.archives.
find({ ‚country”: ‚Egypt‛ });
db.archives.
find({key:‚type”, value:‚Artifact‛});
Indexing techniques
that fit your data
modeling
12. Content Management
• Comments and user generated
content
• Personalization of content and layout
News Site
• Generate layout on the fly
• No need to cache static pages
Multi-device
rendering
• Store large objects
• Simpler modeling of metadataSharing
13. Content Management
{ camera: ‚Nikon d4‛,
location: [ -122.418333, 37.775 ]
}
{ camera: ‚Canon 5d mkII‛,
people: [ ‚Jim‛, ‚Carol‛ ],
taken_on: ISODate("2012-03-07T18:32:35.002Z")
}
{ origin: ‚facebook.com/photos/xwdf23fsdf‛,
license: ‚Creative Commons CC0‛,
size: {
dimensions: [ 124, 52 ],
units: ‚pixels‛
}
}
Flexible data model
for similar but
different objects
Horizontal scalability
for large data sets
Geo spatial indexing
for location-based
searches
GridFS for large
object storage
14. Is MongoDB a good fit for my
use case?
Is there an Ideal use case?
15. Application Why MongoDB Might be a good fit
Large number of objects to
store
Sharding lets you split objects across
multiple servers
High write / read throughput
and data distribution
Sharding + Replication lets you scale read
and write traffic across multiple servers,
multiple tenants, or data centers
Low latency access Memory mapped storage engine caches
documents in RAM, enabling in-memory
operations. Data locality of documents
significantly improves latency over join-
based approaches
Variable data in objects Dynamic schema and JSON data model
enable flexible data storage without sparse
tables or complex joins, and provide for an
intuitive query language
Cloud based deployment Sharding and replication let you work
around hardware limitations in the cloud.