SlideShare una empresa de Scribd logo
1 de 32
Real Time Analytics
Chad Tindel
chad.tindel@10gen.com
The goal
Real Time
Analytics Engine
Real Time
Analytics Engine
Data
Sourc
e
Data
Sourc
e
Data
Sourc
e
Solution goals
Simple log storage
Design Pattern
Aggregation - PipelinesAggregation - Pipelines
• Aggregation requests specify a pipeline
• A pipeline is a series of operations
• Conceptually, the members of a collection
are passed through a pipeline to produce
a result
– Similar to a Unix command-line pipe
Aggregation PipelineAggregation Pipeline
Aggregation - PipelinesAggregation - Pipelines
db.collection.aggregate(
[ {$match: … },
{$group: … },
{$limit: …}, etc
]
Pipeline OperationsPipeline Operations
• $match
– Uses a query predicate (like .find({…})) as a
filter
{ $match : { author : "dave" } }
{ $match : { score : { $gt : 50, $lte : 90 } } }
Pipeline OperationsPipeline Operations
• $project
– Uses a sample document to determine the
shape of the result (similar to .find()’s 2nd
optional argument)
• Include or exclude fields
• Compute new fields
– Arithmetic expressions, including built-in functions
– Pull fields from nested documents to the top
– Push fields from the top down into new virtual documents
Pipeline OperationsPipeline Operations
• $unwind
– Hands out array elements one at a time
{ $unwind : {"$myarray" } }
• $unwind “streams” arrays
– Array values are doled out one at time in the
context of their surrounding document
– Makes it possible to filter out elements before
returning
Pipeline OperationsPipeline Operations
• $group
– Aggregates items into buckets defined by a
key
GroupingGrouping
• $group aggregation expressions
– Define a grouping key as the _id of the result
– Total grouped column values: $sum
– Average grouped column values: $avg
– Collect grouped column values in an array or
set: $push, $addToSet
– Other functions
• $min, $max, $first, $last
Pipeline OperationsPipeline Operations
• $sort
– Sort documents
– Sort specifications are the same as today,
e.g., $sort:{ key1: 1, key2: -1, …}
{ $sort : {“total”:-1} }
Pipeline OperationsPipeline Operations
• $limit
– Only allow the specified number of documents
to pass
{ $limit : 20 }
Pipeline OperationsPipeline Operations
• $skip
– Skip over the specified number of documents
{ $skip : 10 }
Computed ExpressionsComputed Expressions
• Available in $project operations
• Prefix expression language
– Add two fields: $add:[“$field1”, “$field2”]
– Provide a value for a missing field: $ifNull:
[“$field1”, “$field2”]
– Nesting: $add:[“$field1”, $ifNull:[“$field2”,
“$field3”]]
(continued)
Computed ExpressionsComputed Expressions
(continued)(continued)
• String functions
– toUpper, toLower, substr
• Date field extraction
– Get year, month, day, hour, etc, from ISODate
• Date arithmetic
• Null value substitution (like MySQL ifnull(),
Oracle nvl())
• Ternary conditional
– Return one of two values based on a predicate
• Other functions….
– And we can easily add more as required
Sample data
Original
Event
Data
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif
HTTP/1.0" 200 2326 “http://www.example.com/start.html" "Mozilla/4.08
[en] (Win98; I ;Nav)”
As JSON doc = {
_id: ObjectId('4f442120eb03305789000000'),
host: "127.0.0.1",
time: ISODate("2000-10-10T20:55:36Z"),
path: "/apache_pb.gif",
referer: “http://www.example.com/start.html",
user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)”
}
Insert to
MongoDB
db.logs.insert( doc )
Dynamic Queries
Find all
logs for
a URL
db.logs.find( { ‘path’ : ‘/index.html’ } )
Find all
logs for
a time
range
db.logs.find( { ‘time’ :
{ ‘$gte’ : new Date(2012,0),
‘$lt’ : new Date(2012,1) } } );
Find all
logs for
a host
over a
range of
dates
db.logs.find( {
‘host’ : ‘127.0.0.1’,
‘time’ : { ‘$gte’ : new Date(2012,0),
‘$lt’ : new Date(2012, 1) } } );
Aggregation Framework
Request
s per
day by
URL
db.logs.aggregate( [
{ '$match': {
'time': {
'$gte': new Date(2012,0),
'$lt': new Date(2012,1) } } },
{ '$project': {
'path': 1,
'date': {
'y': { '$year': '$time' },
'm': { '$month': '$time' },
'd': { '$dayOfMonth': '$time' } } } },
{ '$group': {
'_id': {
'p':'$path’,
'y': '$date.y',
'm': '$date.m',
'd': '$date.d' },
'hits': { '$sum': 1 } } },
])
Aggregation Framework
{
‘ok’: 1,
‘result’: [
{ '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 1 },'hits’: 124 } },
{ '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 2 },'hits’: 245} },
{ '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 3 },'hits’: 322} },
{ '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 4 },'hits’: 175} },
{ '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 5 },'hits’: 94} }
]
}
Roll-ups with map-
reduce
Design Pattern
Map Reduce – Map Phase
Generat
e hourly
rollups
from log
data
var map = function() {
var key = {
p: this.path,
d: new Date(
this.ts.getFullYear(),
this.ts.getMonth(),
this.ts.getDate(),
this.ts.getHours(),
0, 0, 0) };
emit( key, { hits: 1 } );
}
Map Reduce – Reduce Phase
Generat
e hourly
rollups
from log
data
var reduce = function(key, values) {
var r = { hits: 0 };
values.forEach(function(v) {
r.hits += v.hits;
});
return r;
}
)
Map Reduce
Generat
e hourly
rollups
from log
data
cutoff = new Date(2012,0,1)
query = { 'ts': { '$gt': last_run, '$lt': cutoff } }
db.logs.mapReduce( map, reduce, {
‘query’: query,
‘out’: { ‘reduce’ : ‘stats.hourly’ } } )
last_run = cutoff
Map Reduce Output
> db.stats.hourly.find()
{ '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 00:00:00”) },
’value': { ’hits’: 124 } },
{ '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 01:00:00”) },
’value': { ’hits’: 245} },
{ '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 02:00:00”) },
’value': { ’hits’: 322} },
{ '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 03:00:00”) },
’value': { ’hits’: 175} },
... More ...
Chained Map Reduce
Collection 1 :
Raw Logs
Collection 1 :
Raw Logs
Map
Reduce
Map
Reduce
Collection 2:
Hourly Stats
Collection 2:
Hourly Stats
Collection 3:
Daily Stats
Collection 3:
Daily Stats
Map
Reduce
Map
Reduce
Runs
every hour
Runs
every day
Pre-aggregated
documents
Design Pattern
Pre-Aggregation
Data for
URL /
Date
{
_id: "20101010/site-1/apache_pb.gif",
metadata: {
date: ISODate("2000-10-10T00:00:00Z"),
site: "site-1",
page: "/apache_pb.gif" },
daily: 5468426,
hourly: {
"0": 227850,
"1": 210231,
...
"23": 20457 },
minute: {
"0": 3612,
"1": 3241,
...
"1439": 2819 }
}
Pre-Aggregation
Data for
URL /
Date
id_daily = dt_utc.strftime('%Y%m%d/') + site + page
hour = dt_utc.hour
minute = dt_utc.minute
# Get a datetime that only includes date info
d = datetime.combine(dt_utc.date(), time.min)
query = {
'_id': id_daily,
'metadata': { 'date': d, 'site': site, 'page': page } }
update = { '$inc': {
‘daily’ : 1,
'hourly.%d' % (hour,): 1,
'minute.%d.%d' % (hour,minute): 1 } }
db.stats.daily.update(query, update, upsert=True)
Pre-Aggregation
Data for
URL /
Date
db.stats.daily.findOne(
{'metadata': {'date':dt,
'site':'site-1',
'page':'/index.html'}},
{ 'minute': 1 }
);
Solution Architect, 10gen

Más contenido relacionado

La actualidad más candente

Building Applications with MongoDB - an Introduction
Building Applications with MongoDB - an IntroductionBuilding Applications with MongoDB - an Introduction
Building Applications with MongoDB - an Introduction
MongoDB
 
Building a web application with mongo db
Building a web application with mongo dbBuilding a web application with mongo db
Building a web application with mongo db
MongoDB
 
Nosh slides mongodb web application - mongo philly 2011
Nosh slides   mongodb web application - mongo philly 2011Nosh slides   mongodb web application - mongo philly 2011
Nosh slides mongodb web application - mongo philly 2011
MongoDB
 

La actualidad más candente (20)

Building Your First MongoDB Application (Mongo Austin)
Building Your First MongoDB Application (Mongo Austin)Building Your First MongoDB Application (Mongo Austin)
Building Your First MongoDB Application (Mongo Austin)
 
Rubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for RubyRubyconfindia2018 - GPU accelerated libraries for Ruby
Rubyconfindia2018 - GPU accelerated libraries for Ruby
 
Building Applications with MongoDB - an Introduction
Building Applications with MongoDB - an IntroductionBuilding Applications with MongoDB - an Introduction
Building Applications with MongoDB - an Introduction
 
Building a web application with mongo db
Building a web application with mongo dbBuilding a web application with mongo db
Building a web application with mongo db
 
日経平均上下予想Botを作った話
日経平均上下予想Botを作った話日経平均上下予想Botを作った話
日経平均上下予想Botを作った話
 
Academy PRO: Elasticsearch Misc
Academy PRO: Elasticsearch MiscAcademy PRO: Elasticsearch Misc
Academy PRO: Elasticsearch Misc
 
The elements of a functional mindset
The elements of a functional mindsetThe elements of a functional mindset
The elements of a functional mindset
 
Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020Time Series Meetup: Virtual Edition | July 2020
Time Series Meetup: Virtual Edition | July 2020
 
Nosh slides mongodb web application - mongo philly 2011
Nosh slides   mongodb web application - mongo philly 2011Nosh slides   mongodb web application - mongo philly 2011
Nosh slides mongodb web application - mongo philly 2011
 
Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019Aerospike Nested CDTs - Meetup Dec 2019
Aerospike Nested CDTs - Meetup Dec 2019
 
Mysql 4.0 casual
Mysql 4.0 casualMysql 4.0 casual
Mysql 4.0 casual
 
JavaScript Event Loop
JavaScript Event LoopJavaScript Event Loop
JavaScript Event Loop
 
Shrug2017 arcpy data_and_you
Shrug2017 arcpy data_and_youShrug2017 arcpy data_and_you
Shrug2017 arcpy data_and_you
 
Spark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with SparkSpark 4th Meetup Londond - Building a Product with Spark
Spark 4th Meetup Londond - Building a Product with Spark
 
Query for json databases
Query for json databasesQuery for json databases
Query for json databases
 
Apache Spark - Aram Mkrtchyan
Apache Spark - Aram MkrtchyanApache Spark - Aram Mkrtchyan
Apache Spark - Aram Mkrtchyan
 
User Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love StoryUser Defined Aggregation in Apache Spark: A Love Story
User Defined Aggregation in Apache Spark: A Love Story
 
Bubble in link list
Bubble in link listBubble in link list
Bubble in link list
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 
Programs
ProgramsPrograms
Programs
 

Similar a Schema Design by Chad Tindel, Solution Architect, 10gen

1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
MongoDB
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDB
Takahiro Inoue
 
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool FeaturesMongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
ajhannan
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
Paul Chao
 
2012 mongo db_bangalore_roadmap_new
2012 mongo db_bangalore_roadmap_new2012 mongo db_bangalore_roadmap_new
2012 mongo db_bangalore_roadmap_new
MongoDB
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
Dmitry Buzdin
 

Similar a Schema Design by Chad Tindel, Solution Architect, 10gen (20)

MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 
MongoDB 3.2 - Analytics
MongoDB 3.2  - AnalyticsMongoDB 3.2  - Analytics
MongoDB 3.2 - Analytics
 
Social Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDBSocial Data and Log Analysis Using MongoDB
Social Data and Log Analysis Using MongoDB
 
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool FeaturesMongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
MongoDB Chicago - MapReduce, Geospatial, & Other Cool Features
 
Scalding big ADta
Scalding big ADtaScalding big ADta
Scalding big ADta
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab
 
2012 mongo db_bangalore_roadmap_new
2012 mongo db_bangalore_roadmap_new2012 mongo db_bangalore_roadmap_new
2012 mongo db_bangalore_roadmap_new
 
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
Optimizing InfluxDB Performance in the Real World by Dean Sheehan, Senior Dir...
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkMongoDB's New Aggregation framework
MongoDB's New Aggregation framework
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
Webinar: Index Tuning and Evaluation
Webinar: Index Tuning and EvaluationWebinar: Index Tuning and Evaluation
Webinar: Index Tuning and Evaluation
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Couchbas for dummies
Couchbas for dummiesCouchbas for dummies
Couchbas for dummies
 
Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5Unlocking Your Hadoop Data with Apache Spark and CDH5
Unlocking Your Hadoop Data with Apache Spark and CDH5
 
Hadoop london
Hadoop londonHadoop london
Hadoop london
 
Big Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilindBig Data Analytics with Hadoop with @techmilind
Big Data Analytics with Hadoop with @techmilind
 

Más de MongoDB

Más de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Schema Design by Chad Tindel, Solution Architect, 10gen

  • 1. Real Time Analytics Chad Tindel chad.tindel@10gen.com
  • 2. The goal Real Time Analytics Engine Real Time Analytics Engine Data Sourc e Data Sourc e Data Sourc e
  • 5. Aggregation - PipelinesAggregation - Pipelines • Aggregation requests specify a pipeline • A pipeline is a series of operations • Conceptually, the members of a collection are passed through a pipeline to produce a result – Similar to a Unix command-line pipe
  • 7. Aggregation - PipelinesAggregation - Pipelines db.collection.aggregate( [ {$match: … }, {$group: … }, {$limit: …}, etc ]
  • 8. Pipeline OperationsPipeline Operations • $match – Uses a query predicate (like .find({…})) as a filter { $match : { author : "dave" } } { $match : { score : { $gt : 50, $lte : 90 } } }
  • 9. Pipeline OperationsPipeline Operations • $project – Uses a sample document to determine the shape of the result (similar to .find()’s 2nd optional argument) • Include or exclude fields • Compute new fields – Arithmetic expressions, including built-in functions – Pull fields from nested documents to the top – Push fields from the top down into new virtual documents
  • 10. Pipeline OperationsPipeline Operations • $unwind – Hands out array elements one at a time { $unwind : {"$myarray" } } • $unwind “streams” arrays – Array values are doled out one at time in the context of their surrounding document – Makes it possible to filter out elements before returning
  • 11. Pipeline OperationsPipeline Operations • $group – Aggregates items into buckets defined by a key
  • 12. GroupingGrouping • $group aggregation expressions – Define a grouping key as the _id of the result – Total grouped column values: $sum – Average grouped column values: $avg – Collect grouped column values in an array or set: $push, $addToSet – Other functions • $min, $max, $first, $last
  • 13. Pipeline OperationsPipeline Operations • $sort – Sort documents – Sort specifications are the same as today, e.g., $sort:{ key1: 1, key2: -1, …} { $sort : {“total”:-1} }
  • 14. Pipeline OperationsPipeline Operations • $limit – Only allow the specified number of documents to pass { $limit : 20 }
  • 15. Pipeline OperationsPipeline Operations • $skip – Skip over the specified number of documents { $skip : 10 }
  • 16. Computed ExpressionsComputed Expressions • Available in $project operations • Prefix expression language – Add two fields: $add:[“$field1”, “$field2”] – Provide a value for a missing field: $ifNull: [“$field1”, “$field2”] – Nesting: $add:[“$field1”, $ifNull:[“$field2”, “$field3”]] (continued)
  • 17. Computed ExpressionsComputed Expressions (continued)(continued) • String functions – toUpper, toLower, substr • Date field extraction – Get year, month, day, hour, etc, from ISODate • Date arithmetic • Null value substitution (like MySQL ifnull(), Oracle nvl()) • Ternary conditional – Return one of two values based on a predicate • Other functions…. – And we can easily add more as required
  • 18. Sample data Original Event Data 127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 “http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)” As JSON doc = { _id: ObjectId('4f442120eb03305789000000'), host: "127.0.0.1", time: ISODate("2000-10-10T20:55:36Z"), path: "/apache_pb.gif", referer: “http://www.example.com/start.html", user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)” } Insert to MongoDB db.logs.insert( doc )
  • 19. Dynamic Queries Find all logs for a URL db.logs.find( { ‘path’ : ‘/index.html’ } ) Find all logs for a time range db.logs.find( { ‘time’ : { ‘$gte’ : new Date(2012,0), ‘$lt’ : new Date(2012,1) } } ); Find all logs for a host over a range of dates db.logs.find( { ‘host’ : ‘127.0.0.1’, ‘time’ : { ‘$gte’ : new Date(2012,0), ‘$lt’ : new Date(2012, 1) } } );
  • 20. Aggregation Framework Request s per day by URL db.logs.aggregate( [ { '$match': { 'time': { '$gte': new Date(2012,0), '$lt': new Date(2012,1) } } }, { '$project': { 'path': 1, 'date': { 'y': { '$year': '$time' }, 'm': { '$month': '$time' }, 'd': { '$dayOfMonth': '$time' } } } }, { '$group': { '_id': { 'p':'$path’, 'y': '$date.y', 'm': '$date.m', 'd': '$date.d' }, 'hits': { '$sum': 1 } } }, ])
  • 21. Aggregation Framework { ‘ok’: 1, ‘result’: [ { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 1 },'hits’: 124 } }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 2 },'hits’: 245} }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 3 },'hits’: 322} }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 4 },'hits’: 175} }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 5 },'hits’: 94} } ] }
  • 23. Map Reduce – Map Phase Generat e hourly rollups from log data var map = function() { var key = { p: this.path, d: new Date( this.ts.getFullYear(), this.ts.getMonth(), this.ts.getDate(), this.ts.getHours(), 0, 0, 0) }; emit( key, { hits: 1 } ); }
  • 24. Map Reduce – Reduce Phase Generat e hourly rollups from log data var reduce = function(key, values) { var r = { hits: 0 }; values.forEach(function(v) { r.hits += v.hits; }); return r; } )
  • 25. Map Reduce Generat e hourly rollups from log data cutoff = new Date(2012,0,1) query = { 'ts': { '$gt': last_run, '$lt': cutoff } } db.logs.mapReduce( map, reduce, { ‘query’: query, ‘out’: { ‘reduce’ : ‘stats.hourly’ } } ) last_run = cutoff
  • 26. Map Reduce Output > db.stats.hourly.find() { '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 00:00:00”) }, ’value': { ’hits’: 124 } }, { '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 01:00:00”) }, ’value': { ’hits’: 245} }, { '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 02:00:00”) }, ’value': { ’hits’: 322} }, { '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 03:00:00”) }, ’value': { ’hits’: 175} }, ... More ...
  • 27. Chained Map Reduce Collection 1 : Raw Logs Collection 1 : Raw Logs Map Reduce Map Reduce Collection 2: Hourly Stats Collection 2: Hourly Stats Collection 3: Daily Stats Collection 3: Daily Stats Map Reduce Map Reduce Runs every hour Runs every day
  • 29. Pre-Aggregation Data for URL / Date { _id: "20101010/site-1/apache_pb.gif", metadata: { date: ISODate("2000-10-10T00:00:00Z"), site: "site-1", page: "/apache_pb.gif" }, daily: 5468426, hourly: { "0": 227850, "1": 210231, ... "23": 20457 }, minute: { "0": 3612, "1": 3241, ... "1439": 2819 } }
  • 30. Pre-Aggregation Data for URL / Date id_daily = dt_utc.strftime('%Y%m%d/') + site + page hour = dt_utc.hour minute = dt_utc.minute # Get a datetime that only includes date info d = datetime.combine(dt_utc.date(), time.min) query = { '_id': id_daily, 'metadata': { 'date': d, 'site': site, 'page': page } } update = { '$inc': { ‘daily’ : 1, 'hourly.%d' % (hour,): 1, 'minute.%d.%d' % (hour,minute): 1 } } db.stats.daily.update(query, update, upsert=True)
  • 31. Pre-Aggregation Data for URL / Date db.stats.daily.findOne( {'metadata': {'date':dt, 'site':'site-1', 'page':'/index.html'}}, { 'minute': 1 } );