2. MongoDB features and Architecture
Business Intelligence: Overview
Model of Data using SQL vs NoSQL
Concept of MapReduce
Real-world use-case of Business Analytics
Scalability
Conclusions
7. Business + smart information =
Business Intelligence
Consists of
querying,
reporting, and
analytics for
businesses
Enable
business to
make smart
decision to
execute
8. Modelization of Data in SQL
A 1-many relation of node (id, value) with
other nodes related by two different relations
Node Relation
id id_node1
value
id_node2
9. NoSQL Modelization mapped on Relational
Database Modelization
Node Relation
c c
id _id
value
value 1
value2
10. Using Complex Type Attributes to Model data
Nodes
_id
valued
relations []
value 1
value 2
…
value n
11. No join operation required
Nesting of documents, resulting in
instantaneous access to retrieve nodes/
Supporting agile method of programming
Schema flexible adaptive to changing
business needs
12. MapReduce
◦ Programming model for managing large amount of
data in a parallel fashion
◦ Map : Processing of a data list to create key/value
pairs
◦ Reduce: Porcess above pair to create new
aggregated key/value pairs
15. Problem
The task is to find the 2 closest
cities in each country
Given that earth as 2D plane
The distance b/w P1 (x1,y1) and
P2 (x2,y2) is computed as
Square-Root of { (x1-x2)2 + (y1-
y2)2 }
18. create view city_dist as
select c1.CountryID,
c1.CityId, c1.City,
c2.CityId as CityId2, c2.City as City2, dist(c1,c2) as
Dist from cities c1 inner join cities c2
where c1.CountryID = c2.CountryID
and c1.CityId < c2.CityId
/* Calculate distance between 2 cities only once */
19. select city_dist.* from
( select CountryID, min(Dist) as MinDist from
city_dist where Dist > 0 /* Avoid cities which
share Latitude & Longitude */ group by
CountryID ) a inner join city_dist on
a.CountryID = city_dist.CountryID and
a.MinDist = city_dist.Dist;
24. function ReduceCode(key, values) {
var reduced = {"data":[]};
for (var i in values) {
var inter = values[i];
for (var j in inter.data) {
reduced.data.push(inter.data[j]);
}
} return reduced;
25. function Finalize(key, reduced) {
if (reduced.data.length == 1) { return { "message" : "This Country contains only 1 City" };
}
var min_dist = 999999999999;
var city1 = { "name": "" };
var city2 = { "name": "" };
var c1; var c2; var d;
for (var i in reduced.data) {
for (var j in reduced.data) {
if (i>=j) continue;
c1 = reduced.data[i];
c2 = reduced.data[j];
d = Dist(c1,c2)
if (d < min_dist && d > 0) {
min_dist = d; city1 = c1; city2 = c2;
}
}
}
return {"city1": city1.name, "city2": city2.name, "dist": min_dist};
}
26. Shard!
For write intensive, increase number of
shards
For read intensive, increase number of
replica-sets within shards
Best Read performance : Data in Shard
breadth in memory
27.
28. MongoDB has application well-suited for BI
Schemaless Model allows high-performance
Replication with Sharding allows Scaling out
29. http://www.mongodb.org/
http://www.jaspersoft.com/
R. Cattell. Scalable SQL and NoSQL Data
Stores.http://www.cattell.net/datastores/Dat
astores.pdf
C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. R.
Bradski, A. Y.Ng, and K. Olukotun. Map-
reduce for machine learning on multicore. In
NIPS, pages 281–288, 2006.
http://www.mongovue.com/2010/11/03/yet
-another-mongodb-map-reduce-tutorial/