SlideShare una empresa de Scribd logo
1 de 63
Codemotion Milano 2013

Data Processing and
Aggregation
Massimo Brignoli
Solutions Architect, MongoDB Inc.

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Who Am I?
• Solutions Architect/Evangelist in MongoDB Inc.
• 20 years of experience in databases
• Former MySQL employee

• Previous life: web, web, web

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Big Data

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
What is Big Data?
• Big Data is like teenage sex:
• everyone talks about it
• nobody really knows how to do it

• everyone thinks everyone else is doing it
• so everyone claims they are doing it…

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Understanding Big Data – It’s Not Very “Big”

64% - Ingest diverse,
new data in real-time

15% - More than 100TB
of data
20% - Less than 100TB
(average of all? <20TB)
from Big Data Executive Summary – 50+ top executives from Government and F500 firms
For over a decade

Big Data == Custom
Software

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Lots of Great Innovations Since 1970
Including the Relational Database
RDBMS Makes Development Hard

Code

XML Config

DB Schema

Application

Object Relational
Mapping

Relational
Database
And Even Harder To Iterate
New
Table

New
Column

New
Table
Name

Pet

Phone

New
Column

3 months later…

Email
From Complexity to Simplicity
MongoDB

RDBMS

{

_id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin",
department : "Marketing",
title : "Product Manager, Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{

type :

"Health",

plan : "PPO Plus" },
{

type :

"Dental",

plan : "Standard" }
]
}
In the past few years
Open source software has
emerged enabling the rest of
us to handle Big Data

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Use Popular, Well-Known Technologies

Source: Silicon Angle, 2012
Enterprise Big Data Stack

CRM, ERP, Collaboration, Mobile, BI

Data Management
Online Data
RDBMS
RDBMS

Offline Data
Hadoop

Infrastructure
OS & Virtualization, Compute, Storage, Network

EDW

Security & Auditing

Management & Monitoring

Applications
Consideration – Online vs. Offline
Online

• Real-time
• Low-latency
• High availability

vs.

Offline

• Long-running
• High-Latency
• Availability is lower priority
How MongoDB Meets Our
Requirements
• MongoDB is an operational database
• MongoDB provides high performance for storage

and retrieval at large scale
• MongoDB has a robust query interface permitting

intelligent operations
• MongoDB is not a data processing engine, but

provides processing functionality

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
MongoDB data processing options
http://www.flickr.com/photos/torek/4444673930/ http://createivecommons.org/licenses/by-nc-sa/3.0/
Except where otherwise noted, this work is licensed under
Getting Example Data

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
The “hello world” of
MapReduce is counting words
in a paragraph of text.
Let’s try something a little
more interesting…

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
What is the most popular pub
name?

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Open Street Map Data
#!/usr/bin/env python
# Data Source
# http://www.overpass-api.de/api/xapi?*[amenity=pub][bbox=-10.5,49.78,1.78,59]
import re
import sys
from imposm.parser import OSMParser
import pymongo
class Handler(object):
def nodes(self, nodes):
if not nodes:
return
docs = []
for node in nodes:
osm_id, doc, (lon, lat) = node
if "name" not in doc:
node_points[osm_id] = (lon, lat)
continue
doc["name"] = doc["name"].title().lstrip("The ").replace("And", "&")
doc["_id"] = osm_id
doc["location"] = {"type": "Point", "coordinates": [lon, lat]}
docs.append(doc)
collection.insert(docs)

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Example Pub Data
{
"_id" : 451152,
"amenity" : "pub",
"name" : "The Dignity",
"addr:housenumber" : "363",
"addr:street" : "Regents Park Road",
"addr:city" : "London",
"addr:postcode" : "N3 1DH",
"toilets" : "yes",
"toilets:access" : "customers",
"location" : {
"type" : "Point",
"coordinates" : [-0.1945732, 51.6008172]
}
}

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
MapReduce

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
MongoDB MapReduce
•

map
MongoDB

reduce
finalize

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
map

Map Function
MongoDB

reduce

> var map = function() {
finalize

emit(this.name, 1);

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
map

Reduce Function
MongoDB

reduce

> var reduce = function (key, values) {
finalize

var sum = 0;
values.forEach( function (val) {sum += val;} );
return sum;
}

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Results
> db.pub_names.find().sort({value: -1}).limit(10)
{ "_id" : "The Red Lion", "value" : 407 }
{ "_id" : "The Royal Oak", "value" : 328 }
{ "_id" : "The Crown", "value" : 242 }
{ "_id" : "The White Hart", "value" : 214 }
{ "_id" : "The White Horse", "value" : 200 }
{ "_id" : "The New Inn", "value" : 187 }
{ "_id" : "The Plough", "value" : 185 }
{ "_id" : "The Rose & Crown", "value" : 164 }
{ "_id" : "The Wheatsheaf", "value" : 147 }
{ "_id" : "The Swan", "value" : 140 }

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Pub Names in the Center of London
> db.pubs.mapReduce(map, reduce, { out: "pub_names",
query: {
location: {
$within: { $centerSphere: [[-0.12, 51.516], 2 / 3959] }
}}
})

{
"result" : "pub_names",
"timeMillis" : 116,
"counts" : {
"input" : 643,
"emit" : 643,
"reduce" : 54,
"output" : 537
},
"ok" : 1,
}
Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Results
> db.pub_names.find().sort({value: -1}).limit(10)
{
{
{
{
{
{
{
{
{
{

"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"

:
:
:
:
:
:
:
:
:
:

"All Bar One", "value" : 11 }
"The Slug & Lettuce", "value" : 7 }
"The Coach & Horses", "value" : 6 }
"The Green Man", "value" : 5 }
"The Kings Arms", "value" : 5 }
"The Red Lion", "value" : 5 }
"Corney & Barrow", "value" : 4 }
"O'Neills", "value" : 4 }
"Pitcher & Piano", "value" : 4 }
"The Crown", "value" : 4 }

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
MongoDB MapReduce
• Real-time
• Output directly to document or collection
• Runs inside MongoDB on local data

− Adds load to your DB
− In Javascript – debugging can be a challenge
− Translating in and out of C++

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Aggregation Framework

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Aggregation Framework
•

op1
MongoDB

op2

opN
Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Aggregation Framework in 60
Seconds

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Aggregation Framework Operators
• $project
• $match
• $limit

• $skip
• $sort
• $unwind
• $group

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
$match
• Filter documents
• Uses existing query syntax
• If using $geoNear it has to be first in pipeline

• $where is not supported

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Matching Field Values
{

"_id" : 271421,
"amenity" : "pub",
"name" : "Sir Walter Tyrrell",
"location" : {
"type" : "Point",
"coordinates" : [
-1.6192422,
50.9131996
]
}
}

{ "$match": {

"name": "The Red Lion"
}}

{

"_id" : 271466,
"amenity" : "pub",
"name" : "The Red Lion",
"location" : {
"type" : "Point",
"coordinates" : [
-1.5494749,
50.7837119
]}

{
"_id" : 271466,
"amenity" : "pub",
"name" : "The Red Lion",
"location" : {
"type" : "Point",
"coordinates" : [
-1.5494749,
50.7837119
]
}

}

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
$project
• Reshape documents
• Include, exclude or rename fields
• Inject computed fields

• Create sub-document fields

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Including and Excluding Fields
{ “$project”: {

{

"_id" : 271466,

"name" : "The Red Lion",

“_id”: 0,
“amenity”: 1,
“name”: 1,

"location" : {

}}

"amenity" : "pub",

"type" : "Point",
"coordinates" : [
-1.5494749,
50.7837119
]
}

{
“amenity” : “pub”,
“name” : “The Red Lion”
}

}
Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Reformatting Documents
{ “$project”: {

{

"_id" : 271466,
"amenity" : "pub",
"name" : "The Red Lion",
"location" : {

“_id”: 0,
“name”: 1,
“meta”: {
“type”: “$amenity”}
}}

"type" : "Point",
"coordinates" : [
-1.5494749,
50.7837119
]
}
}

{
“name” : “The Red Lion”
“meta” : {
“type” : “pub”
}}

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Dealing with Arrays
{ “$project”: {

{

"_id" : 271466,
"amenity" : "pub",
"name" : "The Red Lion",
"facilities" : [

"toilets",

“_id”: 0,
“name”: 1,
“meta”: {
“type”: “$amenity”}
}}
{"$unwind": "$facility"}

"food"
],
}

{ "name" : "The Red Lion",
"facility" : "toilets" },
{ "name" : "The Red Lion",
"facility" : "food" }

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
$group
• Group documents by an ID
• Field reference, object, constant
• Other output fields are computed

$max, $min, $avg, $sum
$addToSet, $push $first, $last
• Processes all data in memory

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Back to the pub!

•

http://www.offwestend.com/index.php/theatres/pastshows/71

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Popular Pub Names
>var popular_pub_names = [
{ $match : location:
{ $within: { $centerSphere:
[[-0.12, 51.516], 2 / 3959]}}}
},
{ $group :
{ _id: “$name”
value: {$sum: 1} }
},
{ $sort : {value: -1} },
{ $limit : 10 }

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Results
> db.pubs.aggregate(popular_pub_names)
{
"result" : [
{ "_id" : "All Bar One", "value" : 11 }
{ "_id" : "The Slug & Lettuce", "value" : 7 }
{ "_id" : "The Coach & Horses", "value" : 6 }
{ "_id" : "The Green Man", "value" : 5 }
{ "_id" : "The Kings Arms", "value" : 5 }
{ "_id" : "The Red Lion", "value" : 5 }
{ "_id" : "Corney & Barrow", "value" : 4 }
{ "_id" : "O'Neills", "value" : 4 }
{ "_id" : "Pitcher & Piano", "value" : 4 }
{ "_id" : "The Crown", "value" : 4 }
],
"ok" : 1
}
Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Aggregation Framework Benefits
• Real-time
• Simple yet powerful interface
• Declared in JSON, executes in C++

• Runs inside MongoDB on local data

− Adds load to your DB
− Limited Operators
− Data output is limited

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Analyzing MongoDB Data in
External Systems

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
MongoDB with Hadoop
•

MongoDB

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
MongoDB with Hadoop
•

MongoDB

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/

warehouse
MongoDB with Hadoop
•

ETL

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/

MongoDB
Map Pub Names in Python
#!/usr/bin/env python
from pymongo_hadoop import BSONMapper
def mapper(documents):
bounds = get_bounds() # ~2 mile polygon
for doc in documents:
geo = get_geo(doc["location"]) # Convert the geo type
if not geo:
continue
if bounds.intersects(geo):
yield {'_id': doc['name'], 'count': 1}

BSONMapper(mapper)
print >> sys.stderr, "Done Mapping."

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Reduce Pub Names in Python
#!/usr/bin/env python

from pymongo_hadoop import BSONReducer

def reducer(key, values):
_count = 0
for v in values:
_count += v['count']
return {'_id': key, 'value': _count}

BSONReducer(reducer)
Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Execute MapReduce
hadoop jar target/mongo-hadoop-streaming-assembly-1.1.0-rc0.jar 
-mapper examples/pub/map.py 
-reducer examples/pub/reduce.py 
-mongo mongodb://127.0.0.1/demo.pubs 
-outputURI mongodb://127.0.0.1/demo.pub_names

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Popular Pub Names Nearby
> db.pub_names.find().sort({value: -1}).limit(10)
{
{
{
{
{
{
{
{
{
{

"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"
"_id"

:
:
:
:
:
:
:
:
:
:

"All Bar One", "value" : 11 }
"The Slug & Lettuce", "value" : 7 }
"The Coach & Horses", "value" : 6 }
"The Kings Arms", "value" : 5 }
"Corney & Barrow", "value" : 4 }
"O'Neills", "value" : 4 }
"Pitcher & Piano", "value" : 4 }
"The Crown", "value" : 4 }
"The George", "value" : 4 }
"The Green Man", "value" : 4 }

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
MongoDB and Hadoop
• Away from data store
• Can leverage existing data processing infrastructure
• Can horizontally scale your data processing
- Offline batch processing
- Requires synchronisation between store &

processor
- Infrastructure is much more complex

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
The Future of Big Data and
MongoDB

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
What is Big Data?
Big Data today will be
normal tomorrow

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Exponential Data Growth
Billions of URLs indexed by Google
10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
2000

2002

2004

2006

2008

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/

2010

2012
MongoDB enables you to
scale big

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
MongoDB is evolving

so you can process the
big

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Data Processing with MongoDB
• Process in MongoDB using Map/Reduce
• Process in MongoDB using Aggregation

Framework
• Process outside MongoDB using Hadoop and

other external tools

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Questions?

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
Codemotion Milano

Thanks!
Massimo Brignoli
Solutions Architect, MongoDB Inc.

Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/

Más contenido relacionado

La actualidad más candente

Web Scraper Shibuya.pm tech talk #8
Web Scraper Shibuya.pm tech talk #8Web Scraper Shibuya.pm tech talk #8
Web Scraper Shibuya.pm tech talk #8Tatsuhiko Miyagawa
 
Webinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaWebinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaMongoDB
 
Learn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBLearn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBMarakana Inc.
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBMongoDB
 
Apache CouchDB talk at Ontario GNU Linux Fest
Apache CouchDB talk at Ontario GNU Linux FestApache CouchDB talk at Ontario GNU Linux Fest
Apache CouchDB talk at Ontario GNU Linux FestMyles Braithwaite
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBBradley Holt
 
Java development with MongoDB
Java development with MongoDBJava development with MongoDB
Java development with MongoDBJames Williams
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in DocumentsMongoDB
 
Honing headers for highly hardened highspeed hypertext
Honing headers for highly hardened highspeed hypertextHoning headers for highly hardened highspeed hypertext
Honing headers for highly hardened highspeed hypertextFastly
 
Going on an HTTP Diet: Front-End Web Performance
Going on an HTTP Diet: Front-End Web PerformanceGoing on an HTTP Diet: Front-End Web Performance
Going on an HTTP Diet: Front-End Web PerformanceAdam Norwood
 
Emerging threats jonkman_sans_cti_summit_2015
Emerging threats jonkman_sans_cti_summit_2015Emerging threats jonkman_sans_cti_summit_2015
Emerging threats jonkman_sans_cti_summit_2015Emerging Threats
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB
 
How to make Ajax work for you
How to make Ajax work for youHow to make Ajax work for you
How to make Ajax work for youSimon Willison
 
Active Https Cookie Stealing
Active Https Cookie StealingActive Https Cookie Stealing
Active Https Cookie StealingSecurityTube.Net
 
MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know Norberto Leite
 
My First Cluster with MongoDB Atlas
My First Cluster with MongoDB AtlasMy First Cluster with MongoDB Atlas
My First Cluster with MongoDB AtlasJay Gordon
 

La actualidad más candente (19)

Analyse Yourself
Analyse YourselfAnalyse Yourself
Analyse Yourself
 
Web Scraper Shibuya.pm tech talk #8
Web Scraper Shibuya.pm tech talk #8Web Scraper Shibuya.pm tech talk #8
Web Scraper Shibuya.pm tech talk #8
 
Webinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and JavaWebinar: Building Your First App with MongoDB and Java
Webinar: Building Your First App with MongoDB and Java
 
Learn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDBLearn Learn how to build your mobile back-end with MongoDB
Learn Learn how to build your mobile back-end with MongoDB
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
 
Apache CouchDB talk at Ontario GNU Linux Fest
Apache CouchDB talk at Ontario GNU Linux FestApache CouchDB talk at Ontario GNU Linux Fest
Apache CouchDB talk at Ontario GNU Linux Fest
 
OSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDBOSCON 2011 Learning CouchDB
OSCON 2011 Learning CouchDB
 
Nodejs meetup-12-2-2015
Nodejs meetup-12-2-2015Nodejs meetup-12-2-2015
Nodejs meetup-12-2-2015
 
Java development with MongoDB
Java development with MongoDBJava development with MongoDB
Java development with MongoDB
 
Back to Basics Webinar 3: Schema Design Thinking in Documents
 Back to Basics Webinar 3: Schema Design Thinking in Documents Back to Basics Webinar 3: Schema Design Thinking in Documents
Back to Basics Webinar 3: Schema Design Thinking in Documents
 
Honing headers for highly hardened highspeed hypertext
Honing headers for highly hardened highspeed hypertextHoning headers for highly hardened highspeed hypertext
Honing headers for highly hardened highspeed hypertext
 
Going on an HTTP Diet: Front-End Web Performance
Going on an HTTP Diet: Front-End Web PerformanceGoing on an HTTP Diet: Front-End Web Performance
Going on an HTTP Diet: Front-End Web Performance
 
Emerging threats jonkman_sans_cti_summit_2015
Emerging threats jonkman_sans_cti_summit_2015Emerging threats jonkman_sans_cti_summit_2015
Emerging threats jonkman_sans_cti_summit_2015
 
MongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDBMongoDB Europe 2016 - Graph Operations with MongoDB
MongoDB Europe 2016 - Graph Operations with MongoDB
 
How to make Ajax work for you
How to make Ajax work for youHow to make Ajax work for you
How to make Ajax work for you
 
Active Https Cookie Stealing
Active Https Cookie StealingActive Https Cookie Stealing
Active Https Cookie Stealing
 
MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know MongoDB + Java - Everything you need to know
MongoDB + Java - Everything you need to know
 
My First Cluster with MongoDB Atlas
My First Cluster with MongoDB AtlasMy First Cluster with MongoDB Atlas
My First Cluster with MongoDB Atlas
 
Python and MongoDB
Python and MongoDBPython and MongoDB
Python and MongoDB
 

Destacado

Lambda Architecture in Practice
Lambda Architecture in PracticeLambda Architecture in Practice
Lambda Architecture in PracticeNavneet kumar
 
MongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data LakeMongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data LakeMongoDB
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionSteve Loughran
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeMongoDB
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneMongoDB
 
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...MongoDB
 
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big dataTrieu Nguyen
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionGuido Schmutz
 

Destacado (12)

Lambda Architecture in Practice
Lambda Architecture in PracticeLambda Architecture in Practice
Lambda Architecture in Practice
 
MongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data LakeMongoDB Europe 2016 - The Rise of the Data Lake
MongoDB Europe 2016 - The Rise of the Data Lake
 
My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 edition
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova Generazione
 
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
 
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big Compute
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
Big Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in ActionBig Data and Fast Data - Lambda Architecture in Action
Big Data and Fast Data - Lambda Architecture in Action
 

Similar a Past, Present and Future of Data Processing in Apache Hadoop

Mongo db first steps with csharp
Mongo db first steps with csharpMongo db first steps with csharp
Mongo db first steps with csharpSerdar Buyuktemiz
 
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Codemotion
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB
 
Practical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.jsPractical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.jsasync_io
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB at Gilt Groupe
MongoDB at Gilt GroupeMongoDB at Gilt Groupe
MongoDB at Gilt GroupeMongoDB
 
MongoDB and Ruby on Rails
MongoDB and Ruby on RailsMongoDB and Ruby on Rails
MongoDB and Ruby on Railsrfischer20
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...Antonios Giannopoulos
 
Accra MongoDB User Group
Accra MongoDB User GroupAccra MongoDB User Group
Accra MongoDB User GroupMongoDB
 
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101MongoDB
 
OrientDB for real & Web App development
OrientDB for real & Web App developmentOrientDB for real & Web App development
OrientDB for real & Web App developmentLuca Garulli
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...MongoDB
 
Back to Basics Webinar 2 - Your First MongoDB Application
Back to  Basics Webinar 2 - Your First MongoDB ApplicationBack to  Basics Webinar 2 - Your First MongoDB Application
Back to Basics Webinar 2 - Your First MongoDB ApplicationJoe Drumgoole
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationMongoDB
 
Mongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-finalMongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-finalMongoDB
 
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPSimpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPDaniel Zivkovic
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisMongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisJason Terpko
 
Getting started with MongoDB and Scala - Open Source Bridge 2012
Getting started with MongoDB and Scala - Open Source Bridge 2012Getting started with MongoDB and Scala - Open Source Bridge 2012
Getting started with MongoDB and Scala - Open Source Bridge 2012sullis
 

Similar a Past, Present and Future of Data Processing in Apache Hadoop (20)

Mongo db first steps with csharp
Mongo db first steps with csharpMongo db first steps with csharp
Mongo db first steps with csharp
 
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
Perchè potresti aver bisogno di un database NoSQL anche se non sei Google o F...
 
MongoDB Tick Data Presentation
MongoDB Tick Data PresentationMongoDB Tick Data Presentation
MongoDB Tick Data Presentation
 
MongoDB
MongoDBMongoDB
MongoDB
 
Practical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.jsPractical Use of MongoDB for Node.js
Practical Use of MongoDB for Node.js
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB at Gilt Groupe
MongoDB at Gilt GroupeMongoDB at Gilt Groupe
MongoDB at Gilt Groupe
 
MongoDB and Ruby on Rails
MongoDB and Ruby on RailsMongoDB and Ruby on Rails
MongoDB and Ruby on Rails
 
How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...How sitecore depends on mongo db for scalability and performance, and what it...
How sitecore depends on mongo db for scalability and performance, and what it...
 
Accra MongoDB User Group
Accra MongoDB User GroupAccra MongoDB User Group
Accra MongoDB User Group
 
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
MongoDB Days Silicon Valley: Jumpstart: Ops/Admin 101
 
OrientDB for real & Web App development
OrientDB for real & Web App developmentOrientDB for real & Web App development
OrientDB for real & Web App development
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
 
Back to Basics Webinar 2 - Your First MongoDB Application
Back to  Basics Webinar 2 - Your First MongoDB ApplicationBack to  Basics Webinar 2 - Your First MongoDB Application
Back to Basics Webinar 2 - Your First MongoDB Application
 
Back to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB ApplicationBack to Basics Webinar 2: Your First MongoDB Application
Back to Basics Webinar 2: Your First MongoDB Application
 
Mongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-finalMongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-final
 
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCPSimpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
Simpler, faster, cheaper Enterprise Apps using only Spring Boot on GCP
 
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to RedisMongoDB: Comparing WiredTiger In-Memory Engine to Redis
MongoDB: Comparing WiredTiger In-Memory Engine to Redis
 
Getting started with MongoDB and Scala - Open Source Bridge 2012
Getting started with MongoDB and Scala - Open Source Bridge 2012Getting started with MongoDB and Scala - Open Source Bridge 2012
Getting started with MongoDB and Scala - Open Source Bridge 2012
 

Más de Codemotion

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyCodemotion
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaCodemotion
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserCodemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 - Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Codemotion
 

Más de Codemotion (20)

Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
 
Pompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending storyPompili - From hero to_zero: The FatalNoise neverending story
Pompili - From hero to_zero: The FatalNoise neverending story
 
Pastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storiaPastore - Commodore 65 - La storia
Pastore - Commodore 65 - La storia
 
Pennisi - Essere Richard Altwasser
Pennisi - Essere Richard AltwasserPennisi - Essere Richard Altwasser
Pennisi - Essere Richard Altwasser
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 - Francesco Baldassarri  - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
 

Último

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Past, Present and Future of Data Processing in Apache Hadoop

  • 1. Codemotion Milano 2013 Data Processing and Aggregation Massimo Brignoli Solutions Architect, MongoDB Inc. Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 2. Who Am I? • Solutions Architect/Evangelist in MongoDB Inc. • 20 years of experience in databases • Former MySQL employee • Previous life: web, web, web Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 3. Big Data Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 4. What is Big Data? • Big Data is like teenage sex: • everyone talks about it • nobody really knows how to do it • everyone thinks everyone else is doing it • so everyone claims they are doing it… Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 5. Understanding Big Data – It’s Not Very “Big” 64% - Ingest diverse, new data in real-time 15% - More than 100TB of data 20% - Less than 100TB (average of all? <20TB) from Big Data Executive Summary – 50+ top executives from Government and F500 firms
  • 6. For over a decade Big Data == Custom Software Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 7. Lots of Great Innovations Since 1970
  • 9. RDBMS Makes Development Hard Code XML Config DB Schema Application Object Relational Mapping Relational Database
  • 10. And Even Harder To Iterate New Table New Column New Table Name Pet Phone New Column 3 months later… Email
  • 11. From Complexity to Simplicity MongoDB RDBMS { _id : ObjectId("4c4ba5e5e8aabf3"), employee_name: "Dunham, Justin", department : "Marketing", title : "Product Manager, Web", report_up: "Neray, Graham", pay_band: “C", benefits : [ { type : "Health", plan : "PPO Plus" }, { type : "Dental", plan : "Standard" } ] }
  • 12. In the past few years Open source software has emerged enabling the rest of us to handle Big Data Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 13. Use Popular, Well-Known Technologies Source: Silicon Angle, 2012
  • 14. Enterprise Big Data Stack CRM, ERP, Collaboration, Mobile, BI Data Management Online Data RDBMS RDBMS Offline Data Hadoop Infrastructure OS & Virtualization, Compute, Storage, Network EDW Security & Auditing Management & Monitoring Applications
  • 15. Consideration – Online vs. Offline Online • Real-time • Low-latency • High availability vs. Offline • Long-running • High-Latency • Availability is lower priority
  • 16. How MongoDB Meets Our Requirements • MongoDB is an operational database • MongoDB provides high performance for storage and retrieval at large scale • MongoDB has a robust query interface permitting intelligent operations • MongoDB is not a data processing engine, but provides processing functionality Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 17. MongoDB data processing options http://www.flickr.com/photos/torek/4444673930/ http://createivecommons.org/licenses/by-nc-sa/3.0/ Except where otherwise noted, this work is licensed under
  • 18. Getting Example Data Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 19. The “hello world” of MapReduce is counting words in a paragraph of text. Let’s try something a little more interesting… Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 20. What is the most popular pub name? Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 21. Open Street Map Data #!/usr/bin/env python # Data Source # http://www.overpass-api.de/api/xapi?*[amenity=pub][bbox=-10.5,49.78,1.78,59] import re import sys from imposm.parser import OSMParser import pymongo class Handler(object): def nodes(self, nodes): if not nodes: return docs = [] for node in nodes: osm_id, doc, (lon, lat) = node if "name" not in doc: node_points[osm_id] = (lon, lat) continue doc["name"] = doc["name"].title().lstrip("The ").replace("And", "&") doc["_id"] = osm_id doc["location"] = {"type": "Point", "coordinates": [lon, lat]} docs.append(doc) collection.insert(docs) Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 22. Example Pub Data { "_id" : 451152, "amenity" : "pub", "name" : "The Dignity", "addr:housenumber" : "363", "addr:street" : "Regents Park Road", "addr:city" : "London", "addr:postcode" : "N3 1DH", "toilets" : "yes", "toilets:access" : "customers", "location" : { "type" : "Point", "coordinates" : [-0.1945732, 51.6008172] } } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 23. MapReduce Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 24. MongoDB MapReduce • map MongoDB reduce finalize Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 25. map Map Function MongoDB reduce > var map = function() { finalize emit(this.name, 1); Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 26. map Reduce Function MongoDB reduce > var reduce = function (key, values) { finalize var sum = 0; values.forEach( function (val) {sum += val;} ); return sum; } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 27. Results > db.pub_names.find().sort({value: -1}).limit(10) { "_id" : "The Red Lion", "value" : 407 } { "_id" : "The Royal Oak", "value" : 328 } { "_id" : "The Crown", "value" : 242 } { "_id" : "The White Hart", "value" : 214 } { "_id" : "The White Horse", "value" : 200 } { "_id" : "The New Inn", "value" : 187 } { "_id" : "The Plough", "value" : 185 } { "_id" : "The Rose & Crown", "value" : 164 } { "_id" : "The Wheatsheaf", "value" : 147 } { "_id" : "The Swan", "value" : 140 } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 28. Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 29. Pub Names in the Center of London > db.pubs.mapReduce(map, reduce, { out: "pub_names", query: { location: { $within: { $centerSphere: [[-0.12, 51.516], 2 / 3959] } }} }) { "result" : "pub_names", "timeMillis" : 116, "counts" : { "input" : 643, "emit" : 643, "reduce" : 54, "output" : 537 }, "ok" : 1, } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 30. Results > db.pub_names.find().sort({value: -1}).limit(10) { { { { { { { { { { "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" : : : : : : : : : : "All Bar One", "value" : 11 } "The Slug & Lettuce", "value" : 7 } "The Coach & Horses", "value" : 6 } "The Green Man", "value" : 5 } "The Kings Arms", "value" : 5 } "The Red Lion", "value" : 5 } "Corney & Barrow", "value" : 4 } "O'Neills", "value" : 4 } "Pitcher & Piano", "value" : 4 } "The Crown", "value" : 4 } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 31. MongoDB MapReduce • Real-time • Output directly to document or collection • Runs inside MongoDB on local data − Adds load to your DB − In Javascript – debugging can be a challenge − Translating in and out of C++ Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 32. Aggregation Framework Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 33. Aggregation Framework • op1 MongoDB op2 opN Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 34. Aggregation Framework in 60 Seconds Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 35. Aggregation Framework Operators • $project • $match • $limit • $skip • $sort • $unwind • $group Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 36. $match • Filter documents • Uses existing query syntax • If using $geoNear it has to be first in pipeline • $where is not supported Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 37. Matching Field Values { "_id" : 271421, "amenity" : "pub", "name" : "Sir Walter Tyrrell", "location" : { "type" : "Point", "coordinates" : [ -1.6192422, 50.9131996 ] } } { "$match": { "name": "The Red Lion" }} { "_id" : 271466, "amenity" : "pub", "name" : "The Red Lion", "location" : { "type" : "Point", "coordinates" : [ -1.5494749, 50.7837119 ]} { "_id" : 271466, "amenity" : "pub", "name" : "The Red Lion", "location" : { "type" : "Point", "coordinates" : [ -1.5494749, 50.7837119 ] } } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 38. $project • Reshape documents • Include, exclude or rename fields • Inject computed fields • Create sub-document fields Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 39. Including and Excluding Fields { “$project”: { { "_id" : 271466, "name" : "The Red Lion", “_id”: 0, “amenity”: 1, “name”: 1, "location" : { }} "amenity" : "pub", "type" : "Point", "coordinates" : [ -1.5494749, 50.7837119 ] } { “amenity” : “pub”, “name” : “The Red Lion” } } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 40. Reformatting Documents { “$project”: { { "_id" : 271466, "amenity" : "pub", "name" : "The Red Lion", "location" : { “_id”: 0, “name”: 1, “meta”: { “type”: “$amenity”} }} "type" : "Point", "coordinates" : [ -1.5494749, 50.7837119 ] } } { “name” : “The Red Lion” “meta” : { “type” : “pub” }} Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 41. Dealing with Arrays { “$project”: { { "_id" : 271466, "amenity" : "pub", "name" : "The Red Lion", "facilities" : [ "toilets", “_id”: 0, “name”: 1, “meta”: { “type”: “$amenity”} }} {"$unwind": "$facility"} "food" ], } { "name" : "The Red Lion", "facility" : "toilets" }, { "name" : "The Red Lion", "facility" : "food" } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 42. $group • Group documents by an ID • Field reference, object, constant • Other output fields are computed $max, $min, $avg, $sum $addToSet, $push $first, $last • Processes all data in memory Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 43. Back to the pub! • http://www.offwestend.com/index.php/theatres/pastshows/71 Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 44. Popular Pub Names >var popular_pub_names = [ { $match : location: { $within: { $centerSphere: [[-0.12, 51.516], 2 / 3959]}}} }, { $group : { _id: “$name” value: {$sum: 1} } }, { $sort : {value: -1} }, { $limit : 10 } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 45. Results > db.pubs.aggregate(popular_pub_names) { "result" : [ { "_id" : "All Bar One", "value" : 11 } { "_id" : "The Slug & Lettuce", "value" : 7 } { "_id" : "The Coach & Horses", "value" : 6 } { "_id" : "The Green Man", "value" : 5 } { "_id" : "The Kings Arms", "value" : 5 } { "_id" : "The Red Lion", "value" : 5 } { "_id" : "Corney & Barrow", "value" : 4 } { "_id" : "O'Neills", "value" : 4 } { "_id" : "Pitcher & Piano", "value" : 4 } { "_id" : "The Crown", "value" : 4 } ], "ok" : 1 } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 46. Aggregation Framework Benefits • Real-time • Simple yet powerful interface • Declared in JSON, executes in C++ • Runs inside MongoDB on local data − Adds load to your DB − Limited Operators − Data output is limited Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 47. Analyzing MongoDB Data in External Systems Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 48. MongoDB with Hadoop • MongoDB Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 49. MongoDB with Hadoop • MongoDB Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/ warehouse
  • 50. MongoDB with Hadoop • ETL Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/ MongoDB
  • 51. Map Pub Names in Python #!/usr/bin/env python from pymongo_hadoop import BSONMapper def mapper(documents): bounds = get_bounds() # ~2 mile polygon for doc in documents: geo = get_geo(doc["location"]) # Convert the geo type if not geo: continue if bounds.intersects(geo): yield {'_id': doc['name'], 'count': 1} BSONMapper(mapper) print >> sys.stderr, "Done Mapping." Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 52. Reduce Pub Names in Python #!/usr/bin/env python from pymongo_hadoop import BSONReducer def reducer(key, values): _count = 0 for v in values: _count += v['count'] return {'_id': key, 'value': _count} BSONReducer(reducer) Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 53. Execute MapReduce hadoop jar target/mongo-hadoop-streaming-assembly-1.1.0-rc0.jar -mapper examples/pub/map.py -reducer examples/pub/reduce.py -mongo mongodb://127.0.0.1/demo.pubs -outputURI mongodb://127.0.0.1/demo.pub_names Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 54. Popular Pub Names Nearby > db.pub_names.find().sort({value: -1}).limit(10) { { { { { { { { { { "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" "_id" : : : : : : : : : : "All Bar One", "value" : 11 } "The Slug & Lettuce", "value" : 7 } "The Coach & Horses", "value" : 6 } "The Kings Arms", "value" : 5 } "Corney & Barrow", "value" : 4 } "O'Neills", "value" : 4 } "Pitcher & Piano", "value" : 4 } "The Crown", "value" : 4 } "The George", "value" : 4 } "The Green Man", "value" : 4 } Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 55. MongoDB and Hadoop • Away from data store • Can leverage existing data processing infrastructure • Can horizontally scale your data processing - Offline batch processing - Requires synchronisation between store & processor - Infrastructure is much more complex Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 56. The Future of Big Data and MongoDB Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 57. What is Big Data? Big Data today will be normal tomorrow Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 58. Exponential Data Growth Billions of URLs indexed by Google 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 2000 2002 2004 2006 2008 Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/ 2010 2012
  • 59. MongoDB enables you to scale big Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 60. MongoDB is evolving so you can process the big Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 61. Data Processing with MongoDB • Process in MongoDB using Map/Reduce • Process in MongoDB using Aggregation Framework • Process outside MongoDB using Hadoop and other external tools Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 62. Questions? Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/
  • 63. Codemotion Milano Thanks! Massimo Brignoli Solutions Architect, MongoDB Inc. Except where otherwise noted, this work is licensed under http://createivecommons.org/licenses/by-nc-sa/3.0/

Notas del editor

  1. IBM designed IMS with Rockwell and Caterpillar starting in 1966 for the Apollo program. IMS&apos;s challenge was to inventory the very large bill of materials (BOM) for the Saturn V moon rocket and Apollo space vehicle.
  2. This is helpful because as much as 95% of enterprise information is unstructured, and doesn’t fit neatly into tidy rows and columns. NoSQL and Hadoop allow for dynamic schema.
  3. The industry is talking about Hadoop and MongoDB for Big Data. So should you
  4. This is where MongoDB fits into the existing enterprise IT stackMongoDB is an operational data store used for online data, in the same way that Oracle is an operational data store. It supports applications that ingest, store, manage and even analyze data in real-time. (Compared to Hadoop and data warehouses, which are used for offline, batch analytical workloads.)
  5. Another common use case we see is warehousing of data -* again the connector allows you to utilize existing libraries via hadoopUS
  6. The third most common usecase is an ETL - extract transform load - function.Then putting the aggregated data into mongodb for further analysis.