Join us for a webinar on how MongoDB and Hadoop can work together to solve Big Data problems in today's enterprises. We will take an in depth look at how the two technologies make real business intelligence accessible to end users. After a brief introduction to both technologies, this webinar will dive deep into the MongoDB+Hadoop Connector and how it is applied to enable new business insights.
In this webinar you will learn:
What information problems are a good fit for MongoDB and Hadoop
How to integrate the two technologies using the MongoDB+Hadoop Connector
Programming paradigms for tackling common problems
2. What is MongoDB?
The leading NoSQL database
General
Purpose
2
Document
Database
OpenSource
3. MongoDB Document Model
RDBMS
MongoDB
{
_id : ObjectId("4c4ba5e5e8aabf3"),
employee_name: "Dunham, Justin",
department : "Marketing",
title : "Product Manager, Web",
report_up: "Neray, Graham",
pay_band: “C",
benefits : [
{
type :
"Health",
plan : "PPO Plus" },
{
type :
"Dental",
plan : "Standard" }
]
}
3
4. What is Hadoop?
“The Apache Hadoop software library is a framework that
allows for the distributed processing of large data sets
across clusters of computers using simple programming
models.”*
•
•
•
•
Large datasets
Analytics
Batch
Map-Reduce
*source: hadoop.apache.org
4
5. 5
Applications
CRM, ERP, Collaboration, Mobile, BI
Data Management
Online Data
Offline Data
RDBMS
RDBMS
Hadoop
EDW
Infrastructure
OS & Virtualization, Compute, Storage, Network
Security & Auditing
Management & Monitoring
Enterprise IT Stack
6. Consideration: Online vs. Offline
Online
• Real-time
• Low-latency
• High availability
6
vs.
Offline
• Long-running
• High-Latency
• Availability is lower priority
8. Hadoop is good for…
Risk Modeling
Recommendation
Engine
Ad Targeting
Transaction
Analysis
Trade
Surveillance
Network Failure
Prediction
8
Churn Analysis
Search Quality
Data Lake
9. MongoDB is good for…
360 Degree View
of the Customer
Fraud Detection
User Data
Management
Content
Management &
Delivery
Reference Data
Product Catalogs
9
Mobile & Social
Apps
Machine to
Machine Apps
Data Hub
10. MongoDB and Hadoop: Complementary
• Real-time systems
• Light-weight analytical
workloads
10
• “Data Lake”
• In-depth analytics
11. Use MongoDB+Hadoop Together
ECommerce
Analysis
MongoDB
Connector for
Hadoop
•
•
•
•
•
•
11
Products & Inventory
Real-time recommendations
Customer profile
Session management
Customer clickstream
Fraud detection
•
•
•
•
Transaction history
Clickstream history
Recommendation model
Fraud modeling
12. Example – Fraud Detection
Nightly
Analysis
Payments
• Online payments
processing
MongoDB
Connector for
Hadoop
• Fraud modeling
query
only
Fraud
Detection
query only
12
Results
Cache
3rd Party Data
Sources
13. Customer example – Global Travel
Firm
Travel
Algorithms
MongoDB
Connector for
Hadoop
•
•
•
•
13
Flights, hotels and cars
Real-time offers
User profiles, reviews
User metadata (previous
purchases, clicks,
views)
•
•
•
•
User segmentation
Offer recommendation engine
Ad serving engine
Bundling engine
14. Customer example – MetLife
Churn
Analysis
Insurance
MongoDB
Connector for
Hadoop
•
•
•
•
•
14
Insurance policies
Demographic data
Customer web data
Call center data
Real-time churn detection
• Customer action analysis
• Churn prediction
algorithms
15. Customer example – Criteo
Ad-Serving
Algorithms
MongoDB
Connector for
Hadoop
•
•
•
•
•
15
Catalogs and products
User profiles
Clicks
Views
Transactions
• User segmentation
• Recommendation engine
• Prediction engine
16. What is MongoDB-Hadoop Connector?
• Java Map-Reduce, Stream Map-Reduce, Pig, &
Hive access to MongoDB
– MongoDB as input
• mongo.job.input.format=com.hadoop.MongoInputFormat
• mongo.input.uri=mongodb://my-db:27017/db1.collection1
– MongoDB as output
• mongo.job.output.format=com.hadoop.MongoOutputFormat
• mongo.input.uri=mongodb://my-db:27017/db1.collection2
– Using MongoDB backup files
• mongo.job.output.format=com.hadoop.BSONFileOutputFormat
• mapred.output.dir=file:///results.bson
16
17. Enhancing MongoDB-Hadoop Connector
• Version 1.1.0, July 2013
• Version 1.2.0, December 2013
– Pig support
– Apache Hadoop 2.2 support
– Hive support
– Multiple collections as M-R
– Streaming support
source
– Read/Write MongoDB backups
– Update writes
– Custom splitting support
– Much more….
17
– Multiple mongos support
– Performance improvements
19. Resources
Resource
White paper: Big Data: Examples and
Guidelines for the Enterprise Decision Maker
http://www.mongodb.com/lp/white
paper/big-data-nosql
Recorded Webinar Series: Thrive with Big
Data
http://www.mongodb.com/lp/bigdata-series
Recorded Webinar: What’s New with
MongoDB Hadoop Integration
http://www.mongodb.com/presenta
tions/webinar-whats-newmongodb-hadoop-integration
Documentation: MongoDB Connector for
Hadoop
http://docs.mongodb.org/ecosyste
m/tools/hadoop/
Trouble Tickets
http://jira.mongodb.org (project =
Hadoop Integration)
Subscriptions, support, consulting, training
19
Location
https://www.mongodb.com/produc
ts/how-to-buy
Notas del editor
This is where MongoDB fits into the existing enterprise IT stackMongoDB is an operational data store used for online data, in the same way that Oracle is an operational data store. It supports applications that ingest, store, manage and even analyze data in real-time. (Compared to Hadoop and data warehouses, which are used for offline, batch analytical workloads.)