SlideShare una empresa de Scribd logo
1 de 40
Unlocking Operational Intelligence
from the Data Lake
Mat Keep
Director, Product & Market Analysis
mat.keep@mongodb.com
@matkeep
2
The World is Changing
Digital Natives & Digital Transformation
Volume
Velocity
Variety
Iterative
Agile
Short Cycles
Always On
Secure
Global
Open-Source
Cloud
Commodity
Data Time
Risk Cost
3
Creating the “Insight Economy”
4
Data Warehouse Challenges
5
The Rise of the Data Lake
6
• 24% CAGR: Hadoop,
Spark & Streaming
• 18% CAGR: Databases
• Databases are key
components within the
big data landscape
“Big Data” is More than Just Hadoop
7
Apache Hadoop Data Lake
• Risk modeling
• Retrospective & predictive analytics
• Machine learning & pattern
matching
• Customer segmentation & churn
analysis
• ETL pipelines
• Active archives
NoSQL
Database
8
http://www.infoworld.com/article/2980316/big-data/why-your-big-data-strategy-is-a-bust.html
“Thru 2018, 70 percent of Hadoop
deployments will not meet cost savings
and revenue generation objectives due to
skills and integration challenges.”
Nick Heudecker, Research Director, Data Management & Integration
9
How to Avoid Being in the 70%?
1. Unify data lake analytics with
the operational applications
2. Create smart, contextually
aware, data-driven apps &
insights
3. Integrate a database layer with
the data lake
10
Why a Database + Hadoop?
Distributed Processing & Analytics
• Data stored as large files (64MB-128MB
blocks). No indexes
• Write-once-read-many, append-only
• Designed for high throughput scans
across TB/PB of data.
• Multi-minute latency
Common Attributes
• Schema-on-read
• Multiple replicas
• Horizontal scale
• High throughput
• Low TCO
11
Why a Database + Hadoop?
Distributed Processing & Analytics
• Random access to subsets of data
• Millisecond latency
• Expressive querying, rich
aggregations & flexible indexing
• Update fast changing data, avoid re-
write / re-compute entire data set
• Data stored as large files (64MB-128MB
blocks). No indexes
• Write-once-read-many, append-only
• Designed for high throughput scans
across TB/PB of data.
• Multi-minute latency
Common Attributes
• Schema-on-read
• Multiple replicas
• Horizontal scale
• High throughput
• Low TCO
12
MongoDB & Hadoop: What’s Common
Distributed Processing & Analytics
Common Attributes
• Schema-on-read
• Multiple replicas
• Horizontal scale
• High throughput
• Low TCO
13
Bringing it Together
Online Services
powered by
Back-end machine learning
powered by
• User account & personalization
• Product catalog
• Session management & shopping cart
• Recommendations
• Customer classification & clustering
• Basket analysis
• Brand sentiment
• Price optimization
MongoDB
Connector for
Hadoop
MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data.
Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in
128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data.
Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in
128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
Configure where to
land incoming data
MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data.
Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in
128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
Raw data processed to
generate analytics models
MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data.
Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in
128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
MongoDB exposes
analytics models to
operational apps.
Handles real time
updates
MessageQueue
Customer Data Mgmt Mobile App IoT App Live Dashboards
Raw Data
Processed
Events
Distributed
Processing
Frameworks
Millisecond latency. Expressive querying & flexible indexing against subsets of data.
Updates-in place. In-database aggregations & transformations
Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in
128MB blocks. Write-once-read-many & append-only storage model
Sensors
User Data
Clickstreams
Logs
Churn
Analysis
Enriched
Customer
Profiles
Risk
Modeling
Predictive
Analytics
Real-Time Access
Batch Processing, Batch Views
Design Pattern: Operationalized Data Lake
Compute new
models against
MongoDB &
HDFS
19
Operational Database Requirements
1 “Smart” integration with the data lake
2 Powerful real-time analytics
3 Flexible, governed data model
4 Scale with the data lake
5 Sophisticated management & security
Operational Data Lake in Action
21
Problem Why MongoDB ResultsProblem Solution Results
Existing EDW with nightly
batch loads
No real-time analytics to
personalize user experience
Application changes broke ETL
pipeline
Unable to scale as services
expanded
Microservices architecture running on AWS
All application events written to Kafka queue,
routed to MongoDB and Hadoop
Events that personalize real-time experience (ie
triggering email send, additional questions,
offers) written to MongoDB
All event data aggregated with other data
sources and analyzed in Hadoop, updated
customer profiles written back to MongoDB
2x faster delivery of new
services after migrating to new
architecture
Enabled continuous delivery:
pushing new features every
day
Personalized user experience,
plus higher uptime and
scalability
UK’s Leading Price Comparison Site
Out-pacing Internet search giants with continuous delivery pipeline
powered by microservices & Docker running MongoDB, Kafka and
Hadoop in the cloud
22
Problem Why MongoDB Results
Problem Solution Results
Customer data scattered across
100+ different systems
Poor customer experience: no
personalization, no consistent
experience across brands or
devices
No way to analyze customer
behavior to deliver targeted offers
Selected MongoDB over HBase for
schema flexibility and rich query support
MongoDB stores all customer profiles,
served to web, mobile & call-center apps
Distributed across multiple regions for DR
and data locality
All customer interactions stored in
MongoDB, loaded into Hadoop for
customer segmentation
Unified processing pipeline with Spark
running across MongoDB and Hadoop
Single profile created for each
customer, personalizing
experience in real time
Revenue optimization by
calculating best ticket prices
Reduce competitive pressures
by identifying gaps in product
offerings
Customer Data Management
Single view and real-time analytics with MongoDB,
Spark, & Hadoop
Leading
Global
Airline
23
Problem Why MongoDB Results
Problem Solution Results
Commercialize a national security
platform
Massive volumes of multi-
structured data: news, RSS &
social feeds, geospatial, geological,
health & crime stats
Requires complex analysis,
delivered in real time, always on
Apache NiFI for data ingestion, routing
& metadata management
Hadoop for text analytics
HANA for geospatial analytics
MongoDB correlates analytics with
user profiles & location data to deliver
real-time alerts to corporate security
teams & individual travelers
Enables Prescient to uniquely
blend big data technology with its
security IP developed in
government
Dynamic data model supports
indexing 38k data sources,
growing at 200 per day
24x7 continuous availability
Scalability to PBs of data
World’s Most Sophisticated
Traveler Safety Platform
Analyzing PBs of Data with MongoDB, Hadoop, Apache NiFi
& SAP HANA
24
Problem Why MongoDB Results
Problem Solution Results
Requirement to analyze data over
many different dimensions to detect
real time threat profiles
HBase unable to query data
beyond primary key lookups
Lucene search unable to scale with
growth in data
MongoDB + Hadoop to collect and
analyze data from internet sensors in
real time
MongoDB dynamic schema enables
sensor data to be enriched with
geospatial tags
Auto-sharding to scale as data
volumes grow
Run complex, real-time analytics on
live data
Improved query performance by
over 3x
Scale to support doubling of data
volume every 24 months
Deploy across global data
centers for low latency user
experience
Engineering teams have more
time to develop new features
Powering Global Threat
Intelligence
Cloud-based real-time analytics with MongoDB & Hadoop
Wrapping Up
Conclusion
1 Data lakes enable
enterprises to affordably
capture & analyze more data
2 Operational and analytical
workloads are converging
3 MongoDB is the key
technology to operationalize
the data lake
27
MongoDB Compass MongoDB Connector for BI
MongoDB Enterprise Server
MongoDB Enterprise Advanced24x7Support
(1hourSLA)
CommercialLicense
(NoAGPLCopyleftRestrictions)
Platform
Certifications
MongoDB Ops Manager
Monitoring &
Alerting
Query
Optimization
Backup &
Recovery
Automation &
Configuration
Schema Visualization
Data Exploration
Ad-Hoc Queries
Visualization
Analysis
Reporting
Authorization Auditing
Encryption
(In Flight & at Rest)
Authentication
REST APIEmergency
Patches
Customer
Success
Program
On-Demand
Online Training
Warranty
Limitation of
Liability
Indemnification
28
Resources to Learn More
• Guide: Operational Data Lake
• Whitepaper: Real-Time
Analytics with Apache Spark &
MongoDB
30
For More Information
Resource Location
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Free Online Training education.mongodb.com
Webinars and Events mongodb.com/events
Documentation docs.mongodb.org
MongoDB Downloads mongodb.com/download
Additional Info info@mongodb.com
31
Problem Why MongoDB Results
Problem Solution Results
System failures in online banking
systems creating customer sat
issues
No personalization experience
across channels
No enrichment of user data with
social media chatter
Apache Flume to ingest log data &
social media streams, Apache Spark
to process log events
MongoDB to persist log data and
KPIs, immediately rebuild user
sessions when a service fails
Integration with MongoDB query
language and secondary indexes to
selectively filter and query data in real
time
Improved user experience, with
more customers using online,
self-service channels
Improved services following
deeper understanding of how
users interact with systems
Greater user insight by adding
social media insights
One of World’s Largest Banks
Creating new customer insights with MongoDB & Spark
32
Fare Calculation Engine
One of World’s Largest Airlines Migrates from Oracle to
MongoDB and Apache Spark to Support 100x performance
improvement
Problem Why MongoDB Results
Problem Solution Results
China Eastern targeting 130,000 seats
sold every day across its web and
mobile channels
New fare calculation engine needed to
support 20,000 search queries per
second, but current Oracle platform
supported only 200 per second
Apache Spark used for fare
calculations, using business rules
stored in MongoDB
Fare calculations written to MongoDB
for access by the search application
MongoDB Connector for Apache Spark
allows seamless integration with data
locality awareness across the cluster
Cluster of less than 20 API, Spark &
MongoDB nodes supports 180m fare
calculations & 1.6 billion searches per
day
Each node delivers 15x higher
performance and 10x lower latency
than existing Oracle servers
MongoDB Enterprise Advanced
provided Ops Manager for operational
automation and access to expert
technical support
33
MongoDB Connector for Apache Spark
• Native Scala connector, certified by Databricks
• Exposes all Spark APIs & libraries
• Efficient data filtering with predicate pushdown,
secondary indexes, & in-database
aggregations
• Locality awareness to reduce data movement
“We reduced 100+ lines of integration code to just a
single line after moving to the MongoDB Spark connector.”
- Early Access Tester, Multi-National Banking Group Group
34
Evaluating your Options
35
Query and Data Model
MongoDB Relational Column Family
(i.e. HBase)
Rich query language & secondary
indexes
Yes Yes Requires integration
with separate Spark /
Hadoop cluster
In-Database aggregations & search Yes Yes Requires integration
with separate Spark /
Hadoop cluster
Dynamic schema Yes No Partial
Data validation Yes Yes App-side code
• Why it matters
– Query & Aggregations: Rich, real time analytics against operational data
– Dynamic Schema: Manage multi-structured data
– Data Validation: Enforce data governance between data lake & operational apps
36
Data Lake Integration
MongoDB Relational Column Family
(i.e. HBase)
Hadoop + secondary indexes Yes Yes: Expensive No secondary
indexes
Spark + secondary indexes Yes Yes: Expensive No secondary
indexes
Native BI connectivity Yes Yes 3rd-party connectors
Workload isolation Yes Yes: Expensive Load data to
separate
Spark/Hadoop
cluster
• Why it matters
– Hadoop + Spark: Efficient data movement between data lake, processing layer & database
– Native BI Connectivity: Visualizing operational data
– Workload isolation: separation between operational and analytical workloads
37
Operationalizing for Scale & Security
MongoDB Relational Column Family
(i.e. HBase)
Robust security controls Yes Yes Yes
Scale-out on commodity hardware Yes No Yes
Sophisticated management platform Yes Yes Monitoring only
• Why it matters
– Security: Data protection for regulatory compliance
– Scale-Out: Grow with the data lake
– Management: Reduce TCO with platform automation, monitoring, disaster recovery
38
MongoDB Nexus Architecture
Adoption & Skills Availability
500+ employees
About
MongoDB, Inc.
2,000+
customers
13 offices
worldwide
$311M in
funding

Más contenido relacionado

La actualidad más candente

[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive AnalyticsInfochimps, a CSC Big Data Business
 
Final_CloudEventFrankfurt2017 (1).pdf
Final_CloudEventFrankfurt2017 (1).pdfFinal_CloudEventFrankfurt2017 (1).pdf
Final_CloudEventFrankfurt2017 (1).pdfMongoDB
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Entity Resolution Service - Bringing Petabytes of Data Online for Instant Access
Entity Resolution Service - Bringing Petabytes of Data Online for Instant AccessEntity Resolution Service - Bringing Petabytes of Data Online for Instant Access
Entity Resolution Service - Bringing Petabytes of Data Online for Instant AccessDataWorks Summit
 
Partner Solutions: CIGNEX Datamatics – Alfresco integration with Liferay Port...
Partner Solutions: CIGNEX Datamatics – Alfresco integration with Liferay Port...Partner Solutions: CIGNEX Datamatics – Alfresco integration with Liferay Port...
Partner Solutions: CIGNEX Datamatics – Alfresco integration with Liferay Port...Alfresco Software
 
MongoDB and RDBMS: Using Polyglot Persistence at Equifax
MongoDB and RDBMS: Using Polyglot Persistence at Equifax MongoDB and RDBMS: Using Polyglot Persistence at Equifax
MongoDB and RDBMS: Using Polyglot Persistence at Equifax MongoDB
 
Marketing Digital Command Center
Marketing Digital Command CenterMarketing Digital Command Center
Marketing Digital Command CenterDataWorks Summit
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
 
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Mariano Gonzalez
 
A Brief Introduction: MongoDB
A Brief Introduction: MongoDBA Brief Introduction: MongoDB
A Brief Introduction: MongoDBDATAVERSITY
 
MongoDB_Spark
MongoDB_SparkMongoDB_Spark
MongoDB_SparkMat Keep
 
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreBig Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreAmazon Web Services
 
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database HypeHybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database HypeAli Hodroj
 
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsSpark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsAli Hodroj
 
Real-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data GridsReal-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data GridsAli Hodroj
 
Cascading User Group Meet
Cascading User Group MeetCascading User Group Meet
Cascading User Group MeetVinoth Kannan
 
2014 10 09 Top reasons to use IBM BigInsights as your Big Data Hadoop system
2014 10 09 Top reasons to use IBM BigInsights as your Big Data Hadoop system2014 10 09 Top reasons to use IBM BigInsights as your Big Data Hadoop system
2014 10 09 Top reasons to use IBM BigInsights as your Big Data Hadoop systemToby Woolfe
 
MAALBS Big Data agile framwork
MAALBS Big Data agile framwork MAALBS Big Data agile framwork
MAALBS Big Data agile framwork balvis_ms
 

La actualidad más candente (20)

[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
[Webinar] Measure Twice, Build Once: Real-Time Predictive Analytics
 
Final_CloudEventFrankfurt2017 (1).pdf
Final_CloudEventFrankfurt2017 (1).pdfFinal_CloudEventFrankfurt2017 (1).pdf
Final_CloudEventFrankfurt2017 (1).pdf
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Entity Resolution Service - Bringing Petabytes of Data Online for Instant Access
Entity Resolution Service - Bringing Petabytes of Data Online for Instant AccessEntity Resolution Service - Bringing Petabytes of Data Online for Instant Access
Entity Resolution Service - Bringing Petabytes of Data Online for Instant Access
 
Partner Solutions: CIGNEX Datamatics – Alfresco integration with Liferay Port...
Partner Solutions: CIGNEX Datamatics – Alfresco integration with Liferay Port...Partner Solutions: CIGNEX Datamatics – Alfresco integration with Liferay Port...
Partner Solutions: CIGNEX Datamatics – Alfresco integration with Liferay Port...
 
MongoDB and RDBMS: Using Polyglot Persistence at Equifax
MongoDB and RDBMS: Using Polyglot Persistence at Equifax MongoDB and RDBMS: Using Polyglot Persistence at Equifax
MongoDB and RDBMS: Using Polyglot Persistence at Equifax
 
Marketing Digital Command Center
Marketing Digital Command CenterMarketing Digital Command Center
Marketing Digital Command Center
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
Architecting Analytic Pipelines on GCP - Chicago Cloud Conference 2020
 
Infochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey TheoremInfochimps + CloudCon: Infinite Monkey Theorem
Infochimps + CloudCon: Infinite Monkey Theorem
 
A Brief Introduction: MongoDB
A Brief Introduction: MongoDBA Brief Introduction: MongoDB
A Brief Introduction: MongoDB
 
451 Research Impact Report
451 Research Impact Report451 Research Impact Report
451 Research Impact Report
 
MongoDB_Spark
MongoDB_SparkMongoDB_Spark
MongoDB_Spark
 
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreBig Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
 
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database HypeHybrid Transactional/Analytics Processing: Beyond the Big Database Hype
Hybrid Transactional/Analytics Processing: Beyond the Big Database Hype
 
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data GridsSpark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
Spark DC Interactive Meetup: HTAP with Spark and In-Memory Data Grids
 
Real-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data GridsReal-time Microservices and In-Memory Data Grids
Real-time Microservices and In-Memory Data Grids
 
Cascading User Group Meet
Cascading User Group MeetCascading User Group Meet
Cascading User Group Meet
 
2014 10 09 Top reasons to use IBM BigInsights as your Big Data Hadoop system
2014 10 09 Top reasons to use IBM BigInsights as your Big Data Hadoop system2014 10 09 Top reasons to use IBM BigInsights as your Big Data Hadoop system
2014 10 09 Top reasons to use IBM BigInsights as your Big Data Hadoop system
 
MAALBS Big Data agile framwork
MAALBS Big Data agile framwork MAALBS Big Data agile framwork
MAALBS Big Data agile framwork
 

Destacado

My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionSteve Loughran
 
"Adoption Tactics; Why Your End Users and Project Managers Will Rave Over Sha...
"Adoption Tactics; Why Your End Users and Project Managers Will Rave Over Sha..."Adoption Tactics; Why Your End Users and Project Managers Will Rave Over Sha...
"Adoption Tactics; Why Your End Users and Project Managers Will Rave Over Sha...Gina Montgomery, V-TSP
 
Groovy Domain Specific Languages - SpringOne2GX 2012
Groovy Domain Specific Languages - SpringOne2GX 2012Groovy Domain Specific Languages - SpringOne2GX 2012
Groovy Domain Specific Languages - SpringOne2GX 2012Guillaume Laforge
 
The Rise of Microservices - Containers and Orchestration
The Rise of Microservices - Containers and OrchestrationThe Rise of Microservices - Containers and Orchestration
The Rise of Microservices - Containers and OrchestrationMongoDB
 
Past, Present and Future of Data Processing in Apache Hadoop
Past, Present and Future of Data Processing in Apache HadoopPast, Present and Future of Data Processing in Apache Hadoop
Past, Present and Future of Data Processing in Apache HadoopCodemotion
 
Lambda Architecture in Practice
Lambda Architecture in PracticeLambda Architecture in Practice
Lambda Architecture in PracticeNavneet kumar
 
Boeing against Airbus
Boeing against AirbusBoeing against Airbus
Boeing against AirbusRuhull
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
 
Webinar: MongoDB Connector for Spark
Webinar: MongoDB Connector for SparkWebinar: MongoDB Connector for Spark
Webinar: MongoDB Connector for SparkMongoDB
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneMongoDB
 
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...MongoDB
 
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeMongoDB
 
Boeing 787 dreamliner project lessson learned
Boeing 787 dreamliner project  lessson learnedBoeing 787 dreamliner project  lessson learned
Boeing 787 dreamliner project lessson learnedMohamed Sol Al Okaily
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondDataWorks Summit/Hadoop Summit
 
BOEING 787 US's Dreamliner Or Nightmare-Liner? Who will Ultimately Win the Ae...
BOEING 787 US's Dreamliner Or Nightmare-Liner? Who will Ultimately Win the Ae...BOEING 787 US's Dreamliner Or Nightmare-Liner? Who will Ultimately Win the Ae...
BOEING 787 US's Dreamliner Or Nightmare-Liner? Who will Ultimately Win the Ae...France Houdard
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big dataTrieu Nguyen
 

Destacado (20)

My other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 editionMy other computer is a datacentre - 2012 edition
My other computer is a datacentre - 2012 edition
 
"Adoption Tactics; Why Your End Users and Project Managers Will Rave Over Sha...
"Adoption Tactics; Why Your End Users and Project Managers Will Rave Over Sha..."Adoption Tactics; Why Your End Users and Project Managers Will Rave Over Sha...
"Adoption Tactics; Why Your End Users and Project Managers Will Rave Over Sha...
 
Amazon QuickSight
Amazon QuickSightAmazon QuickSight
Amazon QuickSight
 
Groovy Domain Specific Languages - SpringOne2GX 2012
Groovy Domain Specific Languages - SpringOne2GX 2012Groovy Domain Specific Languages - SpringOne2GX 2012
Groovy Domain Specific Languages - SpringOne2GX 2012
 
The Rise of Microservices - Containers and Orchestration
The Rise of Microservices - Containers and OrchestrationThe Rise of Microservices - Containers and Orchestration
The Rise of Microservices - Containers and Orchestration
 
Past, Present and Future of Data Processing in Apache Hadoop
Past, Present and Future of Data Processing in Apache HadoopPast, Present and Future of Data Processing in Apache Hadoop
Past, Present and Future of Data Processing in Apache Hadoop
 
Lambda Architecture in Practice
Lambda Architecture in PracticeLambda Architecture in Practice
Lambda Architecture in Practice
 
Boeing against Airbus
Boeing against AirbusBoeing against Airbus
Boeing against Airbus
 
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!
 
Webinar: MongoDB Connector for Spark
Webinar: MongoDB Connector for SparkWebinar: MongoDB Connector for Spark
Webinar: MongoDB Connector for Spark
 
L’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova GenerazioneL’architettura di Classe Enterprise di Nuova Generazione
L’architettura di Classe Enterprise di Nuova Generazione
 
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
MongoDB Europe 2016 - Who’s Helping Themselves To Your Data? Demystifying Mon...
 
MongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big ComputeMongoDB Europe 2016 - Big Data meets Big Compute
MongoDB Europe 2016 - Big Data meets Big Compute
 
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right WayMongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
MongoDB Europe 2016 - ETL for Pros – Getting Data Into MongoDB The Right Way
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
Boeing 787 dreamliner project lessson learned
Boeing 787 dreamliner project  lessson learnedBoeing 787 dreamliner project  lessson learned
Boeing 787 dreamliner project lessson learned
 
Big Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyondBig Data for Managers: From hadoop to streaming and beyond
Big Data for Managers: From hadoop to streaming and beyond
 
BOEING 787 US's Dreamliner Or Nightmare-Liner? Who will Ultimately Win the Ae...
BOEING 787 US's Dreamliner Or Nightmare-Liner? Who will Ultimately Win the Ae...BOEING 787 US's Dreamliner Or Nightmare-Liner? Who will Ultimately Win the Ae...
BOEING 787 US's Dreamliner Or Nightmare-Liner? Who will Ultimately Win the Ae...
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 

Similar a Unlocking Operational Intelligence from the Data Lake with MongoDB

Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantageAmazon Web Services
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
 
Key Data Management Requirements for the IoT
Key Data Management Requirements for the IoTKey Data Management Requirements for the IoT
Key Data Management Requirements for the IoTMongoDB
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading StrategiesMongoDB
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBMongoDB
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Denodo
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationAbdelkrim Hadjidj
 
Accelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data StrategyAccelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data StrategyMongoDB
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and AnalyticsVMware Tanzu
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analyticsAmazon Web Services
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBconfluent
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product IntroTapdata
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesAshraf Uddin
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesJames Serra
 
Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013IntelAPAC
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBMongoDB
 
Data Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEAData Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEAAndrew Morgan
 

Similar a Unlocking Operational Intelligence from the Data Lake with MongoDB (20)

Using real time big data analytics for competitive advantage
 Using real time big data analytics for competitive advantage Using real time big data analytics for competitive advantage
Using real time big data analytics for competitive advantage
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
Key Data Management Requirements for the IoT
Key Data Management Requirements for the IoTKey Data Management Requirements for the IoT
Key Data Management Requirements for the IoT
 
MongoDB Breakfast Milan - Mainframe Offloading Strategies
MongoDB Breakfast Milan -  Mainframe Offloading StrategiesMongoDB Breakfast Milan -  Mainframe Offloading Strategies
MongoDB Breakfast Milan - Mainframe Offloading Strategies
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDB
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
Paris FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant PresentationParis FOD Meetup #5 Cognizant Presentation
Paris FOD Meetup #5 Cognizant Presentation
 
Accelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data StrategyAccelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data Strategy
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Innovating With Data and Analytics
Innovating With Data and AnalyticsInnovating With Data and Analytics
Innovating With Data and Analytics
 
Tapping the cloud for real time data analytics
 Tapping the cloud for real time data analytics Tapping the cloud for real time data analytics
Tapping the cloud for real time data analytics
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Tapdata Product Intro
Tapdata Product IntroTapdata Product Intro
Tapdata Product Intro
 
Big Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture CapabilitiesBig Data: Its Characteristics And Architecture Capabilities
Big Data: Its Characteristics And Architecture Capabilities
 
Big Data: It’s all about the Use Cases
Big Data: It’s all about the Use CasesBig Data: It’s all about the Use Cases
Big Data: It’s all about the Use Cases
 
Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013Girish Juneja - Intel Big Data & Cloud Summit 2013
Girish Juneja - Intel Big Data & Cloud Summit 2013
 
Hadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data ProcessingHadoop 2.0: YARN to Further Optimize Data Processing
Hadoop 2.0: YARN to Further Optimize Data Processing
 
Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDB
 
Data Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEAData Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEA
 

Más de MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Más de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Último

毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...ttt fff
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一F sss
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 

Último (20)

毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
毕业文凭制作#回国入职#diploma#degree美国加州州立大学北岭分校毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#de...
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 

Unlocking Operational Intelligence from the Data Lake with MongoDB

  • 1. Unlocking Operational Intelligence from the Data Lake Mat Keep Director, Product & Market Analysis mat.keep@mongodb.com @matkeep
  • 2. 2 The World is Changing Digital Natives & Digital Transformation Volume Velocity Variety Iterative Agile Short Cycles Always On Secure Global Open-Source Cloud Commodity Data Time Risk Cost
  • 5. 5 The Rise of the Data Lake
  • 6. 6 • 24% CAGR: Hadoop, Spark & Streaming • 18% CAGR: Databases • Databases are key components within the big data landscape “Big Data” is More than Just Hadoop
  • 7. 7 Apache Hadoop Data Lake • Risk modeling • Retrospective & predictive analytics • Machine learning & pattern matching • Customer segmentation & churn analysis • ETL pipelines • Active archives NoSQL Database
  • 8. 8 http://www.infoworld.com/article/2980316/big-data/why-your-big-data-strategy-is-a-bust.html “Thru 2018, 70 percent of Hadoop deployments will not meet cost savings and revenue generation objectives due to skills and integration challenges.” Nick Heudecker, Research Director, Data Management & Integration
  • 9. 9 How to Avoid Being in the 70%? 1. Unify data lake analytics with the operational applications 2. Create smart, contextually aware, data-driven apps & insights 3. Integrate a database layer with the data lake
  • 10. 10 Why a Database + Hadoop? Distributed Processing & Analytics • Data stored as large files (64MB-128MB blocks). No indexes • Write-once-read-many, append-only • Designed for high throughput scans across TB/PB of data. • Multi-minute latency Common Attributes • Schema-on-read • Multiple replicas • Horizontal scale • High throughput • Low TCO
  • 11. 11 Why a Database + Hadoop? Distributed Processing & Analytics • Random access to subsets of data • Millisecond latency • Expressive querying, rich aggregations & flexible indexing • Update fast changing data, avoid re- write / re-compute entire data set • Data stored as large files (64MB-128MB blocks). No indexes • Write-once-read-many, append-only • Designed for high throughput scans across TB/PB of data. • Multi-minute latency Common Attributes • Schema-on-read • Multiple replicas • Horizontal scale • High throughput • Low TCO
  • 12. 12 MongoDB & Hadoop: What’s Common Distributed Processing & Analytics Common Attributes • Schema-on-read • Multiple replicas • Horizontal scale • High throughput • Low TCO
  • 13. 13 Bringing it Together Online Services powered by Back-end machine learning powered by • User account & personalization • Product catalog • Session management & shopping cart • Recommendations • Customer classification & clustering • Basket analysis • Brand sentiment • Price optimization MongoDB Connector for Hadoop
  • 14. MessageQueue Customer Data Mgmt Mobile App IoT App Live Dashboards Raw Data Processed Events Distributed Processing Frameworks Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model Sensors User Data Clickstreams Logs Churn Analysis Enriched Customer Profiles Risk Modeling Predictive Analytics Real-Time Access Batch Processing, Batch Views Design Pattern: Operationalized Data Lake
  • 15. MessageQueue Customer Data Mgmt Mobile App IoT App Live Dashboards Raw Data Processed Events Distributed Processing Frameworks Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model Sensors User Data Clickstreams Logs Churn Analysis Enriched Customer Profiles Risk Modeling Predictive Analytics Real-Time Access Batch Processing, Batch Views Design Pattern: Operationalized Data Lake Configure where to land incoming data
  • 16. MessageQueue Customer Data Mgmt Mobile App IoT App Live Dashboards Raw Data Processed Events Distributed Processing Frameworks Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model Sensors User Data Clickstreams Logs Churn Analysis Enriched Customer Profiles Risk Modeling Predictive Analytics Real-Time Access Batch Processing, Batch Views Design Pattern: Operationalized Data Lake Raw data processed to generate analytics models
  • 17. MessageQueue Customer Data Mgmt Mobile App IoT App Live Dashboards Raw Data Processed Events Distributed Processing Frameworks Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model Sensors User Data Clickstreams Logs Churn Analysis Enriched Customer Profiles Risk Modeling Predictive Analytics Real-Time Access Batch Processing, Batch Views Design Pattern: Operationalized Data Lake MongoDB exposes analytics models to operational apps. Handles real time updates
  • 18. MessageQueue Customer Data Mgmt Mobile App IoT App Live Dashboards Raw Data Processed Events Distributed Processing Frameworks Millisecond latency. Expressive querying & flexible indexing against subsets of data. Updates-in place. In-database aggregations & transformations Multi-minute latency with scans across TB/PB of data. No indexes. Data stored in 128MB blocks. Write-once-read-many & append-only storage model Sensors User Data Clickstreams Logs Churn Analysis Enriched Customer Profiles Risk Modeling Predictive Analytics Real-Time Access Batch Processing, Batch Views Design Pattern: Operationalized Data Lake Compute new models against MongoDB & HDFS
  • 19. 19 Operational Database Requirements 1 “Smart” integration with the data lake 2 Powerful real-time analytics 3 Flexible, governed data model 4 Scale with the data lake 5 Sophisticated management & security
  • 21. 21 Problem Why MongoDB ResultsProblem Solution Results Existing EDW with nightly batch loads No real-time analytics to personalize user experience Application changes broke ETL pipeline Unable to scale as services expanded Microservices architecture running on AWS All application events written to Kafka queue, routed to MongoDB and Hadoop Events that personalize real-time experience (ie triggering email send, additional questions, offers) written to MongoDB All event data aggregated with other data sources and analyzed in Hadoop, updated customer profiles written back to MongoDB 2x faster delivery of new services after migrating to new architecture Enabled continuous delivery: pushing new features every day Personalized user experience, plus higher uptime and scalability UK’s Leading Price Comparison Site Out-pacing Internet search giants with continuous delivery pipeline powered by microservices & Docker running MongoDB, Kafka and Hadoop in the cloud
  • 22. 22 Problem Why MongoDB Results Problem Solution Results Customer data scattered across 100+ different systems Poor customer experience: no personalization, no consistent experience across brands or devices No way to analyze customer behavior to deliver targeted offers Selected MongoDB over HBase for schema flexibility and rich query support MongoDB stores all customer profiles, served to web, mobile & call-center apps Distributed across multiple regions for DR and data locality All customer interactions stored in MongoDB, loaded into Hadoop for customer segmentation Unified processing pipeline with Spark running across MongoDB and Hadoop Single profile created for each customer, personalizing experience in real time Revenue optimization by calculating best ticket prices Reduce competitive pressures by identifying gaps in product offerings Customer Data Management Single view and real-time analytics with MongoDB, Spark, & Hadoop Leading Global Airline
  • 23. 23 Problem Why MongoDB Results Problem Solution Results Commercialize a national security platform Massive volumes of multi- structured data: news, RSS & social feeds, geospatial, geological, health & crime stats Requires complex analysis, delivered in real time, always on Apache NiFI for data ingestion, routing & metadata management Hadoop for text analytics HANA for geospatial analytics MongoDB correlates analytics with user profiles & location data to deliver real-time alerts to corporate security teams & individual travelers Enables Prescient to uniquely blend big data technology with its security IP developed in government Dynamic data model supports indexing 38k data sources, growing at 200 per day 24x7 continuous availability Scalability to PBs of data World’s Most Sophisticated Traveler Safety Platform Analyzing PBs of Data with MongoDB, Hadoop, Apache NiFi & SAP HANA
  • 24. 24 Problem Why MongoDB Results Problem Solution Results Requirement to analyze data over many different dimensions to detect real time threat profiles HBase unable to query data beyond primary key lookups Lucene search unable to scale with growth in data MongoDB + Hadoop to collect and analyze data from internet sensors in real time MongoDB dynamic schema enables sensor data to be enriched with geospatial tags Auto-sharding to scale as data volumes grow Run complex, real-time analytics on live data Improved query performance by over 3x Scale to support doubling of data volume every 24 months Deploy across global data centers for low latency user experience Engineering teams have more time to develop new features Powering Global Threat Intelligence Cloud-based real-time analytics with MongoDB & Hadoop
  • 26. Conclusion 1 Data lakes enable enterprises to affordably capture & analyze more data 2 Operational and analytical workloads are converging 3 MongoDB is the key technology to operationalize the data lake
  • 27. 27 MongoDB Compass MongoDB Connector for BI MongoDB Enterprise Server MongoDB Enterprise Advanced24x7Support (1hourSLA) CommercialLicense (NoAGPLCopyleftRestrictions) Platform Certifications MongoDB Ops Manager Monitoring & Alerting Query Optimization Backup & Recovery Automation & Configuration Schema Visualization Data Exploration Ad-Hoc Queries Visualization Analysis Reporting Authorization Auditing Encryption (In Flight & at Rest) Authentication REST APIEmergency Patches Customer Success Program On-Demand Online Training Warranty Limitation of Liability Indemnification
  • 28. 28 Resources to Learn More • Guide: Operational Data Lake • Whitepaper: Real-Time Analytics with Apache Spark & MongoDB
  • 29.
  • 30. 30 For More Information Resource Location Case Studies mongodb.com/customers Presentations mongodb.com/presentations Free Online Training education.mongodb.com Webinars and Events mongodb.com/events Documentation docs.mongodb.org MongoDB Downloads mongodb.com/download Additional Info info@mongodb.com
  • 31. 31 Problem Why MongoDB Results Problem Solution Results System failures in online banking systems creating customer sat issues No personalization experience across channels No enrichment of user data with social media chatter Apache Flume to ingest log data & social media streams, Apache Spark to process log events MongoDB to persist log data and KPIs, immediately rebuild user sessions when a service fails Integration with MongoDB query language and secondary indexes to selectively filter and query data in real time Improved user experience, with more customers using online, self-service channels Improved services following deeper understanding of how users interact with systems Greater user insight by adding social media insights One of World’s Largest Banks Creating new customer insights with MongoDB & Spark
  • 32. 32 Fare Calculation Engine One of World’s Largest Airlines Migrates from Oracle to MongoDB and Apache Spark to Support 100x performance improvement Problem Why MongoDB Results Problem Solution Results China Eastern targeting 130,000 seats sold every day across its web and mobile channels New fare calculation engine needed to support 20,000 search queries per second, but current Oracle platform supported only 200 per second Apache Spark used for fare calculations, using business rules stored in MongoDB Fare calculations written to MongoDB for access by the search application MongoDB Connector for Apache Spark allows seamless integration with data locality awareness across the cluster Cluster of less than 20 API, Spark & MongoDB nodes supports 180m fare calculations & 1.6 billion searches per day Each node delivers 15x higher performance and 10x lower latency than existing Oracle servers MongoDB Enterprise Advanced provided Ops Manager for operational automation and access to expert technical support
  • 33. 33 MongoDB Connector for Apache Spark • Native Scala connector, certified by Databricks • Exposes all Spark APIs & libraries • Efficient data filtering with predicate pushdown, secondary indexes, & in-database aggregations • Locality awareness to reduce data movement “We reduced 100+ lines of integration code to just a single line after moving to the MongoDB Spark connector.” - Early Access Tester, Multi-National Banking Group Group
  • 35. 35 Query and Data Model MongoDB Relational Column Family (i.e. HBase) Rich query language & secondary indexes Yes Yes Requires integration with separate Spark / Hadoop cluster In-Database aggregations & search Yes Yes Requires integration with separate Spark / Hadoop cluster Dynamic schema Yes No Partial Data validation Yes Yes App-side code • Why it matters – Query & Aggregations: Rich, real time analytics against operational data – Dynamic Schema: Manage multi-structured data – Data Validation: Enforce data governance between data lake & operational apps
  • 36. 36 Data Lake Integration MongoDB Relational Column Family (i.e. HBase) Hadoop + secondary indexes Yes Yes: Expensive No secondary indexes Spark + secondary indexes Yes Yes: Expensive No secondary indexes Native BI connectivity Yes Yes 3rd-party connectors Workload isolation Yes Yes: Expensive Load data to separate Spark/Hadoop cluster • Why it matters – Hadoop + Spark: Efficient data movement between data lake, processing layer & database – Native BI Connectivity: Visualizing operational data – Workload isolation: separation between operational and analytical workloads
  • 37. 37 Operationalizing for Scale & Security MongoDB Relational Column Family (i.e. HBase) Robust security controls Yes Yes Yes Scale-out on commodity hardware Yes No Yes Sophisticated management platform Yes Yes Monitoring only • Why it matters – Security: Data protection for regulatory compliance – Scale-Out: Grow with the data lake – Management: Reduce TCO with platform automation, monitoring, disaster recovery
  • 39. Adoption & Skills Availability
  • 40. 500+ employees About MongoDB, Inc. 2,000+ customers 13 offices worldwide $311M in funding

Notas del editor

  1. Seen rapid growth in adoption of the data lake – a centralized repository for many new data sources orgs now collecting But not without challenges – primary challenge is how to make analytics generated by the data lake available to our real time, operational apps So we are going to cover Rise of data lake Challenges presented in getting most biz value out of data lake Role that databases play, and requirements Case studies who are unlockig insight from the data lake
  2. As enterprises bring more products and services on line as part of digital transformation initiatives, one thing don’t lack today is data – from streams of sensor readings, to social sentiment, to machine logs, mobile apps, and more. Analysts estimate volumes growing at 40% per annum, with 80% of all data unstructured. Same time – we see more pressure on time to market, on exposing apps to global audiences, and in reducing cost of delivering new services Trends fundamentally changes how enterprises build and run modern apps
  3. What all of this new data available, we are creating an insight economy Uncovering new insights by collecting and analyzing this data carries the promise of competitive advantage and efficiency savings. Better understand customers by predicting what they might buy based on behavior, on demographics – could be optimizing supply chain to better or faster routes. Reducing risk of fraud by identifying suspicious behavior – its all about that data Those that don’t harness data are at major disadvantage understand the past, monitor the present, and predict the future MIT: data-driven decision environments have 5% higher productivity, 6% higher profit and up to 50% higher market value than other businesses.
  4. Traditional source of data from operational apps has been DW, take all this data in, then create analytics from it However, the traditional Enterprise Data Warehouse (EDW) is straining under the load, overwhelmed by the sheer volume and variety of data pouring into the business. Costs – hundreds to thousands of $ per TB v 10s to hundreds in commodity systems
  5. Becaise of these challenges many organizations have turned to Hadoop as a centralized repository for this new data, creating what many call a data lake. Not are replacement – adjunct – stores all new data – apply new analytics which combined with traditional reporting coming from the DW Gartner estimate around 50% of ents have or are in the process of rolling out data lakes
  6. When we think about data lakes, think about big data, and big data often associated with Hadoop – reality is more than just Hadoop Market growth forecast by wikibon – “big data revenues” growing from $19bn 2016 to $92bn in 2026. S/W outpacing h/w and PS. IDC forecasr Just under $50bn by 2019, 23% CAGR. Software growing fastest Leading charge, Hadoop and spark. Closely followed by databases – key part of big data landscape – because they operationalize the data lake – link between backend data lake and front end apps that consume analytics to make those apps smarter
  7. Hadoop – well established, celebrates 10th anniversry this year Grown from HDFS and MR into dozens of projects - Gartner identify 19 common projects supported by 4 leading distros. Avg distro has many more projects – processing frameworks, to search, to provisionng and mgmt, to security to file formats to integration Each project is developed independenytly – own roadmap, own dependencies – incredible complexity HDFS is the common storage layer – against which processing frameworks run to produce outputs you see on the slide
  8. While something like 50% of enterprises either have or are evaluating Hadoop to create new classes of app, not without its challenges Appears in a number of Gartner analysis, any by the press
  9. One of the fundamental challenges in integration is how to integrate data lake with your operational systems Operational apps run the business – how do you expose analytics created in the data lake to better serve customers with more relevant products and offers, to better drive efficiency savings from IoT-enabled smart factory Unify data lake analytics with the operational applications Enables you to create smart, contextually aware, data-driven apps Integrated database layer operationalizes the data lake
  10. Differences come in how data is stored, accessed and updated. Hadoop is a file system – it stores data in files in blocks – has no knowledge of that underlying data – its has no indexes. If you want to access a specific record, scan all the data that stored in the file where the record is located – could be tens of MBs HDFS characteristics WORM, ie update customer data, rewite all that customer data, not just individual customers Hadoop excels at generating analytics models by scanning and processing large datasets, is not designed to provide real-time, random access by operational applications. the time to read the whole dataset is more important than the latency in reading the first record. http://stackoverflow.com/questions/15675312/why-hdfs-is-write-once-and-read-multiple-times/37300268#37300268\
  11. But MongoDB more than just a filesystem. Full database, so gives you a whole bunch of things hdfs doesn’t give – Millisecond latency query responsiveness. Random access to indexed subsets of data. Expressive querying & flexible indexing: Supporting complex queries and aggregations against the data in real time, making online applications smarter and contextual. Updating fast-changing data in real time as users interact with online applications, without having to rewrite the entire data set. fine-grained access with complex filtering logic, Use distributed processing libs against it – mongo collection or doc looks like an input or output in hdfs. Rather than load a file, load a dataframe. Hive sees Mongodb as a table Longer jobs Batch analytics Append only files Great for scanning all data or large subsets in files
  12. Obvious question is why do we need a database when we have Hadoop. Comes down to how each platform persists and accesses data. HDFS is a file system – accesses data in batches of 128MB blocks. MongoDB is a database which provides fine grained access to data at the level of individual records – gives each system very different properties – talk through. Despite those differences, lots of similarities – in how we process data – MR, Spark. These are unopinionated on underlying persistence layer – could be HDFS, could be MongoDB. Means can unify analytics across data lake and in your database Both MongoDB and HDFS – common atrributes provide: Schema on Read, multiple replicas for fault tolerance horizontal scale, low TCO. But have different characteristics in how they store and access data – means suited to different parts of the data lake deployment
  13. When you bring the database and the data lake together, you can build powerful, data driven apps Take a real life example – data lake of a large retailers Online store front and ecomm engine is powered by MongoDB – handling customer profiles, sessions, baskets, product catalogs – presenting recommendations and offers As they browse the ite, all of their activity is being written back to Hadoop –blending it with other data sources – social feeds, demogragpahics, market data, credit scores, currency feeds, to segment and cluster customers These can then be exposed to MongoDB, so when customers come back, presented with personalized experience – based on what you have browsed before – what you are likely want to purchase next. Could not serve that operational app that is dealing individual customers from hdfs – not real time, no indexes to access just the customer details you need. No way of updating customer record –everything is rewritten and recomputed Regression and classification for customer clustering
  14. Lets go deeper and wider This is a design pattern for the data lake – multiple components that collectively handle ingest, storage, processing and analysis of data, then serving it to consuming operational apps Step thru
  15. Data ingestion: Data streams are ingested to a pub/sub message queue, which routes all raw data into HDFS. Often also have event processing running against the queue to find interesting events that need to be consumed by the operational apps immediately - displaying an offer to a user browsing a product page, or alarms generated against vehicle telemetry from an IoT apps, are routed to MongoDB for immediate consumption by operational applications.
  16. Raw data is loaded into the data lake where we can use Hadoop jobs – MR or Spark, generate analytics models from the raw data – see examples in the layer above HDFS
  17. MongoDB exposes these models to the operational processes, serving indexed queries and updates against them with real-time latency
  18. The distributed processing frameworks can re-compute analytics models, against data stored in either HDFS or MongoDB, continuously flowing updates from the operational database to analytics models Look at some examples of users who have deployed this type of design pattern little later
  19. Beyond low latency performance, specific requirements. Need much more than just a datastore, fully-featured database serving as a System of Record for online applications Tight integration between MongoDB and the data lake – minimize data movement between them, fullt exploit native capabilities of each part of the system Need to be able to serve operational workloads, run analytics against live operational data –ie top trending articles now so I know where to place my ads, how many widgets coming off my produiction line are failing QA, is that up or down with previous trends. Gartner calls it HTAP (Hybrid Transactional and Analytical Processing), Forrester = transalytics – to do that, need: Powerful query language, secondary indexes, aggregations & transformations all within the database – not ETL into a warehouse Workload isolation: operational & analytics – so don’t contend for the same resource Flexible schema to handle multi-structured data, but need to enforce governance to that data Secure access to the data: – the operational DB typically accessed by a much broader audience than Hadoop, so security controls critical – robust access controls – LDAP, kerberos, RBAC Auditing of all events for reg compliance. Encr of data in motion and at rest, all built into the database Need to scale as the data lake scales – means scaling out on commodity hardware, often across geo regions To simplify the envrionment, need sophisticated mgmt tools: to automate database deployment, scaling, monitoring and alerting, and disaster recovery. Tight integration: not enough just to move data between analytics and operational layers – need to move it efficiently. Connectors should allow selective filtering by using secondary indexes to extract and process only the range of data it needs – for example, retrieving all customers located in a specific geography. This is very different from other databases that do not support secondary indexes. In these cases, Spark and Hadoop jobs are limited to extracting all data based on a simple primary key, even if only a subset of that data is required for the query. This means more processing overhead, more hardware, and longer time-to-insight for the user. Workload isolation: provision database clusters with dedicated analytic nodes, allowing users to simultaneously run real-time analytics and reporting queries against live data, without impacting nodes servicing the operational application. Flexible data model to store data of any structure, and easily evolve the model to capture new attribs – ie enriched user profiles with geospatial data. Also need to ensure data quality by enforcing validation rules against the data – to ensure it is appropriated typed, contains all attribs needed by the app Expressive queries developers to build applications that can query and analyze the data in multiple ways – by single keys, ranges, text search, and geospatial queries through to complex aggregations and MapReduce jobs, returning responses in milliseconds. Complex queries are executed natively in the database without having to use additional analytics frameworks or tools, and avoiding the latency that comes from moving data between operational and analytical engines. Secondary indexes give oppt to filter data in any way you need – key for low latency operational queries Robust security controls: govern access, provide audit trails and enc data in flight and at rest Scale-out – match scale out of data lake, as it grows, add new nodes to service higher data volumes or user load Advanced management platform. To reduce data lake TCO and risk of application downtime, powerful tooling to automate database deployment, scaling, monitoring and alerting, and disaster recovery.
  20. Look at examples in action
  21. CTM – UK’s leading price comparisons sites – moved from an on-prem RDBMS based monlithic app to microservices architecture powered by MongoDB with Hadoop at the back end providing analytics – enabled them better personalize customer experience and deepen relationships Read through bullets
  22. 2nd example leading global airline. Thru M&A – multiple brands to service different countries and market sectors, but customer data spread across 100 different systems. By using Hadoop and Spark, brought that data together to create a single view, and that is loaded into MongoDB which powers the online apps – web and mobile, as well as call center – so users get a consistent experience however they interact. All user data and ticket data is stored in MongoDB, then written back into Hadoop to run advanced analytics that allow ticket price optimization, identify offers, and gaps in product portfolio Read bullets
  23. Provide traveler safety platform for corp customers – if natural disaster or security incident while traveler away on biz, able to send real time alerts and advise on how to get to safety Platform built for national govts, now launched for commercial usage - Analyzing PBs of Data with MongoDB, Hadoop, Apache NiFi & SAP HANA Read bullets
  24. McAfee – built its cloud based threat intelligence platform on MongoDB. Platform monitor threat activity for clients in RT – identifies attacks are taking place, identifies when users maybe interacting with insecure or suspicious sites All RT activity is captured in MongoDB – provide alerting to security teams, sent to Hadoop for further backend analytics, with updated threat profiles written back to mongo
  25. MongoDB is open source – also provide EA Collection of software and support to run in production at scale
  26. The Stratio Apache Spark-certified Big Data (BD) platform is used by an impressive client list including BBVA, Just Eat, Santander, SAP, Sony, and Telefonica. The company has implemented a unified real-time monitoring platform for a multinational banking group operating in 31 countries with 51 million clients all over the world. The bank wanted to ensure a high quality of service and personalized experience across its online channels, and needed to continuously monitor client activity to check service response times and identify potential issues. The application was built on a modern technology foundation including: Apache Flume to aggregate log data Apache Spark to process log events in real time MongoDB to persist log data, processed events and Key Performance Indicators (KPIs). The aggregated KPIs, stored by MongoDB enable the bank to analyze client and systems behavior in real time in order to improve the customer experience. Collecting raw log data allows the bank to immediately rebuild user sessions if a service fails, with analysis generated by MongoDB and Spark providing complete traceability to quickly identify the root cause of any issue. The project required a database that provided always-on availability, high performance, and linear scalability. In addition, a fully dynamic schema was needed to support high volumes of rapidly changing semi-structured and unstructured JSON data being ingested from a variety of logs, clickstreams, and social networks. After evaluating the project’s requirements, Stratio concluded MongoDB was the best fit. With MongoDB’s query projections and secondary indexes, analytic processes run by the Stratio BD platform avoid the need to scan the entire data set, which is not the case with other databases.
  27. China Eastern Industry: Travel and Hospitality, Airline Use Case: Search
  28. While its impt to provide low latency access to data, not enough to just support simple K-V lookups – demand is to get insights from data faster – so this is the role of RT analytics - track in RT where vehicles in your fleet, what social sentiment to an announcement you’ve just made, Correlate patterns of real time fraud attempts against specific domains – so this is where expressive query lang, secondary indexes, aggs in database are valuable. MongoDB and RDBMS both have strong features – RDBMS further ahead – column family – little more than k-v. Need to move data out to other query frameworks or analytics nodes to get any intelligence – adds latency, adds complexity – more moving parts RDBMS good in many areas, but lacks data model flexibility needed to handle rapidly changing, multi-structured data is where it falls downs. CF – more schema flexibility than relational, but still need to pre-define columns, restrict speed to evolve apps Data validtion – apply rules to data structures operational database stores – apps creates single view of your customer – data maybe spread across many repositories – loaded into data lake, creates single view, loads in mongo to serve operational apps – needs to ensure docs contains mandatory fields: unique customer identifiers, typed and formed in a specific way, ie ID is always an integer, email address always contains @. Doc validation in mongo enables you to do this. RDBMS full schema validation, so a little ahead – have to enforce govn in code in a CF database Look at aggregrated scores – relationla abnd mongo evenly matched, with CF, much simpler datastore, long way behind
  29. Hadoop and Spark integration: need to do more than just move vast amounts of data between each layer of the stack – need intelligent connectors that can push down predicates, filter data with secondary indexes – ie access all customers in a specific geo, without being able to access the DBs secondary indexe, and pre-aggregate data, moving a ton of data backward and forward – more processing cycles, longer latency. MngoDB connector for Hadoop, and for Spark, both support these capabilities. CF doesn’t offer secondary indexes or aggs, so nothing to filter the data RDBMS offers these capabilities in its connectors, but generally only available as expensive add-ons, hence downgraded Workload isolation – ability to perform real time analytics on live operational data, without interfering with operational apps – don’t want some type of aggregation looking at how many deliveries your fleet of trucks has made with how quickly you can detect from sensor data than a vehicle has developed a fault – key to do this is distribute queries to dedicated nodes in the database cluster – some provisioned for operational data, then replicating to nodes dedicated to analytics. MongoDB – up to 50 members in a single replica set – configure analytics as hidden so never hit by op queries. CF, restricted to just 3 data replicas – there for HA, not for separation of different workloads. RDBMS, expensive add-on Native BI connectivity – may not be relevant in all cases, but many orgs want to be able to create live dashboards reporting current state from op systems. MongoDB had a native BI connector that exposes database as an ODBC data source – visualize in anything from tableau to biz objects to excel. Rich tooling in relational world. CF, connector exist, 3rd party, don’t push down queries to the database, instead extract all data – so more computationally and network intensive to power dashboards
  30. Security: data from operational databases exposed to apps and potentially millions of users – need to provide robust access controls, may include integration with LDAP, kerberos, PKI environments and RBAC to titghly seggregate who can do what in the DB. Enc data in flight and at rest, need to maintain a log of activity in the DB for forensic analysis All solutions do well – big investment in Hadoop ecosystem, rapidly gainining ground on RDBMS, but doing it at much lower cost Scale out – need to be able to scale as data lake scales, and more digital services opened up to users – non-relational databaes, core strenght. Fundamental challenge in RDBMS requires scale up, limited headroom, very expenive in proprietary h/w Mgmt – Hadoop is complex, mgmt tools still primitive. For op database, need a platform that provides powerful tooling to automate database deployment, scaling, fine grained monitoring and alerting, and disaster recovery with point in time backups and automated restores. Rich tooling in relational world – big investment from Mongo to close that gap
  31. Left hand side – maintained attribs of relational – blended with innovation from NoSQL Uniquely differentiates mongodb from its peers in the non-relational DB market
  32. Invest in tech that has production proven deployments, broad skills availability With availability of Hadoop skills cited by Gartner analysts as a top challenge, it is essential you choose an operational database with a large available talent pool. This enables you to find staff who can rapidly build differentiated big data applications. Across multiple measures, including DB Engines Rankings, 451 Group NoSQL Skills Index and the Gartner Magic Quadrant for Operational Databases, MongoDB is the leading non-relational database.