SlideShare una empresa de Scribd logo
1 de 44
1©MapR Technologies - Confidential
Expect More from Hadoop
2©MapR Technologies - Confidential
Introducing MapR
MapR offers the
technology leading
distribution for Hadoop
3©MapR Technologies - Confidential
The Industry-Leaders Choose MapR in
the Cloud
Google chose MapR to
provide Hadoop on Google
Compute Engine
Amazon EMR is the largest
Hadoop provider in revenue
and # of clusters
4©MapR Technologies - Confidential
MapR Supports Broad Set of Use Cases
 Log analysis
 HBase
 Customer targeting
 Social media analysis
 Customer Revenue
Analytics
 ETL Offload
 Advertising exchange
analysis and optimization
 Clickstream Analysis
 Quality profiling/field
failure analysis
 Customer
Sentiment
 Network Analytics
 Monitors and measures
behavior of online shoppers
 Fraud Detection
 Channel analytics
 Customer Behavior Analysis
 Brand Monitoring
 Customer targeting
 Viewer Behavioral analytics
 Recommendation Engine
 Family tree connections
 Intrusion detection & prevention
 Forensic analysis
 Global threat
analytics
 Virus analysis
 Patient care
monitoring
Leading Retailer
 Recommendation Engine
 Fraud detection and Prevention
Leading Bank
5©MapR Technologies - Confidential
Introducing Hadoop
Hadoop is deployed because
a) big data
b) fast data
c) rapidly changing data
6©MapR Technologies - Confidential
Introducing Hadoop
Hadoop is deployed because
a) big data
b) fast data
c) rapidly changing data
7©MapR Technologies - Confidential
Introducing Change
Changing data implies
a need for integration
8©MapR Technologies - Confidential
Introducing Change
Changing data implies
a need for integration
If you copy, the data will
change before you finish.
9©MapR Technologies - Confidential
Controlling Change
Changing data implies
a need for stabilization
10©MapR Technologies - Confidential
Controlling Change
Changing data implies
a need for stabilization
Long running analyses must
have stable data
11©MapR Technologies - Confidential
The Story Can Now be Told
Here are three true
stories about how
Hadoop integration
pays off
12©MapR Technologies - Confidential
Story #1
ETL Off-load
13©MapR Technologies - Confidential
The Problem
 Major telecom vendor
 Key step in billing pipeline handled by data warehouse (EDW)
 EDW at maximum capacity
 Multiple rounds of software optimization already done
 Revenue limiting (= career limiting) bottleneck
14©MapR Technologies - Confidential
ETL
CDR billing
records
Billing
reports
Data Warehouse
Customer
bills
Original Flow
15©MapR Technologies - Confidential
ETL
CDR billing
records
Billing
reports
Data Warehouse
Customer
bills
Original Flow
70% of total load
<10% of total code
Import by bulk
load from NFS
16©MapR Technologies - Confidential
ETL
CDR billing
records
Billing
reports
Data Warehouse
Customer
billing
With ETL Offload
Import written
to MapR via NFS
Bulk load via NFS
from MapR
17©MapR Technologies - Confidential
Simplified Analysis – EDW Strategy
 70% of EDW consumed by ETL processing
 EDW direct hardware cost is approximately $30 million CAPEX, 12
million OPEX
 Additional EDW only increases capacity by 50% due to poor
division of labor
18©MapR Technologies - Confidential
Simplified Analysis – MapR Strategy
 Hardware + MapR cost ~ $1.5 million
 ETL replacement development costs ~ $1.5 million
 Result is 3x performance increase
19©MapR Technologies - Confidential
Price Performance
 EDW strategy
– 1.5 x performance
– $30 million
 MapR Strategy
– 3 x performance
– $3 million
 20x cost/performance advantage for MapR strategy
20©MapR Technologies - Confidential
Story #2
Search Abuse
21©MapR Technologies - Confidential
The Problem
 Build a high performance recommendation
– Use all kinds of available data
 Deploy it to production
– Must have efficient deployment
22©MapR Technologies - Confidential
Input Data
 User transactions
– user id, merchant id
– SIC code, amount
 Offer transactions
– user id, offer id
– vendor id, merchant id’s,
– offers, views, accepts
23©MapR Technologies - Confidential
Input Data
 User transactions
– user id, merchant id
– SIC code, amount
 Offer transactions
– user id, offer id
– vendor id, merchant id’s,
– offers, views, accepts
Import data via standard interfaces
from log files, databases, direct
feeds
Find anomalous indicators of
behavior
24©MapR Technologies - Confidential
Search-based Recommendations
 Sample document
– Merchant Id
– Field for text description
– Phone
– Address
– Location
25©MapR Technologies - Confidential
Search-based Recommendations
 Sample “document”
– Merchant Id
– Field for text description
– Phone
– Address
– Location
– Indicator merchant id’s
– Indicator industry (SIC) id’s
– Indicator offers
– Indicator text
– Local top40
26©MapR Technologies - Confidential
Search-based Recommendations
 Sample “document”
– Merchant Id
– Field for text description
– Phone
– Address
– Location
– Indicator merchant id’s
– Indicator industry (SIC) id’s
– Indicator offers
– Indicator text
– Local top40
 User History (query)
– Current location
– Recent merchant descriptions
– Recent merchant id’s
– Recent SIC codes
– Recent accepted offers
– Local top40
27©MapR Technologies - Confidential
SolR
Indexer
SolR
Indexer
Solr
indexing
Cooccurrence
(Mahout)
Item meta-
data
Index
shards
Transactions
Web Views
Email
offers
28©MapR Technologies - Confidential
SolR
Indexer
SolR
Indexer
Solr
indexing
Cooccurrence
(Mahout)
Item meta-
data
Index
shards
Transactions
Web Views
Email
offers
Legacy code runs
directly in map-
reduce framework
29©MapR Technologies - Confidential
SolR
Indexer
SolR
Indexer
Solr
search
Web tier
Item meta-
data
Index
shards
User
history
30©MapR Technologies - Confidential
SolR
Indexer
SolR
Indexer
Solr
search
Web tier
Item meta-
data
Index
shards
User
history
SolrCloud runs
without change
via NFS
31©MapR Technologies - Confidential
Objective Results
 At a very large credit card company
 History is all transactions, all web interaction
 Processing time cut from 20 hours per day to 3
 Recommendation engine load time decreased from 8 hours to 3
minutes
32©MapR Technologies - Confidential
Story #3
Stable Learning
33©MapR Technologies - Confidential
The Theme and Setting
 A humble machine learning expert once lived in a small cubicle
 One day the CEO walked in and said
– Your machine recommended PINK WAFFLES to my wife!!!
– Tell me why it is suddenly doing this
34©MapR Technologies - Confidential
The Theme and Setting
 A humble machine learning expert once lived in a small cubicle
 One day the CEO walked in and said
– Your machine recommended PINK WAFFLES to my wife!!!
– Tell me why it is suddenly doing this
 The machine learning expert could say nothing because he could
not reproduce the conditions that model was trained with
 The CEO was not pleased
35©MapR Technologies - Confidential
Why?
36©MapR Technologies - Confidential
StormKafka
Twitter
Data Logger
Kafka
Cluster
Kafka
Cluster
Kafka
Cluster
Kafka
API
Web Service NAS
Web
Data
Hadoop
Flume
HDFS
Data
Web-
site
37©MapR Technologies - Confidential
StormKafka
Twitter
Data Logger
Kafka
Cluster
Kafka
Cluster
Kafka
Cluster
Kafka
API
Web Service NAS
Web
Data
Hadoop
Flume
HDFS
Data
Data arrives
continuously
Web-
site
Learning steps
can’t be tied to
delayed data
It can be delayed
arbitrarily
38©MapR Technologies - Confidential
The Essence of the Problem
 Coupling data arrival with modeling makes the data chain brittle
– Minor delays in data delivery will break modeling SLA’s
 But if data can arrive late and restate the past then we can’t easily
replicate a model build
 Existing data chains don’t support full bitemporal queries
39©MapR Technologies - Confidential
Twitter
MapR
Data Logger
Web-
site
Snap
Data
Modeling
Model
Model
Model
Model Mirror
Live System
40©MapR Technologies - Confidential
The New Story
 A humble machine learning expert once lived in a small cubicle
 One day the CEO walked in and said
– Your machine recommended PINK WAFFLES to my wife!!!
– Tell me why it is suddenly doing this
41©MapR Technologies - Confidential
The New Story
 A humble machine learning expert once lived in a small cubicle
 One day the CEO walked in and said
– Your machine recommended PINK WAFFLES to my wife!!!
– Tell me why it is suddenly doing this
 The machine learning expert could
– Pull out all previously deployed models
– Could exactly replicate any training run with any version of software
– Could point out that PINK WAFFLES were actually quite stylish
 The CEO was very pleased … he ran off to buy pink waffles
42©MapR Technologies - Confidential
Expect more from
Hadoop
43©MapR Technologies - Confidential
Expect MapR
44©MapR Technologies - Confidential
Contact me!
 tdunning@maprtech.com or tdunning@apache.org
 @ted_dunning
 Come to the MapR booth

Más contenido relacionado

La actualidad más candente

Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBaseCarol McDonald
 
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkFree Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkMapR Technologies
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Carol McDonald
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesTed Dunning
 
Breaking Down Analytical and Computational Barriers Across the Energy Industr...
Breaking Down Analytical and Computational Barriers Across the Energy Industr...Breaking Down Analytical and Computational Barriers Across the Energy Industr...
Breaking Down Analytical and Computational Barriers Across the Energy Industr...Spark Summit
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache MahoutTed Dunning
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Mathieu Dumoulin
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
 
Storm users group real time hadoop
Storm users group real time hadoopStorm users group real time hadoop
Storm users group real time hadoopTed Dunning
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsArun Kejariwal
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?Ted Dunning
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesTed Dunning
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Stormboorad
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016Mathieu Dumoulin
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data SecurelyTed Dunning
 
Python time series analysis and visualization for self-driving cars
Python time series analysis and visualization for self-driving carsPython time series analysis and visualization for self-driving cars
Python time series analysis and visualization for self-driving carsAndreas Pawlik
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 

La actualidad más candente (20)

Build a Time Series Application with Apache Spark and Apache HBase
Build a Time Series Application with Apache Spark and Apache  HBaseBuild a Time Series Application with Apache Spark and Apache  HBase
Build a Time Series Application with Apache Spark and Apache HBase
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Free Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache SparkFree Code Friday - Machine Learning with Apache Spark
Free Code Friday - Machine Learning with Apache Spark
 
Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures Streaming patterns revolutionary architectures
Streaming patterns revolutionary architectures
 
Building multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search enginesBuilding multi-modal recommendation engines using search engines
Building multi-modal recommendation engines using search engines
 
Breaking Down Analytical and Computational Barriers Across the Energy Industr...
Breaking Down Analytical and Computational Barriers Across the Energy Industr...Breaking Down Analytical and Computational Barriers Across the Energy Industr...
Breaking Down Analytical and Computational Barriers Across the Energy Industr...
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache Mahout
 
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
Streaming Architecture to Connect Everything (Including Hybrid Cloud) - Strat...
 
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...
 
Storm users group real time hadoop
Storm users group real time hadoopStorm users group real time hadoop
Storm users group real time hadoop
 
Designing Modern Streaming Data Applications
Designing Modern Streaming Data ApplicationsDesigning Modern Streaming Data Applications
Designing Modern Streaming Data Applications
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?
 
MapR & Skytree:
MapR & Skytree: MapR & Skytree:
MapR & Skytree:
 
Spark graphx
Spark graphxSpark graphx
Spark graphx
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Storm
 
CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016CEP - simplified streaming architecture - Strata Singapore 2016
CEP - simplified streaming architecture - Strata Singapore 2016
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
Python time series analysis and visualization for self-driving cars
Python time series analysis and visualization for self-driving carsPython time series analysis and visualization for self-driving cars
Python time series analysis and visualization for self-driving cars
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 

Destacado

Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and howPetr Zapletal
 
Manage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositoryManage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositorySynaltic Group
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Databricks
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereGwen (Chen) Shapira
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015Holden Karau
 
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMRAmazon Web Services
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Sean Roberts
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...DataWorks Summit/Hadoop Summit
 

Destacado (10)

Distributed real time stream processing- why and how
Distributed real time stream processing- why and howDistributed real time stream processing- why and how
Distributed real time stream processing- why and how
 
Manage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositoryManage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repository
 
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
Strata NYC 2015: Sketching Big Data with Spark: randomized algorithms for lar...
 
Kafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be thereKafka Reliability - When it absolutely, positively has to be there
Kafka Reliability - When it absolutely, positively has to be there
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Effective testing for spark programs Strata NY 2015
Effective testing for spark programs   Strata NY 2015Effective testing for spark programs   Strata NY 2015
Effective testing for spark programs Strata NY 2015
 
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
(BDT309) Data Science & Best Practices for Apache Spark on Amazon EMR
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 
Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015Apache Atlas. Data Governance for Hadoop. Strata London 2015
Apache Atlas. Data Governance for Hadoop. Strata London 2015
 
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
Top Three Big Data Governance Issues and How Apache ATLAS resolves it for the...
 

Similar a Big Data Paris

Powering the "As it Happens" Business
Powering the "As it Happens" BusinessPowering the "As it Happens" Business
Powering the "As it Happens" BusinessMapR Technologies
 
Achieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataAchieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataInside Analysis
 
Key Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShareKey Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShareMapR Technologies
 
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise WeAreEsynergy
 
The power of hadoop in business
The power of hadoop in businessThe power of hadoop in business
The power of hadoop in businessMapR Technologies
 
Predictive Analytics San Diego
Predictive Analytics San DiegoPredictive Analytics San Diego
Predictive Analytics San DiegoMapR Technologies
 
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLTBig Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLTKiththi Perera
 
Big data solutions on cloud – the way forward
Big data solutions on cloud – the way forwardBig data solutions on cloud – the way forward
Big data solutions on cloud – the way forwardKiththi Perera
 
How Experian increased insights with Hadoop
How Experian increased insights with HadoopHow Experian increased insights with Hadoop
How Experian increased insights with HadoopPrecisely
 
Network and IT Ops Series: Build Production Solutions
Network and IT Ops Series: Build Production Solutions Network and IT Ops Series: Build Production Solutions
Network and IT Ops Series: Build Production Solutions Neo4j
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersDataWorks Summit
 
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdfth1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdfTarekHassan840678
 
Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersDataWorks Summit
 
Sap Leonardo - what is it, and why would I want one?
Sap Leonardo - what is it, and why would I want one?Sap Leonardo - what is it, and why would I want one?
Sap Leonardo - what is it, and why would I want one?Tom Raftery
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesSingleStore
 
Multi-Cloud Breaks IT Ops: Best Practices to De-Risk Your Cloud Strategy
Multi-Cloud Breaks IT Ops: Best Practices to De-Risk Your Cloud StrategyMulti-Cloud Breaks IT Ops: Best Practices to De-Risk Your Cloud Strategy
Multi-Cloud Breaks IT Ops: Best Practices to De-Risk Your Cloud StrategyThousandEyes
 
Serhii Kholodniuk: What you need to know, before migrating data platform to G...
Serhii Kholodniuk: What you need to know, before migrating data platform to G...Serhii Kholodniuk: What you need to know, before migrating data platform to G...
Serhii Kholodniuk: What you need to know, before migrating data platform to G...Lviv Startup Club
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendationsTed Dunning
 

Similar a Big Data Paris (20)

Powering the "As it Happens" Business
Powering the "As it Happens" BusinessPowering the "As it Happens" Business
Powering the "As it Happens" Business
 
Achieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate DataAchieving Business Value by Fusing Hadoop and Corporate Data
Achieving Business Value by Fusing Hadoop and Corporate Data
 
Key Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShareKey Considerations for Putting Hadoop in Production SlideShare
Key Considerations for Putting Hadoop in Production SlideShare
 
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise Steve Jenkins - Business Opportunities for Big Data in the Enterprise
Steve Jenkins - Business Opportunities for Big Data in the Enterprise
 
The power of hadoop in business
The power of hadoop in businessThe power of hadoop in business
The power of hadoop in business
 
Predictive Analytics San Diego
Predictive Analytics San DiegoPredictive Analytics San Diego
Predictive Analytics San Diego
 
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLTBig Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLT
 
Big data solutions on cloud – the way forward
Big data solutions on cloud – the way forwardBig data solutions on cloud – the way forward
Big data solutions on cloud – the way forward
 
How Experian increased insights with Hadoop
How Experian increased insights with HadoopHow Experian increased insights with Hadoop
How Experian increased insights with Hadoop
 
Network and IT Ops Series: Build Production Solutions
Network and IT Ops Series: Build Production Solutions Network and IT Ops Series: Build Production Solutions
Network and IT Ops Series: Build Production Solutions
 
Monitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service ProvidersMonitizing Big Data at Telecom Service Providers
Monitizing Big Data at Telecom Service Providers
 
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdfth1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
th1330-1410effectenbeurszaal4-3v2-140424180955-phpapp01 (1).pdf
 
Monetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service ProvidersMonetizing Big Data at Telecom Service Providers
Monetizing Big Data at Telecom Service Providers
 
Sap Leonardo - what is it, and why would I want one?
Sap Leonardo - what is it, and why would I want one?Sap Leonardo - what is it, and why would I want one?
Sap Leonardo - what is it, and why would I want one?
 
Polyvalent Recommendations
Polyvalent RecommendationsPolyvalent Recommendations
Polyvalent Recommendations
 
Getting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming ArchitecturesGetting It Right Exactly Once: Principles for Streaming Architectures
Getting It Right Exactly Once: Principles for Streaming Architectures
 
Multi-Cloud Breaks IT Ops: Best Practices to De-Risk Your Cloud Strategy
Multi-Cloud Breaks IT Ops: Best Practices to De-Risk Your Cloud StrategyMulti-Cloud Breaks IT Ops: Best Practices to De-Risk Your Cloud Strategy
Multi-Cloud Breaks IT Ops: Best Practices to De-Risk Your Cloud Strategy
 
Serhii Kholodniuk: What you need to know, before migrating data platform to G...
Serhii Kholodniuk: What you need to know, before migrating data platform to G...Serhii Kholodniuk: What you need to know, before migrating data platform to G...
Serhii Kholodniuk: What you need to know, before migrating data platform to G...
 
Expect More from Hadoop
Expect More from Hadoop Expect More from Hadoop
Expect More from Hadoop
 
Polyvalent recommendations
Polyvalent recommendationsPolyvalent recommendations
Polyvalent recommendations
 

Más de Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Ted Dunning
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossibleTed Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningTed Dunning
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation TechnTed Dunning
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Ted Dunning
 

Más de Ted Dunning (20)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
 

Último

QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

Big Data Paris

  • 1. 1©MapR Technologies - Confidential Expect More from Hadoop
  • 2. 2©MapR Technologies - Confidential Introducing MapR MapR offers the technology leading distribution for Hadoop
  • 3. 3©MapR Technologies - Confidential The Industry-Leaders Choose MapR in the Cloud Google chose MapR to provide Hadoop on Google Compute Engine Amazon EMR is the largest Hadoop provider in revenue and # of clusters
  • 4. 4©MapR Technologies - Confidential MapR Supports Broad Set of Use Cases  Log analysis  HBase  Customer targeting  Social media analysis  Customer Revenue Analytics  ETL Offload  Advertising exchange analysis and optimization  Clickstream Analysis  Quality profiling/field failure analysis  Customer Sentiment  Network Analytics  Monitors and measures behavior of online shoppers  Fraud Detection  Channel analytics  Customer Behavior Analysis  Brand Monitoring  Customer targeting  Viewer Behavioral analytics  Recommendation Engine  Family tree connections  Intrusion detection & prevention  Forensic analysis  Global threat analytics  Virus analysis  Patient care monitoring Leading Retailer  Recommendation Engine  Fraud detection and Prevention Leading Bank
  • 5. 5©MapR Technologies - Confidential Introducing Hadoop Hadoop is deployed because a) big data b) fast data c) rapidly changing data
  • 6. 6©MapR Technologies - Confidential Introducing Hadoop Hadoop is deployed because a) big data b) fast data c) rapidly changing data
  • 7. 7©MapR Technologies - Confidential Introducing Change Changing data implies a need for integration
  • 8. 8©MapR Technologies - Confidential Introducing Change Changing data implies a need for integration If you copy, the data will change before you finish.
  • 9. 9©MapR Technologies - Confidential Controlling Change Changing data implies a need for stabilization
  • 10. 10©MapR Technologies - Confidential Controlling Change Changing data implies a need for stabilization Long running analyses must have stable data
  • 11. 11©MapR Technologies - Confidential The Story Can Now be Told Here are three true stories about how Hadoop integration pays off
  • 12. 12©MapR Technologies - Confidential Story #1 ETL Off-load
  • 13. 13©MapR Technologies - Confidential The Problem  Major telecom vendor  Key step in billing pipeline handled by data warehouse (EDW)  EDW at maximum capacity  Multiple rounds of software optimization already done  Revenue limiting (= career limiting) bottleneck
  • 14. 14©MapR Technologies - Confidential ETL CDR billing records Billing reports Data Warehouse Customer bills Original Flow
  • 15. 15©MapR Technologies - Confidential ETL CDR billing records Billing reports Data Warehouse Customer bills Original Flow 70% of total load <10% of total code Import by bulk load from NFS
  • 16. 16©MapR Technologies - Confidential ETL CDR billing records Billing reports Data Warehouse Customer billing With ETL Offload Import written to MapR via NFS Bulk load via NFS from MapR
  • 17. 17©MapR Technologies - Confidential Simplified Analysis – EDW Strategy  70% of EDW consumed by ETL processing  EDW direct hardware cost is approximately $30 million CAPEX, 12 million OPEX  Additional EDW only increases capacity by 50% due to poor division of labor
  • 18. 18©MapR Technologies - Confidential Simplified Analysis – MapR Strategy  Hardware + MapR cost ~ $1.5 million  ETL replacement development costs ~ $1.5 million  Result is 3x performance increase
  • 19. 19©MapR Technologies - Confidential Price Performance  EDW strategy – 1.5 x performance – $30 million  MapR Strategy – 3 x performance – $3 million  20x cost/performance advantage for MapR strategy
  • 20. 20©MapR Technologies - Confidential Story #2 Search Abuse
  • 21. 21©MapR Technologies - Confidential The Problem  Build a high performance recommendation – Use all kinds of available data  Deploy it to production – Must have efficient deployment
  • 22. 22©MapR Technologies - Confidential Input Data  User transactions – user id, merchant id – SIC code, amount  Offer transactions – user id, offer id – vendor id, merchant id’s, – offers, views, accepts
  • 23. 23©MapR Technologies - Confidential Input Data  User transactions – user id, merchant id – SIC code, amount  Offer transactions – user id, offer id – vendor id, merchant id’s, – offers, views, accepts Import data via standard interfaces from log files, databases, direct feeds Find anomalous indicators of behavior
  • 24. 24©MapR Technologies - Confidential Search-based Recommendations  Sample document – Merchant Id – Field for text description – Phone – Address – Location
  • 25. 25©MapR Technologies - Confidential Search-based Recommendations  Sample “document” – Merchant Id – Field for text description – Phone – Address – Location – Indicator merchant id’s – Indicator industry (SIC) id’s – Indicator offers – Indicator text – Local top40
  • 26. 26©MapR Technologies - Confidential Search-based Recommendations  Sample “document” – Merchant Id – Field for text description – Phone – Address – Location – Indicator merchant id’s – Indicator industry (SIC) id’s – Indicator offers – Indicator text – Local top40  User History (query) – Current location – Recent merchant descriptions – Recent merchant id’s – Recent SIC codes – Recent accepted offers – Local top40
  • 27. 27©MapR Technologies - Confidential SolR Indexer SolR Indexer Solr indexing Cooccurrence (Mahout) Item meta- data Index shards Transactions Web Views Email offers
  • 28. 28©MapR Technologies - Confidential SolR Indexer SolR Indexer Solr indexing Cooccurrence (Mahout) Item meta- data Index shards Transactions Web Views Email offers Legacy code runs directly in map- reduce framework
  • 29. 29©MapR Technologies - Confidential SolR Indexer SolR Indexer Solr search Web tier Item meta- data Index shards User history
  • 30. 30©MapR Technologies - Confidential SolR Indexer SolR Indexer Solr search Web tier Item meta- data Index shards User history SolrCloud runs without change via NFS
  • 31. 31©MapR Technologies - Confidential Objective Results  At a very large credit card company  History is all transactions, all web interaction  Processing time cut from 20 hours per day to 3  Recommendation engine load time decreased from 8 hours to 3 minutes
  • 32. 32©MapR Technologies - Confidential Story #3 Stable Learning
  • 33. 33©MapR Technologies - Confidential The Theme and Setting  A humble machine learning expert once lived in a small cubicle  One day the CEO walked in and said – Your machine recommended PINK WAFFLES to my wife!!! – Tell me why it is suddenly doing this
  • 34. 34©MapR Technologies - Confidential The Theme and Setting  A humble machine learning expert once lived in a small cubicle  One day the CEO walked in and said – Your machine recommended PINK WAFFLES to my wife!!! – Tell me why it is suddenly doing this  The machine learning expert could say nothing because he could not reproduce the conditions that model was trained with  The CEO was not pleased
  • 35. 35©MapR Technologies - Confidential Why?
  • 36. 36©MapR Technologies - Confidential StormKafka Twitter Data Logger Kafka Cluster Kafka Cluster Kafka Cluster Kafka API Web Service NAS Web Data Hadoop Flume HDFS Data Web- site
  • 37. 37©MapR Technologies - Confidential StormKafka Twitter Data Logger Kafka Cluster Kafka Cluster Kafka Cluster Kafka API Web Service NAS Web Data Hadoop Flume HDFS Data Data arrives continuously Web- site Learning steps can’t be tied to delayed data It can be delayed arbitrarily
  • 38. 38©MapR Technologies - Confidential The Essence of the Problem  Coupling data arrival with modeling makes the data chain brittle – Minor delays in data delivery will break modeling SLA’s  But if data can arrive late and restate the past then we can’t easily replicate a model build  Existing data chains don’t support full bitemporal queries
  • 39. 39©MapR Technologies - Confidential Twitter MapR Data Logger Web- site Snap Data Modeling Model Model Model Model Mirror Live System
  • 40. 40©MapR Technologies - Confidential The New Story  A humble machine learning expert once lived in a small cubicle  One day the CEO walked in and said – Your machine recommended PINK WAFFLES to my wife!!! – Tell me why it is suddenly doing this
  • 41. 41©MapR Technologies - Confidential The New Story  A humble machine learning expert once lived in a small cubicle  One day the CEO walked in and said – Your machine recommended PINK WAFFLES to my wife!!! – Tell me why it is suddenly doing this  The machine learning expert could – Pull out all previously deployed models – Could exactly replicate any training run with any version of software – Could point out that PINK WAFFLES were actually quite stylish  The CEO was very pleased … he ran off to buy pink waffles
  • 42. 42©MapR Technologies - Confidential Expect more from Hadoop
  • 43. 43©MapR Technologies - Confidential Expect MapR
  • 44. 44©MapR Technologies - Confidential Contact me!  tdunning@maprtech.com or tdunning@apache.org  @ted_dunning  Come to the MapR booth

Notas del editor

  1. MapR has been selected by two of the companies most experienced with MapReduce technology which is a testament to the technology advanges of MapR’s distribution. Amazon through its Elastic MapReduce service (EMR) hosted over 2 million clusters in the past year. Amazon selected MapR to complement EMR as the only commercial Hadoop distribution being offered, sold and supported as a service by Amazon to its customers. MapR was also selected by Google – the pioneer of MapReduce and the company whose white paper on MapReduce inspired the creation of Hadoop – has also selected MapR to make our distribution available on Google Compute Engine. Hadoop in the cloud makes a great deal of sense: the elastic resource allocation that cloud computing is premised on works well for cluster-based data processing infrastructure used on varying analyses and data sets of indeterminate size. MapR has unique features such as mirroring between sites and multi-tenancy support that further enhance cloud deployments
  2. MapR is used today across industries. We have 10 of the Fortune 100 that are using MapR in production. We have leading web 2.0 properties such as leading digital advertising platforms, using MapR.These customers are using MapR in production for a variety of use cases. Examples include one of the largest credit card issuers in the world that has standardized on MapR for fraud and consumer targeting applications.Other examples include a major health care group,national cyber security, and one of the largest retailers in the world. These are all provided by MapR’s complete distribution for Apache Hadoop