SlideShare una empresa de Scribd logo
1 de 53
1©MapR Technologies - Confidential
Old and New Building Blocks
Come Together For Big Data
2©MapR Technologies - Confidential
 Contact:
tdunning@maprtech.com
@ted_dunning
 Slides and such
http://slideshare.net/tdunning
 Hash tags: #mapr #goto #d3 #node
3©MapR Technologies - Confidential
Embarrassment of Riches
 d3.js allows really pretty pictures
 node.js allows simple (not just web) servers
 Storm does real-time
 Hadoop does big data
 d3 allows very cool visualizations
4©MapR Technologies - Confidential
D3 demo
5©MapR Technologies - Confidential
node demo
6©MapR Technologies - Confidential
Hadoop
demo
7©MapR Technologies - Confidential
But …
 Web camp
– everything is a service with a URL or a DOM
 Big data camp
– non-traditional file systems
 Everybody else
– files and databases
 They don’t like to talk to each other
8©MapR Technologies - Confidential
Why Not Tiered Architectures?
 Tiered architectures
– translations between services and cultures
– standard corporate answer
 Feels like molasses
9©MapR Technologies - Confidential
The Vision
 Integrate
– multiple computing paradigms
– many computing communities
 How?
– common storage, queuing and data platforms
10©MapR Technologies - Confidential
For Example, …
 Incoming documents with text
– store in file-based queues
– index in real-time using Storm and Solr
– add initial engagement class, “don’t-know”
 Search for documents using original text
– add random noise, small for well understood docs, large for “don’t-know”
docs
 Record engagement
11©MapR Technologies - Confidential
Add Analysis
 Process engagement logs
– item-item cooccurrence
– user-item histories
 Update search index
– indicator items
– decrease uncertainty on well understood docs
 Update user profile
– item history
12©MapR Technologies - Confidential
Search Again
 Now searches use recent views + text
– recent views query indicator fields
– text queries normal text data
– add noise as appropriate
13©MapR Technologies - Confidential
And Draw a Picture
 Searches and clicks can be logged
– real-time metrics
– real-time trending topics
 What’s hot, what’s not
 Popular searches
 Document clusters
 Word clouds
14©MapR Technologies - Confidential
In Pictures
15©MapR Technologies - Confidential
In Pictures
Doc
queue
Search
index
Real-time
indexing
Doc
sources
16©MapR Technologies - Confidential
In Pictures
Doc
queue
Search
index
Real-time
indexing
Doc
sources
User
queries
Search
engine
17©MapR Technologies - Confidential
In Pictures
Doc
queue
Search
index
Real-time
indexing
Doc
sources
User
queries
Search
engine Logs
Recommendation
analysis
18©MapR Technologies - Confidential
In Pictures
Doc
queue
Search
index
Real-time
indexing
Doc
sources
User
queries
Search
engine Logs
Recommendation
analysis
Usage
analysis
RenderingAdmin
queries
19©MapR Technologies - Confidential
Which Technology?
Doc
queue
Search
index
Real-time
indexing
Doc
sources
User
queries
Search
engine Logs
Recommendation
analysis
Usage
analysis
Admin
queries
Rendering
Storm/node
Solr
MapR
D3/node
Other
20©MapR Technologies - Confidential
Yeah, But …
 This isn’t as easy as it looks
 Take the real-time / long-time part
21©MapR Technologies - Confidential
t
now
Hadoop is Not Very Real-time
Unprocessed
Data
Fully
processed
Latest full
period
Hadoop job
takes this
long for this
data
22©MapR Technologies - Confidential
t
now
Hadoop works
great back here
Storm
works
here
Real-time and Long-time together
Blended
view
Blended
view
Blended
View
23©MapR Technologies - Confidential
SolR
Indexer
SolR
Indexer
Solr
indexing
Cooccurrence
(Mahout)
Item meta-
data
Index
shards
Complete
history
24©MapR Technologies - Confidential
SolR
Indexer
SolR
Indexer
Solr
search
Web tier
Item meta-
data
Index
shards
User
history
25©MapR Technologies - Confidential
Users
Catcher Storm
Topic
Queue
Web-server
http
Web
Data
MapR
26©MapR Technologies - Confidential
Closer Look – Catcher Protocol
Data
Sources
Catcher
Cluster
Catcher
Cluster
Data
Sources
The data sources and catchers
communicate with a very simple
protocol.
Hello() => list of catchers
Log(topic,message) =>
(OK|FAIL, redirect-to-catcher)
27©MapR Technologies - Confidential
Closer Look – Catcher Queues
Catcher
Cluster
Catcher
Cluster
The catchers forward log requests
to the correct catcher and return
that host in the reply to allow the
client to avoid the extra hop.
Each topic file is appended by
exactly one catcher.
Topic files are kept in shared file
storage.
Topic
File
Topic
File
28©MapR Technologies - Confidential
Closer Look – ProtoSpout
The ProtoSpout tails the topic files,
parses log records into tuples and
injects them into the Storm
topology.
Last fully acked position stored in
shared file system.
Topic
File
Topic
File
ProtoSpout
29©MapR Technologies - Confidential
Yeah, But …
 What was that about adding noise in scoring?
 Why would I do that??
 Is there a simple answer?
30©MapR Technologies - Confidential
Thompson Sampling
 Select each shell according to the probability that it is the best
 Probability that it is the best can be computed using posterior
 But I promised a simple answer
P(i is best) = I E[ri |q]= max
j
E[rj |q]
é
ëê
ù
ûúò P(q | D) dq
31©MapR Technologies - Confidential
Thompson Sampling – Take 2
 Sample θ
 Pick i to maximize reward
 Record result from using i
q ~P(q | D)
i = argmax
j
E[r |q]
32©MapR Technologies - Confidential
Nearly Forgotten until Recently
 Citations for Thompson sampling
33©MapR Technologies - Confidential
Bayesian Bandit for the Search
 Compute distributions based on data so far
 Sample scores s1, s2 …
– based on actual score
– plus per doc noise from these distributions
 Rank docs by si
 Lemma 1: The probability of showing doc i at first position will
match the probability it is the best
 Lemma 2: This is as good as it gets
34©MapR Technologies - Confidential
And it works!
11000 100 200 300 400 500 600 700 800 900 1000
0.12
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0.11
n
regret
ε- greedy, ε = 0.05
Bayesian Bandit with Gamma- Normal
35©MapR Technologies - Confidential
Yeah, But …
 Isn’t recommendations complicated?
 How can I implement this?
36©MapR Technologies - Confidential
Recommendation Basics
 History:
User Thing
1 3
2 4
3 4
2 3
3 2
1 1
2 1
37©MapR Technologies - Confidential
Recommendation Basics
 History as matrix:
 (t1, t3) cooccur 2 times,
 (t1, t4) once,
 (t2, t4) once,
 (t3, t4) once
t1 t2 t3 t4
u1 1 0 1 0
u2 1 0 1 1
u3 0 1 0 1
38©MapR Technologies - Confidential
A Quick Simplification
 Users who do h
 Also do r
Ah
AT
Ah( )
AT
A( )h
User-centric recommendations
Item-centric recommendations
39©MapR Technologies - Confidential
Recommendation Basics
 Coocurrence
t1 t2 t3 t4
t1 2 0 2 1
t2 0 1 0 1
t3 2 0 1 1
t4 1 1 1 2
40©MapR Technologies - Confidential
Problems with Raw Cooccurrence
 Very popular items co-occur with everything
– Welcome document
– Elevator music
 That isn’t interesting
– We want anomalous cooccurrence
41©MapR Technologies - Confidential
Recommendation Basics
 Coocurrence
t1 t2 t3 t4
t1 2 0 2 1
t2 0 1 0 1
t3 2 0 1 1
t4 1 1 1 2
t3 not t3
t1 2 1
not t1 1 1
42©MapR Technologies - Confidential
Spot the Anomaly
 Root LLR is roughly like standard deviations
A not A
B 13 1000
not B 1000 100,000
A not A
B 1 0
not B 0 2
A not A
B 1 0
not B 0 10,000
A not A
B 10 0
not B 0 100,000
0.44 0.98
2.26 7.15
43©MapR Technologies - Confidential
Root LLR Details
 In R
entropy = function(k) {
-sum(k*log((k==0)+(k/sum(k))))
}
rootLLr = function(k) {
sign = …
sign * sqrt(
(entropy(rowSums(k))+entropy(colSums(k))
- entropy(k))/2)
}
 Like sqrt(mutual information * N/2)
See http://bit.ly/16DvLVK
44©MapR Technologies - Confidential
Threshold by Score
 Coocurrence
t1 t2 t3 t4
t1 2 0 2 1
t2 0 1 0 1
t3 2 0 1 1
t4 1 1 1 2
45©MapR Technologies - Confidential
Threshold by Score
 Significant cooccurrence => Indicators
t1 t2 t3 t4
t1 1 0 0 1
t2 0 1 0 1
t3 0 0 1 1
t4 1 0 0 1
46©MapR Technologies - Confidential
Yeah, But …
 Why go to all this trouble?
 Does it really help?
47©MapR Technologies - Confidential
Real-life example
48©MapR Technologies - Confidential
The Real Life Issues
 Exploration
 Diversity
 Speed
 Not the last percent
49©MapR Technologies - Confidential
The Second Page
50©MapR Technologies - Confidential
Make it Worse to Make It Better
 Add noise to rank
1 2 8 7 6 3 5 4 10 13 21 18 12 9 14 24 34 28 32 17
11 27 40 30 41 49 16 15 35 23 19 22 26 31 20 43 25 29 33 62
38 60 74 53 36 37 39 70 45 44 46 71 42 69 47 63 52 57 51 48
 Results are worse today
 But better tomorrow
51©MapR Technologies - Confidential
Anti-Flood
 200 of the same result is no better than 2
 The recommender list is a portfolio of results
– If probability of success is highly correlated, then probability of at least one
success is much lower
 Suppressing items similar to higher ranking items helps
52©MapR Technologies - Confidential
The Punchline
 Hybrid systems really can work today
 Middle tiers aren’t as interesting as they used to be
– No need for Flume … queue directly in big data system
– No need for external queues, tail the data directly with Storm
– No need for query systems for presentation data … read it directly with
node
 Absolutely require common frameworks and standard interfaces
 You can do this today!
53©MapR Technologies - Confidential
 Contact:
tdunning@maprtech.com
@ted_dunning
 Slides and such
http://slideshare.net/tdunning
 Hash tags: #mapr #goto #d3 #node

Más contenido relacionado

La actualidad más candente

Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoopTed Dunning
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to NewMapR Technologies
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteTed Dunning
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesTed Dunning
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?Ted Dunning
 
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected IntelligenceHadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected IntelligenceTed Dunning
 
New Directions for Mahout
New Directions for MahoutNew Directions for Mahout
New Directions for MahoutTed Dunning
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Carol McDonald
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015Ted Dunning
 
On-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy ModelsOn-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy ModelsDatabricks
 
Storm 2012-03-29
Storm 2012-03-29Storm 2012-03-29
Storm 2012-03-29Ted Dunning
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
 
Hadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsHadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsMapR Technologies
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Ted Dunning
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache MahoutTed Dunning
 
Distributed Deep Learning on Spark
Distributed Deep Learning on SparkDistributed Deep Learning on Spark
Distributed Deep Learning on SparkMathieu Dumoulin
 

La actualidad más candente (19)

Real time-hadoop
Real time-hadoopReal time-hadoop
Real time-hadoop
 
Mathematical bridges From Old to New
Mathematical bridges From Old to NewMathematical bridges From Old to New
Mathematical bridges From Old to New
 
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC KeynoteWhere is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
 
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approachesCognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
 
What is the past future tense of data?
What is the past future tense of data?What is the past future tense of data?
What is the past future tense of data?
 
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected IntelligenceHadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
 
New Directions for Mahout
New Directions for MahoutNew Directions for Mahout
New Directions for Mahout
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
 
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
 
Dunning time-series-2015
Dunning time-series-2015Dunning time-series-2015
Dunning time-series-2015
 
On-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy ModelsOn-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy Models
 
Storm 2012-03-29
Storm 2012-03-29Storm 2012-03-29
Storm 2012-03-29
 
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
 
Hadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsHadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
 
What's new in Apache Mahout
What's new in Apache MahoutWhat's new in Apache Mahout
What's new in Apache Mahout
 
Big Data Paris
Big Data ParisBig Data Paris
Big Data Paris
 
Distributed Deep Learning on Spark
Distributed Deep Learning on SparkDistributed Deep Learning on Spark
Distributed Deep Learning on Spark
 

Similar a Goto amsterdam-2013-skinned

CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceMapR Technologies
 
Real-time and Long-time Together
Real-time and Long-time TogetherReal-time and Long-time Together
Real-time and Long-time TogetherMapR Technologies
 
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsMapR Technologies
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationTed Dunning
 
Real-time and long-time together
Real-time and long-time togetherReal-time and long-time together
Real-time and long-time togetherTed Dunning
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really MatterTed Dunning
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFMLconf
 
The power of hadoop in business
The power of hadoop in businessThe power of hadoop in business
The power of hadoop in businessMapR Technologies
 
Predictive Analytics San Diego
Predictive Analytics San DiegoPredictive Analytics San Diego
Predictive Analytics San DiegoMapR Technologies
 
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012MapR Technologies
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and RecommendationsTed Dunning
 
DFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout RecommendersDFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout RecommendersTed Dunning
 
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...DBOnto
 
Diadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meetingDiadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meetingDBOnto
 

Similar a Goto amsterdam-2013-skinned (20)

CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop PerformanceCMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
 
Strata New York 2012
Strata New York 2012Strata New York 2012
Strata New York 2012
 
Real-time and Long-time Together
Real-time and Long-time TogetherReal-time and Long-time Together
Real-time and Long-time Together
 
Devoxx Real-Time Learning
Devoxx Real-Time LearningDevoxx Real-Time Learning
Devoxx Real-Time Learning
 
Polyvalent Recommendations
Polyvalent RecommendationsPolyvalent Recommendations
Polyvalent Recommendations
 
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal RecommendationsBuzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal Recommendations
 
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendationBuzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendation
 
Real-time and long-time together
Real-time and long-time togetherReal-time and long-time together
Real-time and long-time together
 
News From Mahout
News From MahoutNews From Mahout
News From Mahout
 
London hug
London hugLondon hug
London hug
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
 
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SFTed Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SF
 
The power of hadoop in business
The power of hadoop in businessThe power of hadoop in business
The power of hadoop in business
 
Predictive Analytics San Diego
Predictive Analytics San DiegoPredictive Analytics San Diego
Predictive Analytics San Diego
 
New directions for mahout
New directions for mahoutNew directions for mahout
New directions for mahout
 
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012
 
Mahout and Recommendations
Mahout and RecommendationsMahout and Recommendations
Mahout and Recommendations
 
DFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout RecommendersDFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout Recommenders
 
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
 
Diadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meetingDiadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meeting
 

Más de Ted Dunning

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxTed Dunning
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with KubernetesTed Dunning
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in KubernetesTed Dunning
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forTed Dunning
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningTed Dunning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning LogisticsTed Dunning
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTed Dunning
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logisticsTed Dunning
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real DataTed Dunning
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data SecurelyTed Dunning
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownTed Dunning
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopTed Dunning
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossibleTed Dunning
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningTed Dunning
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation TechnTed Dunning
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Ted Dunning
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveTed Dunning
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionTed Dunning
 

Más de Ted Dunning (20)

Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptxDunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
 
How to Get Going with Kubernetes
How to Get Going with KubernetesHow to Get Going with Kubernetes
How to Get Going with Kubernetes
 
Progress for big data in Kubernetes
Progress for big data in KubernetesProgress for big data in Kubernetes
Progress for big data in Kubernetes
 
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look forAnomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
 
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine LearningStreaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
 
Machine Learning Logistics
Machine Learning LogisticsMachine Learning Logistics
Machine Learning Logistics
 
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworksTensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
T digest-update
T digest-updateT digest-update
T digest-update
 
Finding Changes in Real Data
Finding Changes in Real DataFinding Changes in Real Data
Finding Changes in Real Data
 
Sharing Sensitive Data Securely
Sharing Sensitive Data SecurelySharing Sensitive Data Securely
Sharing Sensitive Data Securely
 
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeReal-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
 
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside DownHow the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
 
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on HadoopApache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
 
Doing-the-impossible
Doing-the-impossibleDoing-the-impossible
Doing-the-impossible
 
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine LearningAnomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
 
Recommendation Techn
Recommendation TechnRecommendation Techn
Recommendation Techn
 
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
 
My talk about recommendation and search to the Hive
My talk about recommendation and search to the HiveMy talk about recommendation and search to the Hive
My talk about recommendation and search to the Hive
 
Strata 2014 Anomaly Detection
Strata 2014 Anomaly DetectionStrata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
 

Último

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Último (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 

Goto amsterdam-2013-skinned

  • 1. 1©MapR Technologies - Confidential Old and New Building Blocks Come Together For Big Data
  • 2. 2©MapR Technologies - Confidential  Contact: tdunning@maprtech.com @ted_dunning  Slides and such http://slideshare.net/tdunning  Hash tags: #mapr #goto #d3 #node
  • 3. 3©MapR Technologies - Confidential Embarrassment of Riches  d3.js allows really pretty pictures  node.js allows simple (not just web) servers  Storm does real-time  Hadoop does big data  d3 allows very cool visualizations
  • 4. 4©MapR Technologies - Confidential D3 demo
  • 5. 5©MapR Technologies - Confidential node demo
  • 6. 6©MapR Technologies - Confidential Hadoop demo
  • 7. 7©MapR Technologies - Confidential But …  Web camp – everything is a service with a URL or a DOM  Big data camp – non-traditional file systems  Everybody else – files and databases  They don’t like to talk to each other
  • 8. 8©MapR Technologies - Confidential Why Not Tiered Architectures?  Tiered architectures – translations between services and cultures – standard corporate answer  Feels like molasses
  • 9. 9©MapR Technologies - Confidential The Vision  Integrate – multiple computing paradigms – many computing communities  How? – common storage, queuing and data platforms
  • 10. 10©MapR Technologies - Confidential For Example, …  Incoming documents with text – store in file-based queues – index in real-time using Storm and Solr – add initial engagement class, “don’t-know”  Search for documents using original text – add random noise, small for well understood docs, large for “don’t-know” docs  Record engagement
  • 11. 11©MapR Technologies - Confidential Add Analysis  Process engagement logs – item-item cooccurrence – user-item histories  Update search index – indicator items – decrease uncertainty on well understood docs  Update user profile – item history
  • 12. 12©MapR Technologies - Confidential Search Again  Now searches use recent views + text – recent views query indicator fields – text queries normal text data – add noise as appropriate
  • 13. 13©MapR Technologies - Confidential And Draw a Picture  Searches and clicks can be logged – real-time metrics – real-time trending topics  What’s hot, what’s not  Popular searches  Document clusters  Word clouds
  • 14. 14©MapR Technologies - Confidential In Pictures
  • 15. 15©MapR Technologies - Confidential In Pictures Doc queue Search index Real-time indexing Doc sources
  • 16. 16©MapR Technologies - Confidential In Pictures Doc queue Search index Real-time indexing Doc sources User queries Search engine
  • 17. 17©MapR Technologies - Confidential In Pictures Doc queue Search index Real-time indexing Doc sources User queries Search engine Logs Recommendation analysis
  • 18. 18©MapR Technologies - Confidential In Pictures Doc queue Search index Real-time indexing Doc sources User queries Search engine Logs Recommendation analysis Usage analysis RenderingAdmin queries
  • 19. 19©MapR Technologies - Confidential Which Technology? Doc queue Search index Real-time indexing Doc sources User queries Search engine Logs Recommendation analysis Usage analysis Admin queries Rendering Storm/node Solr MapR D3/node Other
  • 20. 20©MapR Technologies - Confidential Yeah, But …  This isn’t as easy as it looks  Take the real-time / long-time part
  • 21. 21©MapR Technologies - Confidential t now Hadoop is Not Very Real-time Unprocessed Data Fully processed Latest full period Hadoop job takes this long for this data
  • 22. 22©MapR Technologies - Confidential t now Hadoop works great back here Storm works here Real-time and Long-time together Blended view Blended view Blended View
  • 23. 23©MapR Technologies - Confidential SolR Indexer SolR Indexer Solr indexing Cooccurrence (Mahout) Item meta- data Index shards Complete history
  • 24. 24©MapR Technologies - Confidential SolR Indexer SolR Indexer Solr search Web tier Item meta- data Index shards User history
  • 25. 25©MapR Technologies - Confidential Users Catcher Storm Topic Queue Web-server http Web Data MapR
  • 26. 26©MapR Technologies - Confidential Closer Look – Catcher Protocol Data Sources Catcher Cluster Catcher Cluster Data Sources The data sources and catchers communicate with a very simple protocol. Hello() => list of catchers Log(topic,message) => (OK|FAIL, redirect-to-catcher)
  • 27. 27©MapR Technologies - Confidential Closer Look – Catcher Queues Catcher Cluster Catcher Cluster The catchers forward log requests to the correct catcher and return that host in the reply to allow the client to avoid the extra hop. Each topic file is appended by exactly one catcher. Topic files are kept in shared file storage. Topic File Topic File
  • 28. 28©MapR Technologies - Confidential Closer Look – ProtoSpout The ProtoSpout tails the topic files, parses log records into tuples and injects them into the Storm topology. Last fully acked position stored in shared file system. Topic File Topic File ProtoSpout
  • 29. 29©MapR Technologies - Confidential Yeah, But …  What was that about adding noise in scoring?  Why would I do that??  Is there a simple answer?
  • 30. 30©MapR Technologies - Confidential Thompson Sampling  Select each shell according to the probability that it is the best  Probability that it is the best can be computed using posterior  But I promised a simple answer P(i is best) = I E[ri |q]= max j E[rj |q] é ëê ù ûúò P(q | D) dq
  • 31. 31©MapR Technologies - Confidential Thompson Sampling – Take 2  Sample θ  Pick i to maximize reward  Record result from using i q ~P(q | D) i = argmax j E[r |q]
  • 32. 32©MapR Technologies - Confidential Nearly Forgotten until Recently  Citations for Thompson sampling
  • 33. 33©MapR Technologies - Confidential Bayesian Bandit for the Search  Compute distributions based on data so far  Sample scores s1, s2 … – based on actual score – plus per doc noise from these distributions  Rank docs by si  Lemma 1: The probability of showing doc i at first position will match the probability it is the best  Lemma 2: This is as good as it gets
  • 34. 34©MapR Technologies - Confidential And it works! 11000 100 200 300 400 500 600 700 800 900 1000 0.12 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 n regret ε- greedy, ε = 0.05 Bayesian Bandit with Gamma- Normal
  • 35. 35©MapR Technologies - Confidential Yeah, But …  Isn’t recommendations complicated?  How can I implement this?
  • 36. 36©MapR Technologies - Confidential Recommendation Basics  History: User Thing 1 3 2 4 3 4 2 3 3 2 1 1 2 1
  • 37. 37©MapR Technologies - Confidential Recommendation Basics  History as matrix:  (t1, t3) cooccur 2 times,  (t1, t4) once,  (t2, t4) once,  (t3, t4) once t1 t2 t3 t4 u1 1 0 1 0 u2 1 0 1 1 u3 0 1 0 1
  • 38. 38©MapR Technologies - Confidential A Quick Simplification  Users who do h  Also do r Ah AT Ah( ) AT A( )h User-centric recommendations Item-centric recommendations
  • 39. 39©MapR Technologies - Confidential Recommendation Basics  Coocurrence t1 t2 t3 t4 t1 2 0 2 1 t2 0 1 0 1 t3 2 0 1 1 t4 1 1 1 2
  • 40. 40©MapR Technologies - Confidential Problems with Raw Cooccurrence  Very popular items co-occur with everything – Welcome document – Elevator music  That isn’t interesting – We want anomalous cooccurrence
  • 41. 41©MapR Technologies - Confidential Recommendation Basics  Coocurrence t1 t2 t3 t4 t1 2 0 2 1 t2 0 1 0 1 t3 2 0 1 1 t4 1 1 1 2 t3 not t3 t1 2 1 not t1 1 1
  • 42. 42©MapR Technologies - Confidential Spot the Anomaly  Root LLR is roughly like standard deviations A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 2 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000 0.44 0.98 2.26 7.15
  • 43. 43©MapR Technologies - Confidential Root LLR Details  In R entropy = function(k) { -sum(k*log((k==0)+(k/sum(k)))) } rootLLr = function(k) { sign = … sign * sqrt( (entropy(rowSums(k))+entropy(colSums(k)) - entropy(k))/2) }  Like sqrt(mutual information * N/2) See http://bit.ly/16DvLVK
  • 44. 44©MapR Technologies - Confidential Threshold by Score  Coocurrence t1 t2 t3 t4 t1 2 0 2 1 t2 0 1 0 1 t3 2 0 1 1 t4 1 1 1 2
  • 45. 45©MapR Technologies - Confidential Threshold by Score  Significant cooccurrence => Indicators t1 t2 t3 t4 t1 1 0 0 1 t2 0 1 0 1 t3 0 0 1 1 t4 1 0 0 1
  • 46. 46©MapR Technologies - Confidential Yeah, But …  Why go to all this trouble?  Does it really help?
  • 47. 47©MapR Technologies - Confidential Real-life example
  • 48. 48©MapR Technologies - Confidential The Real Life Issues  Exploration  Diversity  Speed  Not the last percent
  • 49. 49©MapR Technologies - Confidential The Second Page
  • 50. 50©MapR Technologies - Confidential Make it Worse to Make It Better  Add noise to rank 1 2 8 7 6 3 5 4 10 13 21 18 12 9 14 24 34 28 32 17 11 27 40 30 41 49 16 15 35 23 19 22 26 31 20 43 25 29 33 62 38 60 74 53 36 37 39 70 45 44 46 71 42 69 47 63 52 57 51 48  Results are worse today  But better tomorrow
  • 51. 51©MapR Technologies - Confidential Anti-Flood  200 of the same result is no better than 2  The recommender list is a portfolio of results – If probability of success is highly correlated, then probability of at least one success is much lower  Suppressing items similar to higher ranking items helps
  • 52. 52©MapR Technologies - Confidential The Punchline  Hybrid systems really can work today  Middle tiers aren’t as interesting as they used to be – No need for Flume … queue directly in big data system – No need for external queues, tail the data directly with Storm – No need for query systems for presentation data … read it directly with node  Absolutely require common frameworks and standard interfaces  You can do this today!
  • 53. 53©MapR Technologies - Confidential  Contact: tdunning@maprtech.com @ted_dunning  Slides and such http://slideshare.net/tdunning  Hash tags: #mapr #goto #d3 #node