Enviar búsqueda
Cargar
Goto amsterdam-2013-skinned
•
Descargar como PPTX, PDF
•
0 recomendaciones
•
972 vistas
T
Ted Dunning
Seguir
Tecnología
Denunciar
Compartir
Denunciar
Compartir
1 de 53
Descargar ahora
Recomendados
Storm users group real time hadoop
Storm users group real time hadoop
Ted Dunning
predictive-analytics-san-diego-2013-02-21
predictive-analytics-san-diego-2013-02-21
Ted Dunning
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
Ted Dunning
Buzz Words Dunning Real-Time Learning
Buzz Words Dunning Real-Time Learning
MapR Technologies
London hug
London hug
Ted Dunning
Hug france-2012-12-04
Hug france-2012-12-04
Ted Dunning
Boston hug-2012-07
Boston hug-2012-07
Ted Dunning
Polyvalent recommendations
Polyvalent recommendations
Ted Dunning
Recomendados
Storm users group real time hadoop
Storm users group real time hadoop
Ted Dunning
predictive-analytics-san-diego-2013-02-21
predictive-analytics-san-diego-2013-02-21
Ted Dunning
Buzz words-dunning-real-time-learning
Buzz words-dunning-real-time-learning
Ted Dunning
Buzz Words Dunning Real-Time Learning
Buzz Words Dunning Real-Time Learning
MapR Technologies
London hug
London hug
Ted Dunning
Hug france-2012-12-04
Hug france-2012-12-04
Ted Dunning
Boston hug-2012-07
Boston hug-2012-07
Ted Dunning
Polyvalent recommendations
Polyvalent recommendations
Ted Dunning
Real time-hadoop
Real time-hadoop
Ted Dunning
Mathematical bridges From Old to New
Mathematical bridges From Old to New
MapR Technologies
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
Ted Dunning
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
Ted Dunning
What is the past future tense of data?
What is the past future tense of data?
Ted Dunning
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Ted Dunning
New Directions for Mahout
New Directions for Mahout
Ted Dunning
ACM 2013-02-25
ACM 2013-02-25
Ted Dunning
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Carol McDonald
Dunning time-series-2015
Dunning time-series-2015
Ted Dunning
On-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy Models
Databricks
Storm 2012-03-29
Storm 2012-03-29
Ted Dunning
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Codemotion
Hadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
MapR Technologies
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
DataWorks Summit
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
Ted Dunning
What's new in Apache Mahout
What's new in Apache Mahout
Ted Dunning
Big Data Paris
Big Data Paris
Ted Dunning
Distributed Deep Learning on Spark
Distributed Deep Learning on Spark
Mathieu Dumoulin
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
MapR Technologies
Strata New York 2012
Strata New York 2012
MapR Technologies
Real-time and Long-time Together
Real-time and Long-time Together
MapR Technologies
Más contenido relacionado
La actualidad más candente
Real time-hadoop
Real time-hadoop
Ted Dunning
Mathematical bridges From Old to New
Mathematical bridges From Old to New
MapR Technologies
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
Ted Dunning
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
Ted Dunning
What is the past future tense of data?
What is the past future tense of data?
Ted Dunning
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Ted Dunning
New Directions for Mahout
New Directions for Mahout
Ted Dunning
ACM 2013-02-25
ACM 2013-02-25
Ted Dunning
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Carol McDonald
Dunning time-series-2015
Dunning time-series-2015
Ted Dunning
On-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy Models
Databricks
Storm 2012-03-29
Storm 2012-03-29
Ted Dunning
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Codemotion
Hadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
MapR Technologies
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
DataWorks Summit
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
Ted Dunning
What's new in Apache Mahout
What's new in Apache Mahout
Ted Dunning
Big Data Paris
Big Data Paris
Ted Dunning
Distributed Deep Learning on Spark
Distributed Deep Learning on Spark
Mathieu Dumoulin
La actualidad más candente
(19)
Real time-hadoop
Real time-hadoop
Mathematical bridges From Old to New
Mathematical bridges From Old to New
Where is Data Going? - RMDC Keynote
Where is Data Going? - RMDC Keynote
Cognitive computing with big data, high tech and low tech approaches
Cognitive computing with big data, high tech and low tech approaches
What is the past future tense of data?
What is the past future tense of data?
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
New Directions for Mahout
New Directions for Mahout
ACM 2013-02-25
ACM 2013-02-25
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Fast, Scalable, Streaming Applications with Spark Streaming, the Kafka API an...
Dunning time-series-2015
Dunning time-series-2015
On-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy Models
Storm 2012-03-29
Storm 2012-03-29
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...
Hadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
Cheap learning-dunning-9-18-2015
Cheap learning-dunning-9-18-2015
What's new in Apache Mahout
What's new in Apache Mahout
Big Data Paris
Big Data Paris
Distributed Deep Learning on Spark
Distributed Deep Learning on Spark
Similar a Goto amsterdam-2013-skinned
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
MapR Technologies
Strata New York 2012
Strata New York 2012
MapR Technologies
Real-time and Long-time Together
Real-time and Long-time Together
MapR Technologies
Devoxx Real-Time Learning
Devoxx Real-Time Learning
MapR Technologies
Polyvalent Recommendations
Polyvalent Recommendations
MapR Technologies
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal Recommendations
MapR Technologies
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendation
Ted Dunning
Real-time and long-time together
Real-time and long-time together
Ted Dunning
News From Mahout
News From Mahout
MapR Technologies
London hug
London hug
MapR Technologies
Which Algorithms Really Matter
Which Algorithms Really Matter
Ted Dunning
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SF
MLconf
The power of hadoop in business
The power of hadoop in business
MapR Technologies
Predictive Analytics San Diego
Predictive Analytics San Diego
MapR Technologies
New directions for mahout
New directions for mahout
MapR Technologies
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012
MapR Technologies
Mahout and Recommendations
Mahout and Recommendations
Ted Dunning
DFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout Recommenders
Ted Dunning
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DBOnto
Diadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meeting
DBOnto
Similar a Goto amsterdam-2013-skinned
(20)
CMU Lecture on Hadoop Performance
CMU Lecture on Hadoop Performance
Strata New York 2012
Strata New York 2012
Real-time and Long-time Together
Real-time and Long-time Together
Devoxx Real-Time Learning
Devoxx Real-Time Learning
Polyvalent Recommendations
Polyvalent Recommendations
Buzz Words Dunning Multi Modal Recommendations
Buzz Words Dunning Multi Modal Recommendations
Buzz words-dunning-multi-modal-recommendation
Buzz words-dunning-multi-modal-recommendation
Real-time and long-time together
Real-time and long-time together
News From Mahout
News From Mahout
London hug
London hug
Which Algorithms Really Matter
Which Algorithms Really Matter
Ted Dunning, Chief Application Architect, MapR at MLconf SF
Ted Dunning, Chief Application Architect, MapR at MLconf SF
The power of hadoop in business
The power of hadoop in business
Predictive Analytics San Diego
Predictive Analytics San Diego
New directions for mahout
New directions for mahout
Boston Hug by Ted Dunning 2012
Boston Hug by Ted Dunning 2012
Mahout and Recommendations
Mahout and Recommendations
DFW Big Data talk on Mahout Recommenders
DFW Big Data talk on Mahout Recommenders
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
DIADEM: domain-centric intelligent automated data extraction methodology Pres...
Diadem DBOnto Kick Off meeting
Diadem DBOnto Kick Off meeting
Más de Ted Dunning
Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
Ted Dunning
How to Get Going with Kubernetes
How to Get Going with Kubernetes
Ted Dunning
Progress for big data in Kubernetes
Progress for big data in Kubernetes
Ted Dunning
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
Ted Dunning
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
Ted Dunning
Machine Learning Logistics
Machine Learning Logistics
Ted Dunning
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
Ted Dunning
Machine Learning logistics
Machine Learning logistics
Ted Dunning
T digest-update
T digest-update
Ted Dunning
Finding Changes in Real Data
Finding Changes in Real Data
Ted Dunning
Sharing Sensitive Data Securely
Sharing Sensitive Data Securely
Ted Dunning
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Ted Dunning
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
Ted Dunning
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
Ted Dunning
Doing-the-impossible
Doing-the-impossible
Ted Dunning
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
Ted Dunning
Recommendation Techn
Recommendation Techn
Ted Dunning
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
Ted Dunning
My talk about recommendation and search to the Hive
My talk about recommendation and search to the Hive
Ted Dunning
Strata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
Ted Dunning
Más de Ted Dunning
(20)
Dunning - SIGMOD - Data Economy.pptx
Dunning - SIGMOD - Data Economy.pptx
How to Get Going with Kubernetes
How to Get Going with Kubernetes
Progress for big data in Kubernetes
Progress for big data in Kubernetes
Anomaly Detection: How to find what you didn’t know to look for
Anomaly Detection: How to find what you didn’t know to look for
Streaming Architecture including Rendezvous for Machine Learning
Streaming Architecture including Rendezvous for Machine Learning
Machine Learning Logistics
Machine Learning Logistics
Tensor Abuse - how to reuse machine learning frameworks
Tensor Abuse - how to reuse machine learning frameworks
Machine Learning logistics
Machine Learning logistics
T digest-update
T digest-update
Finding Changes in Real Data
Finding Changes in Real Data
Sharing Sensitive Data Securely
Sharing Sensitive Data Securely
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-time
How the Internet of Things is Turning the Internet Upside Down
How the Internet of Things is Turning the Internet Upside Down
Apache Kylin - OLAP Cubes for SQL on Hadoop
Apache Kylin - OLAP Cubes for SQL on Hadoop
Doing-the-impossible
Doing-the-impossible
Anomaly Detection - New York Machine Learning
Anomaly Detection - New York Machine Learning
Recommendation Techn
Recommendation Techn
Possible Visions for Mahout 1.0
Possible Visions for Mahout 1.0
My talk about recommendation and search to the Hive
My talk about recommendation and search to the Hive
Strata 2014 Anomaly Detection
Strata 2014 Anomaly Detection
Último
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Puma Security, LLC
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
Paola De la Torre
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
HampshireHUG
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Gabriella Davis
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
hans926745
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
AndikSusilo4
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Ridwan Fadjar
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Katpro Technologies
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
Pooja Nehwal
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
naman860154
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Delhi Call girls
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
2toLead Limited
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
Sinan KOZAK
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
XfilesPro
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Michael W. Hawkins
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Safe Software
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Padma Pradeep
Último
(20)
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
How to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
Goto amsterdam-2013-skinned
1.
1©MapR Technologies -
Confidential Old and New Building Blocks Come Together For Big Data
2.
2©MapR Technologies -
Confidential Contact: tdunning@maprtech.com @ted_dunning Slides and such http://slideshare.net/tdunning Hash tags: #mapr #goto #d3 #node
3.
3©MapR Technologies -
Confidential Embarrassment of Riches d3.js allows really pretty pictures node.js allows simple (not just web) servers Storm does real-time Hadoop does big data d3 allows very cool visualizations
4.
4©MapR Technologies -
Confidential D3 demo
5.
5©MapR Technologies -
Confidential node demo
6.
6©MapR Technologies -
Confidential Hadoop demo
7.
7©MapR Technologies -
Confidential But … Web camp – everything is a service with a URL or a DOM Big data camp – non-traditional file systems Everybody else – files and databases They don’t like to talk to each other
8.
8©MapR Technologies -
Confidential Why Not Tiered Architectures? Tiered architectures – translations between services and cultures – standard corporate answer Feels like molasses
9.
9©MapR Technologies -
Confidential The Vision Integrate – multiple computing paradigms – many computing communities How? – common storage, queuing and data platforms
10.
10©MapR Technologies -
Confidential For Example, … Incoming documents with text – store in file-based queues – index in real-time using Storm and Solr – add initial engagement class, “don’t-know” Search for documents using original text – add random noise, small for well understood docs, large for “don’t-know” docs Record engagement
11.
11©MapR Technologies -
Confidential Add Analysis Process engagement logs – item-item cooccurrence – user-item histories Update search index – indicator items – decrease uncertainty on well understood docs Update user profile – item history
12.
12©MapR Technologies -
Confidential Search Again Now searches use recent views + text – recent views query indicator fields – text queries normal text data – add noise as appropriate
13.
13©MapR Technologies -
Confidential And Draw a Picture Searches and clicks can be logged – real-time metrics – real-time trending topics What’s hot, what’s not Popular searches Document clusters Word clouds
14.
14©MapR Technologies -
Confidential In Pictures
15.
15©MapR Technologies -
Confidential In Pictures Doc queue Search index Real-time indexing Doc sources
16.
16©MapR Technologies -
Confidential In Pictures Doc queue Search index Real-time indexing Doc sources User queries Search engine
17.
17©MapR Technologies -
Confidential In Pictures Doc queue Search index Real-time indexing Doc sources User queries Search engine Logs Recommendation analysis
18.
18©MapR Technologies -
Confidential In Pictures Doc queue Search index Real-time indexing Doc sources User queries Search engine Logs Recommendation analysis Usage analysis RenderingAdmin queries
19.
19©MapR Technologies -
Confidential Which Technology? Doc queue Search index Real-time indexing Doc sources User queries Search engine Logs Recommendation analysis Usage analysis Admin queries Rendering Storm/node Solr MapR D3/node Other
20.
20©MapR Technologies -
Confidential Yeah, But … This isn’t as easy as it looks Take the real-time / long-time part
21.
21©MapR Technologies -
Confidential t now Hadoop is Not Very Real-time Unprocessed Data Fully processed Latest full period Hadoop job takes this long for this data
22.
22©MapR Technologies -
Confidential t now Hadoop works great back here Storm works here Real-time and Long-time together Blended view Blended view Blended View
23.
23©MapR Technologies -
Confidential SolR Indexer SolR Indexer Solr indexing Cooccurrence (Mahout) Item meta- data Index shards Complete history
24.
24©MapR Technologies -
Confidential SolR Indexer SolR Indexer Solr search Web tier Item meta- data Index shards User history
25.
25©MapR Technologies -
Confidential Users Catcher Storm Topic Queue Web-server http Web Data MapR
26.
26©MapR Technologies -
Confidential Closer Look – Catcher Protocol Data Sources Catcher Cluster Catcher Cluster Data Sources The data sources and catchers communicate with a very simple protocol. Hello() => list of catchers Log(topic,message) => (OK|FAIL, redirect-to-catcher)
27.
27©MapR Technologies -
Confidential Closer Look – Catcher Queues Catcher Cluster Catcher Cluster The catchers forward log requests to the correct catcher and return that host in the reply to allow the client to avoid the extra hop. Each topic file is appended by exactly one catcher. Topic files are kept in shared file storage. Topic File Topic File
28.
28©MapR Technologies -
Confidential Closer Look – ProtoSpout The ProtoSpout tails the topic files, parses log records into tuples and injects them into the Storm topology. Last fully acked position stored in shared file system. Topic File Topic File ProtoSpout
29.
29©MapR Technologies -
Confidential Yeah, But … What was that about adding noise in scoring? Why would I do that?? Is there a simple answer?
30.
30©MapR Technologies -
Confidential Thompson Sampling Select each shell according to the probability that it is the best Probability that it is the best can be computed using posterior But I promised a simple answer P(i is best) = I E[ri |q]= max j E[rj |q] é ëê ù ûúò P(q | D) dq
31.
31©MapR Technologies -
Confidential Thompson Sampling – Take 2 Sample θ Pick i to maximize reward Record result from using i q ~P(q | D) i = argmax j E[r |q]
32.
32©MapR Technologies -
Confidential Nearly Forgotten until Recently Citations for Thompson sampling
33.
33©MapR Technologies -
Confidential Bayesian Bandit for the Search Compute distributions based on data so far Sample scores s1, s2 … – based on actual score – plus per doc noise from these distributions Rank docs by si Lemma 1: The probability of showing doc i at first position will match the probability it is the best Lemma 2: This is as good as it gets
34.
34©MapR Technologies -
Confidential And it works! 11000 100 200 300 400 500 600 700 800 900 1000 0.12 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.11 n regret ε- greedy, ε = 0.05 Bayesian Bandit with Gamma- Normal
35.
35©MapR Technologies -
Confidential Yeah, But … Isn’t recommendations complicated? How can I implement this?
36.
36©MapR Technologies -
Confidential Recommendation Basics History: User Thing 1 3 2 4 3 4 2 3 3 2 1 1 2 1
37.
37©MapR Technologies -
Confidential Recommendation Basics History as matrix: (t1, t3) cooccur 2 times, (t1, t4) once, (t2, t4) once, (t3, t4) once t1 t2 t3 t4 u1 1 0 1 0 u2 1 0 1 1 u3 0 1 0 1
38.
38©MapR Technologies -
Confidential A Quick Simplification Users who do h Also do r Ah AT Ah( ) AT A( )h User-centric recommendations Item-centric recommendations
39.
39©MapR Technologies -
Confidential Recommendation Basics Coocurrence t1 t2 t3 t4 t1 2 0 2 1 t2 0 1 0 1 t3 2 0 1 1 t4 1 1 1 2
40.
40©MapR Technologies -
Confidential Problems with Raw Cooccurrence Very popular items co-occur with everything – Welcome document – Elevator music That isn’t interesting – We want anomalous cooccurrence
41.
41©MapR Technologies -
Confidential Recommendation Basics Coocurrence t1 t2 t3 t4 t1 2 0 2 1 t2 0 1 0 1 t3 2 0 1 1 t4 1 1 1 2 t3 not t3 t1 2 1 not t1 1 1
42.
42©MapR Technologies -
Confidential Spot the Anomaly Root LLR is roughly like standard deviations A not A B 13 1000 not B 1000 100,000 A not A B 1 0 not B 0 2 A not A B 1 0 not B 0 10,000 A not A B 10 0 not B 0 100,000 0.44 0.98 2.26 7.15
43.
43©MapR Technologies -
Confidential Root LLR Details In R entropy = function(k) { -sum(k*log((k==0)+(k/sum(k)))) } rootLLr = function(k) { sign = … sign * sqrt( (entropy(rowSums(k))+entropy(colSums(k)) - entropy(k))/2) } Like sqrt(mutual information * N/2) See http://bit.ly/16DvLVK
44.
44©MapR Technologies -
Confidential Threshold by Score Coocurrence t1 t2 t3 t4 t1 2 0 2 1 t2 0 1 0 1 t3 2 0 1 1 t4 1 1 1 2
45.
45©MapR Technologies -
Confidential Threshold by Score Significant cooccurrence => Indicators t1 t2 t3 t4 t1 1 0 0 1 t2 0 1 0 1 t3 0 0 1 1 t4 1 0 0 1
46.
46©MapR Technologies -
Confidential Yeah, But … Why go to all this trouble? Does it really help?
47.
47©MapR Technologies -
Confidential Real-life example
48.
48©MapR Technologies -
Confidential The Real Life Issues Exploration Diversity Speed Not the last percent
49.
49©MapR Technologies -
Confidential The Second Page
50.
50©MapR Technologies -
Confidential Make it Worse to Make It Better Add noise to rank 1 2 8 7 6 3 5 4 10 13 21 18 12 9 14 24 34 28 32 17 11 27 40 30 41 49 16 15 35 23 19 22 26 31 20 43 25 29 33 62 38 60 74 53 36 37 39 70 45 44 46 71 42 69 47 63 52 57 51 48 Results are worse today But better tomorrow
51.
51©MapR Technologies -
Confidential Anti-Flood 200 of the same result is no better than 2 The recommender list is a portfolio of results – If probability of success is highly correlated, then probability of at least one success is much lower Suppressing items similar to higher ranking items helps
52.
52©MapR Technologies -
Confidential The Punchline Hybrid systems really can work today Middle tiers aren’t as interesting as they used to be – No need for Flume … queue directly in big data system – No need for external queues, tail the data directly with Storm – No need for query systems for presentation data … read it directly with node Absolutely require common frameworks and standard interfaces You can do this today!
53.
53©MapR Technologies -
Confidential Contact: tdunning@maprtech.com @ted_dunning Slides and such http://slideshare.net/tdunning Hash tags: #mapr #goto #d3 #node
Descargar ahora