SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
Ensemble Learning Better Predictions Through Diversity Todd Holloway ETech 2008
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object]
Tutorial: Neighbor Method ,[object Object],[object Object],Related Rocky Balboa Hasn’t rated it guess 5 stars? Rocky Rates it 5 Stars Rocky IV Rates it 5 stars Suppose the attributes are  movie titles and a user’s ratings of those movies.  The task is to predict what that user will rate a new movie. Pretty Woman Rates it 2 Stars
Relatedness ,[object Object],The catch is to define ‘related’ - Sarwar, et al. Item-based collaborative  filtering recommendation algorithms.  2001. How to begin to understand a relatedness measure? 1. ‘Off the shelf’ measures Pearson Correlation 2. Tailor measure to dataset
Visualization of Relatedness Measure ,[object Object],Fruchterman & Reingold.  Graph drawing by Force Directed Placement.  1991. 0.8 0.5 0.6 Proximity is interpreted as relatedness… ,[object Object],[object Object],[object Object]
Visualization of Relatedness Measure What’s the big cluster in the center?
Assembling the Model ,[object Object],[object Object],This is similar to the approaches reported by Amazon in 2003, and Tivo in 2004. - Sarwar, et al. Item-based collaborative  filtering recommendation algorithms.  2001. Training Examples Relatedness / Similarity
[object Object],[object Object],[object Object],Ensemble Learning in Practice:  A Look at the Netflix Prize October 2006-present
[object Object],[object Object],From the Internet Archive.
However, improvement slowed and techniques became more sophisticated… Bennett and Lanning. KDCup 2007.
Bennett and Lanning. KDCup 2007. Techniques used…
“ Thanks to Paul Harrison's collaboration, a simple mix of our solutions improved our result from 6.31 to 6.75” Rookies (35)
“ My approach is to  combine the results of many methods  (also two-way interactions between them) using linear regression on the test set. The best method in my ensemble is regularized SVD with biases, post processed with kernel ridge regression” Arek Paterek (15) http://rainbow.mimuw.edu.pl/~ap/ap_kdd.pdf
“ When the predictions of  multiple  RBM models and  multiple  SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflix’s own system.” U of Toronto (13) http://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf
Gravity (3) home.mit.bme.hu/~gtakacs/download/gravity.pdf
“ Predictive accuracy is substantially improved when blending multiple predictors. Our experience is that most efforts should be concentrated in deriving substantially different approaches, rather than refining a single technique. Consequently, our  solution is an ensemble of many methods. “ “ Our final solution (RMSE=0.8712) consists of blending 107 individual results. “ BellKor (2) http://www.research.att.com/~volinsky/netflix/ProgressPrize2007BellKorSolution.pdf
“ Our common team blends the result of team Gravity and team Dinosaur Planet.” Might have guessed from the name… When Gravity and  Dinosaurs Unite (1)
Why combine models? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A Reflection ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],See  Domingos, P. Occam’s two razors: the sharp and the blunt.  KDD.  1998.
Achieving Diversity ,[object Object],[object Object],[object Object],[object Object],[object Object]
Achieving Diversity ,[object Object],[object Object],[object Object],Ratings Actors Genres Classifier A Classifier B Classifier C + Predictions + Classifier A Classifier B Classifier C + Predictions + Training Examples Training Examples
Two Particular Strategies ,[object Object],[object Object],[object Object],[object Object]
Bagging Diversity ,[object Object],[object Object],[object Object]
Bagging Algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object]
Boosting Incrementally create models using selectively using training examples based on some distribution.
AdaBoost (Adaptive Boosting) Algorithm ,[object Object],[object Object],[object Object],[object Object]
AdaBoost Cont. ,[object Object],[object Object],[object Object],[object Object]
Recap ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Further Information… ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Posted on www.ABeautifulWWW.com

Más contenido relacionado

La actualidad más candente

Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with baggingChode Amarnath
 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodHonglin Yu
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsAndrew Ferlitsch
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learningHaris Jamil
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning TechniquesBabu Priyavrat
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationMarina Santini
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA BoostAman Patel
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Jeet Das
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Marina Santini
 
Lecture 11
Lecture 11Lecture 11
Lecture 11Jeet Das
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Marina Santini
 
Download It
Download ItDownload It
Download Itbutest
 

La actualidad más candente (20)

Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with bagging
 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning Method
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
 
Machine learning with ADA Boost
Machine learning with ADA BoostMachine learning with ADA Boost
Machine learning with ADA Boost
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1) Lecture 8: Machine Learning in Practice (1)
Lecture 8: Machine Learning in Practice (1)
 
Lecture 11
Lecture 11Lecture 11
Lecture 11
 
Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)Lecture 9: Machine Learning in Practice (2)
Lecture 9: Machine Learning in Practice (2)
 
Borderline Smote
Borderline SmoteBorderline Smote
Borderline Smote
 
Download It
Download ItDownload It
Download It
 

Similar a Ensemble Learning Featuring the Netflix Prize Competition and ...

Introduction
IntroductionIntroduction
Introductionbutest
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directionsTao He
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningKai Koenig
 
Automating The Generation Of Production Rules For Intelligent Tutoring Systems
Automating The Generation Of Production Rules For Intelligent Tutoring SystemsAutomating The Generation Of Production Rules For Intelligent Tutoring Systems
Automating The Generation Of Production Rules For Intelligent Tutoring SystemsCynthia King
 
Deep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyJason Tsai
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningHoa Le
 
ICELW Conference Slides
ICELW Conference SlidesICELW Conference Slides
ICELW Conference Slidestoolboc
 
Using the Machine to predict Testability
Using the Machine to predict TestabilityUsing the Machine to predict Testability
Using the Machine to predict TestabilityMiguel Lopez
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsChirag Gupta
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Daniel Roggen
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingLionel Briand
 
Think-Aloud Protocols
Think-Aloud ProtocolsThink-Aloud Protocols
Think-Aloud Protocolsbutest
 
17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptxssuser2023c6
 

Similar a Ensemble Learning Featuring the Netflix Prize Competition and ... (20)

Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
Introduction
IntroductionIntroduction
Introduction
 
Testing survey by_directions
Testing survey by_directionsTesting survey by_directions
Testing survey by_directions
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Clusterix at VDS 2016
Clusterix at VDS 2016Clusterix at VDS 2016
Clusterix at VDS 2016
 
De carlo rizk 2010 icelw
De carlo rizk 2010 icelwDe carlo rizk 2010 icelw
De carlo rizk 2010 icelw
 
Automating The Generation Of Production Rules For Intelligent Tutoring Systems
Automating The Generation Of Production Rules For Intelligent Tutoring SystemsAutomating The Generation Of Production Rules For Intelligent Tutoring Systems
Automating The Generation Of Production Rules For Intelligent Tutoring Systems
 
Deep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical MethodologyDeep Learning: Chapter 11 Practical Methodology
Deep Learning: Chapter 11 Practical Methodology
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
B4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearningB4UConference_machine learning_deeplearning
B4UConference_machine learning_deeplearning
 
ICELW Conference Slides
ICELW Conference SlidesICELW Conference Slides
ICELW Conference Slides
 
De carlo rizk 2010 icelw
De carlo rizk 2010 icelwDe carlo rizk 2010 icelw
De carlo rizk 2010 icelw
 
Using the Machine to predict Testability
Using the Machine to predict TestabilityUsing the Machine to predict Testability
Using the Machine to predict Testability
 
Probability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional ExpertsProbability density estimation using Product of Conditional Experts
Probability density estimation using Product of Conditional Experts
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
Wearable Computing - Part IV: Ensemble classifiers & Insight into ongoing res...
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software Testing
 
Think-Aloud Protocols
Think-Aloud ProtocolsThink-Aloud Protocols
Think-Aloud Protocols
 
17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx
 

Más de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Más de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Ensemble Learning Featuring the Netflix Prize Competition and ...

  • 1. Ensemble Learning Better Predictions Through Diversity Todd Holloway ETech 2008
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7. Visualization of Relatedness Measure What’s the big cluster in the center?
  • 8.
  • 9.
  • 10.
  • 11. However, improvement slowed and techniques became more sophisticated… Bennett and Lanning. KDCup 2007.
  • 12. Bennett and Lanning. KDCup 2007. Techniques used…
  • 13. “ Thanks to Paul Harrison's collaboration, a simple mix of our solutions improved our result from 6.31 to 6.75” Rookies (35)
  • 14. “ My approach is to combine the results of many methods (also two-way interactions between them) using linear regression on the test set. The best method in my ensemble is regularized SVD with biases, post processed with kernel ridge regression” Arek Paterek (15) http://rainbow.mimuw.edu.pl/~ap/ap_kdd.pdf
  • 15. “ When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflix’s own system.” U of Toronto (13) http://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf
  • 17. “ Predictive accuracy is substantially improved when blending multiple predictors. Our experience is that most efforts should be concentrated in deriving substantially different approaches, rather than refining a single technique. Consequently, our solution is an ensemble of many methods. “ “ Our final solution (RMSE=0.8712) consists of blending 107 individual results. “ BellKor (2) http://www.research.att.com/~volinsky/netflix/ProgressPrize2007BellKorSolution.pdf
  • 18. “ Our common team blends the result of team Gravity and team Dinosaur Planet.” Might have guessed from the name… When Gravity and Dinosaurs Unite (1)
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. Boosting Incrementally create models using selectively using training examples based on some distribution.
  • 27.
  • 28.
  • 29.
  • 30.

Notas del editor

  1. Try to keep talk accessible…
  2. Nice because it doesn’t require much background to understand. Old, often heuristic approach. Many names and forms. Expand into two slides.
  3. Many others. Add pictures. Characters grouped by different categories.
  4. Adjusted cosine of movies, where the movie attributes are ratings of all given users. Explain how these similarities would be used.
  5. !! Nice because it doesn’t require much background to understand. Old, often heuristic approach. Many names and forms. Expand into two slides.
  6. Supervised learning. Evidence of the utility of combining approaches. Introduce competition. Obviously there has been much written about this competition.
  7. Thru the power of the Internet archive.
  8. From a talk by Netflix. The distribution of results and their belief about the techniques.
  9. Tidbits. Not too much detail available.
  10. Linear regression.
  11. Restricted Boltzmann Machines. Linear combinations (averaging?).
  12. Matrix Factorization-SVD (MF), K-Nearest neighbor (NB), Clustering (CL). Nice data to discuss.
  13. Note the allusion to diversity. Mention this. Linear regression combination. Paper linked lists all 107 classifier types.
  14. The current leaders…
  15. So there is support for using ensembles. What is the intuition behind the success. On major aspect of concern is diversity. That is to say there are meaningful differences between the classifications of each classifier in the ensemble. Add picture.
  16. Add images, examples. Have users seen the terminology in past slides? Voting?
  17. Add images, examples. Have users seen the terminology in past slides? Voting?
  18. The two most common strategies. Bagging—diversity comes focusing on different subsets of data. Boosting is incremental, and gets diversity by having each additional classifier a bit more focused than the last. Add pics. (Bootstrap Aggregating)
  19. Originally rejected (journal of statistics). Can also create multiple NN using different random initial weights for diversity. Another bagging technique?
  20. Originally rejected (journal of statistics). Can also create multiple NN using different random initial weights for diversity. Another bagging technique?
  21. Note: if weights can’t be used directly, they can be used to make examples currently misclassified more important (or less, if lots of noise). generate a weighted sample of the example. Point is that the construction of a classifier