13. “ Thanks to Paul Harrison's collaboration, a simple mix of our solutions improved our result from 6.31 to 6.75” Rookies (35)
14. “ My approach is to combine the results of many methods (also two-way interactions between them) using linear regression on the test set. The best method in my ensemble is regularized SVD with biases, post processed with kernel ridge regression” Arek Paterek (15) http://rainbow.mimuw.edu.pl/~ap/ap_kdd.pdf
15. “ When the predictions of multiple RBM models and multiple SVD models are linearly combined, we achieve an error rate that is well over 6% better than the score of Netflix’s own system.” U of Toronto (13) http://www.cs.toronto.edu/~rsalakhu/papers/rbmcf.pdf
17. “ Predictive accuracy is substantially improved when blending multiple predictors. Our experience is that most efforts should be concentrated in deriving substantially different approaches, rather than refining a single technique. Consequently, our solution is an ensemble of many methods. “ “ Our final solution (RMSE=0.8712) consists of blending 107 individual results. “ BellKor (2) http://www.research.att.com/~volinsky/netflix/ProgressPrize2007BellKorSolution.pdf
18. “ Our common team blends the result of team Gravity and team Dinosaur Planet.” Might have guessed from the name… When Gravity and Dinosaurs Unite (1)
Nice because it doesn’t require much background to understand. Old, often heuristic approach. Many names and forms. Expand into two slides.
Many others. Add pictures. Characters grouped by different categories.
Adjusted cosine of movies, where the movie attributes are ratings of all given users. Explain how these similarities would be used.
!! Nice because it doesn’t require much background to understand. Old, often heuristic approach. Many names and forms. Expand into two slides.
Supervised learning. Evidence of the utility of combining approaches. Introduce competition. Obviously there has been much written about this competition.
Thru the power of the Internet archive.
From a talk by Netflix. The distribution of results and their belief about the techniques.
Tidbits. Not too much detail available.
Linear regression.
Restricted Boltzmann Machines. Linear combinations (averaging?).
Matrix Factorization-SVD (MF), K-Nearest neighbor (NB), Clustering (CL). Nice data to discuss.
Note the allusion to diversity. Mention this. Linear regression combination. Paper linked lists all 107 classifier types.
The current leaders…
So there is support for using ensembles. What is the intuition behind the success. On major aspect of concern is diversity. That is to say there are meaningful differences between the classifications of each classifier in the ensemble. Add picture.
Add images, examples. Have users seen the terminology in past slides? Voting?
Add images, examples. Have users seen the terminology in past slides? Voting?
The two most common strategies. Bagging—diversity comes focusing on different subsets of data. Boosting is incremental, and gets diversity by having each additional classifier a bit more focused than the last. Add pics. (Bootstrap Aggregating)
Originally rejected (journal of statistics). Can also create multiple NN using different random initial weights for diversity. Another bagging technique?
Originally rejected (journal of statistics). Can also create multiple NN using different random initial weights for diversity. Another bagging technique?
Note: if weights can’t be used directly, they can be used to make examples currently misclassified more important (or less, if lots of noise). generate a weighted sample of the example. Point is that the construction of a classifier