SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Why Ensembles Win
                                        Data Mining Competitions
                      A Predictive Analytics Center of Excellence (PACE) Tech Talk
                      November 14, 2012

                      Dean Abbott
                      Abbott Analytics, Inc.
                      Blog: http://abbottanalytics.blogspot.com
                      URL: http://www.abbottanalytics.com
                      Twitter: @deanabb
                      Email: dean@abbottanalytics.com
Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                     1
Outline

                            Motivation for Ensembles
                            How Ensembles are Built
                            Do Ensembles Violate Occams Razor?
                            Why Do Ensembles Win?




Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                               2
PAKDD Cup 2007 Results: Score
                             Metric Changes Winner
                                                                                                                                                        Par4cipant	
           AUCROC	
            AUCROC	
          Top	
  Decile	
    Top	
  Decile	
  
                                                                                                     Modeling	
            Par4cipant	
  Affilia4on	
  
                                       Modeling	
  Technique	
                                                                                        Affilia4on	
  Type	
  -­‐ (Trapezoid (Trapezoidal	
  Rule)	
   Response	
  Rate	
   Response	
  
                                                                                                 Implementa4on	
  -­‐>	
       Loca4on	
  -­‐>	
  
                                                                                                                                                            >	
                al	
  Rule)-­‐>	
   Rank	
  -­‐>	
           -­‐>	
     Rate	
  Rank	
  -­‐>	
  
Ensembles




             TreeNet	
  +	
  Logis-c	
  Regression	
                                            Salford	
  Systems	
       Mainland	
  China	
          Prac--oner	
                 70.01%	
                       1	
           13.00%	
                  7	
  
             Probit	
  Regression	
                                                             SAS	
                      USA	
                        Prac--oner	
                 69.99%	
                       2	
           13.13%	
                  6	
  
             MLP	
  +	
  n-­‐Tuple	
  Classifier	
                                                                          Brazil	
                     Prac--oner	
                 69.62%	
                       3	
           13.88%	
                  1	
  
             TreeNet	
                                                                          Salford	
  Systems	
       USA	
                        Prac--oner	
                 69.61%	
                       4	
           13.25%	
                  4	
  
             TreeNet	
                                                                          Salford	
  Systems	
       Mainland	
  China	
          Prac--oner	
                 69.42%	
                       5	
           13.50%	
                  2	
  
             Ridge	
  Regression	
                                                              Rank	
                     Belgium	
                    Prac--oner	
                 69.28%	
                       6	
           12.88%	
                  9	
  
             2-­‐Layer	
  Linear	
  Regression	
                                                                           USA	
                        Prac--oner	
                 69.14%	
                       7	
           12.88%	
                  9	
  
             Logis-c	
  Regression	
  +	
  Decision	
  Stump	
  +	
  AdaBoost	
  +	
  VFI	
                                Mainland	
  China	
          Academia	
                   69.10%	
                       8	
           13.25%	
                  4	
  
             Logis-c	
  Average	
  of	
  Single	
  Decision	
  Func-ons	
                                                  Australia	
                  Prac--oner	
                 68.85%	
                       9	
           12.13%	
                 17	
  
             Logis-c	
  Regression	
                                                            Weka	
                     Singapore	
                  Academia	
                   68.69%	
                      10	
           12.38%	
                 16	
  
             Logis-c	
  Regression	
                                                                                       Mainland	
  China	
          Prac--oner	
                 68.58%	
                      11	
           12.88%	
                  9	
  
             Decision	
  Tree	
  +	
  Neural	
  Network	
  +	
  Logis-c	
  Regression	
                                    Singapore	
                                               68.54%	
                      12	
           13.00%	
                  7	
  
             Scorecard	
  Linear	
  Addi-ve	
  Model	
                                          Xeno	
                     USA	
                        Prac--oner	
                 68.28%	
                      13	
           11.75%	
                 20	
  
             Random	
  Forest	
                                                                 Weka	
                     USA	
                                                     68.04%	
                      14	
           12.50%	
                 14	
  
             Expanding	
  Regression	
  Tree	
  +	
  RankBoost	
  +	
  Bagging	
                Weka	
                     Mainland	
  China	
          Academia	
                   68.02%	
                      15	
           12.50%	
                 14	
  
                                                                                                SAS	
  +	
  Salford	
  
             Logis-c	
  Regression	
                                                            Systems	
                  India	
                      Prac--oner	
                 67.58%	
                      16	
           12.00%	
                 19	
  
             J48	
  +	
  BayesNet	
                                                             Weka	
                     Mainland	
  China	
          Academia	
                   67.56%	
                      17	
           11.63%	
                 21	
  
             Neural	
  Network	
  +	
  General	
  Addi-ve	
  Model	
                            Tiberius	
                 USA	
                        Prac--oner	
                 67.54%	
                      18	
           11.63%	
                 21	
  
             Decision	
  Tree	
  +	
  Neural	
  Network	
                                                                  Mainland	
  China	
          Academia	
                   67.50%	
                      19	
           12.88%	
                  9	
  
             Decision	
  Tree	
  +	
  Neural	
  Network	
  +	
  Logis-c	
  Regression	
   SAS	
                            USA	
                        Academia	
                   66.71%	
                      20	
           13.50%	
                  2	
  
             Neural	
  Network	
                                                          SAS	
                            USA	
                        Academia	
                   66.36%	
                      21	
           12.13%	
                 17	
  
             Decision	
  Tree	
  +	
  Neural	
  Network	
  +	
  Logis-c	
  Regression	
   SAS	
                            USA	
                        Academia	
                   65.95%	
                      22	
           11.63%	
                 21	
  
             Neural	
  Network	
                                                          SAS	
                            USA	
                        Academia	
                   65.69%	
                      23	
            9.25%	
                 32	
  
             Mul--­‐dimension	
  Balanced	
  Random	
  Forest	
                                                            Mainland	
  China	
          Academia	
                   65.42%	
                      24	
           12.63%	
                 13	
  
             Neural	
  Network	
                                                          SAS	
                            USA	
                        Academia	
                   65.28%	
                      25	
           11.00%	
                 26	
  
             CHAID	
  Decision	
  Tree	
                                                  SPSS	
                           Argen-na	
                   Academia	
                   64.53%	
                      26	
           11.25%	
                 24	
  
             Under-­‐Sampling	
  Based	
  on	
  Clustering	
  +	
  CART	
  Decision	
  Tree	
                              Taiwan	
                     Academia	
                   64.45%	
                      27	
           11.13%	
                 25	
  
             Decision	
  Tree	
  +	
  Neural	
  Network	
  +	
  Polynomial	
  Regression	
  SAS	
                          USA	
                        Academia	
                   64.26%	
                      28	
            9.38%	
                 30	
  
Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                                                                                                                                                                                           3
Netflix Prize


                            2006 Netflix State-of-the-art (Cinematch)
                             RMSE = 0.9525
                            Prize: reduce this RMSE by 10% => 0.8572
                            2007: Korbell team Progress Prize winner
                               –  107 algorithm ensemble
                               –  Top algorithm: SVD with RMSE = 0.8914
                               –  2nd algorithm: Restricted Boltzmann Machine with RMSE =
                                  0.8990
                               –  Mini-ensemble (SVD+RBM) has RMSE = 0.88

                              http://techblog.netflix.com/2012/04/netflix-
                              recommendations-beyond-5-stars.html
Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                            4
Common Kinds of Ensembles
                                vs. Single Models



   Ensembles                       {
 Single
 Classifiers




                                                  From Zhuowen Tu, “Ensemble Classification Methods: Bagging,
                                                  Boosting, and Random Forests”
Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                                                5
What are Model Ensembles?

                          Combining                        outputs from multiple models into single
                           decision
                          Models can be created using the same algorithm, or
                           several different algorithms




                                                                       Decision Logic


                                                                     Ensemble Prediction
Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                                       6
Creating Model Ensembles Step 1:
                                                   Generate Component Models


                           Can Vary Data or                           Single data set
                           Model Parameters:
                             Case (Record) Weights —
                              bootstrapping, sampling
                             Data Values —
                              add noise, recode data
                             Learning  Parameters —
                              vary learning rates, pruning
                              severity, random seeds
                             Variable Subsets —                     Multiple models
                              vary candidate inputs,                 and predictions
                              features
Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                        7
Creating Model Ensembles Step 2:
                                                                     Combining Models

                        Combining                         Methods               Multiple models
                              –  Estimation: Average Outputs                     and predictions
                              –  Classification: Average
                                 probabilities or vote
                                 (best M of N)
                        Variance                    Reduction
                              –  Build complex, overfit models                          Combine
                              –  All models built in same manner
                        Bias            Reduction
                              –  Build simple models
                              –  Subsequent models weight
                                 records with errors more (or
                                 model actual errors)
                                                                                    Decision or
Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.                Prediction Value
                                                                                                     8
How Model Complexity Effects Errors




                             Giovanni Seni , John Elder, Ensemble Methods in Data Mining:
                             Improving Accuracy Through Combining Predictions, Morgan and
                             Claypool Publishers, 2010 (ISBN: 978-1608452842)
Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                            9
Commonly Used Information-
                                     Theoretic Complexity Penalties




                                BIC: Baysian Information Criterion
                                AIC: Akaike Information Criterion
                                MDL: Minimum Description Length

                      For a nice summary:
                      http://en.wikipedia.org/wiki/Regularization_(mathematics)


Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                  10
Four Keys to Effective
                                                   Ensembling

                            Diversity of opinion
                            Independence
                            Decentralization
                            Aggregation


                            From The Wisdom of Crowds, James
                             Surowiecki

                                                                       11

Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                            11
Bagging

                            Bagging Method
                                –  Create many data sets by
                                   bootstrapping (can also do this
                                   with cross validation)
                                –  Create one decision tree for
                                   each data set
                                –  Combine decision trees by
                                   averaging (or voting) final
                                   decisions
                                –  Primarily reduces model
                                   variance rather than bias
                            Results
                                –  On average, better than any         Final
                                                                      Answer
                                   individual tree
                                                                     (average)

Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                 12
Boosting (Adaboost)

                     Boosting              Method
                         –  Creating tree using training data set             Reweight
                                                                              examples
                         –  Score each data point, indicating when each       where
                            incorrect decision is made (errors)               classification
                                                                              incorrect
                         –  Retrain, giving rows with incorrect decisions
                            more weight. Repeat                               Combine
                         –  Final prediction is a weighted average of all     models via
                                                                              weighted sum
                            models-> model regularization.
                         –  Best to create weak models—simple models
                            (just a few splits for a decision tree) and let
                            the boosting iterations find the complexity.
                         –  Often used with trees or Naïve Bayes
                     Results
                         –  Usually better than individual tree or Bagging
Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                               13
Random Forest Ensembles

                      Random                Forest (RF) Method
                            –  Exact same methodology as
                               Bagging, but with a twist
                            –  At each split, rather than using the
                               entire set of candidate inputs, use
                               a random subset of candidate
                               inputs
                            –  Generates diversity of samples and
                               inputs (splits)
                      Results

                            –  On average, better than any              Final
                               individual tree, Bagging, or even       Answer
                               Boosting                               (average)


Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                  14
Stochastic Gradient Boosting

                      Implemented   in MART (Jerry Friedman), and
                       TreeNet (Salford Systems)                                       Predict errors in
                                                                                       ensemble tree
                      Algorithm
                                                                                       so far
                            –  Begin with a simple model—a constant value
                               for a model                                             Combine
                            –  Build a simple tree (perhaps 6 terminal nodes)          models via
                               —now there are 6 possible levels, whereas               weighted sum
                               before there was one level
                            –  Score the model and compute errors. The score           Build
                               is the sum of all previous trees, weighted by a
                                 learning rate
                            –  Build a new tree with the errors as the target
                               variable.
                      Results
                            –  TreeNet has won 2 KDD-Cup competitions and
                               numerous others
                            –  It is less prone to outliers and overfit than
                               Adaboost                                           Final Answer
                                                                                 (additive model)
Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                                       15
Ensembles of Trees: Smoothers

                           Ensembles                                smooth jagged decision boundaries




                        Pictures from
                        T.G. Dietterich. Ensemble methods in machine learning. In
                        Multiple Classier Systems, Cagliari, Italy, 2000.

Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                                         16
Heterogeneous Model
                                                      Ensembles on Glass Data

                                                  Max Error          Min Error        Avera ge Error       Model prediction diversity
                                       40 %                                                                 obtained by using different
                                                                                                            algorithms: tree, NN, RBF,
                                       35 %                                                                 Gaussian, Regression, k-NN
        Percent Classification Error




                                       30 %                                                                Combining 3-5 models on
                                                                                                            average better than best
                                       25 %
                                                                                                            single model
                                       20 %
                                                                                                           Combining all 6 models not
                                       15 %                                                                 best (best is 3&4 model
                                                                                                            combination), but is close
                                       10 %
                                                                                                           The is an example of reducing
                                        5%                                                                  model variance through
                                        0%
                                                                                                            ensembles, but not model bias
                                              1       2        3         4        5            6
                                                          Number Models Combin ed



Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                                                                          17
Direct Marketing Example:
                                         Considerations or I-Miner
                                                                From Abbott, D.W., "How to Improve Customer
                                                                Acquisition Models with Ensembles", presented at
                                                                Predictive Analytics World Conference, Washington,
                                                                D.C., October 20, 2009.




                                                          Steps:
                                                          1.  Join by record—all models applied to same data in
                                                              same row order
                                                          2.  Change probability names
                                                          3.  Average probabilities
                                                                     1.    Decision is avg_prob > threshold
                                                          4.  Decile Probability Ranks
                                                                                                              18
Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
Direct Marketing Example: Variable
                             Inclusion in Model Ensembles

                            Twenty-Five different                          # Models with Common
                                                                                 Variables
                             variables represented                        # Models      # Variables

                             in the ten models
                            Only five were
                             represented in seven
                             or more models
                            Twelve were                             From Abbott, D.W., "How to Improve
                             represented in one or                   Customer Acquisition Models with
                                                                     Ensembles", presented at
                             two models                              Predictive Analytics World
                                                                     Conference, Washington, D.C.,
                                                                     October 20, 2009.
                                                                                           19
Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
Fraud Detection Example:
                                        Deployment Stream

                                                                      Model scoring
                                                                     picks up scores
                                                                        from each
                                                                     model, combines
                                                                     in an ensemble,
                                                                        and pushes
                                                                     scores back to
                                                                         database




Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                   20
Fraud Detection Example: Overall
                                  Model Score on Validation Data

                                                  Total Score (from validation population)
                                                                                                                                 “Score”
                         10.0                                                                 9.5                                weights
                                                                                  8.8
                                                                                                                                  false
      Normalized Score




                          9.0                                  7.5 7.0
                          8.0         7.2          7.2                 6.8 6.9                                      7.2          alarms
                          7.0   6.1                                                                 6.3 6.8               6.3
                                            5.3          5.7                                                  5.3                  and
                          6.0
                          5.0                                                                                                   sensitivi
                          4.0                                                                                                      ty
                          3.0
                          2.0                                                           1.0
                          1.0                                                                                                   Overall,
                                                                                                                                ensemble




                                                                                                      g
                                                                                         W t Te rst

                                                                                               Te g
                                                                                        er e 5 ge

                                                                                               5 st
                                                                                          e r ve e
                                                                                 10

                                                                                               se 1 1
                            1
                                 2
                                       3
                                             4
                                                     5
                                                           6
                                                                     7
                                                                         8
                                                                             9

                                                                                                                                    is




                                                                                                   in
                                                                                             st tin
                                                                                       A v A bl


                                                                                                   e
                                                                                             s o
                                                                                      Av ag ra




                                                                                                 st
                                                                                           ag B
                                                                                                 m




                                                                                           or s
                                                                                          Be W
                                                                                                                                 clearly
                                                                                  En




                                                                                             e
                                                                                                                                best, and
                                                                                                                                  much
                                                                             Model                                               better
                                                                                                                                than best
                            From Abbott, D, and Tom Konchan, “Advanced Fraud Detection                                              on
                            Techniques for Vendor Payments”, Predictive Analytics Summit,
                                                                                                                                 testing
                            San Diego, CA, February 24, 2011.
Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.                                                                data 21
Are Ensembles Better?

                            Accuracy? Yes
                            Interpretability? No
                            Do Ensembles contradict Occam’s Razor?
                               –  Principle: simpler models generalize better; avoid
                                  overfit!
                               –  They are more complex than single models (RF
                                  may have hundreds of trees in the ensemble)
                               –  Yet these more complex models perform better on
                                  held-out data
                               –  But…are they really more complex?
Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                       22
Generalized Degrees of
                                                 Freedom

                            Linear Regression: a degree of freedom in the
                             model is simple a parameter
                               –  Does not extrapolate to non-linear methods
                               –  Number of “parameters” in non-linear methods can
                                  produce more complexity or less
                            Enter…Generalized Degrees of Freedom (GDF)
                               –  GDF (Ye 1998) “randomly perturbs (adds noise to)
                                  the output variable, re-runs the modeling
                                  procedure, and measures the changes to the
                                  estimates” (for same number of parameters)
Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                     23
The Math of GDF




                        From Giovanni Seni , John Elder, Ensemble Methods in Data Mining: Improving
                        Accuracy Through Combining Predictions, Morgan and Claypool Publishers, 2010
                        (ISBN: 978-1608452842)


Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                                       24
The Effect of GDF




                          From Elder, J.F.E IV, “The Generalization Paradox of Ensembles”, Journal of
                          Computational and Graphical Statistics, Volume 12, Number 4, Pages 853–864
Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                                        25
Why Ensembles Win

                            Performance, performance, performance
                            Single model sometimes provide insufficient accuracy
                               –  Neural networks become stuck in local minima
                               –  Decision trees
                                            Run out of data
                                            Are greedy—can get fooled early
                               –  Single algorithms keep pushing performance using the same
                                  ideas (basis function / algorithm), and are incapable of
                                   thinking outside of their box
                            Different algorithms or algorithms built using
                             resample data achieve the same level of accuracy but
                             on different cases—they identify different ways to get
                             the same level of accuracy

Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                              26
Conclusion


                            Ensembles can achieve significant model
                             performance improvements
                            The key to good ensembles is diversity in
                             sampling and variable selection
                            Can be applied to single algorithm, or across
                             multiple algorithms
                            Just do it!

Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                  27
References

                            Giovanni Seni , John Elder, Ensemble Methods in Data Mining:
                             Improving Accuracy Through Combining Predictions, Morgan and
                             Claypool Publishers, 2010 (ISBN: 978-1608452842)
                            Elder, J.F.E IV, “The Generalization Paradox of Ensembles”, Journal
                             of Computational and Graphical Statistics, Volume 12, Number 4,
                             Pages 853–864 DOI: 10.1198/1061860032733

                            Abbott, D.W., “The Benefits of Creating Ensembles of Classifiers”,
                             Abbott Analytics, Inc., http://www.abbottanalytics.com/white-paper-
                             classifiers.php

                            Abbott, D.W., “A Comparison of Algorithms at PAKDD2007”, Blog
                             post at http://abbottanalytics.blogspot.com/2007/05/comparison-of-
                             algorithms-at-pakdd2007.html


Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                                   28
References

                            Tu, Zhuowen, “Ensemble Classification Methods: Bagging, Boosting,
                             and Random Forests”, http://www.loni.ucla.edu/~ztu/courses/
                             2010_CS_spring/cs269_2010_ensemble.pdf

                            Ye, J. (1998), “On Measuring and Correcting the Effects of Data
                             Mining and Model Selection,” Journal of the American Statistical
                             Association, 93, 120–131.




Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
                                                                                                 29

Más contenido relacionado

Similar a PACE Tech Talk 14-Nov-12 - Why Model Ensembles Win Data Mining Competitions

Barnan Das PhD Preliminary Exam
Barnan Das PhD Preliminary ExamBarnan Das PhD Preliminary Exam
Barnan Das PhD Preliminary ExamBarnan Das
 
CDC Factory Overview
CDC Factory OverviewCDC Factory Overview
CDC Factory OverviewSachin Jain
 
monsanto 08-23-05a
monsanto 08-23-05amonsanto 08-23-05a
monsanto 08-23-05afinance28
 
Bio-IT World 2009: Adjusting Information Flow from In-house HTS to Global Out...
Bio-IT World 2009: Adjusting Information Flow from In-house HTS to Global Out...Bio-IT World 2009: Adjusting Information Flow from In-house HTS to Global Out...
Bio-IT World 2009: Adjusting Information Flow from In-house HTS to Global Out...Brian Bissett
 
Pp 5.1 standohyd competition comparison incl drf
Pp 5.1 standohyd competition comparison incl drfPp 5.1 standohyd competition comparison incl drf
Pp 5.1 standohyd competition comparison incl drfSabet Milhaeil
 
Btf exhibitors presentation, atlantis, 5 25-12
Btf exhibitors presentation, atlantis, 5 25-12Btf exhibitors presentation, atlantis, 5 25-12
Btf exhibitors presentation, atlantis, 5 25-12pmcurran1
 
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...Fwdays
 
Fusesource camel-persistence-part1-webinar-charles-moulliard
Fusesource camel-persistence-part1-webinar-charles-moulliardFusesource camel-persistence-part1-webinar-charles-moulliard
Fusesource camel-persistence-part1-webinar-charles-moulliardCharles Moulliard
 
Deutsche Bank Investor Tour Presentation
	 Deutsche Bank Investor Tour Presentation	 Deutsche Bank Investor Tour Presentation
Deutsche Bank Investor Tour Presentationfinance2
 
A fast implementation of matrix-matrix product in double-double precision on ...
A fast implementation of matrix-matrix product in double-double precision on ...A fast implementation of matrix-matrix product in double-double precision on ...
A fast implementation of matrix-matrix product in double-double precision on ...Maho Nakata
 
Scorm標準介紹
Scorm標準介紹Scorm標準介紹
Scorm標準介紹guestd2f047
 
26 a6 emc europe - arnaud christoffel
26 a6   emc europe - arnaud christoffel26 a6   emc europe - arnaud christoffel
26 a6 emc europe - arnaud christoffelScott Adams
 
Linux Power Management Slideshare
Linux Power Management SlideshareLinux Power Management Slideshare
Linux Power Management SlidesharePatrick Bellasi
 

Similar a PACE Tech Talk 14-Nov-12 - Why Model Ensembles Win Data Mining Competitions (14)

Barnan Das PhD Preliminary Exam
Barnan Das PhD Preliminary ExamBarnan Das PhD Preliminary Exam
Barnan Das PhD Preliminary Exam
 
CDC Factory Overview
CDC Factory OverviewCDC Factory Overview
CDC Factory Overview
 
monsanto 08-23-05a
monsanto 08-23-05amonsanto 08-23-05a
monsanto 08-23-05a
 
Bio-IT World 2009: Adjusting Information Flow from In-house HTS to Global Out...
Bio-IT World 2009: Adjusting Information Flow from In-house HTS to Global Out...Bio-IT World 2009: Adjusting Information Flow from In-house HTS to Global Out...
Bio-IT World 2009: Adjusting Information Flow from In-house HTS to Global Out...
 
Pp 5.1 standohyd competition comparison incl drf
Pp 5.1 standohyd competition comparison incl drfPp 5.1 standohyd competition comparison incl drf
Pp 5.1 standohyd competition comparison incl drf
 
Btf exhibitors presentation, atlantis, 5 25-12
Btf exhibitors presentation, atlantis, 5 25-12Btf exhibitors presentation, atlantis, 5 25-12
Btf exhibitors presentation, atlantis, 5 25-12
 
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
Anna Vergeles, Nataliia Manakova "Unsupervised Real-Time Stream-Based Novelty...
 
Fusesource camel-persistence-part1-webinar-charles-moulliard
Fusesource camel-persistence-part1-webinar-charles-moulliardFusesource camel-persistence-part1-webinar-charles-moulliard
Fusesource camel-persistence-part1-webinar-charles-moulliard
 
Deutsche Bank Investor Tour Presentation
	 Deutsche Bank Investor Tour Presentation	 Deutsche Bank Investor Tour Presentation
Deutsche Bank Investor Tour Presentation
 
A fast implementation of matrix-matrix product in double-double precision on ...
A fast implementation of matrix-matrix product in double-double precision on ...A fast implementation of matrix-matrix product in double-double precision on ...
A fast implementation of matrix-matrix product in double-double precision on ...
 
Scorm標準介紹
Scorm標準介紹Scorm標準介紹
Scorm標準介紹
 
Improvement e13 link
Improvement e13 linkImprovement e13 link
Improvement e13 link
 
26 a6 emc europe - arnaud christoffel
26 a6   emc europe - arnaud christoffel26 a6   emc europe - arnaud christoffel
26 a6 emc europe - arnaud christoffel
 
Linux Power Management Slideshare
Linux Power Management SlideshareLinux Power Management Slideshare
Linux Power Management Slideshare
 

Último

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Último (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

PACE Tech Talk 14-Nov-12 - Why Model Ensembles Win Data Mining Competitions

  • 1. Why Ensembles Win Data Mining Competitions A Predictive Analytics Center of Excellence (PACE) Tech Talk November 14, 2012 Dean Abbott Abbott Analytics, Inc. Blog: http://abbottanalytics.blogspot.com URL: http://www.abbottanalytics.com Twitter: @deanabb Email: dean@abbottanalytics.com Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 1
  • 2. Outline   Motivation for Ensembles   How Ensembles are Built   Do Ensembles Violate Occams Razor?   Why Do Ensembles Win? Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 2
  • 3. PAKDD Cup 2007 Results: Score Metric Changes Winner Par4cipant   AUCROC   AUCROC   Top  Decile   Top  Decile   Modeling   Par4cipant  Affilia4on   Modeling  Technique   Affilia4on  Type  -­‐ (Trapezoid (Trapezoidal  Rule)   Response  Rate   Response   Implementa4on  -­‐>   Loca4on  -­‐>   >   al  Rule)-­‐>   Rank  -­‐>   -­‐>   Rate  Rank  -­‐>   Ensembles TreeNet  +  Logis-c  Regression   Salford  Systems   Mainland  China   Prac--oner   70.01%   1   13.00%   7   Probit  Regression   SAS   USA   Prac--oner   69.99%   2   13.13%   6   MLP  +  n-­‐Tuple  Classifier   Brazil   Prac--oner   69.62%   3   13.88%   1   TreeNet   Salford  Systems   USA   Prac--oner   69.61%   4   13.25%   4   TreeNet   Salford  Systems   Mainland  China   Prac--oner   69.42%   5   13.50%   2   Ridge  Regression   Rank   Belgium   Prac--oner   69.28%   6   12.88%   9   2-­‐Layer  Linear  Regression   USA   Prac--oner   69.14%   7   12.88%   9   Logis-c  Regression  +  Decision  Stump  +  AdaBoost  +  VFI   Mainland  China   Academia   69.10%   8   13.25%   4   Logis-c  Average  of  Single  Decision  Func-ons   Australia   Prac--oner   68.85%   9   12.13%   17   Logis-c  Regression   Weka   Singapore   Academia   68.69%   10   12.38%   16   Logis-c  Regression   Mainland  China   Prac--oner   68.58%   11   12.88%   9   Decision  Tree  +  Neural  Network  +  Logis-c  Regression   Singapore   68.54%   12   13.00%   7   Scorecard  Linear  Addi-ve  Model   Xeno   USA   Prac--oner   68.28%   13   11.75%   20   Random  Forest   Weka   USA   68.04%   14   12.50%   14   Expanding  Regression  Tree  +  RankBoost  +  Bagging   Weka   Mainland  China   Academia   68.02%   15   12.50%   14   SAS  +  Salford   Logis-c  Regression   Systems   India   Prac--oner   67.58%   16   12.00%   19   J48  +  BayesNet   Weka   Mainland  China   Academia   67.56%   17   11.63%   21   Neural  Network  +  General  Addi-ve  Model   Tiberius   USA   Prac--oner   67.54%   18   11.63%   21   Decision  Tree  +  Neural  Network   Mainland  China   Academia   67.50%   19   12.88%   9   Decision  Tree  +  Neural  Network  +  Logis-c  Regression   SAS   USA   Academia   66.71%   20   13.50%   2   Neural  Network   SAS   USA   Academia   66.36%   21   12.13%   17   Decision  Tree  +  Neural  Network  +  Logis-c  Regression   SAS   USA   Academia   65.95%   22   11.63%   21   Neural  Network   SAS   USA   Academia   65.69%   23   9.25%   32   Mul--­‐dimension  Balanced  Random  Forest   Mainland  China   Academia   65.42%   24   12.63%   13   Neural  Network   SAS   USA   Academia   65.28%   25   11.00%   26   CHAID  Decision  Tree   SPSS   Argen-na   Academia   64.53%   26   11.25%   24   Under-­‐Sampling  Based  on  Clustering  +  CART  Decision  Tree   Taiwan   Academia   64.45%   27   11.13%   25   Decision  Tree  +  Neural  Network  +  Polynomial  Regression  SAS   USA   Academia   64.26%   28   9.38%   30   Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 3
  • 4. Netflix Prize   2006 Netflix State-of-the-art (Cinematch) RMSE = 0.9525   Prize: reduce this RMSE by 10% => 0.8572   2007: Korbell team Progress Prize winner –  107 algorithm ensemble –  Top algorithm: SVD with RMSE = 0.8914 –  2nd algorithm: Restricted Boltzmann Machine with RMSE = 0.8990 –  Mini-ensemble (SVD+RBM) has RMSE = 0.88 http://techblog.netflix.com/2012/04/netflix- recommendations-beyond-5-stars.html Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 4
  • 5. Common Kinds of Ensembles vs. Single Models Ensembles { Single Classifiers From Zhuowen Tu, “Ensemble Classification Methods: Bagging, Boosting, and Random Forests” Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 5
  • 6. What are Model Ensembles?   Combining outputs from multiple models into single decision   Models can be created using the same algorithm, or several different algorithms Decision Logic Ensemble Prediction Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 6
  • 7. Creating Model Ensembles Step 1: Generate Component Models Can Vary Data or Single data set Model Parameters:   Case (Record) Weights — bootstrapping, sampling   Data Values — add noise, recode data   Learning Parameters — vary learning rates, pruning severity, random seeds   Variable Subsets — Multiple models vary candidate inputs, and predictions features Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 7
  • 8. Creating Model Ensembles Step 2: Combining Models   Combining Methods Multiple models –  Estimation: Average Outputs and predictions –  Classification: Average probabilities or vote (best M of N)   Variance Reduction –  Build complex, overfit models Combine –  All models built in same manner   Bias Reduction –  Build simple models –  Subsequent models weight records with errors more (or model actual errors) Decision or Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. Prediction Value 8
  • 9. How Model Complexity Effects Errors Giovanni Seni , John Elder, Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions, Morgan and Claypool Publishers, 2010 (ISBN: 978-1608452842) Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 9
  • 10. Commonly Used Information- Theoretic Complexity Penalties BIC: Baysian Information Criterion AIC: Akaike Information Criterion MDL: Minimum Description Length For a nice summary: http://en.wikipedia.org/wiki/Regularization_(mathematics) Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 10
  • 11. Four Keys to Effective Ensembling   Diversity of opinion   Independence   Decentralization   Aggregation   From The Wisdom of Crowds, James Surowiecki 11 Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 11
  • 12. Bagging   Bagging Method –  Create many data sets by bootstrapping (can also do this with cross validation) –  Create one decision tree for each data set –  Combine decision trees by averaging (or voting) final decisions –  Primarily reduces model variance rather than bias   Results –  On average, better than any Final Answer individual tree (average) Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 12
  • 13. Boosting (Adaboost)   Boosting Method –  Creating tree using training data set Reweight examples –  Score each data point, indicating when each where incorrect decision is made (errors) classification incorrect –  Retrain, giving rows with incorrect decisions more weight. Repeat Combine –  Final prediction is a weighted average of all models via weighted sum models-> model regularization. –  Best to create weak models—simple models (just a few splits for a decision tree) and let the boosting iterations find the complexity. –  Often used with trees or Naïve Bayes   Results –  Usually better than individual tree or Bagging Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 13
  • 14. Random Forest Ensembles   Random Forest (RF) Method –  Exact same methodology as Bagging, but with a twist –  At each split, rather than using the entire set of candidate inputs, use a random subset of candidate inputs –  Generates diversity of samples and inputs (splits)   Results –  On average, better than any Final individual tree, Bagging, or even Answer Boosting (average) Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 14
  • 15. Stochastic Gradient Boosting   Implemented in MART (Jerry Friedman), and TreeNet (Salford Systems) Predict errors in ensemble tree   Algorithm so far –  Begin with a simple model—a constant value for a model Combine –  Build a simple tree (perhaps 6 terminal nodes) models via —now there are 6 possible levels, whereas weighted sum before there was one level –  Score the model and compute errors. The score Build is the sum of all previous trees, weighted by a learning rate –  Build a new tree with the errors as the target variable.   Results –  TreeNet has won 2 KDD-Cup competitions and numerous others –  It is less prone to outliers and overfit than Adaboost Final Answer (additive model) Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 15
  • 16. Ensembles of Trees: Smoothers   Ensembles smooth jagged decision boundaries Pictures from T.G. Dietterich. Ensemble methods in machine learning. In Multiple Classier Systems, Cagliari, Italy, 2000. Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 16
  • 17. Heterogeneous Model Ensembles on Glass Data Max Error Min Error Avera ge Error   Model prediction diversity 40 % obtained by using different algorithms: tree, NN, RBF, 35 % Gaussian, Regression, k-NN Percent Classification Error 30 %   Combining 3-5 models on average better than best 25 % single model 20 %   Combining all 6 models not 15 % best (best is 3&4 model combination), but is close 10 %   The is an example of reducing 5% model variance through 0% ensembles, but not model bias 1 2 3 4 5 6 Number Models Combin ed Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 17
  • 18. Direct Marketing Example: Considerations or I-Miner From Abbott, D.W., "How to Improve Customer Acquisition Models with Ensembles", presented at Predictive Analytics World Conference, Washington, D.C., October 20, 2009. Steps: 1.  Join by record—all models applied to same data in same row order 2.  Change probability names 3.  Average probabilities 1.  Decision is avg_prob > threshold 4.  Decile Probability Ranks 18 Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
  • 19. Direct Marketing Example: Variable Inclusion in Model Ensembles   Twenty-Five different # Models with Common Variables variables represented # Models # Variables in the ten models   Only five were represented in seven or more models   Twelve were From Abbott, D.W., "How to Improve represented in one or Customer Acquisition Models with Ensembles", presented at two models Predictive Analytics World Conference, Washington, D.C., October 20, 2009. 19 Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved.
  • 20. Fraud Detection Example: Deployment Stream Model scoring picks up scores from each model, combines in an ensemble, and pushes scores back to database Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 20
  • 21. Fraud Detection Example: Overall Model Score on Validation Data Total Score (from validation population) “Score” 10.0 9.5 weights 8.8 false Normalized Score 9.0 7.5 7.0 8.0 7.2 7.2 6.8 6.9 7.2 alarms 7.0 6.1 6.3 6.8 6.3 5.3 5.7 5.3 and 6.0 5.0 sensitivi 4.0 ty 3.0 2.0 1.0 1.0 Overall, ensemble g W t Te rst Te g er e 5 ge 5 st e r ve e 10 se 1 1 1 2 3 4 5 6 7 8 9 is in st tin A v A bl e s o Av ag ra st ag B m or s Be W clearly En e best, and much Model better than best From Abbott, D, and Tom Konchan, “Advanced Fraud Detection on Techniques for Vendor Payments”, Predictive Analytics Summit, testing San Diego, CA, February 24, 2011. Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. data 21
  • 22. Are Ensembles Better?   Accuracy? Yes   Interpretability? No   Do Ensembles contradict Occam’s Razor? –  Principle: simpler models generalize better; avoid overfit! –  They are more complex than single models (RF may have hundreds of trees in the ensemble) –  Yet these more complex models perform better on held-out data –  But…are they really more complex? Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 22
  • 23. Generalized Degrees of Freedom   Linear Regression: a degree of freedom in the model is simple a parameter –  Does not extrapolate to non-linear methods –  Number of “parameters” in non-linear methods can produce more complexity or less   Enter…Generalized Degrees of Freedom (GDF) –  GDF (Ye 1998) “randomly perturbs (adds noise to) the output variable, re-runs the modeling procedure, and measures the changes to the estimates” (for same number of parameters) Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 23
  • 24. The Math of GDF From Giovanni Seni , John Elder, Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions, Morgan and Claypool Publishers, 2010 (ISBN: 978-1608452842) Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 24
  • 25. The Effect of GDF From Elder, J.F.E IV, “The Generalization Paradox of Ensembles”, Journal of Computational and Graphical Statistics, Volume 12, Number 4, Pages 853–864 Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 25
  • 26. Why Ensembles Win   Performance, performance, performance   Single model sometimes provide insufficient accuracy –  Neural networks become stuck in local minima –  Decision trees   Run out of data   Are greedy—can get fooled early –  Single algorithms keep pushing performance using the same ideas (basis function / algorithm), and are incapable of thinking outside of their box   Different algorithms or algorithms built using resample data achieve the same level of accuracy but on different cases—they identify different ways to get the same level of accuracy Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 26
  • 27. Conclusion   Ensembles can achieve significant model performance improvements   The key to good ensembles is diversity in sampling and variable selection   Can be applied to single algorithm, or across multiple algorithms   Just do it! Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 27
  • 28. References   Giovanni Seni , John Elder, Ensemble Methods in Data Mining: Improving Accuracy Through Combining Predictions, Morgan and Claypool Publishers, 2010 (ISBN: 978-1608452842)   Elder, J.F.E IV, “The Generalization Paradox of Ensembles”, Journal of Computational and Graphical Statistics, Volume 12, Number 4, Pages 853–864 DOI: 10.1198/1061860032733   Abbott, D.W., “The Benefits of Creating Ensembles of Classifiers”, Abbott Analytics, Inc., http://www.abbottanalytics.com/white-paper- classifiers.php   Abbott, D.W., “A Comparison of Algorithms at PAKDD2007”, Blog post at http://abbottanalytics.blogspot.com/2007/05/comparison-of- algorithms-at-pakdd2007.html Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 28
  • 29. References   Tu, Zhuowen, “Ensemble Classification Methods: Bagging, Boosting, and Random Forests”, http://www.loni.ucla.edu/~ztu/courses/ 2010_CS_spring/cs269_2010_ensemble.pdf   Ye, J. (1998), “On Measuring and Correcting the Effects of Data Mining and Model Selection,” Journal of the American Statistical Association, 93, 120–131. Copyright © 2000-2012, Abbott Analytics, Inc. All rights reserved. 29