SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Ensemble Methods
Nandita Bhaskhar
content adapted from
(a) Hastie, Tibshirani & Friedman and
(b) Protopapas, Rader & Pan
(c) Jason Brownlee
Nov 12th, 2021
Outline
▪ Decision Trees Recap
▪ Ensemble Methods: Intro
▪ Model Averaging
▪ Bagging
▪ Random Forests
▪ Boosting
▪ Gradient boosting
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 2
Decision Trees Recap
▪ Represented by a series of binary splits
▪ Internal node is a value query on a feature or attribute
▪ Is 𝒙𝒊 > 𝟎. 𝟓? If yes, go right. Else, go left
▪ Terminal nodes are the decision nodes
▪ Classification → class label
▪ Regression → output score
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 3
Decision Trees Recap
▪ The tree is grown using training data using recursive splitting
▪ It is also pruned to an optimal size (often evaluated using
cross-validation)
▪ Inference: pass their features down the tree till it reaches a
terminal node to get the decision
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 4
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 5
Decision
Trees
Recap
Decision Trees Recap
Pros
▪ Can handle large datasets
▪ Can handle mixed predictors
(continuous, discrete, qualitative)
▪ Can ignore redundant variables
▪ Can easily handle missing data
▪ Easy to interpret if small
Cons
▪ Prediction performance is poor
▪ Does not generalize well
▪ Large trees are hard to interpret
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 6
Outline
▪ Decision Trees Recap
▪ Ensemble Methods: Intro
▪ Model Averaging
▪ Bagging
▪ Random Forests
▪ Boosting
▪ Gradient boosting
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 7
Ensemble Methods: Intro
▪ Methods to improve the performance of weak learners
▪ Weak learners (e.g., classification trees) don’t perform that
well
▪ What do we do??
▪ Wisdom of the crowds!
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 8
Ensemble Methods: Intro
▪ Wisdom of the crowds!
▪ Shift responsibility from 1 weak learner to an “ensemble” of
such weak learners
▪ Set of weak learners are combined to form a strong learner
with better performance than any of them individually
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 9
Ensemble Methods: Intro
▪ A single decision tree often produces noisy / weak classfiers
▪ They DON’T generalize well
▪ But they are super fast, adaptive and robust!
▪ Solution: Let’s learn multiple trees!
▪ How to ensure they don’t all just learn the same thing??
▪ TRIVIAL Solution
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 10
Outline
▪ Decision Trees Recap
▪ Ensemble Methods: Intro
▪ Bagging
▪ Random Forests
▪ Boosting
▪ Gradient boosting
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 11
Bagging
▪ Bagging (Breiman, 1996)
▪ Bootstrap Aggregating: to ensure lower variance
▪ Bootstrap sampling: get different splits / subsets of the data
▪ Aggregating: majority voting or averaging
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 12
Bagging
▪ Averages a given procedure over many samples to reduce its
variance
▪ Multiple realizations of the data (via multiple samples) →
▪ calculate predictions multiple times →
▪ average the predictions →
▪ more certain estimations (lesser variance)
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 13
Bagging
▪ Let 𝑓(𝑥) be the classifier and let 𝑏 be a sample set from data
መ
𝑓𝑎𝑔𝑔 𝑥 =
1
𝐵
෍
𝑏=1
𝐵
𝑓𝑏(𝑥)
Or
መ
𝑓𝑎𝑔𝑔 𝑥 = Majority Vote 𝑓𝑏 𝑥 𝑏=1
𝐵
▪ Independent of type of classifier
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 14
Bagging
▪ Bootstrap sampling:
▪ Collect 𝐵 ≅ 100 subsets by sampling with replacement
from training data
▪ Construct 𝐵 trees (one classifier for one subset)
▪ Aggregate them using aggregator of your choice
▪ Parallelizable
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 15
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 16
Bagging
Bagging
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 17
Bagging
▪ What about cross validation?
▪ Each bootstrap sample set uses only a subset of the data
▪ Unused samples: out-of-bag samples (OOB)
▪ Calculate overall error rate on out-of-bag samples for all
bootstraps
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 18
Bagging
▪ Reduces overfitting (i.e., variance)
▪ Can work with any type of classifier (here focus on trees)
▪ Easy to parallelize
▪ But loses on interpretability to single decision tree
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 19
Outline
▪ Decision Trees Recap
▪ Ensemble Methods: Intro
▪ Bagging
▪ Random Forests
▪ Boosting
▪ Gradient boosting
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 20
Random Forests
Issues with Bagging:
▪ Expectation of bagged trees is equal to expectation of
individual trees
𝐄 መ
𝑓𝑎𝑔𝑔 𝑥 = 𝐄 [𝑓𝑏(𝑥)]
▪ Bias of bagged trees is the same as that of individual trees
▪ Each tree is identically distributed (i.d. not i.i.d). Bagged trees
are correlated!
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 21
Random Forests
Issues with Bagging:
▪ Averaging 𝐵 i.i.d. variables scales their variance 𝜎2 to 𝜎2/𝐵
▪ But averaging 𝐵 i.d. variables with pairwise correlations 𝜌
and variance 𝜎2
gives their final variance to be
𝜌𝜎2 +
1 − 𝜌
B
𝜎2
▪ Only the 2nd term reduces on bagging, but the first term
remains
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 22
Random Forests
▪ How to decorrelate the trees generated for bagging?
▪ We want to generate 𝐵 i.i.d. trees such that their bias is the
same, but variance reduces
▪ Ideas:
▪ We can restrict how many times a feature can be used
▪ We only allow a certain number of features
▪ Etc..
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 23
Random Forests
▪ Ideas:
▪ We can restrict how many times a feature can be used
▪ We only allow a certain number of features
▪ Etc..
▪ Bias changes for the above ideas 
▪ Instead, choose only subset of features for each bag
▪ Decorrelated trees when you randomly select the subset
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 24
Random Forests
▪ As in bagging, choose 𝐵 bootstrapped splits (or bags)
▪ For each split in the 𝐵 trees, consider only 𝑘 features from
the full feature set 𝑚
▪ 𝑘 = 𝑚 → same as Bagging
▪ 𝑘 < 𝑚 → Random Forests
▪ OOB error rate can be used to fit RF in one sequence with
cross validation done along the way
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 25
Random Forests
▪ Works great in practice. 𝑘 to be treated as a hyperparameter
Issues:
▪ When you have large number of features, yet very small
number of relevant features
▪ Prob(selecting the relevant feature in 𝑘) is very small
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 26
Outline
▪ Decision Trees Recap
▪ Ensemble Methods: Intro
▪ Bagging
▪ Random Forests
▪ Boosting
▪ Gradient boosting
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 27
Boosting
▪ Boosting does not involve bootstrap sampling
▪ Trees are grown sequentially: each tree is grown using
information from previously grown trees
▪ Like bagging, boosting involves combining many decision
trees, 𝑓1, … , 𝑓𝐵
▪ Lecture slides: AdaBoost
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 28
Boosting
▪ AdaBoost:
▪ Weighted observations
▪ Put more weight on difficult to classify instances and less on
those already handled well
▪ New weak learners are added sequentially that focus their
training on the more difficult patterns
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 29
Outline
▪ Decision Trees Recap
▪ Ensemble Methods: Intro
▪ Bagging
▪ Random Forests
▪ Boosting
▪ Gradient boosting
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 30
Gradient Boosting
Generalization: AdaBoost, Adaptive Reweighting & Combining
Three elements –
▪ A loss function to be optimized
▪ A weak learner to make predictions (decision trees)
▪ An additive model to add weak learners to minimize the loss
function
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 31
Gradient Boosting
Loss Function:
▪ Any differentiable function
▪ Most standard loss functions
▪ L2 loss for regression
▪ Log loss for classification
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 32
Gradient Boosting
Weak Learners:
▪ Decision trees (regression trees) learnt greedily
▪ Constrain the trees to ensure they remain weak
▪ Number of layers, leaves, nodes, splits, etc
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 33
Gradient Boosting
Additive Model:
▪ Add trees one at a time (existing trees are not changed)
▪ Functional gradient descent to minimize loss when adding trees
▪ calculate the loss
▪ add the tree to the model that reduces the loss (i.e., follow the gradient)
▪ parameterize the tree, then modify the parameters of the tree and move in
the right direction by reducing the residual loss.
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 34
Gradient Boosting
Given the current model,
▪ We fit a decision tree to the residuals from the model
▪ Response variable now is the residuals
▪ We then add this new decision tree into the fitted function in
order to update the residuals
▪ The learning rate must be controlled
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 35
Gradient Boosting
Tunable Parameters:
▪ Number of trees (𝐵): Boosting can overfit unlike Bagging /
RFs. Use cross-validation!
▪ Shrinkage parameter (𝜆): small positive number that sets the
learning rate
▪ Number of splits in each tree (𝑑): Usually just choose 𝑑 = 1,
i.e., tree stumps work well
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 36
Gradient Boosting
Variants:
▪ Varying the tree constraints
▪ Weighting each tree to the additive sum using a learning rate
(shrinkage)
▪ Sampling strategies: stochastic gradient boosting
▪ Regularization: L1 / L2
▪ Successful: XGBoost
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 37
Thank you!
General tips for projects:
▪ Use tree-based methods + ensembling as baselines when
dealing with:
▪ Categorical data
▪ Mixed data types
▪ Missing data
12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 38

Más contenido relacionado

Último

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 

Último (20)

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Destacado

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software
 

Destacado (20)

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 

lecture-ensembling-techniques.pdf

  • 1. Ensemble Methods Nandita Bhaskhar content adapted from (a) Hastie, Tibshirani & Friedman and (b) Protopapas, Rader & Pan (c) Jason Brownlee Nov 12th, 2021
  • 2. Outline ▪ Decision Trees Recap ▪ Ensemble Methods: Intro ▪ Model Averaging ▪ Bagging ▪ Random Forests ▪ Boosting ▪ Gradient boosting 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 2
  • 3. Decision Trees Recap ▪ Represented by a series of binary splits ▪ Internal node is a value query on a feature or attribute ▪ Is 𝒙𝒊 > 𝟎. 𝟓? If yes, go right. Else, go left ▪ Terminal nodes are the decision nodes ▪ Classification → class label ▪ Regression → output score 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 3
  • 4. Decision Trees Recap ▪ The tree is grown using training data using recursive splitting ▪ It is also pruned to an optimal size (often evaluated using cross-validation) ▪ Inference: pass their features down the tree till it reaches a terminal node to get the decision 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 4
  • 5. 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 5 Decision Trees Recap
  • 6. Decision Trees Recap Pros ▪ Can handle large datasets ▪ Can handle mixed predictors (continuous, discrete, qualitative) ▪ Can ignore redundant variables ▪ Can easily handle missing data ▪ Easy to interpret if small Cons ▪ Prediction performance is poor ▪ Does not generalize well ▪ Large trees are hard to interpret 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 6
  • 7. Outline ▪ Decision Trees Recap ▪ Ensemble Methods: Intro ▪ Model Averaging ▪ Bagging ▪ Random Forests ▪ Boosting ▪ Gradient boosting 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 7
  • 8. Ensemble Methods: Intro ▪ Methods to improve the performance of weak learners ▪ Weak learners (e.g., classification trees) don’t perform that well ▪ What do we do?? ▪ Wisdom of the crowds! 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 8
  • 9. Ensemble Methods: Intro ▪ Wisdom of the crowds! ▪ Shift responsibility from 1 weak learner to an “ensemble” of such weak learners ▪ Set of weak learners are combined to form a strong learner with better performance than any of them individually 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 9
  • 10. Ensemble Methods: Intro ▪ A single decision tree often produces noisy / weak classfiers ▪ They DON’T generalize well ▪ But they are super fast, adaptive and robust! ▪ Solution: Let’s learn multiple trees! ▪ How to ensure they don’t all just learn the same thing?? ▪ TRIVIAL Solution 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 10
  • 11. Outline ▪ Decision Trees Recap ▪ Ensemble Methods: Intro ▪ Bagging ▪ Random Forests ▪ Boosting ▪ Gradient boosting 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 11
  • 12. Bagging ▪ Bagging (Breiman, 1996) ▪ Bootstrap Aggregating: to ensure lower variance ▪ Bootstrap sampling: get different splits / subsets of the data ▪ Aggregating: majority voting or averaging 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 12
  • 13. Bagging ▪ Averages a given procedure over many samples to reduce its variance ▪ Multiple realizations of the data (via multiple samples) → ▪ calculate predictions multiple times → ▪ average the predictions → ▪ more certain estimations (lesser variance) 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 13
  • 14. Bagging ▪ Let 𝑓(𝑥) be the classifier and let 𝑏 be a sample set from data መ 𝑓𝑎𝑔𝑔 𝑥 = 1 𝐵 ෍ 𝑏=1 𝐵 𝑓𝑏(𝑥) Or መ 𝑓𝑎𝑔𝑔 𝑥 = Majority Vote 𝑓𝑏 𝑥 𝑏=1 𝐵 ▪ Independent of type of classifier 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 14
  • 15. Bagging ▪ Bootstrap sampling: ▪ Collect 𝐵 ≅ 100 subsets by sampling with replacement from training data ▪ Construct 𝐵 trees (one classifier for one subset) ▪ Aggregate them using aggregator of your choice ▪ Parallelizable 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 15
  • 16. 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 16 Bagging
  • 17. Bagging 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 17
  • 18. Bagging ▪ What about cross validation? ▪ Each bootstrap sample set uses only a subset of the data ▪ Unused samples: out-of-bag samples (OOB) ▪ Calculate overall error rate on out-of-bag samples for all bootstraps 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 18
  • 19. Bagging ▪ Reduces overfitting (i.e., variance) ▪ Can work with any type of classifier (here focus on trees) ▪ Easy to parallelize ▪ But loses on interpretability to single decision tree 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 19
  • 20. Outline ▪ Decision Trees Recap ▪ Ensemble Methods: Intro ▪ Bagging ▪ Random Forests ▪ Boosting ▪ Gradient boosting 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 20
  • 21. Random Forests Issues with Bagging: ▪ Expectation of bagged trees is equal to expectation of individual trees 𝐄 መ 𝑓𝑎𝑔𝑔 𝑥 = 𝐄 [𝑓𝑏(𝑥)] ▪ Bias of bagged trees is the same as that of individual trees ▪ Each tree is identically distributed (i.d. not i.i.d). Bagged trees are correlated! 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 21
  • 22. Random Forests Issues with Bagging: ▪ Averaging 𝐵 i.i.d. variables scales their variance 𝜎2 to 𝜎2/𝐵 ▪ But averaging 𝐵 i.d. variables with pairwise correlations 𝜌 and variance 𝜎2 gives their final variance to be 𝜌𝜎2 + 1 − 𝜌 B 𝜎2 ▪ Only the 2nd term reduces on bagging, but the first term remains 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 22
  • 23. Random Forests ▪ How to decorrelate the trees generated for bagging? ▪ We want to generate 𝐵 i.i.d. trees such that their bias is the same, but variance reduces ▪ Ideas: ▪ We can restrict how many times a feature can be used ▪ We only allow a certain number of features ▪ Etc.. 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 23
  • 24. Random Forests ▪ Ideas: ▪ We can restrict how many times a feature can be used ▪ We only allow a certain number of features ▪ Etc.. ▪ Bias changes for the above ideas  ▪ Instead, choose only subset of features for each bag ▪ Decorrelated trees when you randomly select the subset 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 24
  • 25. Random Forests ▪ As in bagging, choose 𝐵 bootstrapped splits (or bags) ▪ For each split in the 𝐵 trees, consider only 𝑘 features from the full feature set 𝑚 ▪ 𝑘 = 𝑚 → same as Bagging ▪ 𝑘 < 𝑚 → Random Forests ▪ OOB error rate can be used to fit RF in one sequence with cross validation done along the way 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 25
  • 26. Random Forests ▪ Works great in practice. 𝑘 to be treated as a hyperparameter Issues: ▪ When you have large number of features, yet very small number of relevant features ▪ Prob(selecting the relevant feature in 𝑘) is very small 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 26
  • 27. Outline ▪ Decision Trees Recap ▪ Ensemble Methods: Intro ▪ Bagging ▪ Random Forests ▪ Boosting ▪ Gradient boosting 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 27
  • 28. Boosting ▪ Boosting does not involve bootstrap sampling ▪ Trees are grown sequentially: each tree is grown using information from previously grown trees ▪ Like bagging, boosting involves combining many decision trees, 𝑓1, … , 𝑓𝐵 ▪ Lecture slides: AdaBoost 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 28
  • 29. Boosting ▪ AdaBoost: ▪ Weighted observations ▪ Put more weight on difficult to classify instances and less on those already handled well ▪ New weak learners are added sequentially that focus their training on the more difficult patterns 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 29
  • 30. Outline ▪ Decision Trees Recap ▪ Ensemble Methods: Intro ▪ Bagging ▪ Random Forests ▪ Boosting ▪ Gradient boosting 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 30
  • 31. Gradient Boosting Generalization: AdaBoost, Adaptive Reweighting & Combining Three elements – ▪ A loss function to be optimized ▪ A weak learner to make predictions (decision trees) ▪ An additive model to add weak learners to minimize the loss function 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 31
  • 32. Gradient Boosting Loss Function: ▪ Any differentiable function ▪ Most standard loss functions ▪ L2 loss for regression ▪ Log loss for classification 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 32
  • 33. Gradient Boosting Weak Learners: ▪ Decision trees (regression trees) learnt greedily ▪ Constrain the trees to ensure they remain weak ▪ Number of layers, leaves, nodes, splits, etc 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 33
  • 34. Gradient Boosting Additive Model: ▪ Add trees one at a time (existing trees are not changed) ▪ Functional gradient descent to minimize loss when adding trees ▪ calculate the loss ▪ add the tree to the model that reduces the loss (i.e., follow the gradient) ▪ parameterize the tree, then modify the parameters of the tree and move in the right direction by reducing the residual loss. 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 34
  • 35. Gradient Boosting Given the current model, ▪ We fit a decision tree to the residuals from the model ▪ Response variable now is the residuals ▪ We then add this new decision tree into the fitted function in order to update the residuals ▪ The learning rate must be controlled 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 35
  • 36. Gradient Boosting Tunable Parameters: ▪ Number of trees (𝐵): Boosting can overfit unlike Bagging / RFs. Use cross-validation! ▪ Shrinkage parameter (𝜆): small positive number that sets the learning rate ▪ Number of splits in each tree (𝑑): Usually just choose 𝑑 = 1, i.e., tree stumps work well 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 36
  • 37. Gradient Boosting Variants: ▪ Varying the tree constraints ▪ Weighting each tree to the additive sum using a learning rate (shrinkage) ▪ Sampling strategies: stochastic gradient boosting ▪ Regularization: L1 / L2 ▪ Successful: XGBoost 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 37
  • 38. Thank you! General tips for projects: ▪ Use tree-based methods + ensembling as baselines when dealing with: ▪ Categorical data ▪ Mixed data types ▪ Missing data 12-Nov-21 CS229 Section: Ensemble Methods. Nandita Bhaskhar 38