SlideShare una empresa de Scribd logo
1 de 39
Descargar para leer sin conexión
Gradient Boosted Regression Trees

scikit

Peter Prettenhofer (@pprett)

Gilles Louppe (@glouppe)

DataRobot

Universit´ de Li`ge, Belgium
e
e
Motivation
Motivation
Outline

1 Basics

2 Gradient Boosting

3 Gradient Boosting in Scikit-learn

4 Case Study: California housing
About us
Peter
• @pprett
• Python & ML ∼ 6 years
• sklearn dev since 2010

Gilles
• @glouppe
• PhD student (Li`ge,
e

Belgium)
• sklearn dev since 2011

Chief tree hugger
Outline

1 Basics

2 Gradient Boosting

3 Gradient Boosting in Scikit-learn

4 Case Study: California housing
Machine Learning 101
• Data comes as...
• A set of examples {(xi , yi )|0 ≤ i < n samples}, with
• Feature vector x ∈ Rn features , and
• Response y ∈ R (regression) or y ∈ {−1, 1} (classification)

• Goal is to...
• Find a function y = f (x)
ˆ
• Such that error L(y , y ) on new (unseen) x is minimal
ˆ
Classification and Regression Trees [Breiman et al, 1984]

MedInc <= 5.04

MedInc <= 3.07

AveRooms <= 4.31

1.62

1.16

MedInc <= 6.82

AveOccup <= 2.37

AveOccup <= 2.74

2.79

1.88

3.39

2.56

sklearn.tree.DecisionTreeClassifier|Regressor

MedInc <= 7.82

3.73

4.57
Function approximation with Regression Trees
10
8
6

ground truth
RT d=1
RT d=3
RT d=20

4

y

2
0
2
4
6
8
0

2

4

x

6

8

10
Function approximation with Regression Trees
10
8
6

ground truth
RT d=1
RT d=3
RT d=20

4

Deprecated

y

2
0

• Nowadays seldom used alone

2

• Ensembles: Random Forest, Bagging, or Boosting

(see sklearn.ensemble)

4
6
8
0

2

4

x

6

8

10
Outline

1 Basics

2 Gradient Boosting

3 Gradient Boosting in Scikit-learn

4 Case Study: California housing
Gradient Boosted Regression Trees

Advantages
• Heterogeneous data (features measured on different scale),
• Supports different loss functions (e.g. huber),
• Automatically detects (non-linear) feature interactions,

Disadvantages
• Requires careful tuning
• Slow to train (but fast to predict)
• Cannot extrapolate
Boosting
AdaBoost [Y. Freund & R. Schapire, 1995]
• Ensemble: each member is an expert on the errors of its

predecessor
• Iteratively re-weights training examples based on errors
2

x1

1
0
1
2

2

1

0

x0

1

2

3

2

1

0

x0

1

2

3

2

1

0

x0

1

2

sklearn.ensemble.AdaBoostClassifier|Regressor

3

2

1

0

x0

1

2

3
Boosting
Huge success
AdaBoost [Y. Freund & R. Schapire, 1995]
• Viola-Jones Face Detector (2001)
• Ensemble: each member is an expert on the errors of its

predecessor
• Iteratively re-weights training examples based on errors
2

x1

1
0
1
2

2

1

0

x0

1

2

3

2

1

0

x0

1

2

3

2

1

0

x0

1

2

3

2

1

• Freund & Schapire won the G¨del prize 2003
o

sklearn.ensemble.AdaBoostClassifier|Regressor

0

x0

1

2

3
Gradient Boosting [J. Friedman, 1999]
Statistical view on boosting
• ⇒ Generalization of boosting to arbitrary loss functions
Gradient Boosting [J. Friedman, 1999]
Statistical view on boosting
• ⇒ Generalization of boosting to arbitrary loss functions

y

Residual fitting
2.5
2.0
1.5
1.0
0.5
0.0
0.5
1.0
1.5
2.0

Ground truth

tree 1

+

∼

2

x

6

10

tree 2

2

x

6

10

tree 3

+

2

x

6

10

2

sklearn.ensemble.GradientBoostingClassifier|Regressor

x

6

10
Functional Gradient Descent
Least Squares Regression
• Squared loss: L(yi , f (xi )) = (yi − f (xi ))2
f
• The residual ∼ the (negative) gradient ∂L(yi ,(x (xi ))
∂f i )
Functional Gradient Descent
Least Squares Regression
• Squared loss: L(yi , f (xi )) = (yi − f (xi ))2
f
• The residual ∼ the (negative) gradient ∂L(yi ,(x (xi ))
∂f i )

Steepest Descent
• Regression trees approximate the (negative) gradient
• Each tree is a successive gradient descent step
8

8

Squared error
Absolute error
Huber error

7
6

6
5
L(y,f(x))

5
L(y,f(x))

Zero-one loss
Log loss
Exponential loss

7

4

4

3

3

2

2

1
0

1
4

3

2

1

0

y−f(x)

1

2

3

4

0

4

3

2

1

0

y ·f(x)

1

2

3

4
Outline

1 Basics

2 Gradient Boosting

3 Gradient Boosting in Scikit-learn

4 Case Study: California housing
GBRT in scikit-learn
How to use it
>>> from sklearn.ensemble import GradientBoostingClassifier
>>> from sklearn.datasets import make_hastie_10_2
>>> X, y = make_hastie_10_2(n_samples=10000)
>>> est = GradientBoostingClassifier(n_estimators=200, max_depth=3)
>>> est.fit(X, y)
...
>>> # get predictions
>>> pred = est.predict(X)
>>> est.predict_proba(X)[0] # class probabilities
array([ 0.67, 0.33])

Implementation
• Written in pure Python/Numpy (easy to extend).
• Builds on top of sklearn.tree.DecisionTreeRegressor (Cython).
• Custom node splitter that uses pre-sorting (better for shallow trees).
Example
from sklearn.ensemble import GradientBoostingRegressor
est = GradientBoostingRegressor(n_estimators=2000, max_depth=1).fit(X, y)
for pred in est.staged_predict(X):
plt.plot(X[:, 0], pred, color=’r’, alpha=0.1)

10
8
6

ground truth
RT d=1
RT d=3
GBRT d=1
High bias - low variance

4

y

2
0
2
4
Low bias - high variance

6
8
0

2

4

x

6

8

10
Model complexity & Overfitting
test_score = np.empty(len(est.estimators_))
for i, pred in enumerate(est.staged_predict(X_test)):
test_score[i] = est.loss_(y_test, pred)
plt.plot(np.arange(n_estimators) + 1, test_score, label=’Test’)
plt.plot(np.arange(n_estimators) + 1, est.train_score_, label=’Train’)

2.0

Test
Train

Error

1.5

1.0

Lowest test error

0.5
train-test gap
0.0
0

200

400

n_estimators

600

800

1000
Model complexity & Overfitting
test_score = np.empty(len(est.estimators_))
for i, pred in enumerate(est.staged_predict(X_test)):
test_score[i] = est.loss_(y_test, pred)
plt.plot(np.arange(n_estimators) + 1, test_score, label=’Test’)
plt.plot(np.arange(n_estimators) + 1, est.train_score_, label=’Train’)

2.0

Test
Train

Regularization
1.5
GBRT provides a number of knobs to control
overfitting
Error

Lowest test
•1.0Tree structure error

• Shrinkage
• Stochastic Gradient Boosting
0.5

train-test gap
0.0
0

200

400

n_estimators

600

800

1000
Regularization: Tree structure
• The max depth of the trees controls the degree of features interactions
• Use min samples leaf to have a sufficient nr. of samples per leaf.
Regularization: Shrinkage
• Slow learning by shrinking tree predictions with 0 < learning rate <= 1
• Lower learning rate requires higher n estimators
2.0

Test
Train
Test learning_rate=0.1
Train learning_rate=0.1

Error

1.5

1.0

Requires more trees
Lower test error

0.5

0.0
0

200

400

n_estimators

600

800

1000
Regularization: Stochastic Gradient Boosting
• Samples: random subset of the training set (subsample)
• Features: random subset of features (max features)
• Improved accuracy – reduced runtime
2.0

Train
Test
Train subsample=0.5, learning_rate=0.1
Test subsample=0.5, learning_rate=0.1

Error

1.5

Subsample alone does poorly

1.0

Even lower test error
0.5

0.0
0

200

400

n_estimators

600

800

1000
Hyperparameter tuning
1. Set n estimators as high as possible (eg. 3000)
2. Tune hyperparameters via grid search.
from sklearn.grid_search import GridSearchCV
param_grid = {’learning_rate’: [0.1, 0.05, 0.02, 0.01],
’max_depth’: [4, 6],
’min_samples_leaf’: [3, 5, 9, 17],
’max_features’: [1.0, 0.3, 0.1]}
est = GradientBoostingRegressor(n_estimators=3000)
gs_cv = GridSearchCV(est, param_grid).fit(X, y)
# best hyperparameter setting
gs_cv.best_params_

3. Finally, set n estimators even higher and tune
learning rate.
Outline

1 Basics

2 Gradient Boosting

3 Gradient Boosting in Scikit-learn

4 Case Study: California housing
Case Study
California Housing dataset
• Predict log(medianHouseValue)
• Block groups in 1990 census
• 20.640 groups with 8 features
(median income, median age, lat,
lon, ...)

• Evaluation: Mean absolute error
on 80/20 split

Challenges
• Heterogeneous features
• Non-linear interactions
Predictive accuracy & runtime

Mean
Ridge
SVR
RF
GBRT

Train time [s]
0.006
28.0
26.3
192.0

Test time [ms]
0.11
2000.00
605.00
439.00

MAE
0.4635
0.2756
0.1888
0.1620
0.1438

0.5

Test
Train

0.4

error

0.3
0.2
0.1
0.0
0

500

1000

1500
n_estimators

2000

2500

3000
Model interpretation
Which features are important?
>>> est.feature_importances_
array([ 0.01, 0.38, ...])

MedInc
AveRooms
Longitude
AveOccup
Latitude
AveBedrms
Population
HouseAge
0.00

0.02

0.04

0.06

0.08 0.10 0.12
Relative importance

0.14

0.16

0.18
Model interpretation
What is the effect of a feature on the response?
from sklearn.ensemble import partial_dependence import as pd

Partial dependence

-0.12

0.09

0.2

3

0.02

0.16

-0.05

Partial dependence

Partial dependence of house value on nonlocation features
for the California housing dataset
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0.2
0.2
0.4
0.4
1.5 3.0 4.5 6.0 7.5
2.0 2.5 3.0 3.5 4.0 4.5
10 20 30 40 50 60
MedInc
AveOccup
HouseAge
0.6
50
0.4
40
0.2
30
0.0
20
0.2
0.4
10
4 5 6 7 8
2.0 2.5 3.0 3.5 4.0
AveRooms
AveOccup
0.6
0.4
0.2
0.0
0.2
0.4

HouseAge

Partial dependence

Partial dependence

features = [’MedInc’, ’AveOccup’, ’HouseAge’, ’AveRooms’,
(’AveOccup’, ’HouseAge’)]
fig, axs = pd.plot_partial_dependence(est, X_train, features,
feature_names=names)
Model interpretation

Automatically detects spatial effects
0.97

0.57

0.66

0.49
0.41
partial dep. on median house value

partial dep. on median house value

0.34

0.33

-0.28

0.25

latitude

latitude

0.03

0.17

-0.60

0.09

-0.91

0.01

-0.07

-1.22
longitude

-1.54

-0.15
longitude
Summary

• Flexible non-parametric classification and regression technique
• Applicable to a variety of problems
• Solid, battle-worn implementation in scikit-learn
Thanks! Questions?
Test time
Train time

Error

1.2
1.0
0.8
0.6
0.4
0.2
0.0
3.0
2.5
2.0
1.5
1.0
0.5
0.0
1.0
0.8
0.6
0.4
0.2
0.0

dataset
bioresp

YahooLTRC

Spam

Solar

Madelon

Expedia

Example 10.2

Covtype

California

Boston

Arcene

Benchmarks

gbm
sklearn-0.15
Tipps & Tricks 1

Input layout
Use dtype=np.float32 to avoid memory copies and fortan layout for slight
runtime benefit.
X = np.asfortranarray(X, dtype=np.float32)
Tipps & Tricks 2

Feature interactions
GBRT automatically detects feature interactions but often explicit interactions
help.
Trees required to approximate X1 − X2 : 10 (left), 1000 (right).

0.3

1.0

0.2
x-y

0.0

0.0

0.1

0.5

0.2

1.0

0.8

0.6

x

0.4

0.2

0.0 1.0

0.8

0.6

0.4
y

0.2

x-y

0.5

0.1

0.3
0.0
1.0

0.8

0.6

x

0.4

0.2

0.0 1.0

0.8

0.6

0.4
y

0.2

1.0
0.0
Tipps & Tricks 3

Categorical variables
Sklearn requires that categorical variables are encoded as numerics. Tree-based
methods work well with ordinal encoding:
df = pd.DataFrame(data={’icao’: [’CRJ2’, ’A380’, ’B737’, ’B737’]})
# ordinal encoding
df_enc = pd.DataFrame(data={’icao’: np.unique(df.icao,
return_inverse=True)[1]})
X = np.asfortranarray(df_enc.values, dtype=np.float32)

Más contenido relacionado

La actualidad más candente

Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligenceMdAlAmin187
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent methodSanghyuk Chun
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionKnoldus Inc.
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine LearningKuppusamy P
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted treesNihar Ranjan
 
Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodKirkwood Donavin
 
Wrapper feature selection method
Wrapper feature selection methodWrapper feature selection method
Wrapper feature selection methodAmir Razmjou
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic RegressionKnoldus Inc.
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with baggingChode Amarnath
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboostmichiaki ito
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithmRashid Ansari
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learningSANTHOSH RAJA M G
 

La actualidad más candente (20)

Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligence
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
 
Xgboost
XgboostXgboost
Xgboost
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Pca ppt
Pca pptPca ppt
Pca ppt
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
 
Gradient Boosted trees
Gradient Boosted treesGradient Boosted trees
Gradient Boosted trees
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Boosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning MethodBoosting - An Ensemble Machine Learning Method
Boosting - An Ensemble Machine Learning Method
 
Machine Learning (Classification Models)
Machine Learning (Classification Models)Machine Learning (Classification Models)
Machine Learning (Classification Models)
 
Wrapper feature selection method
Wrapper feature selection methodWrapper feature selection method
Wrapper feature selection method
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with bagging
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboost
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 

Destacado

Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with PythonDonald Miner
 
Kaggle meetup #3 instacart 2nd place solution
Kaggle meetup #3 instacart 2nd place solutionKaggle meetup #3 instacart 2nd place solution
Kaggle meetup #3 instacart 2nd place solutionKazuki Onodera
 
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Spark Summit
 
Leverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingLeverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingHackerEarth
 
State of women in technical workforce
State of women in technical workforceState of women in technical workforce
State of women in technical workforceHackerEarth
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Domino Data Lab
 
How to recruit excellent tech talent
How to recruit excellent tech talentHow to recruit excellent tech talent
How to recruit excellent tech talentHackerEarth
 
Smart Switchboard: An home automation system
Smart Switchboard: An home automation systemSmart Switchboard: An home automation system
Smart Switchboard: An home automation systemHackerEarth
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at ScaleDomino Data Lab
 
Data Science Competition
Data Science CompetitionData Science Competition
Data Science CompetitionJeong-Yoon Lee
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Domino Data Lab
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringDataRobot
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Antti Haapala
 
Fairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningFairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningHJ van Veen
 
DataRobot R Package
DataRobot R PackageDataRobot R Package
DataRobot R PackageDataRobot
 
6 rules of enterprise innovation
6 rules of enterprise innovation6 rules of enterprise innovation
6 rules of enterprise innovationHackerEarth
 
HackerEarth Sourcing Solution
HackerEarth Sourcing SolutionHackerEarth Sourcing Solution
HackerEarth Sourcing SolutionHackerEarth
 
HackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth helping a startup hire developers - The Practo Case StudyHackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth helping a startup hire developers - The Practo Case StudyHackerEarth
 
How hackathons can drive top line revenue growth
How hackathons can drive top line revenue growthHow hackathons can drive top line revenue growth
How hackathons can drive top line revenue growthHackerEarth
 
How to assess & hire Java developers accurately?
How to assess & hire Java developers accurately?How to assess & hire Java developers accurately?
How to assess & hire Java developers accurately?HackerEarth
 

Destacado (20)

Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with Python
 
Kaggle meetup #3 instacart 2nd place solution
Kaggle meetup #3 instacart 2nd place solutionKaggle meetup #3 instacart 2nd place solution
Kaggle meetup #3 instacart 2nd place solution
 
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
 
Leverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingLeverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and Recruiting
 
State of women in technical workforce
State of women in technical workforceState of women in technical workforce
State of women in technical workforce
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)
 
How to recruit excellent tech talent
How to recruit excellent tech talentHow to recruit excellent tech talent
How to recruit excellent tech talent
 
Smart Switchboard: An home automation system
Smart Switchboard: An home automation systemSmart Switchboard: An home automation system
Smart Switchboard: An home automation system
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at Scale
 
Data Science Competition
Data Science CompetitionData Science Competition
Data Science Competition
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature Engineering
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
 
Fairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningFairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine Learning
 
DataRobot R Package
DataRobot R PackageDataRobot R Package
DataRobot R Package
 
6 rules of enterprise innovation
6 rules of enterprise innovation6 rules of enterprise innovation
6 rules of enterprise innovation
 
HackerEarth Sourcing Solution
HackerEarth Sourcing SolutionHackerEarth Sourcing Solution
HackerEarth Sourcing Solution
 
HackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth helping a startup hire developers - The Practo Case StudyHackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth helping a startup hire developers - The Practo Case Study
 
How hackathons can drive top line revenue growth
How hackathons can drive top line revenue growthHow hackathons can drive top line revenue growth
How hackathons can drive top line revenue growth
 
How to assess & hire Java developers accurately?
How to assess & hire Java developers accurately?How to assess & hire Java developers accurately?
How to assess & hire Java developers accurately?
 

Similar a Gradient Boosted Regression Trees in scikit-learn

L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptxssuserf07225
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning재연 윤
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoringharmonylab
 
Rohan's Masters presentation
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentationrohan_anil
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionSeonho Park
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
adversarial robustness lecture
adversarial robustness lectureadversarial robustness lecture
adversarial robustness lectureMuhammadAhmedShah2
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Universitat Politècnica de Catalunya
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Fabian Pedregosa
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya
 

Similar a Gradient Boosted Regression Trees in scikit-learn (20)

L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
 
Rohan's Masters presentation
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentation
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
 
Lecture4.pptx
Lecture4.pptxLecture4.pptx
Lecture4.pptx
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
adversarial robustness lecture
adversarial robustness lectureadversarial robustness lecture
adversarial robustness lecture
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Xgboost
XgboostXgboost
Xgboost
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
R user group meeting 25th jan 2017
R user group meeting 25th jan 2017R user group meeting 25th jan 2017
R user group meeting 25th jan 2017
 
Eye deep
Eye deepEye deep
Eye deep
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 

Último

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Último (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Gradient Boosted Regression Trees in scikit-learn

  • 1. Gradient Boosted Regression Trees scikit Peter Prettenhofer (@pprett) Gilles Louppe (@glouppe) DataRobot Universit´ de Li`ge, Belgium e e
  • 4. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 5. About us Peter • @pprett • Python & ML ∼ 6 years • sklearn dev since 2010 Gilles • @glouppe • PhD student (Li`ge, e Belgium) • sklearn dev since 2011 Chief tree hugger
  • 6. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 7. Machine Learning 101 • Data comes as... • A set of examples {(xi , yi )|0 ≤ i < n samples}, with • Feature vector x ∈ Rn features , and • Response y ∈ R (regression) or y ∈ {−1, 1} (classification) • Goal is to... • Find a function y = f (x) ˆ • Such that error L(y , y ) on new (unseen) x is minimal ˆ
  • 8. Classification and Regression Trees [Breiman et al, 1984] MedInc <= 5.04 MedInc <= 3.07 AveRooms <= 4.31 1.62 1.16 MedInc <= 6.82 AveOccup <= 2.37 AveOccup <= 2.74 2.79 1.88 3.39 2.56 sklearn.tree.DecisionTreeClassifier|Regressor MedInc <= 7.82 3.73 4.57
  • 9. Function approximation with Regression Trees 10 8 6 ground truth RT d=1 RT d=3 RT d=20 4 y 2 0 2 4 6 8 0 2 4 x 6 8 10
  • 10. Function approximation with Regression Trees 10 8 6 ground truth RT d=1 RT d=3 RT d=20 4 Deprecated y 2 0 • Nowadays seldom used alone 2 • Ensembles: Random Forest, Bagging, or Boosting (see sklearn.ensemble) 4 6 8 0 2 4 x 6 8 10
  • 11. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 12. Gradient Boosted Regression Trees Advantages • Heterogeneous data (features measured on different scale), • Supports different loss functions (e.g. huber), • Automatically detects (non-linear) feature interactions, Disadvantages • Requires careful tuning • Slow to train (but fast to predict) • Cannot extrapolate
  • 13. Boosting AdaBoost [Y. Freund & R. Schapire, 1995] • Ensemble: each member is an expert on the errors of its predecessor • Iteratively re-weights training examples based on errors 2 x1 1 0 1 2 2 1 0 x0 1 2 3 2 1 0 x0 1 2 3 2 1 0 x0 1 2 sklearn.ensemble.AdaBoostClassifier|Regressor 3 2 1 0 x0 1 2 3
  • 14. Boosting Huge success AdaBoost [Y. Freund & R. Schapire, 1995] • Viola-Jones Face Detector (2001) • Ensemble: each member is an expert on the errors of its predecessor • Iteratively re-weights training examples based on errors 2 x1 1 0 1 2 2 1 0 x0 1 2 3 2 1 0 x0 1 2 3 2 1 0 x0 1 2 3 2 1 • Freund & Schapire won the G¨del prize 2003 o sklearn.ensemble.AdaBoostClassifier|Regressor 0 x0 1 2 3
  • 15. Gradient Boosting [J. Friedman, 1999] Statistical view on boosting • ⇒ Generalization of boosting to arbitrary loss functions
  • 16. Gradient Boosting [J. Friedman, 1999] Statistical view on boosting • ⇒ Generalization of boosting to arbitrary loss functions y Residual fitting 2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 Ground truth tree 1 + ∼ 2 x 6 10 tree 2 2 x 6 10 tree 3 + 2 x 6 10 2 sklearn.ensemble.GradientBoostingClassifier|Regressor x 6 10
  • 17. Functional Gradient Descent Least Squares Regression • Squared loss: L(yi , f (xi )) = (yi − f (xi ))2 f • The residual ∼ the (negative) gradient ∂L(yi ,(x (xi )) ∂f i )
  • 18. Functional Gradient Descent Least Squares Regression • Squared loss: L(yi , f (xi )) = (yi − f (xi ))2 f • The residual ∼ the (negative) gradient ∂L(yi ,(x (xi )) ∂f i ) Steepest Descent • Regression trees approximate the (negative) gradient • Each tree is a successive gradient descent step 8 8 Squared error Absolute error Huber error 7 6 6 5 L(y,f(x)) 5 L(y,f(x)) Zero-one loss Log loss Exponential loss 7 4 4 3 3 2 2 1 0 1 4 3 2 1 0 y−f(x) 1 2 3 4 0 4 3 2 1 0 y ·f(x) 1 2 3 4
  • 19. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 20. GBRT in scikit-learn How to use it >>> from sklearn.ensemble import GradientBoostingClassifier >>> from sklearn.datasets import make_hastie_10_2 >>> X, y = make_hastie_10_2(n_samples=10000) >>> est = GradientBoostingClassifier(n_estimators=200, max_depth=3) >>> est.fit(X, y) ... >>> # get predictions >>> pred = est.predict(X) >>> est.predict_proba(X)[0] # class probabilities array([ 0.67, 0.33]) Implementation • Written in pure Python/Numpy (easy to extend). • Builds on top of sklearn.tree.DecisionTreeRegressor (Cython). • Custom node splitter that uses pre-sorting (better for shallow trees).
  • 21. Example from sklearn.ensemble import GradientBoostingRegressor est = GradientBoostingRegressor(n_estimators=2000, max_depth=1).fit(X, y) for pred in est.staged_predict(X): plt.plot(X[:, 0], pred, color=’r’, alpha=0.1) 10 8 6 ground truth RT d=1 RT d=3 GBRT d=1 High bias - low variance 4 y 2 0 2 4 Low bias - high variance 6 8 0 2 4 x 6 8 10
  • 22. Model complexity & Overfitting test_score = np.empty(len(est.estimators_)) for i, pred in enumerate(est.staged_predict(X_test)): test_score[i] = est.loss_(y_test, pred) plt.plot(np.arange(n_estimators) + 1, test_score, label=’Test’) plt.plot(np.arange(n_estimators) + 1, est.train_score_, label=’Train’) 2.0 Test Train Error 1.5 1.0 Lowest test error 0.5 train-test gap 0.0 0 200 400 n_estimators 600 800 1000
  • 23. Model complexity & Overfitting test_score = np.empty(len(est.estimators_)) for i, pred in enumerate(est.staged_predict(X_test)): test_score[i] = est.loss_(y_test, pred) plt.plot(np.arange(n_estimators) + 1, test_score, label=’Test’) plt.plot(np.arange(n_estimators) + 1, est.train_score_, label=’Train’) 2.0 Test Train Regularization 1.5 GBRT provides a number of knobs to control overfitting Error Lowest test •1.0Tree structure error • Shrinkage • Stochastic Gradient Boosting 0.5 train-test gap 0.0 0 200 400 n_estimators 600 800 1000
  • 24. Regularization: Tree structure • The max depth of the trees controls the degree of features interactions • Use min samples leaf to have a sufficient nr. of samples per leaf.
  • 25. Regularization: Shrinkage • Slow learning by shrinking tree predictions with 0 < learning rate <= 1 • Lower learning rate requires higher n estimators 2.0 Test Train Test learning_rate=0.1 Train learning_rate=0.1 Error 1.5 1.0 Requires more trees Lower test error 0.5 0.0 0 200 400 n_estimators 600 800 1000
  • 26. Regularization: Stochastic Gradient Boosting • Samples: random subset of the training set (subsample) • Features: random subset of features (max features) • Improved accuracy – reduced runtime 2.0 Train Test Train subsample=0.5, learning_rate=0.1 Test subsample=0.5, learning_rate=0.1 Error 1.5 Subsample alone does poorly 1.0 Even lower test error 0.5 0.0 0 200 400 n_estimators 600 800 1000
  • 27. Hyperparameter tuning 1. Set n estimators as high as possible (eg. 3000) 2. Tune hyperparameters via grid search. from sklearn.grid_search import GridSearchCV param_grid = {’learning_rate’: [0.1, 0.05, 0.02, 0.01], ’max_depth’: [4, 6], ’min_samples_leaf’: [3, 5, 9, 17], ’max_features’: [1.0, 0.3, 0.1]} est = GradientBoostingRegressor(n_estimators=3000) gs_cv = GridSearchCV(est, param_grid).fit(X, y) # best hyperparameter setting gs_cv.best_params_ 3. Finally, set n estimators even higher and tune learning rate.
  • 28. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 29. Case Study California Housing dataset • Predict log(medianHouseValue) • Block groups in 1990 census • 20.640 groups with 8 features (median income, median age, lat, lon, ...) • Evaluation: Mean absolute error on 80/20 split Challenges • Heterogeneous features • Non-linear interactions
  • 30. Predictive accuracy & runtime Mean Ridge SVR RF GBRT Train time [s] 0.006 28.0 26.3 192.0 Test time [ms] 0.11 2000.00 605.00 439.00 MAE 0.4635 0.2756 0.1888 0.1620 0.1438 0.5 Test Train 0.4 error 0.3 0.2 0.1 0.0 0 500 1000 1500 n_estimators 2000 2500 3000
  • 31. Model interpretation Which features are important? >>> est.feature_importances_ array([ 0.01, 0.38, ...]) MedInc AveRooms Longitude AveOccup Latitude AveBedrms Population HouseAge 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Relative importance 0.14 0.16 0.18
  • 32. Model interpretation What is the effect of a feature on the response? from sklearn.ensemble import partial_dependence import as pd Partial dependence -0.12 0.09 0.2 3 0.02 0.16 -0.05 Partial dependence Partial dependence of house value on nonlocation features for the California housing dataset 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.2 0.2 0.4 0.4 1.5 3.0 4.5 6.0 7.5 2.0 2.5 3.0 3.5 4.0 4.5 10 20 30 40 50 60 MedInc AveOccup HouseAge 0.6 50 0.4 40 0.2 30 0.0 20 0.2 0.4 10 4 5 6 7 8 2.0 2.5 3.0 3.5 4.0 AveRooms AveOccup 0.6 0.4 0.2 0.0 0.2 0.4 HouseAge Partial dependence Partial dependence features = [’MedInc’, ’AveOccup’, ’HouseAge’, ’AveRooms’, (’AveOccup’, ’HouseAge’)] fig, axs = pd.plot_partial_dependence(est, X_train, features, feature_names=names)
  • 33. Model interpretation Automatically detects spatial effects 0.97 0.57 0.66 0.49 0.41 partial dep. on median house value partial dep. on median house value 0.34 0.33 -0.28 0.25 latitude latitude 0.03 0.17 -0.60 0.09 -0.91 0.01 -0.07 -1.22 longitude -1.54 -0.15 longitude
  • 34. Summary • Flexible non-parametric classification and regression technique • Applicable to a variety of problems • Solid, battle-worn implementation in scikit-learn
  • 37. Tipps & Tricks 1 Input layout Use dtype=np.float32 to avoid memory copies and fortan layout for slight runtime benefit. X = np.asfortranarray(X, dtype=np.float32)
  • 38. Tipps & Tricks 2 Feature interactions GBRT automatically detects feature interactions but often explicit interactions help. Trees required to approximate X1 − X2 : 10 (left), 1000 (right). 0.3 1.0 0.2 x-y 0.0 0.0 0.1 0.5 0.2 1.0 0.8 0.6 x 0.4 0.2 0.0 1.0 0.8 0.6 0.4 y 0.2 x-y 0.5 0.1 0.3 0.0 1.0 0.8 0.6 x 0.4 0.2 0.0 1.0 0.8 0.6 0.4 y 0.2 1.0 0.0
  • 39. Tipps & Tricks 3 Categorical variables Sklearn requires that categorical variables are encoded as numerics. Tree-based methods work well with ordinal encoding: df = pd.DataFrame(data={’icao’: [’CRJ2’, ’A380’, ’B737’, ’B737’]}) # ordinal encoding df_enc = pd.DataFrame(data={’icao’: np.unique(df.icao, return_inverse=True)[1]}) X = np.asfortranarray(df_enc.values, dtype=np.float32)