SlideShare una empresa de Scribd logo
1 de 29
Supervised Learning
Understanding
Bagging and
Boosting
Both are ensemble techniques,
where a set of weak learners are combined to create a strong learner
that obtains better performance than a single one.
Error = Bias + Variance
+ Noise
Bagging short for Bootstrap Aggregating
It’s a way to increase accuracy by Decreasing Variance
Done by
Generating additional dataset using combinations
with repetitions to produce multisets of same
cardinality/size as original dataset.
Example: Random Forest
Develops fully grown decision
trees (low bias high variance)
which are uncorrelated to
maximize the decrease in
variance.
Since cannot reduce bias
therefore req. large unpruned
trees.
Boosting
It’s a way to increase accuracy by Reducing Bias
2- step Process Done by
Develop averagely performing models over subsets of
the original data.
Boost these model performance by combining them
using a cost function (eg.majority vote).
Note: every subsets contains elements that were
misclassified or were close by the previous model.
Example: Gradient Boosted Tree
Develops shallow decision trees (high
bias low variance) aka weak larner.
Reduce error mainly by reducing bias
developing new learner taking into
account the previous learner
(Sequential).
Understanding Graphically
Comparison
Both are ensemble methods to get N learners
from 1 learner…
… but, while they are built independently for
Bagging, Boosting tries to add new models that do
well where previous models fail.
Both generate several training data sets by
random sampling…
… but only Boosting determines weights for the data
to tip the scales in favor of the most difficult cases.
Both make the final decision by averaging the N
learners (or taking the majority of them)…
… but it is an equally weighted average for Bagging
and a weighted average for Boosting, more weight
to those with better performance on training data.
Both are good at reducing variance and provide
higher stability…
… but only Boosting tries to reduce bias. On the other
hand, Bagging may solve the overfitting problem,
while Boosting can increase it.
Similarities Differences
Exploring the Scope of Supervised
Learning in Current Setup
Areas where Supervised Learning can be useful
Feature Selection for Clustering
Evaluating Features
Increasing the Aggressiveness of the Current setup
Bringing New Rules Idea
Feature
Selection/
Feature
Importance &
Model
Accuracy and
Threshold
Evaluation
Algorithm Used Feature Importance Metric
XGBoost F Score
Random Forest Gini Index, Entropy
Feature
Selection/
Importance
XGBoost - F Score
Feature
Selection/
Importance
RF - Gini Index
Feature
Selection/
Importance
RF - Entropy
Feature Selection/ Importance
Comparison b/w Important Feature by Random Forest & XGBoost
feature_21w
feature_sut
feature_du1
feature_sc3
feature_drh
feature_1a2
feature_sc18
feature_drl
feature_snc
feature_sc1
feature_2c3
feature_npb
feature_3e1
feature_bst
feature_nub
RF - Entropy
feature_sut
feature_sc3
feature_21w
feature_sc18
feature_du1
feature_sc1
feature_drh
feature_drl
feature_1a2
feature_snc
feature_npb
feature_3e1
feature_tbu
feature_nub
feature_bst
RF - GiniXGBoost - F Score
feature_1a2
feature_2c3
feature_hhs
feature_nrp
feature_urh
feature_nub
feature_nup
feature_psc
feature_sncp
feature_3e1
feature_tpa
feature_snc
feature_bst
feature_tbu
feature_nub
Analysis of Top 15 important variable
Feature Selection/ Importance
Comparison b/w Important Feature by Random Forest & XGBoost
Reason for difference in Feature Importance b/w XGB & RF
Basically, when there are several correlated features, boosting will tend to choose one and use it in
several trees (if necessary). Other correlated features won t be used a lot (or not at all).
It makes sense as other correlated features can't help in the split process anymore -> they don't bring
new information regarding the already used feature. And the learning is done in a serial way.
Each tree of a Random forest is not built from the same features (there is a random selection of
features to use for each tree). Each correlated feature may have the chance to be selected in one of the
tree. Therefore, when you look at the whole model it has used all features. The learning is done in
parallel so each tree is not aware of what have been used for other trees.
Tree Growth XGB
When you grow too many trees, trees are starting to be look very similar (when there is no loss
remaining to learn). Therefore the dominant feature will be an even more important. Having shallow
trees reinforce this trend because there are few possible important features at the root of a tree (shared
features between trees are most of the time the one at the root of it). So your results are not surprising.
In this case, you may have interesting results with random selection of columns (rate around 0.8).
Decreasing ETA may also help (keep more loss to explain after each iteration).
Model Accuracy and Threshold Evaluation
XGBoost
Model Accuracy and Threshold Evaluation
XGBoost
A A
A A BB
B B
Model Accuracy and Threshold Evaluation
XGBoost
A A
A A BB
B B
Model Accuracy and Threshold Evaluation
XGBoost
A A
A A BB
B B
Model Accuracy and Threshold Evaluation
XGBoost
A A
A A BB
B B
Model Accuracy and Threshold Evaluation
XGBoost
A A
A A BB
B B
Model Accuracy and Threshold Evaluation
XGBoost
Threshold Accuracy TN FP FN TP
0 0.059% 0 46990 0 2936
0.1 87.353% 42229 4761 1553 1383
0.2 93.881% 46075 915 2140 796
0.3 94.722% 46691 299 2336 600
0.4 94.894% 46866 124 2425 511
0.5 94.902% 46923 67 2478 458
0.6 94.866% 46956 34 2529 407
0.7 94.856% 46973 17 2551 385
0.8 94.824% 46977 13 2571 365
0.9 94.776% 46982 8 2600 336
1 94.119% 46990 0 2936 0
A
A B
B
Model Accuracy and Threshold Evaluation
Random Forest Criteria - Gini Index Random Forest Criteria - Entropy
Criteria Accuracy TN FP FN TP
Gini 94.800% 46968 22 2574 362
Entropy 94.788% 46967 23 2579 357
A A
A A BB
B B
Model Accuracy and Threshold Evaluation
Comparison b/w Random Forest & XGBoost
Criteria Accuracy TN FP FN TP
Gini 94.800% 46968 22 2574 362
Entropy 94.788% 46967 23 2579 357
Threshold Accuracy TN FP FN TP
0 0.059% 0 46990 0 2936
0.1 87.353% 42229 4761 1553 1383
0.2 93.881% 46075 915 2140 796
0.3 94.722% 46691 299 2336 600
0.4 94.894% 46866 124 2425 511
0.5 94.902% 46923 67 2478 458
0.6 94.866% 46956 34 2529 407
0.7 94.856% 46973 17 2551 385
0.8 94.824% 46977 13 2571 365
0.9 94.776% 46982 8 2600 336
1 94.119% 46990 0 2936 0
Bringing New Rules Idea
Comparison b/w Random Forest & XGBoost
Bringing New Rules Idea
Comparison b/w Random Forest & XGBoost
Understanding Bagging and Boosting

Más contenido relacionado

La actualidad más candente

Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree LearningMilind Gokhale
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lectureShreyas S K
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsAndrew Ferlitsch
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Marina Santini
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentMuhammad Rasel
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for ClassificationPrakash Pimpale
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineRishabh Gupta
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat omarodibat
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagationKrish_ver2
 
Decision Trees
Decision TreesDecision Trees
Decision TreesStudent
 

La actualidad más candente (20)

Gradient Boosting
Gradient BoostingGradient Boosting
Gradient Boosting
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Introduction to random forest and gradient boosting methods a lecture
Introduction to random forest and gradient boosting methods   a lectureIntroduction to random forest and gradient boosting methods   a lecture
Introduction to random forest and gradient boosting methods a lecture
 
Decision tree
Decision treeDecision tree
Decision tree
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Machine Learning - Ensemble Methods
Machine Learning - Ensemble MethodsMachine Learning - Ensemble Methods
Machine Learning - Ensemble Methods
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Support Vector machine
Support Vector machineSupport Vector machine
Support Vector machine
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 

Similar a Understanding Bagging and Boosting

Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsPalin analytics
 
Working with the data for Machine Learning
Working with the data for Machine LearningWorking with the data for Machine Learning
Working with the data for Machine LearningMehwish690898
 
Introduction to XGBoost Machine Learning Model.pptx
Introduction to XGBoost Machine Learning Model.pptxIntroduction to XGBoost Machine Learning Model.pptx
Introduction to XGBoost Machine Learning Model.pptxagathaljjwm20
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxCS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxAbhishekSingh43430
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009Matthew Magistrado
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut
 
13 random forest
13 random forest13 random forest
13 random forestVishal Dutt
 
Random Forest / Bootstrap Aggregation
Random Forest / Bootstrap AggregationRandom Forest / Bootstrap Aggregation
Random Forest / Bootstrap AggregationRupak Roy
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptxRaflyRizky2
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdfDynamicPitch
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
An Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithmsAn Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithmsShouvic Banik0139
 
Download It
Download ItDownload It
Download Itbutest
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingIRJET Journal
 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensembleDanbi Cho
 

Similar a Understanding Bagging and Boosting (20)

Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
 
Working with the data for Machine Learning
Working with the data for Machine LearningWorking with the data for Machine Learning
Working with the data for Machine Learning
 
Introduction to XGBoost Machine Learning Model.pptx
Introduction to XGBoost Machine Learning Model.pptxIntroduction to XGBoost Machine Learning Model.pptx
Introduction to XGBoost Machine Learning Model.pptx
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxCS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptx
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Decision tree
Decision treeDecision tree
Decision tree
 
13 random forest
13 random forest13 random forest
13 random forest
 
Random Forest / Bootstrap Aggregation
Random Forest / Bootstrap AggregationRandom Forest / Bootstrap Aggregation
Random Forest / Bootstrap Aggregation
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdf
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
An Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithmsAn Introduction to Random Forest and linear regression algorithms
An Introduction to Random Forest and linear regression algorithms
 
Decision tree
Decision tree Decision tree
Decision tree
 
Download It
Download ItDownload It
Download It
 
Handling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random UndersamplingHandling Imbalanced Data: SMOTE vs. Random Undersampling
Handling Imbalanced Data: SMOTE vs. Random Undersampling
 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensemble
 

Más de Mohit Rajput

Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule MiningMohit Rajput
 
Understanding known _ unknown - known _ unknown
Understanding known _ unknown - known _ unknownUnderstanding known _ unknown - known _ unknown
Understanding known _ unknown - known _ unknownMohit Rajput
 
Algorithms in Reinforcement Learning
Algorithms in Reinforcement LearningAlgorithms in Reinforcement Learning
Algorithms in Reinforcement LearningMohit Rajput
 
Dissertation mid evaluation
Dissertation mid evaluationDissertation mid evaluation
Dissertation mid evaluationMohit Rajput
 
For Seminar - Prospect: Development of continuous CNT path in BCP using sel...
For Seminar - Prospect:  Development of continuous CNT path in BCP using  sel...For Seminar - Prospect:  Development of continuous CNT path in BCP using  sel...
For Seminar - Prospect: Development of continuous CNT path in BCP using sel...Mohit Rajput
 
Mid-Dissertation Work Done Report
Mid-Dissertation Work Done ReportMid-Dissertation Work Done Report
Mid-Dissertation Work Done ReportMohit Rajput
 
Mid-Dissertation Work Report Presentation
Mid-Dissertation Work Report Presentation  Mid-Dissertation Work Report Presentation
Mid-Dissertation Work Report Presentation Mohit Rajput
 
SURA Final report PVDF-CNT
SURA Final report PVDF-CNTSURA Final report PVDF-CNT
SURA Final report PVDF-CNTMohit Rajput
 
R markup code to create Regression Model
R markup code to create Regression ModelR markup code to create Regression Model
R markup code to create Regression ModelMohit Rajput
 
Regression Model for movies
Regression Model for moviesRegression Model for movies
Regression Model for moviesMohit Rajput
 
Presentation- BCP self assembly meshes
Presentation- BCP self assembly meshesPresentation- BCP self assembly meshes
Presentation- BCP self assembly meshesMohit Rajput
 
Presentation- Multilayer block copolymer meshes by orthogonal self-assembly
Presentation- Multilayer block copolymer  meshes by orthogonal self-assemblyPresentation- Multilayer block copolymer  meshes by orthogonal self-assembly
Presentation- Multilayer block copolymer meshes by orthogonal self-assemblyMohit Rajput
 
Cover for report on Biofuels Generation
Cover for report on Biofuels GenerationCover for report on Biofuels Generation
Cover for report on Biofuels GenerationMohit Rajput
 
A Report on Metal Drawing Operations
A Report on Metal Drawing OperationsA Report on Metal Drawing Operations
A Report on Metal Drawing OperationsMohit Rajput
 
A technical report on BioFuels Generation
A technical report on BioFuels GenerationA technical report on BioFuels Generation
A technical report on BioFuels GenerationMohit Rajput
 
Presentation - Bio-fuels Generation
Presentation - Bio-fuels GenerationPresentation - Bio-fuels Generation
Presentation - Bio-fuels GenerationMohit Rajput
 
Status of Education in India by Mohit Rajput
Status of Education in India by Mohit RajputStatus of Education in India by Mohit Rajput
Status of Education in India by Mohit RajputMohit Rajput
 
Internship Presentation on Characterization of Stainless Steel-Titanium Diffu...
Internship Presentation on Characterization of Stainless Steel-Titanium Diffu...Internship Presentation on Characterization of Stainless Steel-Titanium Diffu...
Internship Presentation on Characterization of Stainless Steel-Titanium Diffu...Mohit Rajput
 
Posters for Exhibition
Posters for ExhibitionPosters for Exhibition
Posters for ExhibitionMohit Rajput
 

Más de Mohit Rajput (20)

Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule Mining
 
Understanding known _ unknown - known _ unknown
Understanding known _ unknown - known _ unknownUnderstanding known _ unknown - known _ unknown
Understanding known _ unknown - known _ unknown
 
Algorithms in Reinforcement Learning
Algorithms in Reinforcement LearningAlgorithms in Reinforcement Learning
Algorithms in Reinforcement Learning
 
Dissertation mid evaluation
Dissertation mid evaluationDissertation mid evaluation
Dissertation mid evaluation
 
For Seminar - Prospect: Development of continuous CNT path in BCP using sel...
For Seminar - Prospect:  Development of continuous CNT path in BCP using  sel...For Seminar - Prospect:  Development of continuous CNT path in BCP using  sel...
For Seminar - Prospect: Development of continuous CNT path in BCP using sel...
 
Mid-Dissertation Work Done Report
Mid-Dissertation Work Done ReportMid-Dissertation Work Done Report
Mid-Dissertation Work Done Report
 
Mid-Dissertation Work Report Presentation
Mid-Dissertation Work Report Presentation  Mid-Dissertation Work Report Presentation
Mid-Dissertation Work Report Presentation
 
Sura ppt final
Sura ppt finalSura ppt final
Sura ppt final
 
SURA Final report PVDF-CNT
SURA Final report PVDF-CNTSURA Final report PVDF-CNT
SURA Final report PVDF-CNT
 
R markup code to create Regression Model
R markup code to create Regression ModelR markup code to create Regression Model
R markup code to create Regression Model
 
Regression Model for movies
Regression Model for moviesRegression Model for movies
Regression Model for movies
 
Presentation- BCP self assembly meshes
Presentation- BCP self assembly meshesPresentation- BCP self assembly meshes
Presentation- BCP self assembly meshes
 
Presentation- Multilayer block copolymer meshes by orthogonal self-assembly
Presentation- Multilayer block copolymer  meshes by orthogonal self-assemblyPresentation- Multilayer block copolymer  meshes by orthogonal self-assembly
Presentation- Multilayer block copolymer meshes by orthogonal self-assembly
 
Cover for report on Biofuels Generation
Cover for report on Biofuels GenerationCover for report on Biofuels Generation
Cover for report on Biofuels Generation
 
A Report on Metal Drawing Operations
A Report on Metal Drawing OperationsA Report on Metal Drawing Operations
A Report on Metal Drawing Operations
 
A technical report on BioFuels Generation
A technical report on BioFuels GenerationA technical report on BioFuels Generation
A technical report on BioFuels Generation
 
Presentation - Bio-fuels Generation
Presentation - Bio-fuels GenerationPresentation - Bio-fuels Generation
Presentation - Bio-fuels Generation
 
Status of Education in India by Mohit Rajput
Status of Education in India by Mohit RajputStatus of Education in India by Mohit Rajput
Status of Education in India by Mohit Rajput
 
Internship Presentation on Characterization of Stainless Steel-Titanium Diffu...
Internship Presentation on Characterization of Stainless Steel-Titanium Diffu...Internship Presentation on Characterization of Stainless Steel-Titanium Diffu...
Internship Presentation on Characterization of Stainless Steel-Titanium Diffu...
 
Posters for Exhibition
Posters for ExhibitionPosters for Exhibition
Posters for Exhibition
 

Último

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 

Último (20)

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 

Understanding Bagging and Boosting

  • 2. Understanding Bagging and Boosting Both are ensemble techniques, where a set of weak learners are combined to create a strong learner that obtains better performance than a single one. Error = Bias + Variance + Noise
  • 3. Bagging short for Bootstrap Aggregating It’s a way to increase accuracy by Decreasing Variance Done by Generating additional dataset using combinations with repetitions to produce multisets of same cardinality/size as original dataset. Example: Random Forest Develops fully grown decision trees (low bias high variance) which are uncorrelated to maximize the decrease in variance. Since cannot reduce bias therefore req. large unpruned trees.
  • 4. Boosting It’s a way to increase accuracy by Reducing Bias 2- step Process Done by Develop averagely performing models over subsets of the original data. Boost these model performance by combining them using a cost function (eg.majority vote). Note: every subsets contains elements that were misclassified or were close by the previous model. Example: Gradient Boosted Tree Develops shallow decision trees (high bias low variance) aka weak larner. Reduce error mainly by reducing bias developing new learner taking into account the previous learner (Sequential).
  • 6.
  • 7.
  • 8.
  • 9.
  • 10. Comparison Both are ensemble methods to get N learners from 1 learner… … but, while they are built independently for Bagging, Boosting tries to add new models that do well where previous models fail. Both generate several training data sets by random sampling… … but only Boosting determines weights for the data to tip the scales in favor of the most difficult cases. Both make the final decision by averaging the N learners (or taking the majority of them)… … but it is an equally weighted average for Bagging and a weighted average for Boosting, more weight to those with better performance on training data. Both are good at reducing variance and provide higher stability… … but only Boosting tries to reduce bias. On the other hand, Bagging may solve the overfitting problem, while Boosting can increase it. Similarities Differences
  • 11. Exploring the Scope of Supervised Learning in Current Setup Areas where Supervised Learning can be useful Feature Selection for Clustering Evaluating Features Increasing the Aggressiveness of the Current setup Bringing New Rules Idea
  • 12. Feature Selection/ Feature Importance & Model Accuracy and Threshold Evaluation Algorithm Used Feature Importance Metric XGBoost F Score Random Forest Gini Index, Entropy
  • 16. Feature Selection/ Importance Comparison b/w Important Feature by Random Forest & XGBoost feature_21w feature_sut feature_du1 feature_sc3 feature_drh feature_1a2 feature_sc18 feature_drl feature_snc feature_sc1 feature_2c3 feature_npb feature_3e1 feature_bst feature_nub RF - Entropy feature_sut feature_sc3 feature_21w feature_sc18 feature_du1 feature_sc1 feature_drh feature_drl feature_1a2 feature_snc feature_npb feature_3e1 feature_tbu feature_nub feature_bst RF - GiniXGBoost - F Score feature_1a2 feature_2c3 feature_hhs feature_nrp feature_urh feature_nub feature_nup feature_psc feature_sncp feature_3e1 feature_tpa feature_snc feature_bst feature_tbu feature_nub Analysis of Top 15 important variable
  • 17. Feature Selection/ Importance Comparison b/w Important Feature by Random Forest & XGBoost Reason for difference in Feature Importance b/w XGB & RF Basically, when there are several correlated features, boosting will tend to choose one and use it in several trees (if necessary). Other correlated features won t be used a lot (or not at all). It makes sense as other correlated features can't help in the split process anymore -> they don't bring new information regarding the already used feature. And the learning is done in a serial way. Each tree of a Random forest is not built from the same features (there is a random selection of features to use for each tree). Each correlated feature may have the chance to be selected in one of the tree. Therefore, when you look at the whole model it has used all features. The learning is done in parallel so each tree is not aware of what have been used for other trees. Tree Growth XGB When you grow too many trees, trees are starting to be look very similar (when there is no loss remaining to learn). Therefore the dominant feature will be an even more important. Having shallow trees reinforce this trend because there are few possible important features at the root of a tree (shared features between trees are most of the time the one at the root of it). So your results are not surprising. In this case, you may have interesting results with random selection of columns (rate around 0.8). Decreasing ETA may also help (keep more loss to explain after each iteration).
  • 18. Model Accuracy and Threshold Evaluation XGBoost
  • 19. Model Accuracy and Threshold Evaluation XGBoost A A A A BB B B
  • 20. Model Accuracy and Threshold Evaluation XGBoost A A A A BB B B
  • 21. Model Accuracy and Threshold Evaluation XGBoost A A A A BB B B
  • 22. Model Accuracy and Threshold Evaluation XGBoost A A A A BB B B
  • 23. Model Accuracy and Threshold Evaluation XGBoost A A A A BB B B
  • 24. Model Accuracy and Threshold Evaluation XGBoost Threshold Accuracy TN FP FN TP 0 0.059% 0 46990 0 2936 0.1 87.353% 42229 4761 1553 1383 0.2 93.881% 46075 915 2140 796 0.3 94.722% 46691 299 2336 600 0.4 94.894% 46866 124 2425 511 0.5 94.902% 46923 67 2478 458 0.6 94.866% 46956 34 2529 407 0.7 94.856% 46973 17 2551 385 0.8 94.824% 46977 13 2571 365 0.9 94.776% 46982 8 2600 336 1 94.119% 46990 0 2936 0 A A B B
  • 25. Model Accuracy and Threshold Evaluation Random Forest Criteria - Gini Index Random Forest Criteria - Entropy Criteria Accuracy TN FP FN TP Gini 94.800% 46968 22 2574 362 Entropy 94.788% 46967 23 2579 357 A A A A BB B B
  • 26. Model Accuracy and Threshold Evaluation Comparison b/w Random Forest & XGBoost Criteria Accuracy TN FP FN TP Gini 94.800% 46968 22 2574 362 Entropy 94.788% 46967 23 2579 357 Threshold Accuracy TN FP FN TP 0 0.059% 0 46990 0 2936 0.1 87.353% 42229 4761 1553 1383 0.2 93.881% 46075 915 2140 796 0.3 94.722% 46691 299 2336 600 0.4 94.894% 46866 124 2425 511 0.5 94.902% 46923 67 2478 458 0.6 94.866% 46956 34 2529 407 0.7 94.856% 46973 17 2551 385 0.8 94.824% 46977 13 2571 365 0.9 94.776% 46982 8 2600 336 1 94.119% 46990 0 2936 0
  • 27. Bringing New Rules Idea Comparison b/w Random Forest & XGBoost
  • 28. Bringing New Rules Idea Comparison b/w Random Forest & XGBoost