SlideShare una empresa de Scribd logo
1 de 17
XGBOOST: A SCALABLE
TREE BOOSTING SYSTEM
(T. CHEN, C. GUESTRIN, 2016)
NATALLIE BAIKEVICH
HARDWARE ACCELERATION FOR
DATA PROCESSING SEMINAR
ETH ZÜRICH
MOTIVATION
 Effective
statistical
models
 Scalable system
 Successful
real-world
applications
XGBoost
eXtreme
Gradient
Boosting
BIAS-VARIANCE TRADEOFF
Random Forest
Variance ↓
Boosting
Bias ↓
Voting
+ +
A BIT OF HISTORY
AdaBoost, 1996
Random Forests, 1999
Gradient Boosting Machine, 2001
AdaBoost, 1996
Random Forests, 1999
Gradient Boosting Machine, 2001
Various improvements in tree
boosting
XGBoost package
A BIT OF HISTORY
AdaBoost, 1996
Random Forests, 1999
Gradient Boosting Machine, 2001
Various improvements in tree
boosting
XGBoost package
1st Kaggle success: Higgs Boson
Challenge
17/29 winning solutions in 2015
A BIT OF HISTORY
WHY DOES XGBOOST WIN "EVERY" MACHINE
LEARNING COMPETITION?
- (MASTER THESIS, D. NIELSEN, 2016)
Source: https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions
TREE ENSEMBLE
REGULARIZED LEARNING
OBJECTIVE
L = l( ˆyi, yi )
i
å + W( fk )
k
å
W( f ) =gT +
1
2
l w
2
Source: http://xgboost.readthedocs.io/en/latest/model.html
ˆyi = fk (xi )
k=1
K
å
loss
regularization
# of leaves
SCORE CALCULATION
1st order gradient 2nd order gradient
Statistics for each leaf
Score
ALGORITHM FEATURES
 Regularized objective
 Shrinkage and column subsampling
 Split finding: exact & approximate,
global & local
 Weighted quantile sketch
 Sparsity-awareness
SYSTEM DESIGN:
BLOCK STRUCTURE
O(Kd x 0
logn) O(Kd x 0
+ x 0
logB)
Blocks can be
 Distributed across machines
 Stored on disk in out-of-core setting
Sorted structure –> linear scan
# trees
Max depth
# non-missing entries
SYSTEM DESIGN:
CACHE-AWARE ACCESS
Improved split finding
 Allocate internal buffer
 Prefetch gradient statistics
Non-continuous memory access
Datasets:
Larger vs Smaller
SYSTEM DESIGN:
BLOCK STRUCTURE
Compression by
columns (CSC):
Decompression
vs
Disk Reading
Block sharding:
Use multiple disks
Too large blocks, cache misses
Too small, inefficient
parallelization
Prefetch
in independent thread
EVALUATION
AWS c3.8xlarge machine:
32 virtual cores, 2x320GB SSD,
60 GB RAM
32 m3.2xlarge machines, each:
8 virtual cores, 2x80GB SSD,
30GB RAM
DATASETS
Dataset n m Task
Allstate 10M 4227 Insurance claim classification
Higgs Boson 10M 28 Event classification
Yahoo LTRC 473K 700 Learning to rank
Criteo 1.7B 67 Click through rate prediction
WHAT’S NEXT?
Model Extensions
DART (+ Dropouts)
LinXGBoost
Parallel Processing
GPU
FPGA
Tuning
Hyperparameter
optimization
More Applications
XGBoost
Scalability
Weighted quantiles
Sparsity-awareness
Cache-awarereness
Data compression

Más contenido relacionado

La actualidad más candente

Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)Pravinkumar Landge
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning TechniquesBabu Priyavrat
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Feature selection
Feature selectionFeature selection
Feature selectionDong Guo
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ FyberDaniel Hen
 
boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)SANG WON PARK
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionJaroslaw Szymczak
 
Xgboost: A Scalable Tree Boosting System - Explained
Xgboost: A Scalable Tree Boosting System - ExplainedXgboost: A Scalable Tree Boosting System - Explained
Xgboost: A Scalable Tree Boosting System - ExplainedSimon Lia-Jonassen
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboostmichiaki ito
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningShubhmay Potdar
 

La actualidad más candente (20)

Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Xgboost
XgboostXgboost
Xgboost
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Decision tree
Decision treeDecision tree
Decision tree
 
boosting algorithm
boosting algorithmboosting algorithm
boosting algorithm
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
 
Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)boosting 기법 이해 (bagging vs boosting)
boosting 기법 이해 (bagging vs boosting)
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 
Xgboost: A Scalable Tree Boosting System - Explained
Xgboost: A Scalable Tree Boosting System - ExplainedXgboost: A Scalable Tree Boosting System - Explained
Xgboost: A Scalable Tree Boosting System - Explained
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboost
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
K means Clustering Algorithm
K means Clustering AlgorithmK means Clustering Algorithm
K means Clustering Algorithm
 

Destacado

Gbm.more GBM in H2O
Gbm.more GBM in H2OGbm.more GBM in H2O
Gbm.more GBM in H2OSri Ambati
 
GBM package in r
GBM package in rGBM package in r
GBM package in rmark_landry
 
Automated data analysis with Python
Automated data analysis with PythonAutomated data analysis with Python
Automated data analysis with PythonGramener
 
Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostGradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostJaroslaw Szymczak
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorVivian S. Zhang
 
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDecision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDeepak George
 

Destacado (8)

Inlining Heuristics
Inlining HeuristicsInlining Heuristics
Inlining Heuristics
 
Gbm.more GBM in H2O
Gbm.more GBM in H2OGbm.more GBM in H2O
Gbm.more GBM in H2O
 
GBM package in r
GBM package in rGBM package in r
GBM package in r
 
Automated data analysis with Python
Automated data analysis with PythonAutomated data analysis with Python
Automated data analysis with Python
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
 
Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostGradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboost
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
 
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDecision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
 

Último

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

XGBoost (System Overview)

  • 1. XGBOOST: A SCALABLE TREE BOOSTING SYSTEM (T. CHEN, C. GUESTRIN, 2016) NATALLIE BAIKEVICH HARDWARE ACCELERATION FOR DATA PROCESSING SEMINAR ETH ZÜRICH
  • 2. MOTIVATION  Effective statistical models  Scalable system  Successful real-world applications XGBoost eXtreme Gradient Boosting
  • 3. BIAS-VARIANCE TRADEOFF Random Forest Variance ↓ Boosting Bias ↓ Voting + +
  • 4. A BIT OF HISTORY AdaBoost, 1996 Random Forests, 1999 Gradient Boosting Machine, 2001
  • 5. AdaBoost, 1996 Random Forests, 1999 Gradient Boosting Machine, 2001 Various improvements in tree boosting XGBoost package A BIT OF HISTORY
  • 6. AdaBoost, 1996 Random Forests, 1999 Gradient Boosting Machine, 2001 Various improvements in tree boosting XGBoost package 1st Kaggle success: Higgs Boson Challenge 17/29 winning solutions in 2015 A BIT OF HISTORY
  • 7. WHY DOES XGBOOST WIN "EVERY" MACHINE LEARNING COMPETITION? - (MASTER THESIS, D. NIELSEN, 2016) Source: https://github.com/dmlc/xgboost/tree/master/demo#machine-learning-challenge-winning-solutions
  • 9. REGULARIZED LEARNING OBJECTIVE L = l( ˆyi, yi ) i å + W( fk ) k å W( f ) =gT + 1 2 l w 2 Source: http://xgboost.readthedocs.io/en/latest/model.html ˆyi = fk (xi ) k=1 K å loss regularization # of leaves
  • 10. SCORE CALCULATION 1st order gradient 2nd order gradient Statistics for each leaf Score
  • 11. ALGORITHM FEATURES  Regularized objective  Shrinkage and column subsampling  Split finding: exact & approximate, global & local  Weighted quantile sketch  Sparsity-awareness
  • 12. SYSTEM DESIGN: BLOCK STRUCTURE O(Kd x 0 logn) O(Kd x 0 + x 0 logB) Blocks can be  Distributed across machines  Stored on disk in out-of-core setting Sorted structure –> linear scan # trees Max depth # non-missing entries
  • 13. SYSTEM DESIGN: CACHE-AWARE ACCESS Improved split finding  Allocate internal buffer  Prefetch gradient statistics Non-continuous memory access Datasets: Larger vs Smaller
  • 14. SYSTEM DESIGN: BLOCK STRUCTURE Compression by columns (CSC): Decompression vs Disk Reading Block sharding: Use multiple disks Too large blocks, cache misses Too small, inefficient parallelization Prefetch in independent thread
  • 15. EVALUATION AWS c3.8xlarge machine: 32 virtual cores, 2x320GB SSD, 60 GB RAM 32 m3.2xlarge machines, each: 8 virtual cores, 2x80GB SSD, 30GB RAM
  • 16. DATASETS Dataset n m Task Allstate 10M 4227 Insurance claim classification Higgs Boson 10M 28 Event classification Yahoo LTRC 473K 700 Learning to rank Criteo 1.7B 67 Click through rate prediction
  • 17. WHAT’S NEXT? Model Extensions DART (+ Dropouts) LinXGBoost Parallel Processing GPU FPGA Tuning Hyperparameter optimization More Applications XGBoost Scalability Weighted quantiles Sparsity-awareness Cache-awarereness Data compression