Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Exploratory data analysis using xgboost package in R

777 visualizaciones

Publicado el

Explain HOW-TO procedure exploratory data analysis using xgboost (EDAXGB), such as feature importance, sensitivity analysis, feature contribution and feature interaction. It is just based on using built-in predict() function in R package.
All of the sample codes are available at: https://github.com/katokohaku/EDAxgboost

Publicado en: Datos y análisis
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/yxufevpm } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y3nhqquc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí

Exploratory data analysis using xgboost package in R

  1. 1. Exploratory DataAnalysis Using XGBoost XGBoost を使った探索的データ分析 第1回 R勉強会@仙台(#Sendai.R)
  2. 2. ?誰 臨床検査事業 の なかのひと ?専門 遊牧@モンゴル(生態学/環境科学) ▼ 臨床検査事業の研究所(データを縦にしたり横にしたりするしごと) 1cm @kato_kohaku
  3. 3. Exploratory Data Analysis (EDA) https://www.itl.nist.gov/div898/handbook/eda/section1/eda11.htm is an approach/philosophy for data analysis that employs a variety of techniques (mostly graphical) to 1. maximize insight into a data set; 2. uncover underlying structure; 3. extract important variables; 4. detect outliers and anomalies; 5. test underlying assumptions; 6. develop parsimonious models; and 7. determine optimal factor settings.
  4. 4. EDA (or explanation) after modelling Taxonomy of Interpretation / Explanation https://christophm.github.io/interpretable-ml-book/
  5. 5. EDA using Random Forest (EDARF) randomForest を使った探索的データ分析 (off-topic) Random Forest model Imputation for missing  rfimpute()  {missForest} Rule Extraction  {intrees}  defragTrees@python  EDARF::plot_prox()  getTree() Feature importance  Gini / Accuracy  Permutation based Sensitivity analysis  Partial Dependence Plot (PDP)  feature contribution based {forestFloor} Suggestion  Feature Tweaking
  6. 6. Today’s topic Intrinsic Post hoc Model-Specific Methods • Linear Regression • Logistic Regression • GLM, GAM and more • Decision Tree • Decision Rules • RuleFit • Naive Bayes Classifier • K-Nearest Neighbors • Feature Importance (OOB error@RF; gain/cover/weight @XGB) • Feature Contribution (forestFloor@RF, XGBoostexplainer, lightgbmExplainer) • Alternate / Enumerate lasso (@LASSO) • inTrees / defragTrees (@RF/XGB) • Actionable feature tweaking (@RF/XGB) Model- Agnostic Methods Intrinsic interpretable Model にも適用可能 • Partial Dependence Plot • Individual Conditional Expectation • Accumulated Local Effects Plot • Feature Interaction • Permutation Feature Importance • Global Surrogate • Local Explanation (LIME, Shapley Values, breakDown) Example- based Explanations ?? • Counterfactual Explanations • Adversarial Examples • Prototypes and Criticisms • Influential Instances EDA × XGBoost
  7. 7. Why EDA × XGBoost (or LightGBM)? Motivation https://twitter.com/fchollet/status/1113476428249464833?s=19
  8. 8. Decision tree, Random Forest & Gradient Boosting Overview https://www.kdnuggets.com/2017/10/understanding-machine-learning-algorithms.html http://www.cse.chalmers.se/~richajo/dit866/lectures/l8/gb_explainer.pdf Gradient Boosting
  9. 9. Gradient Boosting & XGBoost Overview http://www.yisongyue.com/courses/cs155/2019_winter/lectures/Lecture_06.pdf https://www.kdd.org/kdd2016/papers/files/rfp0697-chenAemb.pdf XGBoost’s Improvements:  Overfitting suppression  Split finding efficiency  Computation time
  10. 10. EDA using XGBoost XGBoost を使った探索的データ分析 XGBoost model Rule Extraction  Xgb.model.dt.tree()  {intrees}  defragTrees@python Feature importance  Gain & Cover  Permutation based Summarize explanation  Clustering of observations  Variable response (2)  Feature interaction Suggestion  Feature Tweaking Individual explanation  Shapley value (predcontrib)  Structure based (predapprox) Variable response (1)  PDP / ICE / ALE
  11. 11. EDA (or explanation) using XGBoost 1. Build XGBoost model 2. Feature importance • Gain & Cover • Permutation based 3. Variable response (1) • Partial Dependence Plot (PDP/ICE/ALE) 4. Rule Extraction • Xgb.model.dt.tree() • intrees • defragTrees@python 5. Individual explanation • Shapley value (predcontrib) • Structure based (predapprox) 6. Variable response (2) • Shapley value (predcontrib) • Structure based (predapprox) 7. Feature interaction • 2-way SHAP (predinteraction) URL Today’s Topic Suggestion(off topic)  Feature Tweaking
  12. 12. To Get ALL the Sample Codes Please see github: • https://github.com/katokohaku/EDAxgboost
  13. 13. 1.XGBOOST MODELの構築 1. データセット 1. 変数の基本プロファイルの確認(型、定義、情報、構造、etc) 2. 前処理(変数変換、教師/テストへの分割・サンプリング、 データ変換) 2. タスクと評価指標の設定 1. 分類問題? 回帰問題(回帰の種類)? クラスタリング? その他? 2. 正確度、誤差、AUC、その他? 3. ハイパーパラメタの設定 1. パラメターサーチする・しない 2. どのパラメータ?、探索の方法? 4. 学習済みモデルの評価 1. 予測精度、予測特性(バイアス傾向)、その他 https://github.com/katokohaku/EDAxgboost/blob/master/100_building_xgboost_model.Rmd
  14. 14. EDA (or explanation) after modelling 1. Build XGBoost model 2. Feature importance • Structure based (Gain & Cover) • Permutation based 3. Variable response (1) • Partial Dependence Plot (PDP / ICE / ALE) 4. Rule Extraction • Xgb.model.dt.tree() • intrees 5. Individual explanation • Shapley value (predcontrib) • Structure based (predapprox) 6. Variable response (2) • Shapley value (predcontrib) • Structure based (predapprox) 7. Feature interaction • 2-way SHAP (predinteraction) URL EDA tools for XGBoost Suggestion(off topic)  Feature Tweaking
  15. 15. Human Resources Analytics Data Set Preparation • left (target to predict) • Whether the employee left the workplace or not (1 or 0) Factor • satisfaction_level • Level of satisfaction (0-1) • last_evaluation • Time since last performance evaluation (in Years) • number_project • Number of projects completed while at work • average_montly_hours • Average monthly hours at workplace • time_spend_company • Number of years spent in the company • Work_accident • Whether the employee had a workplace accident • promotion_last_5years • Whether the employee was promoted in the last five years • Sales • Department in which they work for • Salary • Relative level of salary (high) Source https://github.com/ryankarlos/Human-Resource-Analytics-Kaggle-Dataset/tree/master/Original_Kaggle_Dataset
  16. 16. Take a glance Preparation • GGally::ggpairs()
  17. 17. + Random Noise Make continuous features noisy with the same way as: • https://medium.com/applied-data-science/new-r-package-the-xgboost-explainer-51dd7d1aa211 Preparation
  18. 18. Baseline profile: table1::table1()
  19. 19. Convert Train / Test set to xgb.DMatrix Preparation 1. Factor variable → Integer (or dummy) 2. Separate trainset / testset (+under sampling) 3. (data.frame →) matrix → xgb.DMatrix
  20. 20. Convert Train / Test set to xgb.DMatrix To minimize the intercept of xgb model Factor → Integer Separate train set (+under sampling) Convert xgb.DMatrix Separate test set Convert xgb.DMatrix
  21. 21. Hyper-parameter settings Preparation • According to: https://xgboost.readthedocs.io/en/latest/parameter.html • Tune with Grid/Random/BayesOpt. etc., if you like. (Recommendation: using mlR)
  22. 22. Search optimal number of booster Build XGBoost model • Using cross-validation : xgb.cv()
  23. 23. Build XGBoost model: xgb.cv()
  24. 24. Predictive performances • For test set
  25. 25. Distribution of Prediction Predictive performances URL
  26. 26. 2.学習したXGBOOST MODELのプロファイル 1. 予測における特徴量の重要度 (feature importance) 1. Structure based importance(Gain & Cover): xgb.importance() 2. Permutation based importance: DALEX::variable_importance() URL https://github.com/katokohaku/EDAxgboost/blob/master/100_building_xgboost_model.Rmd
  27. 27. EDA (or explanation) after modelling 1. Build XGBoost model 2. Feature importance • Structure based (Gain & Cover) • Permutation based 3. Variable response (1) • Partial Dependence Plot (PDP / ICE / ALE) 4. Rule Extraction • Xgb.model.dt.tree() • intrees 5. Individual explanation • Shapley value (predcontrib) • Structure based (predapprox) 6. Variable response (2) • Shapley value (predcontrib) • Structure based (predapprox) 7. Feature interaction • 2-way SHAP (predinteraction) URL EDA tools for XGBoost Suggestion(off topic)  Feature Tweaking
  28. 28. xgb.importance() Feature importance For a tree model: Gain • represents fractional contribution of each feature to the model based on the total gain of this feature's splits. Higher percentage means a more important predictive feature. Cover • metric of the number of observation related to this feature; Frequency • percentage representing the relative number of times a feature have been used in trees. For a linear model's importance: Weight • the linear coefficient of the feature; https://www.rdocumentation.org/packages/xgboost/versions/0.6.4.1/topics/xgb.importance
  29. 29. Feature importance (structure based) Calculates weight when not split further for each node 1. Distribute weight differences to each node 2. Accumulate the weight of the path passed by each observation, for each booster for each feature (node)
  30. 30. Feature importance (structure based) Feature importance Gain • represents fractional contribution of each feature to the model based on the total gain of this feature's splits. Higher percentage means a more important predictive feature. https://homes.cs.washington.edu/~tqchen/pdf/BoostedTree.pdf Gain of ith feature at kth node in jth booster is calculated as
  31. 31. Feature importance (permutation based) Feature importance • Calculating the increase in the model’s prediction error after permuting the feature. • A feature is “important” if shuffling its values increases the model error, because in this case the model relied on the feature for the prediction. https://christophm.github.io/interpretable-ml-book/feature-importance.html FROM: https://www.kaggle.com/dansbecker/permutation-importance
  32. 32. Structure based vs Permutation based Feature Importance Structure based Permutation based For consistency check, rather than for "which is better?“.
  33. 33. Feature Importance
  34. 34. 3.感度分析(1) 1. 変数値の変化に対するモデル出力の応答 1. Individual Conditional Expectation & Partial Dependence Plot (ICE & PD plot) 2. PDPの問題点 3. Accumulated Local Effect (ALE) Plot URL https://github.com/katokohaku/EDAxgboost/blob/master/200_Sensitivity_analysis.Rmd
  35. 35. EDA (or explanation) after modelling 1. Build XGBoost model 2. Feature importance • Structure based (Gain & Cover) • Permutation based 3. Variable response (1) • Partial Dependence Plot (PDP / ICE / ALE) 4. Rule Extraction • Xgb.model.dt.tree() • intrees 5. Individual explanation • Shapley value (predcontrib) • Structure based (predapprox) 6. Variable response (2) • Shapley value (predcontrib) • Structure based (predapprox) 7. Feature interaction • 2-way SHAP (predinteraction) URL EDA tools for XGBoost Suggestion(off topic)  Feature Tweaking
  36. 36. Marginal Response for a Single Variable Sensitivity Analysis: ICE+PDP vs ALE Plot Variable response comparison: ICE+PD Plot ALE Plot
  37. 37. What-If & other observation (ICE) + average line (PD) Ceteris Paribus Plots (blue line) • show possible scenarios for model predictions allowing for changes in a single dimension keeping all other features constant (the ceteris paribus principle). Individual Conditional Expectation (ICE) plot (gray lines) • visualizes one line per instance. Partial Dependence plot (red line) • are shown as the average line of all observation. https://christophm.github.io/interpretable-ml-book/ice.html Feature value Modeloutput
  38. 38. The assumption of independence • is the biggest issue with Partial Dependence plots. When the features are correlated, PD create new data points in areas of the feature distribution where the actual probability is very low. Disadvantage of Ceteris Paribus Plots and PDP https://christophm.github.io/interpretable-ml-book/pdp.html#disadvantages-5 Forexample,it is unlikelythat: Someone is 2 meters tall but weighs less than 50 kg.
  39. 39. A Solution Local Effect • averages its derivative of observations on conditional distribution, instead of averaging overall distribution of target feature. Accumulated Local Effects (ALE) • averages Local Effects across the window after being calculated for each window. https://arxiv.org/abs/1612.08468 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 LocalEffect(4) ALE = mean(Local Effects)
  40. 40. Sensitivity Analysis: ICE+PDP & ALE Plot
  41. 41. Sensitivity Analysis: ICE+PDP vs ALE Plot
  42. 42. 4-1.ツリーの可視化 と ルールの要約 1. ツリーの可視化 1. boosterのダンプ: xgb.model.dt.tree() 2. Single boosterの可視化: xgb.plot.tree() 3. 要約したツリーの可視化: xgb.plot.multi.trees() 2. 予測ルールの抽出(inTrees) 1. ルールの列挙 2. ルールの要約 URL https://github.com/katokohaku/EDAxgboost/blob/master/300_rule_extraction_xgbPlots.Rmd
  43. 43. EDA (or explanation) after modelling 1. Build XGBoost model 2. Feature importance • Structure based (Gain & Cover) • Permutation based 3. Variable response (1) • Partial Dependence Plot (PDP / ICE / ALE) 4. Rule Extraction • Xgb.model.dt.tree() • intrees 5. Individual explanation • Shapley value (predcontrib) • Structure based (predapprox) 6. Variable response (2) • Shapley value (predcontrib) • Structure based (predapprox) 7. Feature interaction • 2-way SHAP (predinteraction) URL EDA tools for XGBoost Suggestion(off topic)  Feature Tweaking
  44. 44. Text dump Tree model structure Rule Extraction:: xgb.model.dt.tree() • Parse a boosted tree model into a data.table structure.
  45. 45. Plot a boosted tree model (1st tree) Rule Extraction URL
  46. 46. Plot a boosted tree model (2nd tree) Rule Extraction URL
  47. 47. Plot multiple tree model Rule Extraction URL
  48. 48. Multiple-in-one plot Rule Extraction URL
  49. 49. 4-2.ツリーの可視化 と ルールの要約 1. ツリーの可視化 1. boosterのダンプ: xgb.model.dt.tree() 2. Single boosterの可視化: xgb.plot.tree() 3. 要約したツリーの可視化: xgb.plot.multi.trees() 2. 予測ルールの抽出(inTrees) 1. ルールの列挙 2. ルールの要約 URL https://github.com/katokohaku/EDAxgboost/blob/master/300_rule_extraction_xgbPlots.Rmd
  50. 50. Extract rules from of trees Rule Extraction: {inTrees} https://arxiv.org/abs/1408.5456 • Using inTrees
  51. 51. Enumerate rules from of trees Rule Extraction: {inTrees}
  52. 52. Build a simplified tree ensemble learner (STEL) Rule Extraction: {inTrees} ALL of sample code are: https://github.com/katokohaku/EDAxgboost/blob/master/310_rule_extraction_inTrees.md
  53. 53. 5-1.FEATURE CONTRIBUTIONにもとづくプロファイル 1. 個別の観察の説明 (prediction breakdown) 1. Shapley value: predict(..., predcontrib = TRUE, predapprox = FALSE) 2. Structure based: predict(..., predcontrib = TRUE, predapprox = TRUE) 3. 予測に基づく観察対象の次元削減 4. クラスタリングによるグループ化 5. グループ内の観察の可視化 URL https://github.com/katokohaku/EDAxgboost/blob/master/400_breakdown_individual-explanation_and_clustering.Rmd
  54. 54. EDA (or explanation) after modelling 1. Build XGBoost model 2. Feature importance • Structure based (Gain & Cover) • Permutation based 3. Variable response (1) • Partial Dependence Plot (PDP / ICE / ALE) 4. Rule Extraction • Xgb.model.dt.tree() • intrees 5. Individual explanation • Shapley value (predcontrib) • Structure based (predapprox) 6. Variable response (2) • Shapley value (predcontrib) • Structure based (predapprox) 7. Feature interaction • 2-way SHAP (predinteraction) URL EDA tools for XGBoost Suggestion(off topic)  Feature Tweaking
  55. 55. Shapley value A method for assigning payouts to players depending on their contribution to the total payout. Players cooperate in a coalition and receive a certain profit from this cooperation. The “game” • is the prediction task for a single instance of the dataset. The “gain” • is the actual prediction for this instance minus the average prediction for all instances. The “players” • are the feature values of the instance that collaborate to receive the gain (= predict a certain value). • https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf • https://christophm.github.io/interpretable-ml-book/shapley.html Feature contribution based on cooperative game theory
  56. 56. Shapley value Shapley value is the average of all the marginal contributions to all possible coalitions. • One solution to keep the computation time manageable is to compute contributions for only a few samples of the possible coalitions. • https://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf • https://christophm.github.io/interpretable-ml-book/shapley.html Feature contribution based on cooperative game theory
  57. 57. Shapley value
  58. 58. Breakdown individual explanation path Feature contribution based on tree structure Based on xgboost model structure, 1. Calculate weight when not split further for each node 2. Distribute weight differences to each node 3. Accumulate the weight of the path passed by each observation, for each booster for each feature (node)
  59. 59. Feature contribution based on tree structure To get prediction path
  60. 60. Feature contribution based on tree structure
  61. 61. Individual explanation path Enumerate Feature contribution based on Shapley / tree structure Each row explains each observation (prediction breakdown)
  62. 62. Explain single observation Individual explanation: Each row explains each observation (prediction breakdown)
  63. 63. 5-2.FEATURE CONTRIBUTIONにもとづくプロファイル 1. 個別の観察の説明 (prediction breakdown) 1. Shapley value: predict(..., predcontrib = TRUE, predapprox = FALSE) 2. Structure based: predict(..., predcontrib = TRUE, predapprox = TRUE) 3. 予測に基づく観察対象の次元削減 4. クラスタリングによるグループ化 5. グループ内の観察の可視化 URL https://github.com/katokohaku/EDAxgboost/blob/master/400_breakdown_individual-explanation_and_clustering.Rmd
  64. 64. Identify clusters based on xgboost Clustering of featurecontribution of each observation using t-SNE • Dimension reduction using t-SNE
  65. 65. Dimension reduction: Rtsne::Rtsne()
  66. 66. Identify clusters based on xgboost Rtsne::Rtsne() → hclust() → cutree() → ggrepel::geom_label_repel() • Class labeling using hierarchical clustering (hclust)
  67. 67. Rtsne::Rtsne() → hclust() → cutree() → ggrepel::geom_label_repel()
  68. 68. Rtsne::Rtsne() → hclust() → cutree() → ggrepel::geom_label_repel() Scatter plot with group label
  69. 69. Similar observations in a cluster (1) Individual explanation URL
  70. 70. Similar observations in a cluster (2) Individual explanation URL
  71. 71. Individual explanation https://github.com/katokohaku/EDAxgboost/blob/master/R/waterfallBreakdown.R
  72. 72. 6.FEATURE CONTRIBUTIONにもとづく感度分析 1. 変数値の変化に対するモデル出力の応答(感度分析)② 1. Shapley value: predict(..., predcontrib = TRUE, predapprox = FALSE) 2. Structure based: predict(..., predcontrib = TRUE, predapprox = TRUE) URL https://github.com/katokohaku/EDAxgboost/blob/master/410_breakdown_feature_response-interaction.Rmd
  73. 73. EDA (or explanation) after modelling 1. Build XGBoost model 2. Feature importance • Structure based (Gain & Cover) • Permutation based 3. Variable response (1) • Partial Dependence Plot (PDP / ICE / ALE) 4. Rule Extraction • Xgb.model.dt.tree() • intrees 5. Individual explanation • Shapley value (predcontrib) • Structure based (predapprox) 6. Variable response (2) • Shapley value (predcontrib) • Structure based (predapprox) 7. Feature interaction • 2-way SHAP (predinteraction) URL EDA tools for XGBoost Suggestion(off topic)  Feature Tweaking
  74. 74. Individual explanation path Individual explanation Each column explains each feature impact (variable response)
  75. 75. Individual Feature Impact (1) Sensitivity Analysis Each column explains each feature impact (variable response)
  76. 76. Individual Feature Impact (2-1) Sensitivity Analysis Each column explains each feature impact (variable response)
  77. 77. Individual Feature Impact (2-2) Sensitivity Analysis Each column explains each feature impact (variable response)
  78. 78. Contribution dependency plots Sensitivity Analysis URL xgb.plot.shap() • display the estimated contributions (Shapley value) of a feature to model prediction for each individual case.
  79. 79. Feature Impact Summary Sensitivity Analysis http://www.f1-predictor.com/model-interpretability-with-shap/ Similar to SHAPR, • contribution breakdown from prediction path (model structure).
  80. 80. 6.CONTRIBUTIONにもとづく相互作用分析 1. 変数同士の相互作用 1. 2変数の相互作用の強さ: predict(..., predinteraction = TRUE) URL https://github.com/katokohaku/EDAxgboost/blob/master/410_breakdown_feature_response-interaction.Rmd
  81. 81. EDA (or explanation) after modelling 1. Build XGBoost model 2. Feature importance • Structure based (Gain & Cover) • Permutation based 3. Variable response (1) • Partial Dependence Plot (PDP / ICE / ALE) 4. Rule Extraction • Xgb.model.dt.tree() • intrees 5. Individual explanation • Shapley value (predcontrib) • Structure based (predapprox) 6. Variable response (2) • Shapley value (predcontrib) • Structure based (predapprox) 7. Feature interaction • 2-way SHAP (predinteraction) URL EDA tools for XGBoost Suggestion(off topic)  Feature Tweaking
  82. 82. Feature interaction of single observation • Feature contribution can be decomposed as 2-way feature interaction. Feature interaction
  83. 83. 2-way featue interaction: Feature contribution for feature contribution Individual explanation Each row shows breakdown of contribution
  84. 84. Feature interaction of single observation • xgboost:::predict.xgb.Booster(..., predinteraction = TRUE) xgboost:::predict.xgb.Booster(..., predinteraction = TRUE)
  85. 85. Individual explanation Feature contribution for feature contribution of single instance
  86. 86. Absolute mean of all interaction • SHAP can be decomposed as 2-way feature interaction. xgboost:::predict.xgb.Booster(..., predinteraction = TRUE)
  87. 87. xgboost Original Paper • https://www.kdd.org/kdd2016/subtopic/view/xgboost-a-scalable-tree- boosting-system Tasks, Metrics & other Parameters • https://xgboost.readthedocs.io/en/latest/ For R • http://dmlc.ml/rstats/2016/03/10/xgboost.html • https://xgboost.readthedocs.io/en/latest/R- package/xgboostPresentation.html • https://xgboost.readthedocs.io/en/latest/R-package/discoverYourData.html 解説ブログ記事・スライド(日本語) • http://kefism.hatenablog.com/entry/2017/06/11/182959 • https://speakerdeck.com/hoxomaxwell/dive-into-xgboost References
  88. 88. Data & Model explanation Generic interpretability/explainability • Iml book • https://christophm.github.io/interpretable-ml-book/ Exploratory Data Analysis (EDA) • What is EDA? • https://www.itl.nist.gov/div898/handbook/eda/section1/eda11.htm • DALEX • Descriptive mAchine Learning EXplanations • https://pbiecek.github.io/DALEX/ • DrWhy • the collection of tools for Explainable AI (XAI) • https://pbiecek.github.io/DALEX/ References

×