This talk was given at H2O World 2018 NYC and can be viewed here: https://youtu.be/vUqC8UPw9SU
Description:
The good news is building fair, accountable, and transparent machine learning systems is possible. The bad news is it’s harder than many blogs and software package docs would have you believe. The truth is nearly all interpretable machine learning techniques generate approximate explanations, that the fields of eXplainable AI (XAI) and Fairness, Accountability, and Transparency in Machine Learning (FAT/ML) are very new, and that few best practices have been widely agreed upon. This combination can lead to some ugly outcomes! This talk aims to make your interpretable machine learning project a success by describing fundamental technical challenges you will face in building an interpretable machine learning system, defining the real-world value proposition of approximate explanations for exact models, and then outlining the following viable techniques for debugging, explaining, and testing machine learning models: *Model visualizations including decision tree surrogate models, individual conditional expectation (ICE) plots, partial dependence plots, and residual analysis. *Reason code generation techniques like LIME, Shapley explanations, and Treeinterpreter. *Sensitivity Analysis. Plenty of guidance on when, and when not, to use these techniques will also be shared, and the talk will conclude by providing guidelines for testing generated explanations themselves for accuracy and stability. Open source examples (with lots of comments and helpful hints) for building interpretable machine learning systems are available to accompany the talk at: https://github.com/jphall663/interpretable_machine_learning_with_python Bio: Patrick Hall is senior director for data science products at H2O.ai where he focuses mainly on model interpretability. Patrick is also currently an adjunct professor in the Department of Decision Sciences at George Washington University, where he teaches graduate classes in data mining and machine learning. Prior to joining H2O.ai, Patrick held global customer facing roles and research and development roles at SAS Institute.
Speaker's Bio:
Patrick Hall is a senior director for data science products at H2o.ai where he focuses mainly on model interpretability. Patrick is also currently an adjunct professor in the Department of Decision Sciences at George Washington University, where he teaches graduate classes in data mining and machine learning. Prior to joining H2o.ai, Patrick held global customer facing roles and R & D research roles at SAS Institute. He holds multiple patents in automated market segmentation using clustering and deep neural networks. Patrick was the 11th person worldwide to become a Cloudera certified data scientist. He studied computational chemistry at the University of Illinois before graduating from the Institute for Advanced Analytics at North Carolina State University.
Breaking the Kubernetes Kill Chain: Host Path Mount
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
1. Patrick Hall
Data Science Products
H2O.ai
@jpatrickhall
https://www.linkedin.com/in/jpatrickhall/
Practical Tips for
Interpreting Machine
Learning Models
2. What is Machine Learning Interpretability?
“The ability to explain or to present in understandable terms to a
human.”
-- Finale Doshi-Velez and Been Kim. “Towards a rigorous science of
interpretable machine learning.” arXiv preprint. 2017.
https://arxiv.org/pdf/1702.08608.pdf
• FAT*: https://www.fatml.org/resources/principles-for-accountable-algorithms
• XAI: https://www.darpa.mil/program/explainable-artificial-intelligence
3. Why Should You Care About Machine Learning
Interpretability?
“The now-contemplated field of data science amounts to a superset of the fields of statistics and
machine learning, which adds some technology for “scaling up” to “big data.” This chosen superset
is motivated by commercial rather than intellectual developments. Choosing in this way is likely
to miss out on the really important intellectual event of the next 50 years.”
-- David Donoho. “50 years of Data Science.” Tukey Centennial Workshop, 2015. http://bit.ly/2GQOh1J
Social Motivation: Interpretability plays a critical role in the increased convenience, automation, and organization
in our day-to-day lives promised by AI.
Commercial Motivation: Interpretability is required for regulated industry to adopt machine learning.
• Check and balance against accidental or intentional discrimination.
• “Right to explanation.”
• Hacking and adversarial attacks.
• Improved revenue, i.e. Equifax NeuroDecision: https://www.youtube.com/watch?v=9Z_GW9WDS2c
4. How is Machine Learning Interpretability
Practiced?
Reason Codes
https://github.com/slundberg/shap
Interpretable
Models
https://web.stanford.edu/~hastie/ElemStatLearn/printings/ESLII_print12.pdf
Surrogate
Models
https://github.com/jphall663/interpretable_machine_learning_with_python
Model
Visualizations
https://github.com/jphall663/interpretable_machine_learning_with_python
… and many more!
5. Tip 1: Test Your ML Models with Sensitivity Analysis and
Random Data Attacks
If you are using a machine learning
model, you should probably be
conducting sensitivity analysis.
http://www.vias.org/tmdatanaleng/
6. Tip 2: Test Your Explanatory Software
By human quality assessment or …
Simulated data
You can use simulated data with known characteristics to test explanations. For instance, models trained on totally
random data with no relationship between a number of input variables and a prediction target should not give strong
weight to any input variable nor generate compelling local explanations or reason codes. Conversely, you can use
simulated data with a known signal generating function to test that explanations accurately represent that known
function.
https://github.com/h2oai/mli-resources/tree/master/lime_shap_treeint_compare
Explanation stability under data perturbation
Trustworthy explanations likely should not change drastically for minor changes in input data. You can set and test
thresholds for allowable explanation value changes automatically by perturbing input data. (Explanations or reason code
values can also be averaged across a number of models to create more stable explanations.)
Explanation stability with increased prediction accuracy
If previously known, accurate explanations or reason codes from a simpler linear model are available, you can use them as
a reference for the accuracy of explanations from a related, but more complex and hopefully more accurate, model. You
can perform tests to see how accurate a model can become before its prediction’s reason codes veer away from known
standards.
7. Tip 3: Consider Deployment
Production Scoring Environment
R/Py/Etc.
Training Environment
8. Tip 4: Beware of Uninterpretable Features
ClusterTE:ClusterID20:LIMIT_BAL:PAY_3:PAY_AMT2
Try:
• Sparse PCA
• Simple Interactions
• Weight of Evidence
9. Tip 5: Use Global and Local Explanatory Techniques
Together
10. Tip 6: Use Shapley Explanations for Tree-Based Models
11. Tip 7: Most Direct Path to Interpretable ML Model Today
is …
• Monotonic XGBoost + Shapley Explanations
OR
• H2O Driverless AI ;)
12. Tip 8: Combine Decision Tree Surrogate, Partial Dependence,
and ICE to Understand High-Degree Interactions
With complimentary 2-D visualizations of trained models that enable understanding of learned high-degree
interactions.
Decision Tree Surrogate Models
Partial Dependence and Individual Conditional Expectation
https://github.com/jphall663/interpretable_machine_learning_with_python
13. Tip 9: LIME …
• LIME can give an indication of its own trustworthiness using fit statistics.
• LIME can fail, particularly in the presence of extreme nonlinearity or high-degree interactions.
• LIME is difficult to deploy, but there are highly deployable variants, e.g. H2O’s K-LIME, Wells
Fargo LIME-SUP.
• Reason codes are offsets from a local intercept.
• Note that the intercept in LIME can account for the most important local phenomena.
• Generated LIME samples can contain large proportions of out-of-range data that can lead
to unrealistically high or low intercept values.
• Try LIME on discretized input features and on manually constructed interactions.
• Use cross-validation to construct standard deviations or even confidence intervals for reason
code values.