SlideShare una empresa de Scribd logo
1 de 39
Descargar para leer sin conexión
Learn more at datascience.com  |  Empower Your Data Scientists
March 8, 2018
Head to Booth 1215 for a live demo of the DataScience.com Platform
Human in the Loop: Bayesian Rules Enabling Explainable AI
Learn more at datascience.com  |  Empower Your Data Scientists 2
About Me
I am a lead data scientist at DataScience.com. I enjoy applying and optimizing classical machine
learning algorithms, NLP, and Bayesian design strategy to solve real-world problems. Currently, I
am exploring on better ways to extract, evaluate, and explain the learned decision policies of
models. Before joining DataScience.com, I used machine learning algorithms to find love for
eHarmony customers. I am one of the principal authors of Skater, a model interpretation package
for Python. I also organize the PyData Socal meet-up.
Pramit Choudhary
@MaverickPramit
https://www.linkedin.com/in/pramitc/
https://github.com/pramitchoudhary
Learn more at datascience.com  |  Empower Your Data Scientists
Agenda
● Understand the problem of model opacity
● Define the “what” and “why” of model interpretation
● Define the scope of model interpretation
● How do we enable interpretability?
● What is the Bayesian rule list?
● Understand the tension between interpretability and performance
● Benchmark numbers
● What is Skater and how does it help you build models the right way?
● References
Learn more at datascience.com  |  Empower Your Data Scientists
The Problem of Model Opacity
(Y|X)
Training Set
{xi
∈ X, yi
∈ Y}
Why I am getting
weird predictions?
Was my model
biased?
I am not 100%
sure what’s in
the box; I didn’t
build the model.
“By 2018, half of business ethics violations will occur through improper use of
big data analytics.” — Gartner
**reference: https://www.gartner.com/newsroom/id/3144217
Predictor
Black Box Model
Holdout/Test Set
{xi
∈ X, yi
∈ Y}
Empower Your Data Scientists
What is Model Interpretation?
● An extension of model evaluation that helps to foster a better understanding
of a model’s learned decision policies.
● Ability to explain and present a model in a way that is human understandable.
● Human understandable: The model’s result is self descriptive & needs no
further explanation.
Learn more at datascience.com  |  Empower Your Data Scientists
We are starting our journey of explainability with supervised learning problems.
Image source: constructed using tensorboard
Learn more at datascience.com  |  Empower Your Data Scientists
With model interpretation, we want to answer the following questions:
○ Why did the model behave in a certain way?
○ What was the reason for false positives? What are the relevant variables driving a model’s outcome,
e.g., customer lifetime value, fraud detection, image classification, spam detection?
○ How can we trust the predictions of a “black box” model? Is the predictive model biased?
What Do We Want to Achieve?
Learn more at datascience.com  |  Empower Your Data Scientists
Machine Learning Workflow
Define
Hypothesis
Use relevant key
performance
indicators
Handle Data
Handle Missing
Data
Data Partitioning
Engineer and
Select
Features
Transform data
Select relevant
features
Build Model
Build a predictive
model
Deploy Model
Operationalize
analytics as
scalable REST APIs
Test and Monitor
Model
1. Log and track behavior
2. Evaluate
3. Conduct A/B or
multi-armed bandit testing
1 2 3 4 5 6
Model Interpretation: In-Memory Models
● Model assessment
● Explain model at a global and local level
● Publish insights, make collaborative and
informed decisions
Model Interpretation: Deployed Models
● Explore and explain model behavior
● Debug and discover errors to improve
performance
RETRAIN
EVALUATE
Improve existing hypothesis or generate a new one
Learn more at datascience.com  |  Empower Your Data Scientists
An Interpretable Machine Learning System
Interpretability with Rule
Extraction
Empower Your Data Scientists
Why is Model Interpretation Important?
Producer:
● Data scientist/analyst building a model
● Consultants helping clients
Consumer/Decision Maker:
● Business owners or data engineers
● Risk/security assessment managers
● Humans being affected by the model
“Explain the model.”
Learn more at datascience.com  |  Empower Your Data Scientists
Ideas collapse.
Image source: Edu Lauton on Unsplash
Learn more at datascience.com  |  Empower Your Data Scientists 12
Motives for Model Interpretation
1. Debugging and improving an ML system
2. Exploring and discovering latent or hidden feature
interactions (useful for feature engineering/selection
and resolving preconceptions )
3. Understanding model variability
4. Helps in model comparison
5. Building domain knowledge about a particular use
case
6. Brings transparency to decision making to enable
trust
1. Explain the model/algorithm
2. Explain the key features driving the KPI
3. Verify and validate the accountability of ML
learning systems, e.g. causes for False positives in
credit scoring, insurance claim frauds
4. Identify blind spots to prevent adversarial attacks
or fixing dataset errors
5. Ability to share the explanations to consumers of
the predictive model?
6. Comply with Data Protection Regulations, e.g. EU’s
GDPR
● Data Scientist /
● Machine Learning Engineer
● Data Analyst
● Statistician
● Data Science Manager
● Business owner
● Data Engineer
● Auditors / Risk Managers
Producer Consumer
Learn more at datascience.com  |  Empower Your Data Scientists
Scope Of Interpretation
Global Interpretation
Being able to explain the conditional interaction
between dependent(response) variables and
independent(predictor, or explanatory) variables
based on the complete dataset
Global
Interpretation
Local Interpretation
Local Interpretation
Being able to explain the conditional interaction
between dependent(response) variables and
independent(predictor, or explanatory) variables
with respect to a single prediction
Learn more at datascience.com  |  Empower Your Data Scientists
How Do We Enable Model Interpretation?
Reference: Been Kim(ICML’17) Google Brain
( http://people.csail.mit.edu/beenkim/papers/BeenK_FinaleDV_ICML2017_tutorial.pdf )
Learn more at datascience.com  |  Empower Your Data Scientists
Introducing Skater
https://github.com/datascienceinc/Skater
If you like the idea, give us
a star!Gitter Channel (join us here):
https://gitter.im/datascienceinc-skater
/Lobby
Learn more at datascience.com  |  Empower Your Data Scientists
1. Post-Hoc Evaluation of Models
Learn more at datascience.com  |  Empower Your Data Scientists
How Do We Enable Interpretation?
➢ Post-hoc evaluation: A black-box model is built, and we need a way to interpret it.
○ Model agnostic partial dependence plot
○ Model agnostic feature importance
○ Local interpretable model agnostic explanation (LIME)
○ Saliency mask for DNN (image/text): Not supported yet; coming soon...
G. Hooker( KDD’04 ). Discovering additive structure in black box functions
Marco Tulio Ribeiro et. al(2016). Nothing Else Matters
Ning Xie et. al(NIPS’ 2017). Relating Input Concepts to Convolutional Neural Network Decisions
Learn more at datascience.com  |  Empower Your Data Scientists
2. Bayesian Rule List:
Building Naturally Interpretable Models
Via Rule Extraction
Learn more at datascience.com  |  Empower Your Data Scientists
Demo
Building a Model Using a Bayesian Rule List and Skater
1. https://github.com/datascienceinc/Skater/blob/master/example
s/rule_list_notebooks/rule_lists_continuous_features.ipynb
2. https://github.com/datascienceinc/Skater/blob/master/example
s/rule_list_notebooks/rule_lists_titanic_dataset.ipynb
3. https://github.com/datascienceinc/Skater/blob/master/example
s/credit_analysis/credit_analysis_rule_lists.ipynb
Learn more at datascience.com  |  Empower Your Data Scientists
How Do We Enable Interpretation?
➢ Using a probabilistic interpretable estimator (bayesian rule list):
a. Generative probabilistic classifier P(y = 1| x) for each x
b. Initially designed by Letham, Rudin, McCormick, Madigan (2015)
c. Improved by Hongyu Yang. et. al. as Scalable Bayesian Rule List (2017)
d. Works great for Tabular datasets with discrete and independent meaningful features
e. Competitor to decision trees; greedy splitting and pruning
f. Built using pre-mined association rules (frequent pattern-matching algorithms)
• ECLAT (Equivalence Class Clustering and Bottom up Lattice Traversal)
• Non-frequent patterns are not considered
g. Build a bayesian hierarchical model over frequently occuring pre-mined rule lists
h. Applies MCMC (Metropolis–Hastings algorithm) to sample from posterior distribution
over permutation of “IF-THEN-ELSE” conditional statement
i. Output: Generates a logical structure of human-interpretable IF then ELSE decision
stumps
j. Scope of interpretation: global and local
Learn more at datascience.com  |  Empower Your Data Scientists
Bayesian Rule List
● Consider independent and identically distributed(i.i.d) training examples of the form {X, Y} ->
{(xi
, yi
}n
i=1
where xi
∈ X as encoded features and yi
∈ Y as binary labels [0s or 1s].
● A typical bayesian rule list estimator would look like this:
Each rule is independent and selected
from a set of pre-mined rules using
frequent matching algorithms, e.g.,
ECLAT.
Goal: Optimize over the possible set
of pre-mined rules and their order to
create the final set of interpretable
decision stumps.
Learn more at datascience.com  |  Empower Your Data Scientists
Example: Rule List Representation
Figure: BRL output on common diabetes dataset ( http://scikit-learn.org/stable/datasets/index.html#diabetes-dataset )
Goal: Optimize on finite
number of rules maintaining
accuracy.
Sampling: Rules are
sampled from posterior
distribution over a
permutation of pre-mined
rules.
Optimize cardinality of rules horizontally and vertically
Scope of Interpretation:
Global and local.
Learn more at datascience.com  |  Empower Your Data Scientists
Generative vs. Discriminative Models
Input Examples
{xi
∈ X, yi
∈ Y}
Discriminative Model:
Models the posterior probability
directly; maps input X to output and
labels Y directly; e.g., SVM, NN.
Generative Model:
Models a joint probability of input X
and output Y p(X, Y); computes
prediction P(Y | X) using Bayes’ rule;
e.g., Naive Bayes, GAN (Generative
Adversarial Network), BRL.
Learns p( Y | X ) directly
p( Y | X ) ∝ p(X | Y) * p(Y)
Learns p( Y | X ) in-directly
** Reference: Ng and Jordan(2001) On Discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes
Learn more at datascience.com  |  Empower Your Data Scientists
Optimization Goals for Bayesian Rule List
Sample from a posterior distribution over a permutation of pre-mined “IF-THEN-ELSE” conditional
statement:
(d|X, Y, A, , λ, ) ∝ (Y|X, d, ) * (d|A, λ, )
where,
● d = Ordered subset of rules
● A: Pre-mined collection of all rules using the frequent pattern matching algorithm
● Prior hyper-parameters: , λ,
○ = [ 0, 1]: Prior parameter for each label in a binary classification problem
○ λ: Hyper-parameter for the expected length of the rule list
○ : Hyper-parameter for the expected cardinality of each rule in the optimal rule list
Likelihood: Probability of an event
that has already occurred (binomial
distribution).
Prior Probability: Probability
of one’s belief before evidence
(beta distribution).
Posterior: Conditional probability of an
event based on relevant evidence
∝
See Chapter Three of Machine
Learning: A Probabilistic Perspective
Learn more at datascience.com  |  Empower Your Data Scientists
Tension Between Interpretability and Model Performance
Learn more at datascience.com  |  Empower Your Data Scientists
Performance vs. Interpretability
Annual Income
Late fee
amount
Simple decision boundary
(Linear Monotonic)
Complex decision
boundary
(Non-Linear
Non-Monotonic)
Credit card approved
Credit card denied
Non-linear decision
boundary (nonLinear
Monotonic)
Learn more at datascience.com  |  Empower Your Data Scientists
Tension Between Interpretability and Model Performance
Model Performance (Accuracy)
Interpretability/
Degree of
Opacity
Deep learning models
Support vector machines
Random forest
K-nearest neighbors
Linear/Logistic Regression
Decision trees
Bayesian Rule List
XGBoost
** Remember: The purpose of the chart is not to mirror any benchmark on model performance, but to articulate the opacity of predictive models
Learn more at datascience.com  |  Empower Your Data Scientists
No Free Lunch Theorem
“Any elevated performance over one class of problems is offset by performance over another
class.” — David H. Wolpert and William G. Macready, (1997), https://ti.arc.nasa.gov/m/profile/dhw/papers/78.pdf
Simplicity: 10, Robustness: 10, Computation Speed:
scope for improvement, Interpretability: 10
Simplicity:10, Robustness:10, Scalability: with
smart optimization, Interpretability: 10
Image source:wiki(Mimooh, https://commons.wikimedia.org/wiki/File:No_free_lunch_theorem.svg
Learn more at datascience.com  |  Empower Your Data Scientists
Simplicity Is Key
Model Selection Policies:
● Model Performance (e.g., AUC-ROC): How
accurate is the model?
● Scalability: Can the model handle huge volume of
data?
● Computational Speed: Does the model take a long
time to build?
● Robustness: Are the predicted result stable over a
period of time?
● Interpretability: Can one interpret the output in a
human understandable way?
● Simplicity: Can one explain the model easily?
● Occam’s Razor Principle: “When presented with competing hypothetical answers to a problem, one should select
the one that makes the fewest assumptions.”
● In computational learning, build models with the objective of producing a succinct representation of the training set.
Learn more at datascience.com  |  Empower Your Data Scientists
What If We Achieve Accuracy?
Figure: Comparison of BRL and RF using AUC of ROC on Titanic dataset
Learn more at datascience.com  |  Empower Your Data Scientists
Performance Benchmark Using BRL
Dataset Data Type Problem Type Model
Type
Train
Accuracy
Test
Accuracy
Train
AUC-RO
C
Test
AUC-ROC
Computation
Time (in sec)
Diabetes dataset
(Train: 576 rows;
Test: 192)
Tabular data:
continuous
features
Supervised
Classification
BRLC 0.78 0.71 0.82 0.76 0.74
Diabetes dataset
(Train: 576 rows;
Test: 192)
Tabular data:
continuous
features
Supervised
Classification
RF 1.0 0.75 0.81 0.80 0.14
Titanic dataset
(Train: 571 rows;
Test: 143 rows)
Tabular data:
categorical &
continuous
Supervised
Classification
BRLC 0.80 0.86 0.84 0.86 0.67
Titanic dataset
(Train: 571 rows;
Test: 143 rows)
Tabular data:
categorical &
continuous
Supervised
Classification
RF 1.0 0.81 1.0 0.86 0.07
Credit analysis
(Train: 29,839 rows;
Test: 9,947 rows )
Tabular data:
categorical &
continuous
Supervised
Classification BRLC
0.86 0.86 0.65 0.65 2.81
Credit analysis
(Train: 29,839 rows;
Test: 9,947 rows )
Tabular data:
categorical &
continuous
Supervised
Classification
Linear SVM 0.85 0.86 0.68 0.70 0.15
Could be improved with more thoughtful
feature engineering and selection
0.05 difference in performance on hold out
using 10% of the data compared to SVM
Learn more at datascience.com  |  Empower Your Data Scientists
Skater: BRL API Overview (BRLC)
Import the BRLC class
Instantiate BRLC instance
Train a model using fit
Display learned
“if-else” conditions
Use discretizer for
continuous features
Generate class probabilities
Predict class labels
Persist model
Access other rules
Learn more at datascience.com  |  Empower Your Data Scientists
Mission Statement: Enable Interpretability for All Models
Model Performance (Accuracy)
Interpretability/
Degree of
Opacity
Deep learning models
Support vector machines
Random forest
K-nearest neighbors
Linear/Logistic Regression
Decision trees
Bayesian Rule List
XGBoost
** Remember: The purpose of the chart is not to mirror any benchmark on model performance, but to articulate the opacity of predictive models
Learn more at datascience.com  |  Empower Your Data Scientists
Evaluate
(Y|X)
Data
Data
Unboxed model
Evaluate Partial dependence plot
Relative variable importance
Local Interpretable Model
Explanation (LIME)
R or Python model (linear, nonlinear, ensemble, neural networks)
Scikit-learn, caret and rpart packages for CRAN
H20.ai, Algorithmia, etc.
WITHOUT INTERPRETATION ...
WITH SKATER ...
Black box model
How do I understand my
models?
Bayesian rule list (BRL)
More coming ...
Learn more at datascience.com  |  Empower Your Data Scientists
Future Work and Improvement
● Other rule-based algorithm approaches being considered for implementation:
○ H. Lakkaraju, S. H. Bach, and J. Leskovec. Interpretable decision sets: A joint framework
for description and prediction
○ Issue: https://github.com/datascienceinc/Skater/issues/207
Learn more at datascience.com  |  Empower Your Data Scientists
Future Work and Improvement (continued)
● Improve handling of continuous feature
○ Discretize using entropy criterion with the Minimum Description Length Principle (MDLP)
(Reference: Irani, Keki B’93. "Multi-interval discretization of continuous-valued attributes
for classification learning.")
○ Issue: https://github.com/datascienceinc/Skater/issues/206
● Improve scalability and computational efficiency for BRL
○ Parallelizing MCMC sampling using Weierstrass Sampler
○ Reference: Parallelizing MCMC via Weierstrass Sampler, https://arxiv.org/abs/1312.4605
● Add more example notebooks, applied to different use-cases
○ Handling text based models - Kaggle sms-spam-collection dataset
○ More benchmarks
Learn more at datascience.com  |  Empower Your Data Scientists
A Quick Glimpse Into The Future
37
Top 5 Predictions:
1. seat belt = 0.75
2. limousine = 0.051
3. golf cart = 0.017
4. minivan = 0.015
5. car mirror = 0.015
Visual Q&A: Is the person driving the car safely?
Learn more at datascience.com  |  Empower Your Data Scientists
Q&A
info@datascience.com
pramit@datascience.com
@MaverickPramit
@DataScienceInc
Help wanted: https://github.com/datascienceinc/Skater/labels/help%20wanted
Professor. Sameer Singh,
Assistant Professor of Computer
Science @ the University of
California, Irvine
Paco Nathan,
Director of Learning Group @
O’Reilly Media
https://www.datascience.com/resources/webinars/int
erpreting-machine-learning-models
Learn more at datascience.com  |  Empower Your Data Scientists
References
● Interpretation references:
○ A. Weller, (ICML 2017).Challenges for Transparency
○ Zachary C. Lipton, (2016). The Mythos of Model Interpretability
● Rule list-related literature:
○ Letham, B., Rudin, C., McCormick, T. H., & Madigan, D. (2015). Interpretable classifiers using
rules and bayesian analysis: Building a better stroke prediction model. Annals of Applied
Statistics, 9(3), 1350–1371
○ Yang, H., Rudin, C., Seltzer M. (2016). Scalable Bayesian Rule Lists
● Detailed examples of model interpretation using Skater
● Marco Tulio Ribeiro, et al. (KDD 2016) "Why Should I Trust You?": Explaining the Predictions
of Any Classifier

Más contenido relacionado

La actualidad más candente

Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..butest
 
Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Vincenzo Lomonaco
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)butest
 
Computational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
Computational Rationality I - a Lecture at Aalto University by Antti OulasvirtaComputational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
Computational Rationality I - a Lecture at Aalto University by Antti OulasvirtaAalto University
 
Machine learning
Machine learningMachine learning
Machine learningRohit Kumar
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentationDavid Raj Kanthi
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 
Explainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableExplainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableAditya Bhattacharya
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyMarina Santini
 
Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)ActiveEon
 
ML Interpretability Inside Out
ML Interpretability Inside OutML Interpretability Inside Out
ML Interpretability Inside OutMara Graziani
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.butest
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learningbutest
 
Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015Seattle DAML meetup
 
Introduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionIntroduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionGirish Gore
 
introduction to machine learning and nlp
introduction to machine learning and nlpintroduction to machine learning and nlp
introduction to machine learning and nlpMahmoud Farag
 

La actualidad más candente (20)

Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 3 Semester 3 MSc IT Part 2 Mumbai Univer...
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021Continual Learning with Deep Architectures - Tutorial ICML 2021
Continual Learning with Deep Architectures - Tutorial ICML 2021
 
Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)Lecture #1: Introduction to machine learning (ML)
Lecture #1: Introduction to machine learning (ML)
 
Computational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
Computational Rationality I - a Lecture at Aalto University by Antti OulasvirtaComputational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
Computational Rationality I - a Lecture at Aalto University by Antti Oulasvirta
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 
Explainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretableExplainable AI - making ML and DL models more interpretable
Explainable AI - making ML and DL models more interpretable
 
Lecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language TechnologyLecture 2 Basic Concepts in Machine Learning for Language Technology
Lecture 2 Basic Concepts in Machine Learning for Language Technology
 
Explainable AI
Explainable AIExplainable AI
Explainable AI
 
Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)Machine Learning for Dummies (without mathematics)
Machine Learning for Dummies (without mathematics)
 
ML Interpretability Inside Out
ML Interpretability Inside OutML Interpretability Inside Out
ML Interpretability Inside Out
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015Been Kim - Interpretable machine learning, Nov 2015
Been Kim - Interpretable machine learning, Nov 2015
 
Introduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionIntroduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regression
 
introduction to machine learning and nlp
introduction to machine learning and nlpintroduction to machine learning and nlp
introduction to machine learning and nlp
 
Statistical learning intro
Statistical learning introStatistical learning intro
Statistical learning intro
 

Similar a DataScience Platform Demo Explains Bayesian Rule Lists

Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Pramit Choudhary
 
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneGDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneJames Anderson
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needGibDevs
 
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!Sri Ambati
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxVenkateswaraBabuRavi
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...Josh Sheldon
 
ML crash course
ML crash courseML crash course
ML crash coursemikaelhuss
 
​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!Eindhoven University of Technology / JADS
 
WELCOME TO AI PROJECT shidhant mittaal.pptx
WELCOME TO AI PROJECT shidhant mittaal.pptxWELCOME TO AI PROJECT shidhant mittaal.pptx
WELCOME TO AI PROJECT shidhant mittaal.pptx9D38SHIDHANTMITTAL
 
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...Francesca Lazzeri, PhD
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati
 

Similar a DataScience Platform Demo Explains Bayesian Rule Lists (20)

Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
 
ODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AIODSC APAC 2022 - Explainable AI
ODSC APAC 2022 - Explainable AI
 
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for EveryoneGDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
GDG Cloud Southlake #17: Meg Dickey-Kurdziolek: Explainable AI is for Everyone
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptx
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Ai in finance
Ai in financeAi in finance
Ai in finance
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...Computational Thinking in the Workforce and Next Generation Science Standards...
Computational Thinking in the Workforce and Next Generation Science Standards...
 
ML crash course
ML crash courseML crash course
ML crash course
 
​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!​​Explainability in AI and Recommender systems: let’s make it interactive!
​​Explainability in AI and Recommender systems: let’s make it interactive!
 
WELCOME TO AI PROJECT shidhant mittaal.pptx
WELCOME TO AI PROJECT shidhant mittaal.pptxWELCOME TO AI PROJECT shidhant mittaal.pptx
WELCOME TO AI PROJECT shidhant mittaal.pptx
 
Data-X-Sparse-v2
Data-X-Sparse-v2Data-X-Sparse-v2
Data-X-Sparse-v2
 
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
Spark + AI Summit - The Importance of Model Fairness and Interpretability in ...
 
Data-X-v3.1
Data-X-v3.1Data-X-v3.1
Data-X-v3.1
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
 

Último

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 

Último (20)

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 

DataScience Platform Demo Explains Bayesian Rule Lists

  • 1. Learn more at datascience.com  |  Empower Your Data Scientists March 8, 2018 Head to Booth 1215 for a live demo of the DataScience.com Platform Human in the Loop: Bayesian Rules Enabling Explainable AI
  • 2. Learn more at datascience.com  |  Empower Your Data Scientists 2 About Me I am a lead data scientist at DataScience.com. I enjoy applying and optimizing classical machine learning algorithms, NLP, and Bayesian design strategy to solve real-world problems. Currently, I am exploring on better ways to extract, evaluate, and explain the learned decision policies of models. Before joining DataScience.com, I used machine learning algorithms to find love for eHarmony customers. I am one of the principal authors of Skater, a model interpretation package for Python. I also organize the PyData Socal meet-up. Pramit Choudhary @MaverickPramit https://www.linkedin.com/in/pramitc/ https://github.com/pramitchoudhary
  • 3. Learn more at datascience.com  |  Empower Your Data Scientists Agenda ● Understand the problem of model opacity ● Define the “what” and “why” of model interpretation ● Define the scope of model interpretation ● How do we enable interpretability? ● What is the Bayesian rule list? ● Understand the tension between interpretability and performance ● Benchmark numbers ● What is Skater and how does it help you build models the right way? ● References
  • 4. Learn more at datascience.com  |  Empower Your Data Scientists The Problem of Model Opacity (Y|X) Training Set {xi ∈ X, yi ∈ Y} Why I am getting weird predictions? Was my model biased? I am not 100% sure what’s in the box; I didn’t build the model. “By 2018, half of business ethics violations will occur through improper use of big data analytics.” — Gartner **reference: https://www.gartner.com/newsroom/id/3144217 Predictor Black Box Model Holdout/Test Set {xi ∈ X, yi ∈ Y}
  • 5. Empower Your Data Scientists What is Model Interpretation? ● An extension of model evaluation that helps to foster a better understanding of a model’s learned decision policies. ● Ability to explain and present a model in a way that is human understandable. ● Human understandable: The model’s result is self descriptive & needs no further explanation.
  • 6. Learn more at datascience.com  |  Empower Your Data Scientists We are starting our journey of explainability with supervised learning problems. Image source: constructed using tensorboard
  • 7. Learn more at datascience.com  |  Empower Your Data Scientists With model interpretation, we want to answer the following questions: ○ Why did the model behave in a certain way? ○ What was the reason for false positives? What are the relevant variables driving a model’s outcome, e.g., customer lifetime value, fraud detection, image classification, spam detection? ○ How can we trust the predictions of a “black box” model? Is the predictive model biased? What Do We Want to Achieve?
  • 8. Learn more at datascience.com  |  Empower Your Data Scientists Machine Learning Workflow Define Hypothesis Use relevant key performance indicators Handle Data Handle Missing Data Data Partitioning Engineer and Select Features Transform data Select relevant features Build Model Build a predictive model Deploy Model Operationalize analytics as scalable REST APIs Test and Monitor Model 1. Log and track behavior 2. Evaluate 3. Conduct A/B or multi-armed bandit testing 1 2 3 4 5 6 Model Interpretation: In-Memory Models ● Model assessment ● Explain model at a global and local level ● Publish insights, make collaborative and informed decisions Model Interpretation: Deployed Models ● Explore and explain model behavior ● Debug and discover errors to improve performance RETRAIN EVALUATE Improve existing hypothesis or generate a new one
  • 9. Learn more at datascience.com  |  Empower Your Data Scientists An Interpretable Machine Learning System Interpretability with Rule Extraction
  • 10. Empower Your Data Scientists Why is Model Interpretation Important? Producer: ● Data scientist/analyst building a model ● Consultants helping clients Consumer/Decision Maker: ● Business owners or data engineers ● Risk/security assessment managers ● Humans being affected by the model “Explain the model.”
  • 11. Learn more at datascience.com  |  Empower Your Data Scientists Ideas collapse. Image source: Edu Lauton on Unsplash
  • 12. Learn more at datascience.com  |  Empower Your Data Scientists 12 Motives for Model Interpretation 1. Debugging and improving an ML system 2. Exploring and discovering latent or hidden feature interactions (useful for feature engineering/selection and resolving preconceptions ) 3. Understanding model variability 4. Helps in model comparison 5. Building domain knowledge about a particular use case 6. Brings transparency to decision making to enable trust 1. Explain the model/algorithm 2. Explain the key features driving the KPI 3. Verify and validate the accountability of ML learning systems, e.g. causes for False positives in credit scoring, insurance claim frauds 4. Identify blind spots to prevent adversarial attacks or fixing dataset errors 5. Ability to share the explanations to consumers of the predictive model? 6. Comply with Data Protection Regulations, e.g. EU’s GDPR ● Data Scientist / ● Machine Learning Engineer ● Data Analyst ● Statistician ● Data Science Manager ● Business owner ● Data Engineer ● Auditors / Risk Managers Producer Consumer
  • 13. Learn more at datascience.com  |  Empower Your Data Scientists Scope Of Interpretation Global Interpretation Being able to explain the conditional interaction between dependent(response) variables and independent(predictor, or explanatory) variables based on the complete dataset Global Interpretation Local Interpretation Local Interpretation Being able to explain the conditional interaction between dependent(response) variables and independent(predictor, or explanatory) variables with respect to a single prediction
  • 14. Learn more at datascience.com  |  Empower Your Data Scientists How Do We Enable Model Interpretation? Reference: Been Kim(ICML’17) Google Brain ( http://people.csail.mit.edu/beenkim/papers/BeenK_FinaleDV_ICML2017_tutorial.pdf )
  • 15. Learn more at datascience.com  |  Empower Your Data Scientists Introducing Skater https://github.com/datascienceinc/Skater If you like the idea, give us a star!Gitter Channel (join us here): https://gitter.im/datascienceinc-skater /Lobby
  • 16. Learn more at datascience.com  |  Empower Your Data Scientists 1. Post-Hoc Evaluation of Models
  • 17. Learn more at datascience.com  |  Empower Your Data Scientists How Do We Enable Interpretation? ➢ Post-hoc evaluation: A black-box model is built, and we need a way to interpret it. ○ Model agnostic partial dependence plot ○ Model agnostic feature importance ○ Local interpretable model agnostic explanation (LIME) ○ Saliency mask for DNN (image/text): Not supported yet; coming soon... G. Hooker( KDD’04 ). Discovering additive structure in black box functions Marco Tulio Ribeiro et. al(2016). Nothing Else Matters Ning Xie et. al(NIPS’ 2017). Relating Input Concepts to Convolutional Neural Network Decisions
  • 18. Learn more at datascience.com  |  Empower Your Data Scientists 2. Bayesian Rule List: Building Naturally Interpretable Models Via Rule Extraction
  • 19. Learn more at datascience.com  |  Empower Your Data Scientists Demo Building a Model Using a Bayesian Rule List and Skater 1. https://github.com/datascienceinc/Skater/blob/master/example s/rule_list_notebooks/rule_lists_continuous_features.ipynb 2. https://github.com/datascienceinc/Skater/blob/master/example s/rule_list_notebooks/rule_lists_titanic_dataset.ipynb 3. https://github.com/datascienceinc/Skater/blob/master/example s/credit_analysis/credit_analysis_rule_lists.ipynb
  • 20. Learn more at datascience.com  |  Empower Your Data Scientists How Do We Enable Interpretation? ➢ Using a probabilistic interpretable estimator (bayesian rule list): a. Generative probabilistic classifier P(y = 1| x) for each x b. Initially designed by Letham, Rudin, McCormick, Madigan (2015) c. Improved by Hongyu Yang. et. al. as Scalable Bayesian Rule List (2017) d. Works great for Tabular datasets with discrete and independent meaningful features e. Competitor to decision trees; greedy splitting and pruning f. Built using pre-mined association rules (frequent pattern-matching algorithms) • ECLAT (Equivalence Class Clustering and Bottom up Lattice Traversal) • Non-frequent patterns are not considered g. Build a bayesian hierarchical model over frequently occuring pre-mined rule lists h. Applies MCMC (Metropolis–Hastings algorithm) to sample from posterior distribution over permutation of “IF-THEN-ELSE” conditional statement i. Output: Generates a logical structure of human-interpretable IF then ELSE decision stumps j. Scope of interpretation: global and local
  • 21. Learn more at datascience.com  |  Empower Your Data Scientists Bayesian Rule List ● Consider independent and identically distributed(i.i.d) training examples of the form {X, Y} -> {(xi , yi }n i=1 where xi ∈ X as encoded features and yi ∈ Y as binary labels [0s or 1s]. ● A typical bayesian rule list estimator would look like this: Each rule is independent and selected from a set of pre-mined rules using frequent matching algorithms, e.g., ECLAT. Goal: Optimize over the possible set of pre-mined rules and their order to create the final set of interpretable decision stumps.
  • 22. Learn more at datascience.com  |  Empower Your Data Scientists Example: Rule List Representation Figure: BRL output on common diabetes dataset ( http://scikit-learn.org/stable/datasets/index.html#diabetes-dataset ) Goal: Optimize on finite number of rules maintaining accuracy. Sampling: Rules are sampled from posterior distribution over a permutation of pre-mined rules. Optimize cardinality of rules horizontally and vertically Scope of Interpretation: Global and local.
  • 23. Learn more at datascience.com  |  Empower Your Data Scientists Generative vs. Discriminative Models Input Examples {xi ∈ X, yi ∈ Y} Discriminative Model: Models the posterior probability directly; maps input X to output and labels Y directly; e.g., SVM, NN. Generative Model: Models a joint probability of input X and output Y p(X, Y); computes prediction P(Y | X) using Bayes’ rule; e.g., Naive Bayes, GAN (Generative Adversarial Network), BRL. Learns p( Y | X ) directly p( Y | X ) ∝ p(X | Y) * p(Y) Learns p( Y | X ) in-directly ** Reference: Ng and Jordan(2001) On Discriminative vs. Generative classifiers: A comparison of logistic regression and naive Bayes
  • 24. Learn more at datascience.com  |  Empower Your Data Scientists Optimization Goals for Bayesian Rule List Sample from a posterior distribution over a permutation of pre-mined “IF-THEN-ELSE” conditional statement: (d|X, Y, A, , λ, ) ∝ (Y|X, d, ) * (d|A, λ, ) where, ● d = Ordered subset of rules ● A: Pre-mined collection of all rules using the frequent pattern matching algorithm ● Prior hyper-parameters: , λ, ○ = [ 0, 1]: Prior parameter for each label in a binary classification problem ○ λ: Hyper-parameter for the expected length of the rule list ○ : Hyper-parameter for the expected cardinality of each rule in the optimal rule list Likelihood: Probability of an event that has already occurred (binomial distribution). Prior Probability: Probability of one’s belief before evidence (beta distribution). Posterior: Conditional probability of an event based on relevant evidence ∝ See Chapter Three of Machine Learning: A Probabilistic Perspective
  • 25. Learn more at datascience.com  |  Empower Your Data Scientists Tension Between Interpretability and Model Performance
  • 26. Learn more at datascience.com  |  Empower Your Data Scientists Performance vs. Interpretability Annual Income Late fee amount Simple decision boundary (Linear Monotonic) Complex decision boundary (Non-Linear Non-Monotonic) Credit card approved Credit card denied Non-linear decision boundary (nonLinear Monotonic)
  • 27. Learn more at datascience.com  |  Empower Your Data Scientists Tension Between Interpretability and Model Performance Model Performance (Accuracy) Interpretability/ Degree of Opacity Deep learning models Support vector machines Random forest K-nearest neighbors Linear/Logistic Regression Decision trees Bayesian Rule List XGBoost ** Remember: The purpose of the chart is not to mirror any benchmark on model performance, but to articulate the opacity of predictive models
  • 28. Learn more at datascience.com  |  Empower Your Data Scientists No Free Lunch Theorem “Any elevated performance over one class of problems is offset by performance over another class.” — David H. Wolpert and William G. Macready, (1997), https://ti.arc.nasa.gov/m/profile/dhw/papers/78.pdf Simplicity: 10, Robustness: 10, Computation Speed: scope for improvement, Interpretability: 10 Simplicity:10, Robustness:10, Scalability: with smart optimization, Interpretability: 10 Image source:wiki(Mimooh, https://commons.wikimedia.org/wiki/File:No_free_lunch_theorem.svg
  • 29. Learn more at datascience.com  |  Empower Your Data Scientists Simplicity Is Key Model Selection Policies: ● Model Performance (e.g., AUC-ROC): How accurate is the model? ● Scalability: Can the model handle huge volume of data? ● Computational Speed: Does the model take a long time to build? ● Robustness: Are the predicted result stable over a period of time? ● Interpretability: Can one interpret the output in a human understandable way? ● Simplicity: Can one explain the model easily? ● Occam’s Razor Principle: “When presented with competing hypothetical answers to a problem, one should select the one that makes the fewest assumptions.” ● In computational learning, build models with the objective of producing a succinct representation of the training set.
  • 30. Learn more at datascience.com  |  Empower Your Data Scientists What If We Achieve Accuracy? Figure: Comparison of BRL and RF using AUC of ROC on Titanic dataset
  • 31. Learn more at datascience.com  |  Empower Your Data Scientists Performance Benchmark Using BRL Dataset Data Type Problem Type Model Type Train Accuracy Test Accuracy Train AUC-RO C Test AUC-ROC Computation Time (in sec) Diabetes dataset (Train: 576 rows; Test: 192) Tabular data: continuous features Supervised Classification BRLC 0.78 0.71 0.82 0.76 0.74 Diabetes dataset (Train: 576 rows; Test: 192) Tabular data: continuous features Supervised Classification RF 1.0 0.75 0.81 0.80 0.14 Titanic dataset (Train: 571 rows; Test: 143 rows) Tabular data: categorical & continuous Supervised Classification BRLC 0.80 0.86 0.84 0.86 0.67 Titanic dataset (Train: 571 rows; Test: 143 rows) Tabular data: categorical & continuous Supervised Classification RF 1.0 0.81 1.0 0.86 0.07 Credit analysis (Train: 29,839 rows; Test: 9,947 rows ) Tabular data: categorical & continuous Supervised Classification BRLC 0.86 0.86 0.65 0.65 2.81 Credit analysis (Train: 29,839 rows; Test: 9,947 rows ) Tabular data: categorical & continuous Supervised Classification Linear SVM 0.85 0.86 0.68 0.70 0.15 Could be improved with more thoughtful feature engineering and selection 0.05 difference in performance on hold out using 10% of the data compared to SVM
  • 32. Learn more at datascience.com  |  Empower Your Data Scientists Skater: BRL API Overview (BRLC) Import the BRLC class Instantiate BRLC instance Train a model using fit Display learned “if-else” conditions Use discretizer for continuous features Generate class probabilities Predict class labels Persist model Access other rules
  • 33. Learn more at datascience.com  |  Empower Your Data Scientists Mission Statement: Enable Interpretability for All Models Model Performance (Accuracy) Interpretability/ Degree of Opacity Deep learning models Support vector machines Random forest K-nearest neighbors Linear/Logistic Regression Decision trees Bayesian Rule List XGBoost ** Remember: The purpose of the chart is not to mirror any benchmark on model performance, but to articulate the opacity of predictive models
  • 34. Learn more at datascience.com  |  Empower Your Data Scientists Evaluate (Y|X) Data Data Unboxed model Evaluate Partial dependence plot Relative variable importance Local Interpretable Model Explanation (LIME) R or Python model (linear, nonlinear, ensemble, neural networks) Scikit-learn, caret and rpart packages for CRAN H20.ai, Algorithmia, etc. WITHOUT INTERPRETATION ... WITH SKATER ... Black box model How do I understand my models? Bayesian rule list (BRL) More coming ...
  • 35. Learn more at datascience.com  |  Empower Your Data Scientists Future Work and Improvement ● Other rule-based algorithm approaches being considered for implementation: ○ H. Lakkaraju, S. H. Bach, and J. Leskovec. Interpretable decision sets: A joint framework for description and prediction ○ Issue: https://github.com/datascienceinc/Skater/issues/207
  • 36. Learn more at datascience.com  |  Empower Your Data Scientists Future Work and Improvement (continued) ● Improve handling of continuous feature ○ Discretize using entropy criterion with the Minimum Description Length Principle (MDLP) (Reference: Irani, Keki B’93. "Multi-interval discretization of continuous-valued attributes for classification learning.") ○ Issue: https://github.com/datascienceinc/Skater/issues/206 ● Improve scalability and computational efficiency for BRL ○ Parallelizing MCMC sampling using Weierstrass Sampler ○ Reference: Parallelizing MCMC via Weierstrass Sampler, https://arxiv.org/abs/1312.4605 ● Add more example notebooks, applied to different use-cases ○ Handling text based models - Kaggle sms-spam-collection dataset ○ More benchmarks
  • 37. Learn more at datascience.com  |  Empower Your Data Scientists A Quick Glimpse Into The Future 37 Top 5 Predictions: 1. seat belt = 0.75 2. limousine = 0.051 3. golf cart = 0.017 4. minivan = 0.015 5. car mirror = 0.015 Visual Q&A: Is the person driving the car safely?
  • 38. Learn more at datascience.com  |  Empower Your Data Scientists Q&A info@datascience.com pramit@datascience.com @MaverickPramit @DataScienceInc Help wanted: https://github.com/datascienceinc/Skater/labels/help%20wanted Professor. Sameer Singh, Assistant Professor of Computer Science @ the University of California, Irvine Paco Nathan, Director of Learning Group @ O’Reilly Media https://www.datascience.com/resources/webinars/int erpreting-machine-learning-models
  • 39. Learn more at datascience.com  |  Empower Your Data Scientists References ● Interpretation references: ○ A. Weller, (ICML 2017).Challenges for Transparency ○ Zachary C. Lipton, (2016). The Mythos of Model Interpretability ● Rule list-related literature: ○ Letham, B., Rudin, C., McCormick, T. H., & Madigan, D. (2015). Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model. Annals of Applied Statistics, 9(3), 1350–1371 ○ Yang, H., Rudin, C., Seltzer M. (2016). Scalable Bayesian Rule Lists ● Detailed examples of model interpretation using Skater ● Marco Tulio Ribeiro, et al. (KDD 2016) "Why Should I Trust You?": Explaining the Predictions of Any Classifier