SlideShare una empresa de Scribd logo
1 de 37
Interpreting machine
learning models
or how to turn a random forest into a white box
Ando Saabas
About me
• Senior applied scientist at Microsoft,
• Using ML and statistics to improve call quality in Skype
• Various projects on user engagement modelling, Skype user graph analysis,
call reliability modelling, traffic shaping detection
• Blogging on data science and ML @ http://blog.datadive.net/
Machine learning and model interpretation
• Machine learning studies algorithms that learn from data and make
predictions
• Learning algorithms are about correlations in the data
• In contrast, in data science and data mining, understanding causality
is essential
• Applying domain knowledge requires understanding and interpreting
models
Usefulness of model interpretation
• Often, we need to understand individual predictions a model is
making.
For example a model may
• Recommend a treatment for a patient or estimate a disease to be
likely. The doctor needs to understand the reasoning.
• Classify a user as a scammer, but the user disputes it. The fraud
analyst needs to understand why the model made the classification.
• Predict that a video call will be graded poorly by the user. The
engineer needs to understand why this type of call was considered
problematic.
Usefulness of model interpretation cont.
• Understanding differences on a dataset level.
• Why is a new software release receiving poorer feedback from customers
when compared to the previous one?
• Why are grain yields in one region higher than the other?
• Debugging models. A model that worked earlier is giving unexpected
results on newer data.
Algorithmic transparency
• Algorithmic transparency becoming a requirement in many fields
• French Conseil d’Etat (State Council’s) recommendation in „Digital
technology and fundamental rights“(2014) : Impose to algorithm-
based decisions a transparency requirement, on personal data used
by the algorithm, and the general reasoning it followed.
• Federal Trade Commission (FTC) Chair Edith Ramirez: The agency is
concerned about ‘algorithmic transparency /../’ (Oct 2015).
FTC Office of Technology Research and Investigation started in March
2015 to tackle algorithmic transparency among other issues
Interpretable models
• Traditionally, two types of (mainstream) models considered when
interpretability is required
• Linear models (linear and logistic regression)
• 𝑌 = 𝑎 + 𝑏1 𝑥1 + ⋯ + 𝑏 𝑛 𝑥 𝑛
• heart_disease = 0.08*tobacco + 0.043*age + 0.939*famhist + ...
(from Elements of statistical learning)
• Decision trees Family hist
Tobacco
Age > 60
Yes No
60% 20%40% 30%
Example: heart disease risk prediction
• Essentially a linear model
with integer coefficients
• Easy for a doctor to follow
and explain
From National Heart, Lung and Blood
Institute.
Linear models have drawbacks
• Underlying data is often non-linear
• Feature binning: potentially massive
increase in number of features
• Basis expansion (non-linear transfor-
mations of underlying features)
In both cases performance will be traded for interpretability
𝑌 = 2𝑥1 + 𝑥2 − 3𝑥3
𝑌 = 2𝑥1
2 + 3𝑥1 − log 𝑥2 + 𝑥2 𝑥3 + …
vs
Decision trees
• Decision trees can fit to non-linear data
• Work well with both categorical and continuous data, classification
and regression
• Easy to understand
Rooms < 3
Built_year < 2008
Crime rate < 3
35,00045,000
Floor < 2
55,00030,000
70,000
52,000
Crime_rate < 5
Yes No
(Small part of) default decision tree in
scikit-learn. Boston housing data
500 data points
14 features
Decision trees
• Decision trees are understandable only when they are (very) small
• Tree of depth n has up 2n leaves and 2n -1 internal nodes. With depth 20, a
tree can have up to 1048576 leaves
• Previous slide had <200 nodes
• Additionally decision trees are high variance method – low
generalization, tend to overfit
Random forests
• Can learn non-linear relationships in the data well
• Robust to outliers
• Can deal with both continuous and categorical data
• Require very little input preparation (see previous three points)
• Fast to train and test, trivially parallelizable
• High accuracy even with minimal meta-optimization
• Considered to be a black box that is difficult or impossible to interpret
Random forests as a black box
• Consist of a large number of decision trees (often 100s to 1000s)
• Trained on bootstrapped data (sampling with replacement)
• Using random feature selection
Random forests as a black box
• "Black box models such as random forests can't quantify the impact of each
predictor to the predictions of the complex model“, in PRICAI 2014: Trends
in Artificial Intelligence
• Unfortunately, the random forest algorithm is a black box when it comes to
evaluating the impact of a single feature on the overall performance. In
Advances in Natural Language Processing 2014
• “(Random forest model) is close to a black box in the sense that it uses 810
features /../ reduction in the number of features would allow an operator
to study individual decisions to have a rough idea how the global decision
could have been made”. In Advances in Data Mining: Applications and
Theoretical Aspects: 2014
Understanding the model vs the predictions
• Keep in mind, we want to understand why a particular decision was
made. Not necessarily every detail of the full model
• As an analogy, we don’t need to understand how a brain works to
understand why a person made a particular a decision: simple
explanation can be sufficient
• Ultimately, as ML models get more complex and powerful, hoping to
understand the models themselves is doomed to failure
• We should strive to make models explain their decisions
Turning the black box into a white box
• In fact, random forest predictions can be explained and interpreted,
by decomposing predictions into mathematically exact feature
contributions
• Independently of the
• number of features
• number of trees
• depth of the trees
• Classical definition (from Elements of statistical learning)
• Tree divides the feature space into M regions Rm (one for each leaf)
• Prediction for feature vector x is the constant cm associated with region Rm the
vector x belongs to
𝑑𝑡 𝑥 =
𝑚=1
𝑀
𝑐 𝑚 𝐼(𝑥 ∈ 𝑅 𝑚)
Revisiting decision trees
Example decision tree – predicting apartment
prices
Rooms < 3
Crime rate < 3
35,00045,000
Floor < 2
55,00030,000
70,000
52,000
Crime_rate < 5
Yes No
Built_year > 2008
Estimating apartment prices
Assume an apartment [2 rooms; Built in 2010; Neighborhood crime rate: 5]
We walk the tree to obtain the price
Rooms < 3
Floor < 2
55,00030,000
70,000
52,000
Crime_rate < 5
Yes No
Crime rate < 3
35,00045,000
Built_year > 2008
Estimating apartment prices
[2 rooms; Built in 2010; Neighborhood crime rate: 5]
Rooms < 3
Floor < 2
55,00030,000
70,000
52,000
Crime_rate < 5
Yes No
Built_year > 2008
Crime rate < 3
35,00045,000
Estimating apartment prices
[2 rooms; Built in 2010; Neighborhood crime rate: 5]
Rooms < 3
Floor < 2
55,00030,000
70,000
52,000
Crime_rate < 5
Yes No
Built_year > 2008
Crime rate < 3
35,00045,000
Estimating apartment prices
Prediction: 35,000 Path taken: Rooms < 3, Built_year > 2008, Crime_rate < 3
[2 rooms; Built in 2010; Neighborhood crime rate: 5]
Rooms < 3
Floor < 2
55,00030,000
70,000
52,000
Crime_rate < 5Crime rate < 3
35,00045,000
Yes No
Built_year > 2008
Operational view
• Classical definition ignores the operational aspect of the tree.
• There is a decision path through the tree
• All nodes (not just the leaves) have a value associated with them
• Each decision along the path contributes something to the final
outcome
• A feature is associated with every decision
• We can compute the final outcome in terms of feature contributions
Estimating apartment prices
Rooms < 3
Floor < 2
55,00030,000
70,000
52,000
Crime_rate < 5
Yes No
Built_year > 2008
Crime rate < 3
35,00045,000
48,000
33,000
40,000
Estimating apartment prices
Price = 48,000
Rooms < 3
Floor < 2
55,00030,000
70,000
52,000
Crime_rate < 5
Yes No
Built_year > 2008
Crime rate < 3
35,00045,000
48,000
33,000
40,000
𝐸(𝑝𝑟𝑖𝑐𝑒) = 48,000
Estimating apartment prices
[2 rooms]
Rooms < 3
Floor < 2
55,00030,000
70,000
52,000
Crime_rate < 5
Yes No
Built_year > 2008
Crime rate < 3
35,00045,000
48,000
33,000
40,000
Price = 48,000 – 15,000(Rooms)
𝐸(𝑝𝑟𝑖𝑐𝑒|𝑟𝑜𝑜𝑚𝑠 < 3) = 33,000
Estimating apartment prices
Rooms < 3
Floor < 2
55,00030,000
70,000
52,000
Crime_rate < 5
Yes No
Built_year > 2008
Crime rate < 3
35,00045,000
48,000
33,000
40,000
Price = 48,000 – 15,000(Rooms) + 7,000(Built_year)
[Built 2010]
Estimating apartment prices
Rooms < 3
Floor < 2
55,00030,000
70,000
52,000
Crime_rate < 5
Built_year > 2008
Crime rate < 3
35,00045,000
48,000
33,000
40,000
Price = 48,000 – 15,000(Rooms) + 7,000(Built_year) – 5,000(Crime_rate)
[Crime rate 5]
Estimating apartment prices
Rooms < 3
Floor < 2
55,00030,000
70,000
52,000
Crime_rate < 5
Built_year > 2008
Crime rate < 3
35,00045,000
48,000
33,000
40,000
Price = 48,000 – 15,000(Rooms) + 7,000(Built_year) – 5,000(Crime_rate) = 35,000
• We can define the the decision tree in term of the bias and contribution
from each feature.
• Similar to linear regression (prediction = bias + feature1contribution + … +
featurencontribution), but on a prediction level, not model level
• Does not depend on the size of the tree or number of features
• Works equally well for
• Regression and classification trees
• Multivalued and multilabel data
Decomposition for decision trees
dt(𝑥) = 𝑏𝑖𝑎𝑠 + 𝑖=1
𝑁
𝑐𝑜𝑛𝑡𝑟(𝑖, 𝑥))dt 𝑥 = 𝑚=1
𝑀
𝑐 𝑚 𝐼(𝑥 ∈ 𝑅 𝑚)
Deeper inspections
• We can have more fine grained definition in addition to pure feature
contributions
• Separate negative and positive contributions
• Contribution from decisions (floor = 1  -15000)
• Contribution from interactions (floor == 1 & has_terrace  3000)
• etc
• Number of features typically not a concern because of the long tail.
In practice, top 10 features contribute the vast majority of the overall
deviation from the mean
From decision trees to random forests
• Prediction of a random forest is the
average of the predictions of individual
trees
• From distributivity of multiplication and
associativity of addition, random forest prediction can be defined as
the average of bias term of individual trees + sum of averages of each
feature contribution from individual trees
• prediction = bias + feature1contribution + … + featurencontribution
RF 𝑥 =
1
𝐽
𝑗=1
𝐽
𝑑𝑡𝑗(𝑥)
RF 𝑥 =
1
𝐽
𝑗=1
𝐽
𝑏𝑖𝑎𝑠𝑗 𝑥 + (
1
𝐽
𝑗=1
𝐽
𝑐𝑜𝑛𝑡𝑟𝑗 1, 𝑥 + ⋯ +
1
𝐽
𝑗=1
𝐽
𝑐𝑜𝑛𝑡𝑟𝑗 𝑛, 𝑥 )
Random forest interpretation in Scikit-Learn
• Path walking in general no supported by ML libraries
• Scikit-Learn: one of the most popular Python (and overall) ML
libraries
• Patch since 0.17 (released Nov 8 2015) to include values/predictions
at each node: allows walking the tree paths and collecting values
along the way
• Treeintepreter library for decomposing the predictions
• https://github.com/andosa/treeinterpreter
• pip install treeinterpreter
• Supports both decision tree and random forest classes, regression
and classification
Using treeinterpreter
• Decomposing predictions is a one-liner
from treeinterpreter import treeinterpreter as ti
rf = RandomForestRegressor()
rf.fit(trainX,trainY)
prediction, bias, contributions = ti.predict(rf, testX)
#prediction == bias + contributions
assert(numpy.allclose(prediction, bias + np.sum(contributions, axis=1)))
Decomposing a prediction – boston housing
data
prediction, bias, contributions = ti.predict(rf, boston.data)
>> prediction[0]
30.69
>> bias[0]
25.588
>> sorted(zip(contributions[0], boston.feature_names),
>> key=lambda x: -abs(x[0]))
[(4.3387165697195558, 'RM'),
(-1.0771391053864874, 'TAX'),
(1.0207116129073213, 'LSTAT'),
(0.38890774812797702, 'AGE'),
(0.38381481481481539, 'ZN'),
(-0.10397222222222205, 'CRIM'),
(-0.091520697167756987, 'NOX')
...
Conclusions
• Model interpretation can be very beneficial for ML and data science
practitioners in many tasks
• No need to understand the full model in many/most cases: explanation of
decisions sufficient
• Random forests can be turned from black box into a white box
• Each prediction can decomposed into bias and feature contribution terms
• Python library available for scikit-learn at
https://github.com/andosa/treeinterpreter
or pip install treeinterpreter
• Subscribe to http://blog.datadive.net/
• Follow @crossentropy

Más contenido relacionado

La actualidad más candente

Data preprocessing
Data preprocessingData preprocessing
Data preprocessingSlideshare
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Miningsnoreen
 
Introduction to dm and dw
Introduction to dm and dwIntroduction to dm and dw
Introduction to dm and dwANUSUYA T K
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingkayathri02
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessingSalah Amean
 
Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...ranjit banshpal
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingHarry Potter
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingAmuthamca
 
Data pre processing
Data pre processingData pre processing
Data pre processingjunnubabu
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingHarry Potter
 
Building Data Products
Building Data ProductsBuilding Data Products
Building Data ProductsCloudera, Inc.
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reductionKrish_ver2
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduceVarad Meru
 

La actualidad más candente (20)

Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Mining
Data MiningData Mining
Data Mining
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Machine learning
Machine learningMachine learning
Machine learning
 
Introduction to dm and dw
Introduction to dm and dwIntroduction to dm and dw
Introduction to dm and dw
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...Data mining technique for classification and feature evaluation using stream ...
Data mining technique for classification and feature evaluation using stream ...
 
Ppt shuai
Ppt shuaiPpt shuai
Ppt shuai
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Orange Data Mining & Data Visualization Tool
Orange Data Mining & Data Visualization ToolOrange Data Mining & Data Visualization Tool
Orange Data Mining & Data Visualization Tool
 
Data pre processing
Data pre processingData pre processing
Data pre processing
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Building Data Products
Building Data ProductsBuilding Data Products
Building Data Products
 
1.7 data reduction
1.7 data reduction1.7 data reduction
1.7 data reduction
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 

Similar a Interpreting machine learning models

Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering odsc
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureIvo Andreev
 
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)Amazon Web Services
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Exploratory Data Analysis week 4
Exploratory Data Analysis week 4Exploratory Data Analysis week 4
Exploratory Data Analysis week 4Manzur Ashraf
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programmingSoumya Mukherjee
 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Maninda Edirisooriya
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesSơn Còm Nhom
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statisticsSpotle.ai
 
Agile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity managementAgile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity managementAgnirudra Sikdar
 
MLSEV Virtual. State of the Art in ML
MLSEV Virtual. State of the Art in MLMLSEV Virtual. State of the Art in ML
MLSEV Virtual. State of the Art in MLBigML, Inc
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Venturesmicrosoftventures
 
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...Analytics India Magazine
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Treesananth
 

Similar a Interpreting machine learning models (20)

07 learning
07 learning07 learning
07 learning
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
random forest.pptx
random forest.pptxrandom forest.pptx
random forest.pptx
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
AWS re:Invent 2016: Getting to Ground Truth with Amazon Mechanical Turk (MAC201)
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Exploratory Data Analysis week 4
Exploratory Data Analysis week 4Exploratory Data Analysis week 4
Exploratory Data Analysis week 4
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
 
PCA.pptx
PCA.pptxPCA.pptx
PCA.pptx
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statistics
 
Mini datathon
Mini datathonMini datathon
Mini datathon
 
Agile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity managementAgile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity management
 
MLSEV Virtual. State of the Art in ML
MLSEV Virtual. State of the Art in MLMLSEV Virtual. State of the Art in ML
MLSEV Virtual. State of the Art in ML
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Ventures
 
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
Deciphering AI - Unlocking the Black Box of AIML with State-of-the-Art Techno...
 
Machine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision TreesMachine Learning Lecture 3 Decision Trees
Machine Learning Lecture 3 Decision Trees
 

Último

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 

Último (20)

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

Interpreting machine learning models

  • 1. Interpreting machine learning models or how to turn a random forest into a white box Ando Saabas
  • 2. About me • Senior applied scientist at Microsoft, • Using ML and statistics to improve call quality in Skype • Various projects on user engagement modelling, Skype user graph analysis, call reliability modelling, traffic shaping detection • Blogging on data science and ML @ http://blog.datadive.net/
  • 3. Machine learning and model interpretation • Machine learning studies algorithms that learn from data and make predictions • Learning algorithms are about correlations in the data • In contrast, in data science and data mining, understanding causality is essential • Applying domain knowledge requires understanding and interpreting models
  • 4. Usefulness of model interpretation • Often, we need to understand individual predictions a model is making. For example a model may • Recommend a treatment for a patient or estimate a disease to be likely. The doctor needs to understand the reasoning. • Classify a user as a scammer, but the user disputes it. The fraud analyst needs to understand why the model made the classification. • Predict that a video call will be graded poorly by the user. The engineer needs to understand why this type of call was considered problematic.
  • 5. Usefulness of model interpretation cont. • Understanding differences on a dataset level. • Why is a new software release receiving poorer feedback from customers when compared to the previous one? • Why are grain yields in one region higher than the other? • Debugging models. A model that worked earlier is giving unexpected results on newer data.
  • 6. Algorithmic transparency • Algorithmic transparency becoming a requirement in many fields • French Conseil d’Etat (State Council’s) recommendation in „Digital technology and fundamental rights“(2014) : Impose to algorithm- based decisions a transparency requirement, on personal data used by the algorithm, and the general reasoning it followed. • Federal Trade Commission (FTC) Chair Edith Ramirez: The agency is concerned about ‘algorithmic transparency /../’ (Oct 2015). FTC Office of Technology Research and Investigation started in March 2015 to tackle algorithmic transparency among other issues
  • 7. Interpretable models • Traditionally, two types of (mainstream) models considered when interpretability is required • Linear models (linear and logistic regression) • 𝑌 = 𝑎 + 𝑏1 𝑥1 + ⋯ + 𝑏 𝑛 𝑥 𝑛 • heart_disease = 0.08*tobacco + 0.043*age + 0.939*famhist + ... (from Elements of statistical learning) • Decision trees Family hist Tobacco Age > 60 Yes No 60% 20%40% 30%
  • 8. Example: heart disease risk prediction • Essentially a linear model with integer coefficients • Easy for a doctor to follow and explain From National Heart, Lung and Blood Institute.
  • 9. Linear models have drawbacks • Underlying data is often non-linear • Feature binning: potentially massive increase in number of features • Basis expansion (non-linear transfor- mations of underlying features) In both cases performance will be traded for interpretability 𝑌 = 2𝑥1 + 𝑥2 − 3𝑥3 𝑌 = 2𝑥1 2 + 3𝑥1 − log 𝑥2 + 𝑥2 𝑥3 + … vs
  • 10. Decision trees • Decision trees can fit to non-linear data • Work well with both categorical and continuous data, classification and regression • Easy to understand Rooms < 3 Built_year < 2008 Crime rate < 3 35,00045,000 Floor < 2 55,00030,000 70,000 52,000 Crime_rate < 5 Yes No
  • 11. (Small part of) default decision tree in scikit-learn. Boston housing data 500 data points 14 features
  • 12. Decision trees • Decision trees are understandable only when they are (very) small • Tree of depth n has up 2n leaves and 2n -1 internal nodes. With depth 20, a tree can have up to 1048576 leaves • Previous slide had <200 nodes • Additionally decision trees are high variance method – low generalization, tend to overfit
  • 13. Random forests • Can learn non-linear relationships in the data well • Robust to outliers • Can deal with both continuous and categorical data • Require very little input preparation (see previous three points) • Fast to train and test, trivially parallelizable • High accuracy even with minimal meta-optimization • Considered to be a black box that is difficult or impossible to interpret
  • 14. Random forests as a black box • Consist of a large number of decision trees (often 100s to 1000s) • Trained on bootstrapped data (sampling with replacement) • Using random feature selection
  • 15. Random forests as a black box • "Black box models such as random forests can't quantify the impact of each predictor to the predictions of the complex model“, in PRICAI 2014: Trends in Artificial Intelligence • Unfortunately, the random forest algorithm is a black box when it comes to evaluating the impact of a single feature on the overall performance. In Advances in Natural Language Processing 2014 • “(Random forest model) is close to a black box in the sense that it uses 810 features /../ reduction in the number of features would allow an operator to study individual decisions to have a rough idea how the global decision could have been made”. In Advances in Data Mining: Applications and Theoretical Aspects: 2014
  • 16. Understanding the model vs the predictions • Keep in mind, we want to understand why a particular decision was made. Not necessarily every detail of the full model • As an analogy, we don’t need to understand how a brain works to understand why a person made a particular a decision: simple explanation can be sufficient • Ultimately, as ML models get more complex and powerful, hoping to understand the models themselves is doomed to failure • We should strive to make models explain their decisions
  • 17. Turning the black box into a white box • In fact, random forest predictions can be explained and interpreted, by decomposing predictions into mathematically exact feature contributions • Independently of the • number of features • number of trees • depth of the trees
  • 18. • Classical definition (from Elements of statistical learning) • Tree divides the feature space into M regions Rm (one for each leaf) • Prediction for feature vector x is the constant cm associated with region Rm the vector x belongs to 𝑑𝑡 𝑥 = 𝑚=1 𝑀 𝑐 𝑚 𝐼(𝑥 ∈ 𝑅 𝑚) Revisiting decision trees
  • 19. Example decision tree – predicting apartment prices Rooms < 3 Crime rate < 3 35,00045,000 Floor < 2 55,00030,000 70,000 52,000 Crime_rate < 5 Yes No Built_year > 2008
  • 20. Estimating apartment prices Assume an apartment [2 rooms; Built in 2010; Neighborhood crime rate: 5] We walk the tree to obtain the price Rooms < 3 Floor < 2 55,00030,000 70,000 52,000 Crime_rate < 5 Yes No Crime rate < 3 35,00045,000 Built_year > 2008
  • 21. Estimating apartment prices [2 rooms; Built in 2010; Neighborhood crime rate: 5] Rooms < 3 Floor < 2 55,00030,000 70,000 52,000 Crime_rate < 5 Yes No Built_year > 2008 Crime rate < 3 35,00045,000
  • 22. Estimating apartment prices [2 rooms; Built in 2010; Neighborhood crime rate: 5] Rooms < 3 Floor < 2 55,00030,000 70,000 52,000 Crime_rate < 5 Yes No Built_year > 2008 Crime rate < 3 35,00045,000
  • 23. Estimating apartment prices Prediction: 35,000 Path taken: Rooms < 3, Built_year > 2008, Crime_rate < 3 [2 rooms; Built in 2010; Neighborhood crime rate: 5] Rooms < 3 Floor < 2 55,00030,000 70,000 52,000 Crime_rate < 5Crime rate < 3 35,00045,000 Yes No Built_year > 2008
  • 24. Operational view • Classical definition ignores the operational aspect of the tree. • There is a decision path through the tree • All nodes (not just the leaves) have a value associated with them • Each decision along the path contributes something to the final outcome • A feature is associated with every decision • We can compute the final outcome in terms of feature contributions
  • 25. Estimating apartment prices Rooms < 3 Floor < 2 55,00030,000 70,000 52,000 Crime_rate < 5 Yes No Built_year > 2008 Crime rate < 3 35,00045,000 48,000 33,000 40,000
  • 26. Estimating apartment prices Price = 48,000 Rooms < 3 Floor < 2 55,00030,000 70,000 52,000 Crime_rate < 5 Yes No Built_year > 2008 Crime rate < 3 35,00045,000 48,000 33,000 40,000 𝐸(𝑝𝑟𝑖𝑐𝑒) = 48,000
  • 27. Estimating apartment prices [2 rooms] Rooms < 3 Floor < 2 55,00030,000 70,000 52,000 Crime_rate < 5 Yes No Built_year > 2008 Crime rate < 3 35,00045,000 48,000 33,000 40,000 Price = 48,000 – 15,000(Rooms) 𝐸(𝑝𝑟𝑖𝑐𝑒|𝑟𝑜𝑜𝑚𝑠 < 3) = 33,000
  • 28. Estimating apartment prices Rooms < 3 Floor < 2 55,00030,000 70,000 52,000 Crime_rate < 5 Yes No Built_year > 2008 Crime rate < 3 35,00045,000 48,000 33,000 40,000 Price = 48,000 – 15,000(Rooms) + 7,000(Built_year) [Built 2010]
  • 29. Estimating apartment prices Rooms < 3 Floor < 2 55,00030,000 70,000 52,000 Crime_rate < 5 Built_year > 2008 Crime rate < 3 35,00045,000 48,000 33,000 40,000 Price = 48,000 – 15,000(Rooms) + 7,000(Built_year) – 5,000(Crime_rate) [Crime rate 5]
  • 30. Estimating apartment prices Rooms < 3 Floor < 2 55,00030,000 70,000 52,000 Crime_rate < 5 Built_year > 2008 Crime rate < 3 35,00045,000 48,000 33,000 40,000 Price = 48,000 – 15,000(Rooms) + 7,000(Built_year) – 5,000(Crime_rate) = 35,000
  • 31. • We can define the the decision tree in term of the bias and contribution from each feature. • Similar to linear regression (prediction = bias + feature1contribution + … + featurencontribution), but on a prediction level, not model level • Does not depend on the size of the tree or number of features • Works equally well for • Regression and classification trees • Multivalued and multilabel data Decomposition for decision trees dt(𝑥) = 𝑏𝑖𝑎𝑠 + 𝑖=1 𝑁 𝑐𝑜𝑛𝑡𝑟(𝑖, 𝑥))dt 𝑥 = 𝑚=1 𝑀 𝑐 𝑚 𝐼(𝑥 ∈ 𝑅 𝑚)
  • 32. Deeper inspections • We can have more fine grained definition in addition to pure feature contributions • Separate negative and positive contributions • Contribution from decisions (floor = 1  -15000) • Contribution from interactions (floor == 1 & has_terrace  3000) • etc • Number of features typically not a concern because of the long tail. In practice, top 10 features contribute the vast majority of the overall deviation from the mean
  • 33. From decision trees to random forests • Prediction of a random forest is the average of the predictions of individual trees • From distributivity of multiplication and associativity of addition, random forest prediction can be defined as the average of bias term of individual trees + sum of averages of each feature contribution from individual trees • prediction = bias + feature1contribution + … + featurencontribution RF 𝑥 = 1 𝐽 𝑗=1 𝐽 𝑑𝑡𝑗(𝑥) RF 𝑥 = 1 𝐽 𝑗=1 𝐽 𝑏𝑖𝑎𝑠𝑗 𝑥 + ( 1 𝐽 𝑗=1 𝐽 𝑐𝑜𝑛𝑡𝑟𝑗 1, 𝑥 + ⋯ + 1 𝐽 𝑗=1 𝐽 𝑐𝑜𝑛𝑡𝑟𝑗 𝑛, 𝑥 )
  • 34. Random forest interpretation in Scikit-Learn • Path walking in general no supported by ML libraries • Scikit-Learn: one of the most popular Python (and overall) ML libraries • Patch since 0.17 (released Nov 8 2015) to include values/predictions at each node: allows walking the tree paths and collecting values along the way • Treeintepreter library for decomposing the predictions • https://github.com/andosa/treeinterpreter • pip install treeinterpreter • Supports both decision tree and random forest classes, regression and classification
  • 35. Using treeinterpreter • Decomposing predictions is a one-liner from treeinterpreter import treeinterpreter as ti rf = RandomForestRegressor() rf.fit(trainX,trainY) prediction, bias, contributions = ti.predict(rf, testX) #prediction == bias + contributions assert(numpy.allclose(prediction, bias + np.sum(contributions, axis=1)))
  • 36. Decomposing a prediction – boston housing data prediction, bias, contributions = ti.predict(rf, boston.data) >> prediction[0] 30.69 >> bias[0] 25.588 >> sorted(zip(contributions[0], boston.feature_names), >> key=lambda x: -abs(x[0])) [(4.3387165697195558, 'RM'), (-1.0771391053864874, 'TAX'), (1.0207116129073213, 'LSTAT'), (0.38890774812797702, 'AGE'), (0.38381481481481539, 'ZN'), (-0.10397222222222205, 'CRIM'), (-0.091520697167756987, 'NOX') ...
  • 37. Conclusions • Model interpretation can be very beneficial for ML and data science practitioners in many tasks • No need to understand the full model in many/most cases: explanation of decisions sufficient • Random forests can be turned from black box into a white box • Each prediction can decomposed into bias and feature contribution terms • Python library available for scikit-learn at https://github.com/andosa/treeinterpreter or pip install treeinterpreter • Subscribe to http://blog.datadive.net/ • Follow @crossentropy