SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Validation Is (Not) Easy
Dmytro Panchenko
Machine learning engineer, Altexsoft
What is validation?
Validation is a way to select and
evaluate our models.
Two most common strategies:
• train-validation-test split
(holdout validation)
• k-fold cross-validation + test
holdout
Source: https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9
2
What do we expect?
Validation:
• To compare and select models
3
What do we expect?
Validation:
• To compare and select models
Test:
• To evaluate model’s performance
4
Is it easy?
5
What’s wrong?
•Non-representative splits
6
What’s wrong?
•Non-representative splits
•Unstable validation
7
What’s wrong?
•Non-representative splits
•Unstable validation
•Data leakages
Source: https://www.kaggle.com/alexisbcook/data-leakage
8
9
Representative sampling
The Good
10
Representative sampling
The Bad
11
Representative sampling
The Ugly
12
Adversarial validation
1. Merge train and test into a single dataset
2. Label train samples as 0 and test samples as 1
3. Train classifier
4. Train samples with the highest error are the most similar to test
distribution
13
Adversarial validation: usage
• To detect discrepancy in distributions (ROC-AUC > 0.5)
• To make train or validation close to test (by removing features or
sampling most similar items)
• To make test close to production (if we have an unlabeled set of real-
world data)
Examples:
https://www.linkedin.com/pulse/winning-13th-place-kaggles-magic-competition-corey-levinson/
https://www.kaggle.com/c/home-credit-default-risk/discussion/64722
https://www.kaggle.com/c/human-protein-atlas-image-classification/discussion/77251
14
Representative sampling
The Ugly
15
Kaggle example
16
Unstable validation
17
Reasons for instability
•Not enough data
18
Reasons for instability
•Not enough data
•Bad stratification
19
Reasons for instability
•Not enough data
•Bad stratification
•Noisy labels
20
Reasons for instability
•Not enough data
•Bad stratification
•Noisy labels
•Outliers in data
21
Case #1. Mercedes-Benz competition
Competitors were working with a
dataset representing different
features of Mercedes-Benz cars to
predict the time it takes to pass
testing for a car.
Metric: R2.
Only 4k rows.
Extreme outliers in target.
22
Source: https://habr.com/ru/company/ods/blog/336168/
Case #1. Mercedes-Benz competition
Gold medal solution:
1. Multiple k-folds (10x5 folds) to collect more fold statistics
2. Dependent Student’s t-test for paired samples to compare two
models:
𝑇 𝑋1
𝑛
, 𝑋2
𝑛
=
𝐸 𝑋1 − 𝐸 𝑋2
𝑆/ 𝑛
where 𝑛 – number of folds, 𝑋1
𝑛
, 𝑋2
𝑛
– metrics for each fold for
models #1 and #2, 𝑆 – dispersion of elementwise differences.
Source: https://habr.com/ru/company/ods/blog/336168/ Author: https://www.kaggle.com/daniel89
23
Case #2. ML BootCamp VI
1. 19M rows of logs
2. Adversarial validation gives
0.9+ ROC-AUC
3. Extremely unstable CV:
unclear how to stratify
Author: https://www.kaggle.com/sergeifironov/
24
Case #2. ML BootCamp VI
First place solution:
1. Train model on stratified k-folds
2. Compute out-of-fold error for each
sample
3. Stratify dataset by error
4. Optionally: go to step #1 again
Author: https://www.kaggle.com/sergeifironov/
25
Data leakage
Data leakage is the contamination
of the training data by additional
information that will not be
available at the actual prediction
time.
Source: https://www.kaggle.com/alexisbcook/data-leakage
26
Case #1. HPA Classification Challenge
Multiple shots from single experiment
are available.
If one shot is placed into train and
another is placed into validation, you
have a leakage.
27
Case #1. HPA Classification Challenge
Solution: if you have data from
several groups that share the target,
always place whole group to a single
set!
28
Case #2. Telecom Data Cup
Where is the leakage?
Client uses mobile provider services
Client answers to engagement survey
Survey result is written into DB
All previous history is aggregated into a
row in the dataset
29
Case #2. Telecom Data Cup
Engagement survey call itself
is accounted in the call
history.
Short call means everything
was fine. Long conversation
means complaining.
Client uses mobile provider services
Client answers to engagement survey
Survey result is written into DB
All previous history is aggregated into a
row in the dataset
30
LEAKAGE!
Case #2. Telecom Data Cup
Solution: you must only use
data that was available at the
point when prediction should
have been made.
Client uses mobile provider services
Client answers to engagement survey
Survey result is written into DB
All previous history is aggregated into a
row in the dataset
31
LEAKAGE!
Case #3. APTOS Blindness Detection
Different classes were probably collected separately and artificially mixed
into a single dataset, so aspect ratio, image size and crop type vary for
different classes.
32
Case #3. APTOS Blindness Detection
It leads network to learning arbitrary metafeatures of image instead of
actual symptoms.
33
Source: https://www.kaggle.com/dimitreoliveira/diabetic-retinopathy-shap-model-explainability
Case #3. APTOS Blindness Detection
34
Solution: remove metafeatures that are not related to the task and
thoroughly investigate all suspicious “data properties”.
Case #4. Airbus Ship Detection Challenge
35
Original dataset consists of
high-resolution images.
They were cropped,
augmented and after that
divided into train and test.
Case #4. Airbus Ship Detection Challenge
36
Solution: first split data into
train and test, after that apply
all preprocessing.
If preprocessing is data-driven
(e.g. target encoding), use
only train data for that.
Summary
• Always ensure that your
validation is representative
• Check that your validation
scenario corresponds real-
world prediction scenario
• Good luck!
37
Thank you for your attention
Questions are welcomed

Más contenido relacionado

La actualidad más candente

Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...Pluribus One
 
AI in SE: A 25-year Journey
AI in SE: A 25-year JourneyAI in SE: A 25-year Journey
AI in SE: A 25-year JourneyLionel Briand
 
Test design techniques
Test design techniquesTest design techniques
Test design techniquesOksana
 
Testing Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveTesting Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveLionel Briand
 
Metamorphic Security Testing for Web Systems
Metamorphic Security Testing for Web SystemsMetamorphic Security Testing for Web Systems
Metamorphic Security Testing for Web SystemsLionel Briand
 

La actualidad más candente (6)

Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
Sparse Support Faces - Battista Biggio - Int'l Conf. Biometrics, ICB 2015, Ph...
 
AI in SE: A 25-year Journey
AI in SE: A 25-year JourneyAI in SE: A 25-year Journey
AI in SE: A 25-year Journey
 
Test design techniques
Test design techniquesTest design techniques
Test design techniques
 
Testing Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveTesting Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal Perspective
 
Metamorphic Security Testing for Web Systems
Metamorphic Security Testing for Web SystemsMetamorphic Security Testing for Web Systems
Metamorphic Security Testing for Web Systems
 
Black Box Testing
Black Box TestingBlack Box Testing
Black Box Testing
 

Similar a Validation Is (Not) Easy

Target Leakage in Machine Learning (ODSC East 2020)
Target Leakage in Machine Learning (ODSC East 2020)Target Leakage in Machine Learning (ODSC East 2020)
Target Leakage in Machine Learning (ODSC East 2020)Yuriy Guts
 
Model Selection Techniques
Model Selection TechniquesModel Selection Techniques
Model Selection TechniquesSwati .
 
Human-Centered Interpretable Machine Learning
Human-Centered Interpretable  Machine LearningHuman-Centered Interpretable  Machine Learning
Human-Centered Interpretable Machine LearningPrzemek Biecek
 
2a Mini-conf PredictCovid. Field: Artificial Intelligence
2a Mini-conf PredictCovid. Field: Artificial Intelligence2a Mini-conf PredictCovid. Field: Artificial Intelligence
2a Mini-conf PredictCovid. Field: Artificial IntelligenceAlex Camargo
 
Final Exam Questions Fall03
Final Exam Questions Fall03Final Exam Questions Fall03
Final Exam Questions Fall03Radu_Negulescu
 
vodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applicationsvodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applicationsvodQA
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10Roger Barga
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
Automock: Interaction-Based Mock Code Generation
Automock: Interaction-Based Mock Code GenerationAutomock: Interaction-Based Mock Code Generation
Automock: Interaction-Based Mock Code GenerationSabrina Souto
 
Structural Testing: When Quality Really Matters
Structural Testing: When Quality Really MattersStructural Testing: When Quality Really Matters
Structural Testing: When Quality Really MattersTechWell
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh
 
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...CA Technologies
 
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...CA Technologies
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction SystemIRJET Journal
 
Robust approach to machine learning models comparison - Dmitry Larko, Sr. Dat...
Robust approach to machine learning models comparison - Dmitry Larko, Sr. Dat...Robust approach to machine learning models comparison - Dmitry Larko, Sr. Dat...
Robust approach to machine learning models comparison - Dmitry Larko, Sr. Dat...Sri Ambati
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityGon-soo Moon
 
lecture-intro-pet-nams-ai-in-toxicology.pptx
lecture-intro-pet-nams-ai-in-toxicology.pptxlecture-intro-pet-nams-ai-in-toxicology.pptx
lecture-intro-pet-nams-ai-in-toxicology.pptxMarc Teunis
 
Target Leakage in Machine Learning
Target Leakage in Machine LearningTarget Leakage in Machine Learning
Target Leakage in Machine LearningYuriy Guts
 

Similar a Validation Is (Not) Easy (20)

Target Leakage in Machine Learning (ODSC East 2020)
Target Leakage in Machine Learning (ODSC East 2020)Target Leakage in Machine Learning (ODSC East 2020)
Target Leakage in Machine Learning (ODSC East 2020)
 
Model Selection Techniques
Model Selection TechniquesModel Selection Techniques
Model Selection Techniques
 
Human-Centered Interpretable Machine Learning
Human-Centered Interpretable  Machine LearningHuman-Centered Interpretable  Machine Learning
Human-Centered Interpretable Machine Learning
 
2a Mini-conf PredictCovid. Field: Artificial Intelligence
2a Mini-conf PredictCovid. Field: Artificial Intelligence2a Mini-conf PredictCovid. Field: Artificial Intelligence
2a Mini-conf PredictCovid. Field: Artificial Intelligence
 
Final Exam Questions Fall03
Final Exam Questions Fall03Final Exam Questions Fall03
Final Exam Questions Fall03
 
vodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applicationsvodQA Pune (2019) - Testing AI,ML applications
vodQA Pune (2019) - Testing AI,ML applications
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
Automock: Interaction-Based Mock Code Generation
Automock: Interaction-Based Mock Code GenerationAutomock: Interaction-Based Mock Code Generation
Automock: Interaction-Based Mock Code Generation
 
Test AI/ML Applications
Test AI/ML ApplicationsTest AI/ML Applications
Test AI/ML Applications
 
Structural Testing: When Quality Really Matters
Structural Testing: When Quality Really MattersStructural Testing: When Quality Really Matters
Structural Testing: When Quality Really Matters
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...
 
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...
Case Study: How CA Went From 40 Days to Three Days Building Crystal-Clear Tes...
 
Ds for finance day 3
Ds for finance day 3Ds for finance day 3
Ds for finance day 3
 
Post Graduate Admission Prediction System
Post Graduate Admission Prediction SystemPost Graduate Admission Prediction System
Post Graduate Admission Prediction System
 
Robust approach to machine learning models comparison - Dmitry Larko, Sr. Dat...
Robust approach to machine learning models comparison - Dmitry Larko, Sr. Dat...Robust approach to machine learning models comparison - Dmitry Larko, Sr. Dat...
Robust approach to machine learning models comparison - Dmitry Larko, Sr. Dat...
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-Severity
 
lecture-intro-pet-nams-ai-in-toxicology.pptx
lecture-intro-pet-nams-ai-in-toxicology.pptxlecture-intro-pet-nams-ai-in-toxicology.pptx
lecture-intro-pet-nams-ai-in-toxicology.pptx
 
Target Leakage in Machine Learning
Target Leakage in Machine LearningTarget Leakage in Machine Learning
Target Leakage in Machine Learning
 

Último

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 

Último (20)

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 

Validation Is (Not) Easy

  • 1. Validation Is (Not) Easy Dmytro Panchenko Machine learning engineer, Altexsoft
  • 2. What is validation? Validation is a way to select and evaluate our models. Two most common strategies: • train-validation-test split (holdout validation) • k-fold cross-validation + test holdout Source: https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9 2
  • 3. What do we expect? Validation: • To compare and select models 3
  • 4. What do we expect? Validation: • To compare and select models Test: • To evaluate model’s performance 4
  • 8. What’s wrong? •Non-representative splits •Unstable validation •Data leakages Source: https://www.kaggle.com/alexisbcook/data-leakage 8
  • 9. 9
  • 13. Adversarial validation 1. Merge train and test into a single dataset 2. Label train samples as 0 and test samples as 1 3. Train classifier 4. Train samples with the highest error are the most similar to test distribution 13
  • 14. Adversarial validation: usage • To detect discrepancy in distributions (ROC-AUC > 0.5) • To make train or validation close to test (by removing features or sampling most similar items) • To make test close to production (if we have an unlabeled set of real- world data) Examples: https://www.linkedin.com/pulse/winning-13th-place-kaggles-magic-competition-corey-levinson/ https://www.kaggle.com/c/home-credit-default-risk/discussion/64722 https://www.kaggle.com/c/human-protein-atlas-image-classification/discussion/77251 14
  • 19. Reasons for instability •Not enough data •Bad stratification 19
  • 20. Reasons for instability •Not enough data •Bad stratification •Noisy labels 20
  • 21. Reasons for instability •Not enough data •Bad stratification •Noisy labels •Outliers in data 21
  • 22. Case #1. Mercedes-Benz competition Competitors were working with a dataset representing different features of Mercedes-Benz cars to predict the time it takes to pass testing for a car. Metric: R2. Only 4k rows. Extreme outliers in target. 22 Source: https://habr.com/ru/company/ods/blog/336168/
  • 23. Case #1. Mercedes-Benz competition Gold medal solution: 1. Multiple k-folds (10x5 folds) to collect more fold statistics 2. Dependent Student’s t-test for paired samples to compare two models: 𝑇 𝑋1 𝑛 , 𝑋2 𝑛 = 𝐸 𝑋1 − 𝐸 𝑋2 𝑆/ 𝑛 where 𝑛 – number of folds, 𝑋1 𝑛 , 𝑋2 𝑛 – metrics for each fold for models #1 and #2, 𝑆 – dispersion of elementwise differences. Source: https://habr.com/ru/company/ods/blog/336168/ Author: https://www.kaggle.com/daniel89 23
  • 24. Case #2. ML BootCamp VI 1. 19M rows of logs 2. Adversarial validation gives 0.9+ ROC-AUC 3. Extremely unstable CV: unclear how to stratify Author: https://www.kaggle.com/sergeifironov/ 24
  • 25. Case #2. ML BootCamp VI First place solution: 1. Train model on stratified k-folds 2. Compute out-of-fold error for each sample 3. Stratify dataset by error 4. Optionally: go to step #1 again Author: https://www.kaggle.com/sergeifironov/ 25
  • 26. Data leakage Data leakage is the contamination of the training data by additional information that will not be available at the actual prediction time. Source: https://www.kaggle.com/alexisbcook/data-leakage 26
  • 27. Case #1. HPA Classification Challenge Multiple shots from single experiment are available. If one shot is placed into train and another is placed into validation, you have a leakage. 27
  • 28. Case #1. HPA Classification Challenge Solution: if you have data from several groups that share the target, always place whole group to a single set! 28
  • 29. Case #2. Telecom Data Cup Where is the leakage? Client uses mobile provider services Client answers to engagement survey Survey result is written into DB All previous history is aggregated into a row in the dataset 29
  • 30. Case #2. Telecom Data Cup Engagement survey call itself is accounted in the call history. Short call means everything was fine. Long conversation means complaining. Client uses mobile provider services Client answers to engagement survey Survey result is written into DB All previous history is aggregated into a row in the dataset 30 LEAKAGE!
  • 31. Case #2. Telecom Data Cup Solution: you must only use data that was available at the point when prediction should have been made. Client uses mobile provider services Client answers to engagement survey Survey result is written into DB All previous history is aggregated into a row in the dataset 31 LEAKAGE!
  • 32. Case #3. APTOS Blindness Detection Different classes were probably collected separately and artificially mixed into a single dataset, so aspect ratio, image size and crop type vary for different classes. 32
  • 33. Case #3. APTOS Blindness Detection It leads network to learning arbitrary metafeatures of image instead of actual symptoms. 33 Source: https://www.kaggle.com/dimitreoliveira/diabetic-retinopathy-shap-model-explainability
  • 34. Case #3. APTOS Blindness Detection 34 Solution: remove metafeatures that are not related to the task and thoroughly investigate all suspicious “data properties”.
  • 35. Case #4. Airbus Ship Detection Challenge 35 Original dataset consists of high-resolution images. They were cropped, augmented and after that divided into train and test.
  • 36. Case #4. Airbus Ship Detection Challenge 36 Solution: first split data into train and test, after that apply all preprocessing. If preprocessing is data-driven (e.g. target encoding), use only train data for that.
  • 37. Summary • Always ensure that your validation is representative • Check that your validation scenario corresponds real- world prediction scenario • Good luck! 37
  • 38. Thank you for your attention Questions are welcomed