SlideShare una empresa de Scribd logo
1 de 10
IC Fraud Prediction
To predict whether an insurance claim is acceptable or not.
Data Gathering and
Preparation
Data Analysis and
Visualization
Predictive Model Building
Explanatory Model
Building
Workflow
Data Preparation
Data Gathering ✔️ Data quality checks ✔️ Handling extreme values ✔️ Handling missing data ✔️ Feature selection ✔️ Encoding ✔️
Columns with
outliers
● Policy annual
premium
● Umbrella limit
● Capital loss
● Property claim
Solved with:
Median imputation
● Initial data
provided
● Intuitive
cross-check
● Ideation for
derived
columns
2 derived columns: ‘Months
within incident date and policy
bind date’ and ‘incident
within customership’
Columns with
missing data
● Collision type
● Property
damage
● Police report
available
Solved with:
Mode imputation
10
most
important
features
10
least
important
features
Feature - Feature Correlation
Heatmap
Initial: 1000 rows, 40 columns
● Total claim is
the sum of
Property claim,
Vehicle claim
and
Injury claim
● Values in
numeric
columns > 0
1 row containing umbrella limit
< 0 removed
Initial: 1000 rows, 40 columns
Columns removed due to non-relevance: Policy
number, _c39
Columns removed due to correlation >
95% with other column:
Vehicle claim
Columns removed due to contribution transferred
to a derived column: Incident date, Policy bind
date
Columns removed due to feature importance
score < 0.02: Collision type, Property damage,
Incident within customership, Insured sex,
Umbrella limit, Number of vehicles involved,
Police report available, Incident type
Columns in final Analytical Dataset:
Months as customer, Age, Policy state, Policy csl,
Policy deductible, Policy annual premium, Insured zip,
Insured education level, Insured occupation, Insured
hobbies, Insured relationship, Capital gains, Capital
loss, Incident severity, Authorities contacted, Incident
state, Incident city, Incident hour of the day, Bodily
injuries, Witnesses, Total claim amount, Injury claim,
Property claim, Auto make, Auto model, Auto year,
Months between incident date and bind date
Final: 999 rows, 27 columns
Handling imbalanced data✔️
Fraud 25%
Non-Fraud 75%
Initial
imbalanced
dataset
Imbalanced
Training
dataset
Balanced
Training
dataset
For Train Dataset
SMOTE (Synthetic Minority
Oversampling TEchnique)
Train - Test Split
Initial
imbalanced
dataset
Imbalanced
Test
dataset
For Test Dataset
Train - Test Split
Distribution of target labels
Data Analysis and Visualization
Distribution of Target column values along Categorical columns✔️ Distribution of Target column values along Non-Categorical columns✔️
Bar Charts - Feature column (X) vs
Target Column (Y)
Density Plots - Feature Column (X) vs Target Column (Y)
Explanatory Model Building
ML Model performances✔️ Main and Interaction effects on Model Outputs✔️
Model Accura
cy
Precisi
on
Recall F1
Score
LR 0.76 0 0 0
KNN 0.74 0.38 0.12 0.19
NB 0.735 0.35 0.12 0.18
DT 0.74 0.47 0.60 0.53
RF 0.77 0.53 0.44 0.48
XGB 0.775 0.53 0.58 0.55
Heatmap
for
Main
and
Interaction
effects
Therm
plot
for
main
effects
Best performing models are Tree-based models
Selected model: XGBoost
Predictive Model Building
Current Model performance✔️ Improvements✔️
Accuracy Precision Recall F1 Score
0.775 0.53 0.58 0.55
🚀 Hyperparameter Tuning
by GridSearchCV
Best Parameter values:
'colsample_bytree': 1,
'learning_rate': 0.01,
'max_depth': 10,
'n_estimators': 100,
'subsample': 0.7
Accuracy Precision Recall F1 Score
0.82 0.60 0.77 0.67
🚀 Tuning threshold from
ROC by maximising AUC
Theshold
value
=
0.68
Accuracy Precision Recall F1 Score
0.83 0.60 0.90 0.72
Challenges ● Intuitive cross-check and deriving
features.
● Improving the performance - Determining
the set of values for parameters in
hyperparameter tuning.
● Improving the performance further -
Determining correct optimizer for
procuring threshold from ROC. Finalized
at: (TPR - FPR)
Insights
Highest contributing columns [i.e. columns that
should be made sure to contain correct values]
Examples
of
their
contributions
Thanks!
|Srijit|
srijitpanja@gmail.com

Más contenido relacionado

Similar a Data Science use case: Fraud Insurance Claims Detection by ML algo

Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPranov Mishra
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersMohitMhapuskar
 
AI Class Topic 2: Step-by-step Process for AI development
AI Class Topic 2: Step-by-step Process for AI developmentAI Class Topic 2: Step-by-step Process for AI development
AI Class Topic 2: Step-by-step Process for AI developmentValue Amplify Consulting
 
Machine learning thomas_quadrant4_v1.1
Machine learning thomas_quadrant4_v1.1Machine learning thomas_quadrant4_v1.1
Machine learning thomas_quadrant4_v1.1SPIN Chennai
 
Phase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIPhase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIVikas Virani
 
BMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist DeckBMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist DeckSasha Lazarevic
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEFeng Zhu
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine LearningJulien SIMON
 
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Lionel Briand
 
Churn model for telecom
Churn model for telecomChurn model for telecom
Churn model for telecomAmit Kumar
 
Mutual fund Redemption and Cross Sell Analytics
Mutual fund Redemption and Cross Sell Analytics Mutual fund Redemption and Cross Sell Analytics
Mutual fund Redemption and Cross Sell Analytics Dipesh Patel
 
Mutual fund Redemption and Cross Sell Analytics
Mutual fund Redemption and Cross Sell Analytics Mutual fund Redemption and Cross Sell Analytics
Mutual fund Redemption and Cross Sell Analytics Dipesh Patel
 
Testing Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveTesting Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveLionel Briand
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
 
Ledger Alchemy 255 Data mining.pdf
Ledger Alchemy 255 Data mining.pdfLedger Alchemy 255 Data mining.pdf
Ledger Alchemy 255 Data mining.pdfpatiladiti752
 

Similar a Data Science use case: Fraud Insurance Claims Detection by ML algo (20)

Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Bank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptxBank Customer Churn Prediction- Saurav Singh.pptx
Bank Customer Churn Prediction- Saurav Singh.pptx
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Data Mining to Classify Telco Churners
Data Mining to Classify Telco ChurnersData Mining to Classify Telco Churners
Data Mining to Classify Telco Churners
 
AI Class Topic 2: Step-by-step Process for AI development
AI Class Topic 2: Step-by-step Process for AI developmentAI Class Topic 2: Step-by-step Process for AI development
AI Class Topic 2: Step-by-step Process for AI development
 
1710 track3 zhu
1710 track3 zhu1710 track3 zhu
1710 track3 zhu
 
Machine learning thomas_quadrant4_v1.1
Machine learning thomas_quadrant4_v1.1Machine learning thomas_quadrant4_v1.1
Machine learning thomas_quadrant4_v1.1
 
Phase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMIPhase 2 of Predicting Payment default on Vehicle Loan EMI
Phase 2 of Predicting Payment default on Vehicle Loan EMI
 
BMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist DeckBMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist Deck
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...Applications of Search-based Software Testing to Trustworthy Artificial Intel...
Applications of Search-based Software Testing to Trustworthy Artificial Intel...
 
Churn model for telecom
Churn model for telecomChurn model for telecom
Churn model for telecom
 
Mutual fund Redemption and Cross Sell Analytics
Mutual fund Redemption and Cross Sell Analytics Mutual fund Redemption and Cross Sell Analytics
Mutual fund Redemption and Cross Sell Analytics
 
Mutual fund Redemption and Cross Sell Analytics
Mutual fund Redemption and Cross Sell Analytics Mutual fund Redemption and Cross Sell Analytics
Mutual fund Redemption and Cross Sell Analytics
 
Testing Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveTesting Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal Perspective
 
Responsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons LearnedResponsible AI in Industry: Practical Challenges and Lessons Learned
Responsible AI in Industry: Practical Challenges and Lessons Learned
 
Ledger Alchemy 255 Data mining.pdf
Ledger Alchemy 255 Data mining.pdfLedger Alchemy 255 Data mining.pdf
Ledger Alchemy 255 Data mining.pdf
 

Último

Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 

Último (20)

Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Data Science use case: Fraud Insurance Claims Detection by ML algo

  • 1. IC Fraud Prediction To predict whether an insurance claim is acceptable or not.
  • 2. Data Gathering and Preparation Data Analysis and Visualization Predictive Model Building Explanatory Model Building Workflow
  • 3. Data Preparation Data Gathering ✔️ Data quality checks ✔️ Handling extreme values ✔️ Handling missing data ✔️ Feature selection ✔️ Encoding ✔️ Columns with outliers ● Policy annual premium ● Umbrella limit ● Capital loss ● Property claim Solved with: Median imputation ● Initial data provided ● Intuitive cross-check ● Ideation for derived columns 2 derived columns: ‘Months within incident date and policy bind date’ and ‘incident within customership’ Columns with missing data ● Collision type ● Property damage ● Police report available Solved with: Mode imputation 10 most important features 10 least important features Feature - Feature Correlation Heatmap Initial: 1000 rows, 40 columns ● Total claim is the sum of Property claim, Vehicle claim and Injury claim ● Values in numeric columns > 0 1 row containing umbrella limit < 0 removed
  • 4. Initial: 1000 rows, 40 columns Columns removed due to non-relevance: Policy number, _c39 Columns removed due to correlation > 95% with other column: Vehicle claim Columns removed due to contribution transferred to a derived column: Incident date, Policy bind date Columns removed due to feature importance score < 0.02: Collision type, Property damage, Incident within customership, Insured sex, Umbrella limit, Number of vehicles involved, Police report available, Incident type Columns in final Analytical Dataset: Months as customer, Age, Policy state, Policy csl, Policy deductible, Policy annual premium, Insured zip, Insured education level, Insured occupation, Insured hobbies, Insured relationship, Capital gains, Capital loss, Incident severity, Authorities contacted, Incident state, Incident city, Incident hour of the day, Bodily injuries, Witnesses, Total claim amount, Injury claim, Property claim, Auto make, Auto model, Auto year, Months between incident date and bind date Final: 999 rows, 27 columns
  • 5. Handling imbalanced data✔️ Fraud 25% Non-Fraud 75% Initial imbalanced dataset Imbalanced Training dataset Balanced Training dataset For Train Dataset SMOTE (Synthetic Minority Oversampling TEchnique) Train - Test Split Initial imbalanced dataset Imbalanced Test dataset For Test Dataset Train - Test Split Distribution of target labels
  • 6. Data Analysis and Visualization Distribution of Target column values along Categorical columns✔️ Distribution of Target column values along Non-Categorical columns✔️ Bar Charts - Feature column (X) vs Target Column (Y) Density Plots - Feature Column (X) vs Target Column (Y)
  • 7. Explanatory Model Building ML Model performances✔️ Main and Interaction effects on Model Outputs✔️ Model Accura cy Precisi on Recall F1 Score LR 0.76 0 0 0 KNN 0.74 0.38 0.12 0.19 NB 0.735 0.35 0.12 0.18 DT 0.74 0.47 0.60 0.53 RF 0.77 0.53 0.44 0.48 XGB 0.775 0.53 0.58 0.55 Heatmap for Main and Interaction effects Therm plot for main effects Best performing models are Tree-based models Selected model: XGBoost
  • 8. Predictive Model Building Current Model performance✔️ Improvements✔️ Accuracy Precision Recall F1 Score 0.775 0.53 0.58 0.55 🚀 Hyperparameter Tuning by GridSearchCV Best Parameter values: 'colsample_bytree': 1, 'learning_rate': 0.01, 'max_depth': 10, 'n_estimators': 100, 'subsample': 0.7 Accuracy Precision Recall F1 Score 0.82 0.60 0.77 0.67 🚀 Tuning threshold from ROC by maximising AUC Theshold value = 0.68 Accuracy Precision Recall F1 Score 0.83 0.60 0.90 0.72
  • 9. Challenges ● Intuitive cross-check and deriving features. ● Improving the performance - Determining the set of values for parameters in hyperparameter tuning. ● Improving the performance further - Determining correct optimizer for procuring threshold from ROC. Finalized at: (TPR - FPR) Insights Highest contributing columns [i.e. columns that should be made sure to contain correct values] Examples of their contributions

Notas del editor

  1. fraud_reported : {'Y': 0, 'N': 1} incident_severity : {'Major Damage': 0, 'Minor Damage': 1, 'Total Loss': 2, 'Trivial Damage': 3} insured_hobbies : {'sleeping': 0, 'reading': 1, 'board-games': 2, 'bungie-jumping': 3, 'base-jumping': 4, 'golf': 5, 'camping': 6, 'dancing': 7, 'skydiving': 8, 'movies': 9, 'hiking': 10, 'yachting': 11, 'paintball': 12, 'chess': 13, 'kayaking': 14, 'polo': 15, 'basketball': 16, 'video-games': 17, 'cross-fit': 18, 'exercise': 19} auto_make : {'Saab': 0, 'Mercedes': 1, 'Dodge': 2, 'Chevrolet': 3, 'Accura': 4, 'Nissan': 5, 'Audi': 6, 'Toyota': 7, 'Ford': 8, 'Suburu': 9, 'BMW': 10, 'Jeep': 11, 'Honda': 12, 'Volkswagen': 13} incident_state : {'SC': 0, 'VA': 1, 'NY': 2, 'OH': 3, 'WV': 4, 'NC': 5, 'PA': 6}
  2. fraud_reported : {'Y': 0, 'N': 1} incident_severity : {'Major Damage': 0, 'Minor Damage': 1, 'Total Loss': 2, 'Trivial Damage': 3} insured_hobbies : {'sleeping': 0, 'reading': 1, 'board-games': 2, 'bungie-jumping': 3, 'base-jumping': 4, 'golf': 5, 'camping': 6, 'dancing': 7, 'skydiving': 8, 'movies': 9, 'hiking': 10, 'yachting': 11, 'paintball': 12, 'chess': 13, 'kayaking': 14, 'polo': 15, 'basketball': 16, 'video-games': 17, 'cross-fit': 18, 'exercise': 19} auto_make : {'Saab': 0, 'Mercedes': 1, 'Dodge': 2, 'Chevrolet': 3, 'Accura': 4, 'Nissan': 5, 'Audi': 6, 'Toyota': 7, 'Ford': 8, 'Suburu': 9, 'BMW': 10, 'Jeep': 11, 'Honda': 12, 'Volkswagen': 13} incident_state : {'SC': 0, 'VA': 1, 'NY': 2, 'OH': 3, 'WV': 4, 'NC': 5, 'PA': 6}
  3. fraud_reported : {'Y': 0, 'N': 1} incident_severity : {'Major Damage': 0, 'Minor Damage': 1, 'Total Loss': 2, 'Trivial Damage': 3} insured_hobbies : {'sleeping': 0, 'reading': 1, 'board-games': 2, 'bungie-jumping': 3, 'base-jumping': 4, 'golf': 5, 'camping': 6, 'dancing': 7, 'skydiving': 8, 'movies': 9, 'hiking': 10, 'yachting': 11, 'paintball': 12, 'chess': 13, 'kayaking': 14, 'polo': 15, 'basketball': 16, 'video-games': 17, 'cross-fit': 18, 'exercise': 19} auto_make : {'Saab': 0, 'Mercedes': 1, 'Dodge': 2, 'Chevrolet': 3, 'Accura': 4, 'Nissan': 5, 'Audi': 6, 'Toyota': 7, 'Ford': 8, 'Suburu': 9, 'BMW': 10, 'Jeep': 11, 'Honda': 12, 'Volkswagen': 13} incident_state : {'SC': 0, 'VA': 1, 'NY': 2, 'OH': 3, 'WV': 4, 'NC': 5, 'PA': 6}
  4. fraud_reported : {'Y': 0, 'N': 1} incident_severity : {'Major Damage': 0, 'Minor Damage': 1, 'Total Loss': 2, 'Trivial Damage': 3} insured_hobbies : {'sleeping': 0, 'reading': 1, 'board-games': 2, 'bungie-jumping': 3, 'base-jumping': 4, 'golf': 5, 'camping': 6, 'dancing': 7, 'skydiving': 8, 'movies': 9, 'hiking': 10, 'yachting': 11, 'paintball': 12, 'chess': 13, 'kayaking': 14, 'polo': 15, 'basketball': 16, 'video-games': 17, 'cross-fit': 18, 'exercise': 19} auto_make : {'Saab': 0, 'Mercedes': 1, 'Dodge': 2, 'Chevrolet': 3, 'Accura': 4, 'Nissan': 5, 'Audi': 6, 'Toyota': 7, 'Ford': 8, 'Suburu': 9, 'BMW': 10, 'Jeep': 11, 'Honda': 12, 'Volkswagen': 13} incident_state : {'SC': 0, 'VA': 1, 'NY': 2, 'OH': 3, 'WV': 4, 'NC': 5, 'PA': 6}