Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Data Science use case: Fraud Insurance Claims Detection by ML algo

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 10 Anuncio

Data Science use case: Fraud Insurance Claims Detection by ML algo

Descargar para leer sin conexión

- Covers all standard and mandatory steps - *in details for any *supervised/classification - data science application

- Dataset used here: https://www.kaggle.com/buntyshah/auto-insurance-claims-data

- Detailed medium article: https://medium.com/@srijitpanja/step-by-step-data-science-execution-car-insurance-fraud-detection-task-example-9855d306a4c9

- Covers all standard and mandatory steps - *in details for any *supervised/classification - data science application

- Dataset used here: https://www.kaggle.com/buntyshah/auto-insurance-claims-data

- Detailed medium article: https://medium.com/@srijitpanja/step-by-step-data-science-execution-car-insurance-fraud-detection-task-example-9855d306a4c9

Anuncio
Anuncio

Más Contenido Relacionado

Más reciente (20)

Anuncio

Data Science use case: Fraud Insurance Claims Detection by ML algo

  1. 1. IC Fraud Prediction To predict whether an insurance claim is acceptable or not.
  2. 2. Data Gathering and Preparation Data Analysis and Visualization Predictive Model Building Explanatory Model Building Workflow
  3. 3. Data Preparation Data Gathering ✔️ Data quality checks ✔️ Handling extreme values ✔️ Handling missing data ✔️ Feature selection ✔️ Encoding ✔️ Columns with outliers ● Policy annual premium ● Umbrella limit ● Capital loss ● Property claim Solved with: Median imputation ● Initial data provided ● Intuitive cross-check ● Ideation for derived columns 2 derived columns: ‘Months within incident date and policy bind date’ and ‘incident within customership’ Columns with missing data ● Collision type ● Property damage ● Police report available Solved with: Mode imputation 10 most important features 10 least important features Feature - Feature Correlation Heatmap Initial: 1000 rows, 40 columns ● Total claim is the sum of Property claim, Vehicle claim and Injury claim ● Values in numeric columns > 0 1 row containing umbrella limit < 0 removed
  4. 4. Initial: 1000 rows, 40 columns Columns removed due to non-relevance: Policy number, _c39 Columns removed due to correlation > 95% with other column: Vehicle claim Columns removed due to contribution transferred to a derived column: Incident date, Policy bind date Columns removed due to feature importance score < 0.02: Collision type, Property damage, Incident within customership, Insured sex, Umbrella limit, Number of vehicles involved, Police report available, Incident type Columns in final Analytical Dataset: Months as customer, Age, Policy state, Policy csl, Policy deductible, Policy annual premium, Insured zip, Insured education level, Insured occupation, Insured hobbies, Insured relationship, Capital gains, Capital loss, Incident severity, Authorities contacted, Incident state, Incident city, Incident hour of the day, Bodily injuries, Witnesses, Total claim amount, Injury claim, Property claim, Auto make, Auto model, Auto year, Months between incident date and bind date Final: 999 rows, 27 columns
  5. 5. Handling imbalanced data✔️ Fraud 25% Non-Fraud 75% Initial imbalanced dataset Imbalanced Training dataset Balanced Training dataset For Train Dataset SMOTE (Synthetic Minority Oversampling TEchnique) Train - Test Split Initial imbalanced dataset Imbalanced Test dataset For Test Dataset Train - Test Split Distribution of target labels
  6. 6. Data Analysis and Visualization Distribution of Target column values along Categorical columns✔️ Distribution of Target column values along Non-Categorical columns✔️ Bar Charts - Feature column (X) vs Target Column (Y) Density Plots - Feature Column (X) vs Target Column (Y)
  7. 7. Explanatory Model Building ML Model performances✔️ Main and Interaction effects on Model Outputs✔️ Model Accura cy Precisi on Recall F1 Score LR 0.76 0 0 0 KNN 0.74 0.38 0.12 0.19 NB 0.735 0.35 0.12 0.18 DT 0.74 0.47 0.60 0.53 RF 0.77 0.53 0.44 0.48 XGB 0.775 0.53 0.58 0.55 Heatmap for Main and Interaction effects Therm plot for main effects Best performing models are Tree-based models Selected model: XGBoost
  8. 8. Predictive Model Building Current Model performance✔️ Improvements✔️ Accuracy Precision Recall F1 Score 0.775 0.53 0.58 0.55 🚀 Hyperparameter Tuning by GridSearchCV Best Parameter values: 'colsample_bytree': 1, 'learning_rate': 0.01, 'max_depth': 10, 'n_estimators': 100, 'subsample': 0.7 Accuracy Precision Recall F1 Score 0.82 0.60 0.77 0.67 🚀 Tuning threshold from ROC by maximising AUC Theshold value = 0.68 Accuracy Precision Recall F1 Score 0.83 0.60 0.90 0.72
  9. 9. Challenges ● Intuitive cross-check and deriving features. ● Improving the performance - Determining the set of values for parameters in hyperparameter tuning. ● Improving the performance further - Determining correct optimizer for procuring threshold from ROC. Finalized at: (TPR - FPR) Insights Highest contributing columns [i.e. columns that should be made sure to contain correct values] Examples of their contributions
  10. 10. Thanks! |Srijit| srijitpanja@gmail.com

Notas del editor

  • fraud_reported : {'Y': 0, 'N': 1}

    incident_severity : {'Major Damage': 0, 'Minor Damage': 1, 'Total Loss': 2, 'Trivial Damage': 3}

    insured_hobbies : {'sleeping': 0, 'reading': 1, 'board-games': 2, 'bungie-jumping': 3, 'base-jumping': 4, 'golf': 5, 'camping': 6, 'dancing': 7, 'skydiving': 8, 'movies': 9, 'hiking': 10, 'yachting': 11, 'paintball': 12, 'chess': 13, 'kayaking': 14, 'polo': 15, 'basketball': 16, 'video-games': 17, 'cross-fit': 18, 'exercise': 19}

    auto_make : {'Saab': 0, 'Mercedes': 1, 'Dodge': 2, 'Chevrolet': 3, 'Accura': 4, 'Nissan': 5, 'Audi': 6, 'Toyota': 7, 'Ford': 8, 'Suburu': 9, 'BMW': 10, 'Jeep': 11, 'Honda': 12, 'Volkswagen': 13}

    incident_state : {'SC': 0, 'VA': 1, 'NY': 2, 'OH': 3, 'WV': 4, 'NC': 5, 'PA': 6}
  • fraud_reported : {'Y': 0, 'N': 1}

    incident_severity : {'Major Damage': 0, 'Minor Damage': 1, 'Total Loss': 2, 'Trivial Damage': 3}

    insured_hobbies : {'sleeping': 0, 'reading': 1, 'board-games': 2, 'bungie-jumping': 3, 'base-jumping': 4, 'golf': 5, 'camping': 6, 'dancing': 7, 'skydiving': 8, 'movies': 9, 'hiking': 10, 'yachting': 11, 'paintball': 12, 'chess': 13, 'kayaking': 14, 'polo': 15, 'basketball': 16, 'video-games': 17, 'cross-fit': 18, 'exercise': 19}

    auto_make : {'Saab': 0, 'Mercedes': 1, 'Dodge': 2, 'Chevrolet': 3, 'Accura': 4, 'Nissan': 5, 'Audi': 6, 'Toyota': 7, 'Ford': 8, 'Suburu': 9, 'BMW': 10, 'Jeep': 11, 'Honda': 12, 'Volkswagen': 13}

    incident_state : {'SC': 0, 'VA': 1, 'NY': 2, 'OH': 3, 'WV': 4, 'NC': 5, 'PA': 6}
  • fraud_reported : {'Y': 0, 'N': 1}

    incident_severity : {'Major Damage': 0, 'Minor Damage': 1, 'Total Loss': 2, 'Trivial Damage': 3}

    insured_hobbies : {'sleeping': 0, 'reading': 1, 'board-games': 2, 'bungie-jumping': 3, 'base-jumping': 4, 'golf': 5, 'camping': 6, 'dancing': 7, 'skydiving': 8, 'movies': 9, 'hiking': 10, 'yachting': 11, 'paintball': 12, 'chess': 13, 'kayaking': 14, 'polo': 15, 'basketball': 16, 'video-games': 17, 'cross-fit': 18, 'exercise': 19}

    auto_make : {'Saab': 0, 'Mercedes': 1, 'Dodge': 2, 'Chevrolet': 3, 'Accura': 4, 'Nissan': 5, 'Audi': 6, 'Toyota': 7, 'Ford': 8, 'Suburu': 9, 'BMW': 10, 'Jeep': 11, 'Honda': 12, 'Volkswagen': 13}

    incident_state : {'SC': 0, 'VA': 1, 'NY': 2, 'OH': 3, 'WV': 4, 'NC': 5, 'PA': 6}
  • fraud_reported : {'Y': 0, 'N': 1}

    incident_severity : {'Major Damage': 0, 'Minor Damage': 1, 'Total Loss': 2, 'Trivial Damage': 3}

    insured_hobbies : {'sleeping': 0, 'reading': 1, 'board-games': 2, 'bungie-jumping': 3, 'base-jumping': 4, 'golf': 5, 'camping': 6, 'dancing': 7, 'skydiving': 8, 'movies': 9, 'hiking': 10, 'yachting': 11, 'paintball': 12, 'chess': 13, 'kayaking': 14, 'polo': 15, 'basketball': 16, 'video-games': 17, 'cross-fit': 18, 'exercise': 19}

    auto_make : {'Saab': 0, 'Mercedes': 1, 'Dodge': 2, 'Chevrolet': 3, 'Accura': 4, 'Nissan': 5, 'Audi': 6, 'Toyota': 7, 'Ford': 8, 'Suburu': 9, 'BMW': 10, 'Jeep': 11, 'Honda': 12, 'Volkswagen': 13}

    incident_state : {'SC': 0, 'VA': 1, 'NY': 2, 'OH': 3, 'WV': 4, 'NC': 5, 'PA': 6}

×