SlideShare una empresa de Scribd logo
1 de 18
CREDIT CARD
FRAUD Detection
2023
By
Shivam Tiwari
What is Credit Card fraud ??
E
X
A
M
P
L
E
S
Insider Fraud
Phishing
Skimming
Identity Theft
Credit card fraud is
unauthorized use of
someone's card for
purchases, causing
financial loss and
inconvenience.
“To Predict whether the
transactions are
fraudulent or not”
Data Acquisition &
Description
Data preprocessing
Exploratory Data
Analysis
Data Preparation
Model Selection and
Model Training
Conclusion
Work Flow
🤷♂️ No missing values.
Credit Card Fraud
Detection Dataset
2023
Dataset
Column
s
31
Rows
568630
Features
(Columns)
id : Unique identifier for each transaction
V1-V28: Anonymized features representing various transaction
attributes
(e.g., time, location, etc.)
Amount : The transaction amount
Class: Binary label indicating whether the transaction is
fraudulent(1) or not (0)
🤷♂️ No duplicates.
👍 Data type also looks fine
Data preprocessing
A well-structured dataset with 568,630 rows
and 31 columns, featuring no null values and
balanced distribution, provides a reliable
foundation for in-depth analyses, yielding
valuable insights across various domains
Enhanced security, reduced
financial losses, and improved
customer trust through
identification of fraudulent
credit card transactions”
EDA
Exploratory Data
Analysis
Exploratory Data Analysis (EDA) is a
statistical approach to analyze and visualize
data sets, helping to discover patterns,
relationships, and insights for better
understanding and decision-making
df.info()
df.info() provides
concise information
about a Data Frame,
including data types,
non-null count
and memory usage.
df.describe
()
Summarizes Data Frame statistics
like mean, standard deviation, and
quartiles, offering insights into
numerical data distribution and
central tendencies.
df.shape
Displays the
number of
rows and
columns in
Data Frame.
Df.dtype()
Shows data types of each column in Data
Frame.
1:- V17 and V18 are highly co-related. 2:- V16 and V17 are highly co-related. 3:- V9 and
V10 are also positively co-related. 4:- V14 has a negative correlation with V4.
A heatmap visually represents data intensity using color variations, with
warmer colors indicating higher values and cooler colors indicating lower
values.
#Lets look data at
heatmap
paper =
plt.figure(figsize=[2
0,12])
sns.heatmap(df.cor
r(),cmap='BuPu',an
not=True)
plt.title('Correlation
Heatmap',color='re
d')
plt.show()
df.skew()
id -6.579536e-16 V1
-8.341717e-02 V2 -
1.397952e+00 V3
1.462221e-02 V4 -
4.416893e-02 V5
1.506414e+00 V6 -
2.016110e-01 V7
1.902687e+01 V8
2.999722e-01 V9
1.710575e-01 V10
7.404136e-01 V11 -
2.089056e-02 V12
6.675895e-02 V13
1.490639e-02 V14
2.078348e-01 V15
1.123298e-02 V16
2.664070e-01 V17
3.730610e-01 V18
1.291911e-01 V19 -
1.017123e-02 V20 -
1.556460e+00 V21 -
1.089833e-01 V22
3.185295e-01 V23 -
9.968746e-02 V24
6.608974e-02 V25
2.300804e-02 V26 -
1.895874e-02 V27
2.755452e+00 V28
1.724978e+00 Amount
1.655585e-03 Class
0.000000e+00 dtype:
float64
Observations
(●'◡’●):--
Features like
V1,V23 are
highly
negatively
skewed.
plt.figure(figsize=(6, 4)) # Adjust the figure
size as needed
sns.countplot(x='Class', data=df)
plt.xlabel('Class')
plt.ylabel('Count')
plt.title('Distribution of Class')
plt.show()
df['Amount'].plot.box()
A box plot, or box-and-whisker plot, displays the distribution of a dataset,
showing the median, quartiles, and outliers. It provides a visual summary of
central tendency and spread.
# Assuming 'df' is DataFrame and
'Amount' is a column in it
sns.kdeplot(data=df['Amount'], fill=True)
plt.show()
A KDE (Kernel Density Estimate) plot depicts the probability density function of a
continuous variable, smoothing data distribution visually.
Observations
: ♊ Amount is
fairly
Normally
distributed.
# Lets plot a histogram
paper, axes = plt.subplots(2, 2, figsize=(10, 6))
df['V1'].plot(kind='hist', ax=axes[0,0], title='Distribution of V1')
df['V10'].plot(kind='hist', ax=axes[0,1], title='Distribution of V10')
df['V12'].plot(kind='hist', ax=axes[1,0], title='Distribution of V12')
df['V23'].plot(kind='hist', ax=axes[1,1], title='Distribution of V23')
plt.suptitle('Distribution of V1,V10,V12 and V23',size=14)
plt.tight_layout()
plt.show()
Data Preparation
Dividing Dataset into
“X” and “Y”
Shape of X
(568630, 29)
Shape of Y
(568630) Let's standardize all our
features to bring them on
the same scale. #I have
used standard scaler
Feature Scaling
sc = StandardScaler()
x_scaled = sc.fit_transform(x)
x_scaled_df =
pd.DataFrame(x_scaled,columns=x.c
olumns)
Model Selection and Model Training
Dividing dataset into
Training Data and
Testing Data
# Lets Split our dataset into train and test
x_train,x_test,y_train,y_test =
train_test_split(x_scaled_df,y,test_size=0.25,random_state=15,stratify= y)
Decision Tree Model
Model Classification
report
Accuracy Score:-
99.80022228787687
Random Forest Classifier
Model
Classification
report
Accuracy Score:-
96.44480085538626
Logistic regression
Model Classification
report
Accuracy Score:-
99.98454426173694
ALGORITHM ACCURA
CY
CONFUSION MATRIX
CLASSIFICATTION
REPORT
Decision Tree
Model
99.800222
%
Random Forest
Classifier
99.984544
2%
Logistic
Regression
96.444800
8%
Conclusion
•We have done Exploratory Data analysis for different
features.
•We prepared our Data and built different ML Models.
•We have seen two different models, and how they are
performing w.r.t Accuracy, Precision.
• The Decision Tree Method has a higher accuracy score
on the test dataset.
•We have created the confusion matrix to see the details of
the prediction accuracy of each model.
Thank YOU

Más contenido relacionado

Más de Boston Institute of Analytics

Más de Boston Institute of Analytics (20)

Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
Predicting Power Consumption for a Greener Tomorrow: Machine Learning Project...
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
Predictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting TechniquesPredictive Precipitation: Advanced Rain Forecasting Techniques
Predictive Precipitation: Advanced Rain Forecasting Techniques
 
Unveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data ScienceUnveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data Science
 
Beyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
Beyond Thumbs Up/Down: Using AI to Analyze Movie ReviewsBeyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
Beyond Thumbs Up/Down: Using AI to Analyze Movie Reviews
 
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive FutureFuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
Fuel Efficiency Forecast: Predictive Analytics for a Greener Automotive Future
 
Unveiling the Patterns: A Cluster Analysis of NYC Shootings
Unveiling the Patterns: A Cluster Analysis of NYC ShootingsUnveiling the Patterns: A Cluster Analysis of NYC Shootings
Unveiling the Patterns: A Cluster Analysis of NYC Shootings
 
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.orgEnhancing Cybersecurity: An In-depth Analysis of Travelblog.org
Enhancing Cybersecurity: An In-depth Analysis of Travelblog.org
 
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRFExploring Web Security Threats: A Practical Study on SQL Injection and CSRF
Exploring Web Security Threats: A Practical Study on SQL Injection and CSRF
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Detecting Credit Card Fraud: An AI-driven Approach
Detecting Credit Card Fraud: An AI-driven ApproachDetecting Credit Card Fraud: An AI-driven Approach
Detecting Credit Card Fraud: An AI-driven Approach
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
Decoding Loan Approval with Predictive Modeling in Action Discovering Weaknes...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
NLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile PricesNLP Based project presentation: Analyzing Automobile Prices
NLP Based project presentation: Analyzing Automobile Prices
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 

Último

Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
EADTU
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
CaitlinCummins3
 
PS-Policies-on-Enrolment-Transfer-of-Docs-Checking-of-School-Forms-and-SF10-a...
PS-Policies-on-Enrolment-Transfer-of-Docs-Checking-of-School-Forms-and-SF10-a...PS-Policies-on-Enrolment-Transfer-of-Docs-Checking-of-School-Forms-and-SF10-a...
PS-Policies-on-Enrolment-Transfer-of-Docs-Checking-of-School-Forms-and-SF10-a...
nhezmainit1
 

Último (20)

Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17Model Attribute _rec_name in the Odoo 17
Model Attribute _rec_name in the Odoo 17
 
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community PartnershipsSpring gala 2024 photo slideshow - Celebrating School-Community Partnerships
Spring gala 2024 photo slideshow - Celebrating School-Community Partnerships
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"Mattingly "AI and Prompt Design: LLMs with NER"
Mattingly "AI and Prompt Design: LLMs with NER"
 
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptxAnalyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
Analyzing and resolving a communication crisis in Dhaka textiles LTD.pptx
 
8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management8 Tips for Effective Working Capital Management
8 Tips for Effective Working Capital Management
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....
 
Trauma-Informed Leadership - Five Practical Principles
Trauma-Informed Leadership - Five Practical PrinciplesTrauma-Informed Leadership - Five Practical Principles
Trauma-Informed Leadership - Five Practical Principles
 
Rich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdfRich Dad Poor Dad ( PDFDrive.com )--.pdf
Rich Dad Poor Dad ( PDFDrive.com )--.pdf
 
e-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopale-Sealing at EADTU by Kamakshi Rajagopal
e-Sealing at EADTU by Kamakshi Rajagopal
 
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
Transparency, Recognition and the role of eSealing - Ildiko Mazar and Koen No...
 
Pharmaceutical Biotechnology VI semester.pdf
Pharmaceutical Biotechnology VI semester.pdfPharmaceutical Biotechnology VI semester.pdf
Pharmaceutical Biotechnology VI semester.pdf
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
 
PS-Policies-on-Enrolment-Transfer-of-Docs-Checking-of-School-Forms-and-SF10-a...
PS-Policies-on-Enrolment-Transfer-of-Docs-Checking-of-School-Forms-and-SF10-a...PS-Policies-on-Enrolment-Transfer-of-Docs-Checking-of-School-Forms-and-SF10-a...
PS-Policies-on-Enrolment-Transfer-of-Docs-Checking-of-School-Forms-and-SF10-a...
 
Observing-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptxObserving-Correct-Grammar-in-Making-Definitions.pptx
Observing-Correct-Grammar-in-Making-Definitions.pptx
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
 
Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"Mattingly "AI & Prompt Design: Named Entity Recognition"
Mattingly "AI & Prompt Design: Named Entity Recognition"
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
 
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUMDEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
DEMONSTRATION LESSON IN ENGLISH 4 MATATAG CURRICULUM
 
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)ESSENTIAL of (CS/IT/IS) class 07 (Networks)
ESSENTIAL of (CS/IT/IS) class 07 (Networks)
 

Combating Fraudulent Transactions: A Deep Dive into Credit Card Fraud Detection

  • 1.
  • 3. What is Credit Card fraud ?? E X A M P L E S Insider Fraud Phishing Skimming Identity Theft Credit card fraud is unauthorized use of someone's card for purchases, causing financial loss and inconvenience. “To Predict whether the transactions are fraudulent or not”
  • 4. Data Acquisition & Description Data preprocessing Exploratory Data Analysis Data Preparation Model Selection and Model Training Conclusion Work Flow
  • 5. 🤷♂️ No missing values. Credit Card Fraud Detection Dataset 2023 Dataset Column s 31 Rows 568630 Features (Columns) id : Unique identifier for each transaction V1-V28: Anonymized features representing various transaction attributes (e.g., time, location, etc.) Amount : The transaction amount Class: Binary label indicating whether the transaction is fraudulent(1) or not (0) 🤷♂️ No duplicates. 👍 Data type also looks fine Data preprocessing
  • 6. A well-structured dataset with 568,630 rows and 31 columns, featuring no null values and balanced distribution, provides a reliable foundation for in-depth analyses, yielding valuable insights across various domains Enhanced security, reduced financial losses, and improved customer trust through identification of fraudulent credit card transactions”
  • 7. EDA Exploratory Data Analysis Exploratory Data Analysis (EDA) is a statistical approach to analyze and visualize data sets, helping to discover patterns, relationships, and insights for better understanding and decision-making df.info() df.info() provides concise information about a Data Frame, including data types, non-null count and memory usage. df.describe () Summarizes Data Frame statistics like mean, standard deviation, and quartiles, offering insights into numerical data distribution and central tendencies. df.shape Displays the number of rows and columns in Data Frame. Df.dtype() Shows data types of each column in Data Frame.
  • 8. 1:- V17 and V18 are highly co-related. 2:- V16 and V17 are highly co-related. 3:- V9 and V10 are also positively co-related. 4:- V14 has a negative correlation with V4. A heatmap visually represents data intensity using color variations, with warmer colors indicating higher values and cooler colors indicating lower values. #Lets look data at heatmap paper = plt.figure(figsize=[2 0,12]) sns.heatmap(df.cor r(),cmap='BuPu',an not=True) plt.title('Correlation Heatmap',color='re d') plt.show()
  • 9. df.skew() id -6.579536e-16 V1 -8.341717e-02 V2 - 1.397952e+00 V3 1.462221e-02 V4 - 4.416893e-02 V5 1.506414e+00 V6 - 2.016110e-01 V7 1.902687e+01 V8 2.999722e-01 V9 1.710575e-01 V10 7.404136e-01 V11 - 2.089056e-02 V12 6.675895e-02 V13 1.490639e-02 V14 2.078348e-01 V15 1.123298e-02 V16 2.664070e-01 V17 3.730610e-01 V18 1.291911e-01 V19 - 1.017123e-02 V20 - 1.556460e+00 V21 - 1.089833e-01 V22 3.185295e-01 V23 - 9.968746e-02 V24 6.608974e-02 V25 2.300804e-02 V26 - 1.895874e-02 V27 2.755452e+00 V28 1.724978e+00 Amount 1.655585e-03 Class 0.000000e+00 dtype: float64 Observations (●'◡’●):-- Features like V1,V23 are highly negatively skewed. plt.figure(figsize=(6, 4)) # Adjust the figure size as needed sns.countplot(x='Class', data=df) plt.xlabel('Class') plt.ylabel('Count') plt.title('Distribution of Class') plt.show()
  • 10. df['Amount'].plot.box() A box plot, or box-and-whisker plot, displays the distribution of a dataset, showing the median, quartiles, and outliers. It provides a visual summary of central tendency and spread.
  • 11. # Assuming 'df' is DataFrame and 'Amount' is a column in it sns.kdeplot(data=df['Amount'], fill=True) plt.show() A KDE (Kernel Density Estimate) plot depicts the probability density function of a continuous variable, smoothing data distribution visually. Observations : ♊ Amount is fairly Normally distributed.
  • 12. # Lets plot a histogram paper, axes = plt.subplots(2, 2, figsize=(10, 6)) df['V1'].plot(kind='hist', ax=axes[0,0], title='Distribution of V1') df['V10'].plot(kind='hist', ax=axes[0,1], title='Distribution of V10') df['V12'].plot(kind='hist', ax=axes[1,0], title='Distribution of V12') df['V23'].plot(kind='hist', ax=axes[1,1], title='Distribution of V23') plt.suptitle('Distribution of V1,V10,V12 and V23',size=14) plt.tight_layout() plt.show()
  • 13. Data Preparation Dividing Dataset into “X” and “Y” Shape of X (568630, 29) Shape of Y (568630) Let's standardize all our features to bring them on the same scale. #I have used standard scaler Feature Scaling sc = StandardScaler() x_scaled = sc.fit_transform(x) x_scaled_df = pd.DataFrame(x_scaled,columns=x.c olumns)
  • 14. Model Selection and Model Training Dividing dataset into Training Data and Testing Data # Lets Split our dataset into train and test x_train,x_test,y_train,y_test = train_test_split(x_scaled_df,y,test_size=0.25,random_state=15,stratify= y) Decision Tree Model Model Classification report Accuracy Score:- 99.80022228787687
  • 15. Random Forest Classifier Model Classification report Accuracy Score:- 96.44480085538626 Logistic regression Model Classification report Accuracy Score:- 99.98454426173694
  • 16. ALGORITHM ACCURA CY CONFUSION MATRIX CLASSIFICATTION REPORT Decision Tree Model 99.800222 % Random Forest Classifier 99.984544 2% Logistic Regression 96.444800 8%
  • 17. Conclusion •We have done Exploratory Data analysis for different features. •We prepared our Data and built different ML Models. •We have seen two different models, and how they are performing w.r.t Accuracy, Precision. • The Decision Tree Method has a higher accuracy score on the test dataset. •We have created the confusion matrix to see the details of the prediction accuracy of each model.