SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
Yashwantrao Chavan Institute of Science, Satara.
Department of Statistics
M.Sc. II 2018-2019
Seminar on
“Should This Loan be Approved or Denied?”: A Large
Dataset with Class Assignment Guidelines
Min Li, Amy Mickel, and Stanley Taylor
Presented by
Patil Pooja Rajaram
Roll No. 115
Content:
Introduction
Methodology
Background and Description of Datasets
Procedure
Statistical tools used
Analysis using logistic regression, Artificial
neural network and Support vector machine
Conclusion
References
IntroduCtIon:
In this article, a large and rich dataset from the U.S.
Small Business Administration (SBA) and an accompanying
assignment designed to teach statistics as an investigative
process of decision making are presented. Guidelines for the
assignment titled “Should This Loan Be Approved or Denied?,”
along with a
subset of the larger dataset, are provided.
For this case-study assignment, students assume the role of
loan officer at a bank and are asked to approve or deny a loan by
assessing its risk of default using logistic regression. The dataset
accompanying this article is a real dataset from the U.S. Small
Business Administration (SBA).
MetHodoLoGY :
By analysing real data, students experience statistics as an
investigative process of decision making, for the student is required
to answer the following question: As a representative of the bank,
should I grant a loan to a particular small business (Company X)?
Why or why not? The student makes this decision by assessing a
loan’s risk.
The assessment is accomplished by estimating the loan’s
default probability through analyzing this historical dataset and then
classifying the loan into one of two categories:
(a) higher risk—likely to default on the loan (i.e., be charged
off/failure to pay in full) or
(b) lower risk—likely to pay off the loan in full.
BaCkGround and desCrIptIon of datasets :
The U.S. SBA was founded in 1953 on the principle of
promoting and assisting small enterprises in the U.S. credit market.
SBA acts much like an insurance provider to reduce the risk for a
bank by taking on some of the risk through guaranteeing a portion
of the loan.
Two datasets are provided:
(a) “National SBA” dataset (named SBAnational.csv) from the
U.S. SBA which includes historical data from 1987 through 2014
(899,164 observations) and
(b) “SBA Case” dataset (named SBAcase.csv) which is used in
the assignment described in this paper (2102 observations).
The “SBA Case” dataset is a subset of the “National SBA.”
The variable name, the data type, and a brief description of each
variable are provided for the 27 variables in the two datasets. For the
“SBA Case” dataset, an additional eight variables were generated by
the authors as part of the assignment.
PROCEDURE:
The steps involved in the investigative process of analysing
these data to make an informed decision as to whether a loan
should be approved or denied are :
Step 1: Identifying indicators of potential risk
Step 2: Understanding the case study
Step 3: Building the model, creating decision rules, and validating
the logistic regression model and
Step 4: Using the model to make decisions.
STATISTICAL TOOLS USED FOR ANALYSIS ARE :
Statistical analysis is carried out using R-software and
statistical tools used for analysis are :
1] Logistic regression
2] Artificial neural network(ANN)
3] Support vector machine(SVM)
 Step 1: Identifying Explanatory Variables (Indicators or
Predictors) of Potential Risk
1) Location (State)
2) Industry
3) Gross Disbursement
4) New versus Established Businesses
5) Loans Backed by Real Estate
6) Economic Recession
7) SBA’s Guaranteed Portion of Approved Loan
 Step 2: Understanding the Case Study and Dataset:
Students being a loan officer for Bank of America, have
received two loan applications from two small businesses:
Carmichael Realty (a commercial real estate agency) and SV
Consulting (a real estate consulting firm). As a loan officer,
students need to determine if they should grant or deny these
two loan applications and provide an explanation as to “why or
why not.” To make this decision, they need to assess the loan’s
risk by calculating the estimated probability of default using
Step 4: Using the Model to Make Decisions :
Table 1:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.61170 0.09462 6.465 1.02e-10
Real Estate 2.12822 0.34500 6.169 6.89e-10
Portion 0.55722 0.10058 5.540 3.03e-08
Recession -0.50412 0.24121 -2.090 0.0366
classification
State of nature: Reality
Loans charged
off
Loans paid in
full
Total
Higher risk 31 14 45
Lower risk 324 682 1006
Total 355 696 1051
Model Accuracy = 0.6784015 = 67.84 %
Misclassification rate = 0.3215985 = 32.15 %
The final model with the risk indicators in Table 1 is used to
estimate the probability of default for the two loan applications, the
estimated probability of default for Carmichael Realty (Loan 1) is 0.05
and SV Consulting (Loan 2) is 0.55. Applying the decision rules and
cut-off probability of 0.5, Loan 1 is classified as “lower risk” and
should be approved, and Loan 2 is classified as “higher risk” and
should be denied.
Loan Name Date Loan SBA Real
Estate
Est.
Prob. Of
Default
Approve
1 Carmichael
Realty
Current $1000000 $750000 Yes 0.05 Yes
2 SV
Consulting
current $100000 $40000 No 0.55 No
Artificial neural network :
classification
State of nature: Reality
Loans
charged off
Loans paid in
full
Total
Higher risk 31 12 43
Lower risk 324 684 1008
Total 355 696 1051
Model Accuracy = 0.6803045 = 68.03 %
Misclassification rate = 0.3196955 = 31.96 %
Support vector machine :
classification
State of nature: Reality
Loans
charged off
Loans paid in
full
Total
Higher risk 20 39 59
Lower risk 335 657 992
Total 355 696 1051
Model Accuracy = 0.6441484 = 64.41 %
Misclassification rate = 0.3558516 = 35.58 %
Loan Application
Carmichael Realty
SV Consulting
Result :
Should be
ConClusion:
Model Accuracy Misclassification rate
Logistic regression 67.84 % 32.15 %
ANN 68.03 % 31.96 %
SVM 64.41 % 35.58 %
 The misclassification rate for support vector machine was
found to be higher than those from logistic regression or
neural networks.
 Logistic regression is equivalent to the neural network with no
hidden node.
 If the objective is to separate loans from loans that are likely
to default without needing the predicted probability of default,
then neural networks and SVM are good choices.
RefeRences :
Journal of statistics education (Taylor and Francis
group)
Introduction to linear regression analysis
:Douglas C Montgomerry, Elizabeth A. Peck, G.
Geoffrey Vining
Data mining concepts and techniques :Micheline
Kamber, Jiawei Han, Jian Pei
Thank
you…..

Más contenido relacionado

La actualidad más candente

MORTE DE LORD LICHFIELD / SEGREDOS DE O CÓDIGO DA VINCI
MORTE   DE   LORD   LICHFIELD   /  SEGREDOS   DE  O   CÓDIGO   DA   VINCIMORTE   DE   LORD   LICHFIELD   /  SEGREDOS   DE  O   CÓDIGO   DA   VINCI
MORTE DE LORD LICHFIELD / SEGREDOS DE O CÓDIGO DA VINCIClaudio José Ayrosa Rosière
 
conductive material
conductive materialconductive material
conductive materialBhuwan Singh
 
Crop Circles a mystery
 Crop Circles a mystery Crop Circles a mystery
Crop Circles a mysteryStalin Suhas
 
Reality and Nature . . . The Challenger Disaster Revisited
Reality and Nature . . . The Challenger Disaster RevisitedReality and Nature . . . The Challenger Disaster Revisited
Reality and Nature . . . The Challenger Disaster RevisitedKurt D. Hamman
 
окружающий мир 2 класс вахрушев2
окружающий мир 2 класс вахрушев2окружающий мир 2 класс вахрушев2
окружающий мир 2 класс вахрушев2YchebnikRU
 
Piezo electric energy harvesting
Piezo electric energy harvestingPiezo electric energy harvesting
Piezo electric energy harvestingSANDEEP MITTAPALLY
 
Tornado data and infographics
Tornado data and infographicsTornado data and infographics
Tornado data and infographicsKristin Wegner
 
Project magnetic susceptibility of magnetic materials
Project magnetic susceptibility of magnetic materialsProject magnetic susceptibility of magnetic materials
Project magnetic susceptibility of magnetic materialssamiaalotaibi1412
 

La actualidad más candente (9)

MORTE DE LORD LICHFIELD / SEGREDOS DE O CÓDIGO DA VINCI
MORTE   DE   LORD   LICHFIELD   /  SEGREDOS   DE  O   CÓDIGO   DA   VINCIMORTE   DE   LORD   LICHFIELD   /  SEGREDOS   DE  O   CÓDIGO   DA   VINCI
MORTE DE LORD LICHFIELD / SEGREDOS DE O CÓDIGO DA VINCI
 
conductive material
conductive materialconductive material
conductive material
 
Crop Circles a mystery
 Crop Circles a mystery Crop Circles a mystery
Crop Circles a mystery
 
Reality and Nature . . . The Challenger Disaster Revisited
Reality and Nature . . . The Challenger Disaster RevisitedReality and Nature . . . The Challenger Disaster Revisited
Reality and Nature . . . The Challenger Disaster Revisited
 
окружающий мир 2 класс вахрушев2
окружающий мир 2 класс вахрушев2окружающий мир 2 класс вахрушев2
окружающий мир 2 класс вахрушев2
 
Piezo electric energy harvesting
Piezo electric energy harvestingPiezo electric energy harvesting
Piezo electric energy harvesting
 
energy harvesting though piezo
energy harvesting though piezoenergy harvesting though piezo
energy harvesting though piezo
 
Tornado data and infographics
Tornado data and infographicsTornado data and infographics
Tornado data and infographics
 
Project magnetic susceptibility of magnetic materials
Project magnetic susceptibility of magnetic materialsProject magnetic susceptibility of magnetic materials
Project magnetic susceptibility of magnetic materials
 

Similar a Should this loan be approved or denied

Loan Risk Assessment & Scoring Model
Loan Risk Assessment & Scoring ModelLoan Risk Assessment & Scoring Model
Loan Risk Assessment & Scoring ModelSaurabh Singh
 
Credit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMsCredit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMsIRJET Journal
 
IRJET- Prediction of Credit Risks in Lending Bank Loans
IRJET- Prediction of Credit Risks in Lending Bank LoansIRJET- Prediction of Credit Risks in Lending Bank Loans
IRJET- Prediction of Credit Risks in Lending Bank LoansIRJET Journal
 
fast publication journals
fast publication journalsfast publication journals
fast publication journalsrikaseorika
 
Creditscore
CreditscoreCreditscore
Creditscorekevinlan
 
Using "big data" in the Netherlands for troubled borrowers
Using "big data" in the Netherlands for troubled borrowersUsing "big data" in the Netherlands for troubled borrowers
Using "big data" in the Netherlands for troubled borrowersjtgator
 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestHirak Sen Roy
 
Barclays - Case Study Competition | ISB | National Finalist
Barclays - Case Study Competition | ISB | National FinalistBarclays - Case Study Competition | ISB | National Finalist
Barclays - Case Study Competition | ISB | National FinalistNaveen Kumar
 
Project Report - Acquisition Credit Scoring Model
Project Report - Acquisition Credit Scoring ModelProject Report - Acquisition Credit Scoring Model
Project Report - Acquisition Credit Scoring ModelSubhasis Mishra
 
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING mlaij
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersIRJET Journal
 
Mortgage Insurance Data Organization Havlicek Mrotek
Mortgage Insurance Data Organization Havlicek MrotekMortgage Insurance Data Organization Havlicek Mrotek
Mortgage Insurance Data Organization Havlicek Mrotekkylemrotek
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation ModelMihai Enescu
 
Predicting Credit Card Defaults using Machine Learning Algorithms
Predicting Credit Card Defaults using Machine Learning AlgorithmsPredicting Credit Card Defaults using Machine Learning Algorithms
Predicting Credit Card Defaults using Machine Learning AlgorithmsSagar Tupkar
 
Loan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine LearningLoan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine LearningSouma Maiti
 
credit scoring paper published in eswa
credit scoring paper published in eswacredit scoring paper published in eswa
credit scoring paper published in eswaAkhil Bandhu Hens, FRM
 

Similar a Should this loan be approved or denied (20)

Loan Risk Assessment & Scoring Model
Loan Risk Assessment & Scoring ModelLoan Risk Assessment & Scoring Model
Loan Risk Assessment & Scoring Model
 
Credit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMsCredit risk assessment with imbalanced data sets using SVMs
Credit risk assessment with imbalanced data sets using SVMs
 
IRJET- Prediction of Credit Risks in Lending Bank Loans
IRJET- Prediction of Credit Risks in Lending Bank LoansIRJET- Prediction of Credit Risks in Lending Bank Loans
IRJET- Prediction of Credit Risks in Lending Bank Loans
 
Data mining on Financial Data
Data mining on Financial DataData mining on Financial Data
Data mining on Financial Data
 
fast publication journals
fast publication journalsfast publication journals
fast publication journals
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
scrib.pptx
scrib.pptxscrib.pptx
scrib.pptx
 
Creditscore
CreditscoreCreditscore
Creditscore
 
Using "big data" in the Netherlands for troubled borrowers
Using "big data" in the Netherlands for troubled borrowersUsing "big data" in the Netherlands for troubled borrowers
Using "big data" in the Netherlands for troubled borrowers
 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random Forest
 
Barclays - Case Study Competition | ISB | National Finalist
Barclays - Case Study Competition | ISB | National FinalistBarclays - Case Study Competition | ISB | National Finalist
Barclays - Case Study Competition | ISB | National Finalist
 
Project Report - Acquisition Credit Scoring Model
Project Report - Acquisition Credit Scoring ModelProject Report - Acquisition Credit Scoring Model
Project Report - Acquisition Credit Scoring Model
 
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
DEVELOPING PREDICTION MODEL OF LOAN RISK IN BANKS USING DATA MINING
 
Loan Analysis Predicting Defaulters
Loan Analysis Predicting DefaultersLoan Analysis Predicting Defaulters
Loan Analysis Predicting Defaulters
 
Mortgage Insurance Data Organization Havlicek Mrotek
Mortgage Insurance Data Organization Havlicek MrotekMortgage Insurance Data Organization Havlicek Mrotek
Mortgage Insurance Data Organization Havlicek Mrotek
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation Model
 
Predicting Credit Card Defaults using Machine Learning Algorithms
Predicting Credit Card Defaults using Machine Learning AlgorithmsPredicting Credit Card Defaults using Machine Learning Algorithms
Predicting Credit Card Defaults using Machine Learning Algorithms
 
Loan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine LearningLoan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine Learning
 
credit scoring paper published in eswa
credit scoring paper published in eswacredit scoring paper published in eswa
credit scoring paper published in eswa
 
Group 1 p53
Group 1 p53Group 1 p53
Group 1 p53
 

Último

Gender and caste discrimination in india
Gender and caste discrimination in indiaGender and caste discrimination in india
Gender and caste discrimination in indiavandanasingh01072003
 
OAT_RI_Ep18 WeighingTheRisks_Mar24_GlobalCredit.pptx
OAT_RI_Ep18 WeighingTheRisks_Mar24_GlobalCredit.pptxOAT_RI_Ep18 WeighingTheRisks_Mar24_GlobalCredit.pptx
OAT_RI_Ep18 WeighingTheRisks_Mar24_GlobalCredit.pptxhiddenlevers
 
What is sip and What are its Benefits in 2024
What is sip and What are its Benefits in 2024What is sip and What are its Benefits in 2024
What is sip and What are its Benefits in 2024prajwalgopocket
 
2024-04-09 - Pension Playpen roundtable - slides.pptx
2024-04-09 - Pension Playpen roundtable - slides.pptx2024-04-09 - Pension Playpen roundtable - slides.pptx
2024-04-09 - Pension Playpen roundtable - slides.pptxHenry Tapper
 
Money Forward Integrated Report “Forward Map” 2024
Money Forward Integrated Report “Forward Map” 2024Money Forward Integrated Report “Forward Map” 2024
Money Forward Integrated Report “Forward Map” 2024Money Forward
 
ekthesi-trapeza-tis-ellados-gia-2023.pdf
ekthesi-trapeza-tis-ellados-gia-2023.pdfekthesi-trapeza-tis-ellados-gia-2023.pdf
ekthesi-trapeza-tis-ellados-gia-2023.pdfSteliosTheodorou4
 
Crypto Confidence Unlocked: AnyKYCaccount's Shortcut to Binance Verification
Crypto Confidence Unlocked: AnyKYCaccount's Shortcut to Binance VerificationCrypto Confidence Unlocked: AnyKYCaccount's Shortcut to Binance Verification
Crypto Confidence Unlocked: AnyKYCaccount's Shortcut to Binance VerificationAny kyc Account
 
2B Nation-State.pptx contemporary world nation
2B  Nation-State.pptx contemporary world nation2B  Nation-State.pptx contemporary world nation
2B Nation-State.pptx contemporary world nationko9240888
 
Building pressure? Rising rents, and what to expect in the future
Building pressure? Rising rents, and what to expect in the futureBuilding pressure? Rising rents, and what to expect in the future
Building pressure? Rising rents, and what to expect in the futureResolutionFoundation
 
Banking: Commercial and Central Banking.pptx
Banking: Commercial and Central Banking.pptxBanking: Commercial and Central Banking.pptx
Banking: Commercial and Central Banking.pptxANTHONYAKINYOSOYE1
 
ΤτΕ: Ανάπτυξη 2,3% και πληθωρισμός 2,8% φέτος
ΤτΕ: Ανάπτυξη 2,3% και πληθωρισμός 2,8% φέτοςΤτΕ: Ανάπτυξη 2,3% και πληθωρισμός 2,8% φέτος
ΤτΕ: Ανάπτυξη 2,3% και πληθωρισμός 2,8% φέτοςNewsroom8
 
The Inspirational Story of Julio Herrera Velutini - Global Finance Leader
The Inspirational Story of Julio Herrera Velutini - Global Finance LeaderThe Inspirational Story of Julio Herrera Velutini - Global Finance Leader
The Inspirational Story of Julio Herrera Velutini - Global Finance LeaderArianna Varetto
 
Introduction to Health Economics Dr. R. Kurinji Malar.pptx
Introduction to Health Economics Dr. R. Kurinji Malar.pptxIntroduction to Health Economics Dr. R. Kurinji Malar.pptx
Introduction to Health Economics Dr. R. Kurinji Malar.pptxDrRkurinjiMalarkurin
 
Thoma Bravo Equity - Presentation Pension Fund
Thoma Bravo Equity - Presentation Pension FundThoma Bravo Equity - Presentation Pension Fund
Thoma Bravo Equity - Presentation Pension FundAshwinJey
 
Aon-UK-DC-Pension-Tracker-Q1-2024. slideshare
Aon-UK-DC-Pension-Tracker-Q1-2024. slideshareAon-UK-DC-Pension-Tracker-Q1-2024. slideshare
Aon-UK-DC-Pension-Tracker-Q1-2024. slideshareHenry Tapper
 
Kempen ' UK DB Endgame Paper Apr 24 final3.pdf
Kempen ' UK DB Endgame Paper Apr 24 final3.pdfKempen ' UK DB Endgame Paper Apr 24 final3.pdf
Kempen ' UK DB Endgame Paper Apr 24 final3.pdfHenry Tapper
 
Global Economic Outlook, 2024 - Scholaride Consulting
Global Economic Outlook, 2024 - Scholaride ConsultingGlobal Economic Outlook, 2024 - Scholaride Consulting
Global Economic Outlook, 2024 - Scholaride Consultingswastiknandyofficial
 
Hello this ppt is about seminar final project
Hello this ppt is about seminar final projectHello this ppt is about seminar final project
Hello this ppt is about seminar final projectninnasirsi
 
Liquidity Decisions in Financial management
Liquidity Decisions in Financial managementLiquidity Decisions in Financial management
Liquidity Decisions in Financial managementshrutisingh143670
 
10 QuickBooks Tips 2024 - Globus Finanza.pdf
10 QuickBooks Tips 2024 - Globus Finanza.pdf10 QuickBooks Tips 2024 - Globus Finanza.pdf
10 QuickBooks Tips 2024 - Globus Finanza.pdfglobusfinanza
 

Último (20)

Gender and caste discrimination in india
Gender and caste discrimination in indiaGender and caste discrimination in india
Gender and caste discrimination in india
 
OAT_RI_Ep18 WeighingTheRisks_Mar24_GlobalCredit.pptx
OAT_RI_Ep18 WeighingTheRisks_Mar24_GlobalCredit.pptxOAT_RI_Ep18 WeighingTheRisks_Mar24_GlobalCredit.pptx
OAT_RI_Ep18 WeighingTheRisks_Mar24_GlobalCredit.pptx
 
What is sip and What are its Benefits in 2024
What is sip and What are its Benefits in 2024What is sip and What are its Benefits in 2024
What is sip and What are its Benefits in 2024
 
2024-04-09 - Pension Playpen roundtable - slides.pptx
2024-04-09 - Pension Playpen roundtable - slides.pptx2024-04-09 - Pension Playpen roundtable - slides.pptx
2024-04-09 - Pension Playpen roundtable - slides.pptx
 
Money Forward Integrated Report “Forward Map” 2024
Money Forward Integrated Report “Forward Map” 2024Money Forward Integrated Report “Forward Map” 2024
Money Forward Integrated Report “Forward Map” 2024
 
ekthesi-trapeza-tis-ellados-gia-2023.pdf
ekthesi-trapeza-tis-ellados-gia-2023.pdfekthesi-trapeza-tis-ellados-gia-2023.pdf
ekthesi-trapeza-tis-ellados-gia-2023.pdf
 
Crypto Confidence Unlocked: AnyKYCaccount's Shortcut to Binance Verification
Crypto Confidence Unlocked: AnyKYCaccount's Shortcut to Binance VerificationCrypto Confidence Unlocked: AnyKYCaccount's Shortcut to Binance Verification
Crypto Confidence Unlocked: AnyKYCaccount's Shortcut to Binance Verification
 
2B Nation-State.pptx contemporary world nation
2B  Nation-State.pptx contemporary world nation2B  Nation-State.pptx contemporary world nation
2B Nation-State.pptx contemporary world nation
 
Building pressure? Rising rents, and what to expect in the future
Building pressure? Rising rents, and what to expect in the futureBuilding pressure? Rising rents, and what to expect in the future
Building pressure? Rising rents, and what to expect in the future
 
Banking: Commercial and Central Banking.pptx
Banking: Commercial and Central Banking.pptxBanking: Commercial and Central Banking.pptx
Banking: Commercial and Central Banking.pptx
 
ΤτΕ: Ανάπτυξη 2,3% και πληθωρισμός 2,8% φέτος
ΤτΕ: Ανάπτυξη 2,3% και πληθωρισμός 2,8% φέτοςΤτΕ: Ανάπτυξη 2,3% και πληθωρισμός 2,8% φέτος
ΤτΕ: Ανάπτυξη 2,3% και πληθωρισμός 2,8% φέτος
 
The Inspirational Story of Julio Herrera Velutini - Global Finance Leader
The Inspirational Story of Julio Herrera Velutini - Global Finance LeaderThe Inspirational Story of Julio Herrera Velutini - Global Finance Leader
The Inspirational Story of Julio Herrera Velutini - Global Finance Leader
 
Introduction to Health Economics Dr. R. Kurinji Malar.pptx
Introduction to Health Economics Dr. R. Kurinji Malar.pptxIntroduction to Health Economics Dr. R. Kurinji Malar.pptx
Introduction to Health Economics Dr. R. Kurinji Malar.pptx
 
Thoma Bravo Equity - Presentation Pension Fund
Thoma Bravo Equity - Presentation Pension FundThoma Bravo Equity - Presentation Pension Fund
Thoma Bravo Equity - Presentation Pension Fund
 
Aon-UK-DC-Pension-Tracker-Q1-2024. slideshare
Aon-UK-DC-Pension-Tracker-Q1-2024. slideshareAon-UK-DC-Pension-Tracker-Q1-2024. slideshare
Aon-UK-DC-Pension-Tracker-Q1-2024. slideshare
 
Kempen ' UK DB Endgame Paper Apr 24 final3.pdf
Kempen ' UK DB Endgame Paper Apr 24 final3.pdfKempen ' UK DB Endgame Paper Apr 24 final3.pdf
Kempen ' UK DB Endgame Paper Apr 24 final3.pdf
 
Global Economic Outlook, 2024 - Scholaride Consulting
Global Economic Outlook, 2024 - Scholaride ConsultingGlobal Economic Outlook, 2024 - Scholaride Consulting
Global Economic Outlook, 2024 - Scholaride Consulting
 
Hello this ppt is about seminar final project
Hello this ppt is about seminar final projectHello this ppt is about seminar final project
Hello this ppt is about seminar final project
 
Liquidity Decisions in Financial management
Liquidity Decisions in Financial managementLiquidity Decisions in Financial management
Liquidity Decisions in Financial management
 
10 QuickBooks Tips 2024 - Globus Finanza.pdf
10 QuickBooks Tips 2024 - Globus Finanza.pdf10 QuickBooks Tips 2024 - Globus Finanza.pdf
10 QuickBooks Tips 2024 - Globus Finanza.pdf
 

Should this loan be approved or denied

  • 1. Yashwantrao Chavan Institute of Science, Satara. Department of Statistics M.Sc. II 2018-2019 Seminar on “Should This Loan be Approved or Denied?”: A Large Dataset with Class Assignment Guidelines Min Li, Amy Mickel, and Stanley Taylor Presented by Patil Pooja Rajaram Roll No. 115
  • 2. Content: Introduction Methodology Background and Description of Datasets Procedure Statistical tools used Analysis using logistic regression, Artificial neural network and Support vector machine Conclusion References
  • 3. IntroduCtIon: In this article, a large and rich dataset from the U.S. Small Business Administration (SBA) and an accompanying assignment designed to teach statistics as an investigative process of decision making are presented. Guidelines for the assignment titled “Should This Loan Be Approved or Denied?,” along with a subset of the larger dataset, are provided. For this case-study assignment, students assume the role of loan officer at a bank and are asked to approve or deny a loan by assessing its risk of default using logistic regression. The dataset accompanying this article is a real dataset from the U.S. Small Business Administration (SBA).
  • 4. MetHodoLoGY : By analysing real data, students experience statistics as an investigative process of decision making, for the student is required to answer the following question: As a representative of the bank, should I grant a loan to a particular small business (Company X)? Why or why not? The student makes this decision by assessing a loan’s risk. The assessment is accomplished by estimating the loan’s default probability through analyzing this historical dataset and then classifying the loan into one of two categories: (a) higher risk—likely to default on the loan (i.e., be charged off/failure to pay in full) or (b) lower risk—likely to pay off the loan in full.
  • 5. BaCkGround and desCrIptIon of datasets : The U.S. SBA was founded in 1953 on the principle of promoting and assisting small enterprises in the U.S. credit market. SBA acts much like an insurance provider to reduce the risk for a bank by taking on some of the risk through guaranteeing a portion of the loan. Two datasets are provided: (a) “National SBA” dataset (named SBAnational.csv) from the U.S. SBA which includes historical data from 1987 through 2014 (899,164 observations) and (b) “SBA Case” dataset (named SBAcase.csv) which is used in the assignment described in this paper (2102 observations). The “SBA Case” dataset is a subset of the “National SBA.” The variable name, the data type, and a brief description of each variable are provided for the 27 variables in the two datasets. For the “SBA Case” dataset, an additional eight variables were generated by the authors as part of the assignment.
  • 6. PROCEDURE: The steps involved in the investigative process of analysing these data to make an informed decision as to whether a loan should be approved or denied are : Step 1: Identifying indicators of potential risk Step 2: Understanding the case study Step 3: Building the model, creating decision rules, and validating the logistic regression model and Step 4: Using the model to make decisions.
  • 7. STATISTICAL TOOLS USED FOR ANALYSIS ARE : Statistical analysis is carried out using R-software and statistical tools used for analysis are : 1] Logistic regression 2] Artificial neural network(ANN) 3] Support vector machine(SVM)
  • 8.  Step 1: Identifying Explanatory Variables (Indicators or Predictors) of Potential Risk 1) Location (State) 2) Industry 3) Gross Disbursement 4) New versus Established Businesses 5) Loans Backed by Real Estate 6) Economic Recession 7) SBA’s Guaranteed Portion of Approved Loan  Step 2: Understanding the Case Study and Dataset: Students being a loan officer for Bank of America, have received two loan applications from two small businesses: Carmichael Realty (a commercial real estate agency) and SV Consulting (a real estate consulting firm). As a loan officer, students need to determine if they should grant or deny these two loan applications and provide an explanation as to “why or why not.” To make this decision, they need to assess the loan’s risk by calculating the estimated probability of default using
  • 9.
  • 10. Step 4: Using the Model to Make Decisions : Table 1: Estimate Std. Error z value Pr(>|z|) (Intercept) 0.61170 0.09462 6.465 1.02e-10 Real Estate 2.12822 0.34500 6.169 6.89e-10 Portion 0.55722 0.10058 5.540 3.03e-08 Recession -0.50412 0.24121 -2.090 0.0366 classification State of nature: Reality Loans charged off Loans paid in full Total Higher risk 31 14 45 Lower risk 324 682 1006 Total 355 696 1051 Model Accuracy = 0.6784015 = 67.84 % Misclassification rate = 0.3215985 = 32.15 %
  • 11. The final model with the risk indicators in Table 1 is used to estimate the probability of default for the two loan applications, the estimated probability of default for Carmichael Realty (Loan 1) is 0.05 and SV Consulting (Loan 2) is 0.55. Applying the decision rules and cut-off probability of 0.5, Loan 1 is classified as “lower risk” and should be approved, and Loan 2 is classified as “higher risk” and should be denied. Loan Name Date Loan SBA Real Estate Est. Prob. Of Default Approve 1 Carmichael Realty Current $1000000 $750000 Yes 0.05 Yes 2 SV Consulting current $100000 $40000 No 0.55 No
  • 12. Artificial neural network : classification State of nature: Reality Loans charged off Loans paid in full Total Higher risk 31 12 43 Lower risk 324 684 1008 Total 355 696 1051 Model Accuracy = 0.6803045 = 68.03 % Misclassification rate = 0.3196955 = 31.96 %
  • 13. Support vector machine : classification State of nature: Reality Loans charged off Loans paid in full Total Higher risk 20 39 59 Lower risk 335 657 992 Total 355 696 1051 Model Accuracy = 0.6441484 = 64.41 % Misclassification rate = 0.3558516 = 35.58 %
  • 14. Loan Application Carmichael Realty SV Consulting Result : Should be
  • 15. ConClusion: Model Accuracy Misclassification rate Logistic regression 67.84 % 32.15 % ANN 68.03 % 31.96 % SVM 64.41 % 35.58 %  The misclassification rate for support vector machine was found to be higher than those from logistic regression or neural networks.  Logistic regression is equivalent to the neural network with no hidden node.  If the objective is to separate loans from loans that are likely to default without needing the predicted probability of default, then neural networks and SVM are good choices.
  • 16. RefeRences : Journal of statistics education (Taylor and Francis group) Introduction to linear regression analysis :Douglas C Montgomerry, Elizabeth A. Peck, G. Geoffrey Vining Data mining concepts and techniques :Micheline Kamber, Jiawei Han, Jian Pei