This presentation is an adaptation of the methodology used in the author\'s paper, Survival prediction using gene expression data: a review and comparison, to the credit default modelling.
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
Estimating Time To Default
1. Estimating Time to Default
using Survival Analysis tools
Problems arising due to insufficient
samples and censoring
Estimating Time to Default using Survival Analysis 1
2. Contents
• Purpose of the presentation
• State the problem
• Formalize the problem and the model
• Data used and method applied
• Model assessment
• Results
• Conclusions
Estimating Time to Default using Survival Analysis 2
3. Purpose of the presentation
• Adapt the author’s paper “Survival prediction
using gene expression data: a review and
comparison” to a company default setting
• Establish the framework in which company
defaults can be viewed as a Survival Analysis
problem
• Define the data generated and the model
applied
• Present the findings and the conclusions
Estimating Time to Default using Survival Analysis 3
4. Formalizing the task 1
The problem
• Assume that there is a way to measure many
features of few companies (questionnaire,
qualitative research)
• This results in large quantities of data, but only
a few independent samples
• It would be useful to know which features are
relevant for use of statistical methods
• Standard statistics cannot be used
• Observations might be censored i.e. the
company doesn’t default while observed
Estimating Time to Default using Survival Analysis 4
5. Formalizing the task 2
Requirements of a solution
• Apply a method which can incorporate the
censoring of the data
• It also has to be able to reduce the number of
predictors efficiently
• It should be qualitatively well posed, i.e.
characterizing the relevant features
• It should be time and computation power
efficient
Estimating Time to Default using Survival Analysis 5
6. Formalizing the task 3
Definitions I
• A company is in default if it fails to meet its
obligations
• Merge/acquisition does not qualify
• An event occurs if the company defaults or if it
gets out of scope for any other reason
(including end of observation)
• Time observed is understood as the time
between the beginning of the observation and
the occurrence of an event
Estimating Time to Default using Survival Analysis 6
7. Formalizing the task 4
Definitions II
• An event is censored if it is not a default
• An indicator shows whether an event is
censored or not (True/False)
• For every sample, there is an observation of
(censored) time to default
• On every sample (i.e. company) the same
features are measured (predictors)
• A model is defining a connection between the
predictors and the observations
Estimating Time to Default using Survival Analysis 7
8. Data used
• Since no real-life data is available, this
presentation is based on a simulation
• The simulation assumes 500 features
measured on 50 companies
• Only 50 features are relevant predictors, i.e. the
simulated time to default is dependent only on
50 features
• 1/3 of the observed time are censored
• The simulation has been run 1000 times
Estimating Time to Default using Survival Analysis 8
9. Method applied
• The methodology is called Supervised Principal
Component Analysis method
• It was developed by Blair et al. for similar
setups
• It has the advantage of first finding the relevant
predictors (thus the name supervised) and then
building quick-to-use predictors from them
(principal component analysis)
Estimating Time to Default using Survival Analysis 9
10. Model assessment
• Assessing the model is based on a measure of
success in estimation
• The most straightforward measures are
applied:
• How large part of the relevant and irrelevant predictors
had been characterized as relevant
• p-value: measuring the probability of accidental
success
• The results are shown in the following slides
Estimating Time to Default using Survival Analysis 10
11. Selection of each feature
(% over the 1000 runs)
Relevant
features
90%
Irrelevant features
with extra noise
30% Irrelevant features
with no extra noise
15%
Estimating Time to Default using Survival Analysis 11
12. Histogram of relevant genes selected
Estimating Time to Default using Survival Analysis 12
13. Histogram of irrelevant genes selected
Estimating Time to Default using Survival Analysis 13
14. P-values of the principal component constructed
Estimating Time to Default using Survival Analysis 14
15. Conclusions
• The method found the relevant features in most
of the cases
• With a good threshold (appearance over 90%),
the relevant features can be found
• The p-values show high prediction power
• The estimates can be used for evaluation
• Really noisy features mislead the method
• Human interaction in the evaluation cannot be
omitted
Estimating Time to Default using Survival Analysis 15