SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
Dynamic Bayesian modeling for risk
prediction in credit operations
Hanen Borchani1, Ana M. Martínez1, Andrés R. Masegosa2,
Helge Langseth2, Thomas D. Nielsen1, Antonio Salmerón3,
Antonio Fernández4, Anders L. Madsen1,5, Ramón Sáez4
1Department of Computer Science, Aalborg University, Denmark
2 Department of Computer and Information Science,
The Norwegian University of Science and Technology, Norway
3Department of Mathematics, University of Almería, Spain
4 Banco de Crédito Cooperativo, Spain
5 Hugin Expert A/S, Aalborg, Denmark
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 1
Outline
1 Introduction
2 The financial data set
3 Risk prediction using dynamic Bayesian networks
4 Experimental results
5 Conclusion
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 2
Outline
1 Introduction
2 The financial data set
3 Risk prediction using dynamic Bayesian networks
4 Experimental results
5 Conclusion
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 3
Introduction
Efficient solutions for risk prediction in banks can be crucial for reducing
losses due to inefficient business procedures.
Such solutions can be used as tools for monitoring the evolution of
customers in terms of credit operations risk to increase solvency of the
banking institutions.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 4
Introduction
Efficient solutions for risk prediction in banks can be crucial for reducing
losses due to inefficient business procedures.
Such solutions can be used as tools for monitoring the evolution of
customers in terms of credit operations risk to increase solvency of the
banking institutions.
From a machine learning perspective, credit scoring has traditionally
been approached as a supervised classification problem.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 5
Introduction
Efficient solutions for risk prediction in banks can be crucial for reducing
losses due to inefficient business procedures.
Such solutions can be used as tools for monitoring the evolution of
customers in terms of credit operations risk to increase solvency of the
banking institutions.
From a machine learning perspective, credit scoring has traditionally
been approached as a supervised classification problem.
However, recently, this problem presents additional challenging
characteristics that separate it from the standard supervised
classification problems.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 6
Challenges
Classification in a streaming context: a stream of multiple sequences received
over time, each sequence representing a particular client. That is, at every time
step t, we receive the data Dt containing information about all the clients.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 7
Challenges
Classification in a streaming context: a stream of multiple sequences received
over time, each sequence representing a particular client. That is, at every time
step t, we receive the data Dt containing information about all the clients.
A delayed class-feedback: the class label for each sample/client corresponds to
the client’s defaulting behavior in the following twelve months and this
information is therefore only available after a twelve month delay. Thus, the
available data is a mixture of labeled and unlabeled samples.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 8
Challenges
Classification in a streaming context: a stream of multiple sequences received
over time, each sequence representing a particular client. That is, at every time
step t, we receive the data Dt containing information about all the clients.
A delayed class-feedback: the class label for each sample/client corresponds to
the client’s defaulting behavior in the following twelve months and this
information is therefore only available after a twelve month delay. Thus, the
available data is a mixture of labeled and unlabeled samples.
Concept drift: the domain exhibits a form of concept drift where the data
distribution as well as the set of feature variables relevant for classification
may vary over time.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 9
Challenges
Classification in a streaming context: a stream of multiple sequences received
over time, each sequence representing a particular client. That is, at every time
step t, we receive the data Dt containing information about all the clients.
A delayed class-feedback: the class label for each sample/client corresponds to
the client’s defaulting behavior in the following twelve months and this
information is therefore only available after a twelve month delay. Thus, the
available data is a mixture of labeled and unlabeled samples.
Concept drift: the domain exhibits a form of concept drift where the data
distribution as well as the set of feature variables relevant for classification
may vary over time.
Objective: Explore the credit scoring problem based on a real-world data set.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 10
Outline
1 Introduction
2 The financial data set
3 Risk prediction using dynamic Bayesian networks
4 Experimental results
5 Conclusion
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 11
The financial data set
Provided by a Spanish bank in the Almería region: Banco de Crédito
Cooperativo (BCC).
It contains monthly aggregated information for a set of BCC clients for the
period from April 2007 to March 2014.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 12
The financial data set
Provided by a Spanish bank in the Almería region: Banco de Crédito
Cooperativo (BCC).
It contains monthly aggregated information for a set of BCC clients for the
period from April 2007 to March 2014.
Only “active” clients are considered, meaning that we restrict our attention to
individuals between 18 and 65 years of age, who have at least one automatic
bill payment or direct debit in the bank.
BCC employees are excluded since they have special conditions.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 13
The financial data set
Provided by a Spanish bank in the Almería region: Banco de Crédito
Cooperativo (BCC).
It contains monthly aggregated information for a set of BCC clients for the
period from April 2007 to March 2014.
Only “active” clients are considered, meaning that we restrict our attention to
individuals between 18 and 65 years of age, who have at least one automatic
bill payment or direct debit in the bank.
BCC employees are excluded since they have special conditions.
The resulting data set includes 50 000 clients each month.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 14
The financial data set
44 feature variables, denoted Xt, where 11 variables describing the financial
status of a client (VARXX) and 33 socio-demographic variables (SOCXX).
Variable ID Description Variable ID Description
VAR01 Total credit amount VAR07 Unpaid amount in mortgages
VAR02 Income VAR08 Unpaid amount in personal loans
VAR03 Expenses VAR09 Unpaid amount in credit cards
VAR04 Account balance VAR10 Unpaid amount in bank account deficit
VAR05 Risk balance in mortgages VAR11 Unpaid amount in other products
VAR06 Risk balance in consumer loans SOC01-33 Set of 33 socio-demographic variables
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 15
The financial data set
44 feature variables, denoted Xt, where 11 variables describing the financial
status of a client (VARXX) and 33 socio-demographic variables (SOCXX).
Variable ID Description Variable ID Description
VAR01 Total credit amount VAR07 Unpaid amount in mortgages
VAR02 Income VAR08 Unpaid amount in personal loans
VAR03 Expenses VAR09 Unpaid amount in credit cards
VAR04 Account balance VAR10 Unpaid amount in bank account deficit
VAR05 Risk balance in mortgages VAR11 Unpaid amount in other products
VAR06 Risk balance in consumer loans SOC01-33 Set of 33 socio-demographic variables
Each client u has an associated class variable C
(u)
t for each time step t that
indicates if that particular client will default during the following 12 months.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 16
Outline
1 Introduction
2 The financial data set
3 Risk prediction using dynamic Bayesian networks
4 Experimental results
5 Conclusion
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 17
Dynamic Bayesian classifiers
A dynamic probabilistic model for doing prediction in the BCC domain
At time T (the current time), we predict the defaulting status (CT ) of a
particular client based on previous socio-economical observations and the
client’s known defaulting status λ = 12 months earlier.
in fact apply to most credit scoring problems as well as many other domains. We
will discuss this issue further in Section 5, which also serves to demonstrate the
broader relevance of the above mentioned problems.
In this paper we present a first approach to address the BCC credit scoring
problem3
based on the use of a simple dynamic probabilistic graphical model [5].
A rough visual description of this model is given in Figure 1. Our preliminary
approach is implemented based on the AMIDST Toolbox4
. This toolbox provides
an e cient implementation of approximate inference and learning methods for
streaming data using the Bayesian networks modeling framework [5] as well as
variational Bayes inference and learning procedures [6].
CT 12 CT 11
XT 11
CT 10
XT 10
CT 1
XT 1
CT
XT
Figure 1. A dynamic probabilistic model for doing prediction in the BCC domain. At time T
(assumed to be the current time) we wish to predict the defaulting status (CT ) of a particular
customer based on previous socio-economical observations as well as the customer’s known
defaulting status = 12 months earlier. Note that due to the independence assumptions in the
Figure 1: Square/Round boxes indicate data which is available/non-available
when predicting the defaulting status of the clients at month T.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 18
Dynamic Bayesian classifiers
A 2-time-slices Dynamic Naïve Bayes classifier
X1,t−1
Ct−1 Ct
."."."X2,t−1 Xn,t−1 X1,t
."."."X2,t Xn,t
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 19
Dynamic Bayesian classifiers
A 2-time-slices Dynamic Naïve Bayes classifier
X1,t−1
Ct−1 Ct
."."."X2,t−1 Xn,t−1 X1,t
."."."X2,t Xn,t
It assumes that only the class variables are connected across time and that all
the predictive variables at time step t are conditionally independent given the
class variable at time t.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 20
Dynamic Bayesian classifiers
A 2-time-slices Dynamic Naïve Bayes classifier
X1,t−1
Ct−1 Ct
."."."X2,t−1 Xn,t−1 X1,t
."."."X2,t Xn,t
It assumes that only the class variables are connected across time and that all
the predictive variables at time step t are conditionally independent given the
class variable at time t.
The joint probability factorizes as
p(c1:T , x1:T ) =
T
t=1
p(ct|ct−1)
n
i=1
p (xi,t|ct) ·
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 21
Dynamic Bayesian classifiers
Learning the model
Bayesian approach for multinomial and normally distributed data.
p (xi,t|ct) are learned from the labeled data DT−λ.
p(ct|ct−1) are learned using the class transitions from DT−λ−1 to DT−λ.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 22
Dynamic Bayesian classifiers
Learning the model
Bayesian approach for multinomial and normally distributed data.
p (xi,t|ct) are learned from the labeled data DT−λ.
p(ct|ct−1) are learned using the class transitions from DT−λ−1 to DT−λ.
Prediction
It amounts to calculating the conditional probability for the class label for each
client u at time T given all the information collected so far, D1:T .
p c
(u)
t |x
(u)
t−λ+1:t , c
(u)
t−λ ∝ p x
(u)
t |c
(u)
t
c
(u)
t−1
p c
(u)
t |c
(u)
t−1 p c
(u)
t−1|x
(u)
t−λ+1:t−1, c
(u)
t−λ ·
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 23
Dynamic Bayesian classifiers
Feature subset selection
The relevance of the variables may vary over time.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 24
Dynamic Bayesian classifiers
Feature subset selection
The relevance of the variables may vary over time.
We consider a wrapper feature selection method with the Naïve Bayes model
as the base classifier combined with greedy search.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 25
Dynamic Bayesian classifiers
Feature subset selection
The relevance of the variables may vary over time.
We consider a wrapper feature selection method with the Naïve Bayes model
as the base classifier combined with greedy search.
The area under the curve (AUC) was used as the objective function, because
AUC usually performs well even if the data has class imbalance.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 26
Dynamic Bayesian classifiers
Feature subset selection
The relevance of the variables may vary over time.
We consider a wrapper feature selection method with the Naïve Bayes model
as the base classifier combined with greedy search.
The area under the curve (AUC) was used as the objective function, because
AUC usually performs well even if the data has class imbalance.
In our case, the feature selection method is performed at each time step to
infer which variables are helpful in separating defaulters from non-defaulters.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 27
Outline
1 Introduction
2 The financial data set
3 Risk prediction using dynamic Bayesian networks
4 Experimental results
5 Conclusion
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 28
AMIDST toolbox
Open source Java toolbox http://amidst.github.io/toolbox/
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 29
Predictive performance analysis
The feature subset selection helps to improve the value of the AUC.
The AUC value increases over time: the problem becomes easier to solve.0.650.700.750.800.850.900.951.00
AUC
Dynamic NB with FS
Dynamic NB
May2008
Jul2008
Sep2008
Nov2008
Jan2009
Mar2009
May2009
Jul2009
Sep2009
Nov2009
Jan2010
Mar2010
May2010
Jul2010
Sep2010
Nov2010
Jan2011
Mar2011
May2011
Jul2011
Sep2011
Nov2011
Jan2012
Mar2012
May2012
Jul2012
Sep2012
Nov2012
Jan2013
Mar2013
May2013
Jul2013
Sep2013
Nov2013
Jan2014
Mar2014
Figure 2: AUC results for the Dynamic Naive Bayes (NB) classifier with and
without feature selection (FS).
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 30
Analysis of relevant features
In general, the sociodemographic features play a minor role in terms of
predictive performance.
VAR01
VAR02
VAR04
VAR05
VAR06
VAR07
VAR08
VAR09
VAR10
VAR11
SOC01
SOC02
SOC03
SOC05
SOC06
SOC07
SOC10
SOC11
SOC12
SOC14
SOC16
SOC17
SOC18
SOC20
SOC22
SOC26
SOC28
SOC31
May2007
Jul2007
Sep2007
Nov2007
Jan2008
Mar2008
May2008
Jul2008
Sep2008
Nov2008
Jan2009
Mar2009
May2009
Jul2009
Sep2009
Nov2009
Jan2010
Mar2010
May2010
Jul2010
Sep2010
Nov2010
Jan2011
Mar2011
May2011
Jul2011
Sep2011
Nov2011
Jan2012
Mar2012
May2012
Jul2012
Sep2012
Nov2012
Jan2013
Mar2013
Figure 3: The set of selected features throughout the months.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 31
Analysis of relevant features
The most frequently selected variables consistently separate the two types of
clients, such as VAR04 and VAR08.
−0.3−0.2−0.10.00.1
VAR04
Jun2007
Sep2007
Dec2007
Mar2008
Jun2008
Sep2008
Dec2008
Mar2009
Jun2009
Sep2009
Dec2009
Mar2010
Jun2010
Sep2010
Dec2010
Mar2011
Jun2011
Sep2011
Dec2011
Mar2012
Jun2012
Sep2012
Dec2012
Mar2013
Jun2013
Sep2013
Dec2013
Mar2014
Non−defaulting
Defaulting
0.00.51.01.52.02.5
VAR08
Jun2007
Sep2007
Dec2007
Mar2008
Jun2008
Sep2008
Dec2008
Mar2009
Jun2009
Sep2009
Dec2009
Mar2010
Jun2010
Sep2010
Dec2010
Mar2011
Jun2011
Sep2011
Dec2011
Mar2012
Jun2012
Sep2012
Dec2012
Mar2013
Jun2013
Sep2013
Dec2013
Mar2014
Non−defaulting
Defaulting
Figure 4: Time-dependent averages of variables VAR04 (“Account balance”)
and VAR08 (“Unpaid amount in personal loans”) for non-defaulting and
defaulting clients.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 32
Outline
1 Introduction
2 The financial data set
3 Risk prediction using dynamic Bayesian networks
4 Experimental results
5 Conclusion
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 33
Conclusion
A first step towards analyzing risk prediction in credit operations for the bank
Banco de Crédito Cooperativo.
A dynamic Naïve Bayes classifier with a wrapper feature subset selection.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 34
Conclusion
A first step towards analyzing risk prediction in credit operations for the bank
Banco de Crédito Cooperativo.
A dynamic Naïve Bayes classifier with a wrapper feature subset selection.
The feature subset selection helps to improve the results and gives insight into
which attributes are most relevant as a function of time.
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 35
Conclusion
A first step towards analyzing risk prediction in credit operations for the bank
Banco de Crédito Cooperativo.
A dynamic Naïve Bayes classifier with a wrapper feature subset selection.
The feature subset selection helps to improve the results and gives insight into
which attributes are most relevant as a function of time.
The AMIDST toolbox performs inference and learning under a Bayesian
framework and provides functionality to improve the presented model
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 36
Conclusion
A first step towards analyzing risk prediction in credit operations for the bank
Banco de Crédito Cooperativo.
A dynamic Naïve Bayes classifier with a wrapper feature subset selection.
The feature subset selection helps to improve the results and gives insight into
which attributes are most relevant as a function of time.
The AMIDST toolbox performs inference and learning under a Bayesian
framework and provides functionality to improve the presented model
Use of more expressive network structures
Extend the feature subset selection method to take the set of selected
features from the previous time-steps into account
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 37
Thank you for your attention
Questions?
Acknowledgments: This project has received funding from the European Union’s
Seventh Framework Programme for research, technological development and
demonstration under grant agreement no 619209
Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 38

Más contenido relacionado

Destacado

Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statisticsSagar Kamble
 
Module 5 Bayesian belief network modelling
Module 5 Bayesian belief network modellingModule 5 Bayesian belief network modelling
Module 5 Bayesian belief network modellingThink2Impact
 
Cost-Aware Virtual Machine Placement across Distributed Data Centers using Ba...
Cost-Aware Virtual Machine Placement acrossDistributed Data Centers using Ba...Cost-Aware Virtual Machine Placement acrossDistributed Data Centers using Ba...
Cost-Aware Virtual Machine Placement across Distributed Data Centers using Ba...Soodeh Farokhi
 
Probabilistic modeling in deep learning
Probabilistic modeling in deep learningProbabilistic modeling in deep learning
Probabilistic modeling in deep learningDenis Dus
 
construction risk factor analysis: BBN Network
construction risk factor analysis: BBN Networkconstruction risk factor analysis: BBN Network
construction risk factor analysis: BBN NetworkShaswati Mohapatra
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석datasciencekorea
 
The Perfume Project - Part 1
The Perfume Project - Part 1The Perfume Project - Part 1
The Perfume Project - Part 1Ankit Jha
 
Controlling Project Performance using PDM - PSQT2005 - Ben Linders
Controlling Project Performance using PDM - PSQT2005 - Ben LindersControlling Project Performance using PDM - PSQT2005 - Ben Linders
Controlling Project Performance using PDM - PSQT2005 - Ben LindersBen Linders
 
Building Process Improvement Business Cases Using Bayesian Belief Networks an...
Building Process Improvement Business Cases Using Bayesian Belief Networks an...Building Process Improvement Business Cases Using Bayesian Belief Networks an...
Building Process Improvement Business Cases Using Bayesian Belief Networks an...Ben Linders
 

Destacado (10)

Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
 
Bayes Belief Network
Bayes Belief NetworkBayes Belief Network
Bayes Belief Network
 
Module 5 Bayesian belief network modelling
Module 5 Bayesian belief network modellingModule 5 Bayesian belief network modelling
Module 5 Bayesian belief network modelling
 
Cost-Aware Virtual Machine Placement across Distributed Data Centers using Ba...
Cost-Aware Virtual Machine Placement acrossDistributed Data Centers using Ba...Cost-Aware Virtual Machine Placement acrossDistributed Data Centers using Ba...
Cost-Aware Virtual Machine Placement across Distributed Data Centers using Ba...
 
Probabilistic modeling in deep learning
Probabilistic modeling in deep learningProbabilistic modeling in deep learning
Probabilistic modeling in deep learning
 
construction risk factor analysis: BBN Network
construction risk factor analysis: BBN Networkconstruction risk factor analysis: BBN Network
construction risk factor analysis: BBN Network
 
Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석Bayesian Network 을 활용한 예측 분석
Bayesian Network 을 활용한 예측 분석
 
The Perfume Project - Part 1
The Perfume Project - Part 1The Perfume Project - Part 1
The Perfume Project - Part 1
 
Controlling Project Performance using PDM - PSQT2005 - Ben Linders
Controlling Project Performance using PDM - PSQT2005 - Ben LindersControlling Project Performance using PDM - PSQT2005 - Ben Linders
Controlling Project Performance using PDM - PSQT2005 - Ben Linders
 
Building Process Improvement Business Cases Using Bayesian Belief Networks an...
Building Process Improvement Business Cases Using Bayesian Belief Networks an...Building Process Improvement Business Cases Using Bayesian Belief Networks an...
Building Process Improvement Business Cases Using Bayesian Belief Networks an...
 

Similar a Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

Quantifi Newsletter InSight Issue 6
Quantifi Newsletter InSight Issue 6Quantifi Newsletter InSight Issue 6
Quantifi Newsletter InSight Issue 6Quantifi
 
CFPB Consumer Complaints Report - Tableau
CFPB Consumer Complaints Report - TableauCFPB Consumer Complaints Report - Tableau
CFPB Consumer Complaints Report - TableauNisheet Mahajan
 
in the world of data analytics, there is a multitude of visualizat
in the world of data analytics, there is a multitude of visualizatin the world of data analytics, there is a multitude of visualizat
in the world of data analytics, there is a multitude of visualizatLizbethQuinonez813
 
Quantifi newsletter Insight autumn 2015
Quantifi newsletter Insight autumn 2015Quantifi newsletter Insight autumn 2015
Quantifi newsletter Insight autumn 2015Quantifi
 
Quantifi newsletter Insight autumn 2012
Quantifi newsletter Insight autumn 2012Quantifi newsletter Insight autumn 2012
Quantifi newsletter Insight autumn 2012Quantifi
 
Specifier Selling Program - By Michael Obstoj
Specifier Selling Program - By Michael ObstojSpecifier Selling Program - By Michael Obstoj
Specifier Selling Program - By Michael ObstojMichael Obstoj
 
How to flourish in an uncertain future
How to flourish in an uncertain futureHow to flourish in an uncertain future
How to flourish in an uncertain futureDeloitte UK
 
Grzegorz hansen open ap is in corporate banking_8 nov 2018_barcelona_with com...
Grzegorz hansen open ap is in corporate banking_8 nov 2018_barcelona_with com...Grzegorz hansen open ap is in corporate banking_8 nov 2018_barcelona_with com...
Grzegorz hansen open ap is in corporate banking_8 nov 2018_barcelona_with com...Grzegorz Hansen, PhD
 
Citigroup Dact Final Tv
Citigroup Dact Final TvCitigroup Dact Final Tv
Citigroup Dact Final Tvu_expres1
 
FDA_SAKEC2018.pptx
FDA_SAKEC2018.pptxFDA_SAKEC2018.pptx
FDA_SAKEC2018.pptxmviji
 
1_Open Finance Meetup_Arnaud Sirtaine_Pulse Consult.pptx
1_Open Finance Meetup_Arnaud Sirtaine_Pulse Consult.pptx1_Open Finance Meetup_Arnaud Sirtaine_Pulse Consult.pptx
1_Open Finance Meetup_Arnaud Sirtaine_Pulse Consult.pptxFinTech Belgium
 
A tech firm with a balance sheet - Analytics in Financial Services
A tech firm with a balance sheet - Analytics in Financial ServicesA tech firm with a balance sheet - Analytics in Financial Services
A tech firm with a balance sheet - Analytics in Financial ServicesStavros Apostolou
 
SMA | Comments on BCBS (June 2016) consultation (Standardized Measurement App...
SMA | Comments on BCBS (June 2016) consultation (Standardized Measurement App...SMA | Comments on BCBS (June 2016) consultation (Standardized Measurement App...
SMA | Comments on BCBS (June 2016) consultation (Standardized Measurement App...GRATeam
 
Analytics in banking service
Analytics in banking serviceAnalytics in banking service
Analytics in banking serviceShahbaz Atique
 

Similar a Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015) (20)

cv_2016_1
cv_2016_1cv_2016_1
cv_2016_1
 
Quantifi Newsletter InSight Issue 6
Quantifi Newsletter InSight Issue 6Quantifi Newsletter InSight Issue 6
Quantifi Newsletter InSight Issue 6
 
CFPB Consumer Complaints Report - Tableau
CFPB Consumer Complaints Report - TableauCFPB Consumer Complaints Report - Tableau
CFPB Consumer Complaints Report - Tableau
 
in the world of data analytics, there is a multitude of visualizat
in the world of data analytics, there is a multitude of visualizatin the world of data analytics, there is a multitude of visualizat
in the world of data analytics, there is a multitude of visualizat
 
Quantifi newsletter Insight autumn 2015
Quantifi newsletter Insight autumn 2015Quantifi newsletter Insight autumn 2015
Quantifi newsletter Insight autumn 2015
 
Quantifi newsletter Insight autumn 2012
Quantifi newsletter Insight autumn 2012Quantifi newsletter Insight autumn 2012
Quantifi newsletter Insight autumn 2012
 
Eurobanking 2017 - Call for Papers
Eurobanking 2017 - Call for PapersEurobanking 2017 - Call for Papers
Eurobanking 2017 - Call for Papers
 
Specifier Selling Program - By Michael Obstoj
Specifier Selling Program - By Michael ObstojSpecifier Selling Program - By Michael Obstoj
Specifier Selling Program - By Michael Obstoj
 
How to flourish in an uncertain future
How to flourish in an uncertain futureHow to flourish in an uncertain future
How to flourish in an uncertain future
 
Grzegorz hansen open ap is in corporate banking_8 nov 2018_barcelona_with com...
Grzegorz hansen open ap is in corporate banking_8 nov 2018_barcelona_with com...Grzegorz hansen open ap is in corporate banking_8 nov 2018_barcelona_with com...
Grzegorz hansen open ap is in corporate banking_8 nov 2018_barcelona_with com...
 
Data science in finance industry
Data science in finance industryData science in finance industry
Data science in finance industry
 
Citigroup Dact Final Tv
Citigroup Dact Final TvCitigroup Dact Final Tv
Citigroup Dact Final Tv
 
FDA_SAKEC2018.pptx
FDA_SAKEC2018.pptxFDA_SAKEC2018.pptx
FDA_SAKEC2018.pptx
 
1_Open Finance Meetup_Arnaud Sirtaine_Pulse Consult.pptx
1_Open Finance Meetup_Arnaud Sirtaine_Pulse Consult.pptx1_Open Finance Meetup_Arnaud Sirtaine_Pulse Consult.pptx
1_Open Finance Meetup_Arnaud Sirtaine_Pulse Consult.pptx
 
A tech firm with a balance sheet - Analytics in Financial Services
A tech firm with a balance sheet - Analytics in Financial ServicesA tech firm with a balance sheet - Analytics in Financial Services
A tech firm with a balance sheet - Analytics in Financial Services
 
SMA | Comments on BCBS (June 2016) consultation (Standardized Measurement App...
SMA | Comments on BCBS (June 2016) consultation (Standardized Measurement App...SMA | Comments on BCBS (June 2016) consultation (Standardized Measurement App...
SMA | Comments on BCBS (June 2016) consultation (Standardized Measurement App...
 
Digitization Strategies in Corporate Banking von Brigitte Ross
Digitization Strategies in Corporate Banking von Brigitte RossDigitization Strategies in Corporate Banking von Brigitte Ross
Digitization Strategies in Corporate Banking von Brigitte Ross
 
1en kraft
1en kraft1en kraft
1en kraft
 
Managing the Customer Portfolio to Improve Service and Financial Performance
Managing the Customer Portfolio to Improve Service and Financial PerformanceManaging the Customer Portfolio to Improve Service and Financial Performance
Managing the Customer Portfolio to Improve Service and Financial Performance
 
Analytics in banking service
Analytics in banking serviceAnalytics in banking service
Analytics in banking service
 

Más de AMIDST Toolbox

Analysis of massive data using R (CAEPIA2015)
Analysis of massive data using R (CAEPIA2015)Analysis of massive data using R (CAEPIA2015)
Analysis of massive data using R (CAEPIA2015)AMIDST Toolbox
 
A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graph...
A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graph...A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graph...
A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graph...AMIDST Toolbox
 
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...AMIDST Toolbox
 
Parallelisation of the PC Algorithm (CAEPIA2015)
Parallelisation of the PC Algorithm (CAEPIA2015)Parallelisation of the PC Algorithm (CAEPIA2015)
Parallelisation of the PC Algorithm (CAEPIA2015)AMIDST Toolbox
 
Amidst demo (BNAIC 2015)
Amidst demo (BNAIC 2015)Amidst demo (BNAIC 2015)
Amidst demo (BNAIC 2015)AMIDST Toolbox
 
d-VMP: Distributed Variational Message Passing (PGM2016)
d-VMP: Distributed Variational Message Passing (PGM2016)d-VMP: Distributed Variational Message Passing (PGM2016)
d-VMP: Distributed Variational Message Passing (PGM2016)AMIDST Toolbox
 
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...AMIDST Toolbox
 

Más de AMIDST Toolbox (8)

Analysis of massive data using R (CAEPIA2015)
Analysis of massive data using R (CAEPIA2015)Analysis of massive data using R (CAEPIA2015)
Analysis of massive data using R (CAEPIA2015)
 
A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graph...
A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graph...A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graph...
A Java Toolbox for Analysis of MassIve Data STreams using Probabilistic Graph...
 
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
 
Parallelisation of the PC Algorithm (CAEPIA2015)
Parallelisation of the PC Algorithm (CAEPIA2015)Parallelisation of the PC Algorithm (CAEPIA2015)
Parallelisation of the PC Algorithm (CAEPIA2015)
 
Amidst demo (BNAIC 2015)
Amidst demo (BNAIC 2015)Amidst demo (BNAIC 2015)
Amidst demo (BNAIC 2015)
 
d-VMP: Distributed Variational Message Passing (PGM2016)
d-VMP: Distributed Variational Message Passing (PGM2016)d-VMP: Distributed Variational Message Passing (PGM2016)
d-VMP: Distributed Variational Message Passing (PGM2016)
 
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...
Scalable MAP inference in Bayesian networks based on a Map-Reduce approach (P...
 
Flink Forward 2016
Flink Forward 2016Flink Forward 2016
Flink Forward 2016
 

Último

Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxmalonesandreagweneth
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 

Último (20)

Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptxLIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
LIGHT-PHENOMENA-BY-CABUALDIONALDOPANOGANCADIENTE-CONDEZA (1).pptx
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 

Dynamic Bayesian modeling for risk prediction in credit operations (SCAI2015)

  • 1. Dynamic Bayesian modeling for risk prediction in credit operations Hanen Borchani1, Ana M. Martínez1, Andrés R. Masegosa2, Helge Langseth2, Thomas D. Nielsen1, Antonio Salmerón3, Antonio Fernández4, Anders L. Madsen1,5, Ramón Sáez4 1Department of Computer Science, Aalborg University, Denmark 2 Department of Computer and Information Science, The Norwegian University of Science and Technology, Norway 3Department of Mathematics, University of Almería, Spain 4 Banco de Crédito Cooperativo, Spain 5 Hugin Expert A/S, Aalborg, Denmark Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 1
  • 2. Outline 1 Introduction 2 The financial data set 3 Risk prediction using dynamic Bayesian networks 4 Experimental results 5 Conclusion Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 2
  • 3. Outline 1 Introduction 2 The financial data set 3 Risk prediction using dynamic Bayesian networks 4 Experimental results 5 Conclusion Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 3
  • 4. Introduction Efficient solutions for risk prediction in banks can be crucial for reducing losses due to inefficient business procedures. Such solutions can be used as tools for monitoring the evolution of customers in terms of credit operations risk to increase solvency of the banking institutions. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 4
  • 5. Introduction Efficient solutions for risk prediction in banks can be crucial for reducing losses due to inefficient business procedures. Such solutions can be used as tools for monitoring the evolution of customers in terms of credit operations risk to increase solvency of the banking institutions. From a machine learning perspective, credit scoring has traditionally been approached as a supervised classification problem. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 5
  • 6. Introduction Efficient solutions for risk prediction in banks can be crucial for reducing losses due to inefficient business procedures. Such solutions can be used as tools for monitoring the evolution of customers in terms of credit operations risk to increase solvency of the banking institutions. From a machine learning perspective, credit scoring has traditionally been approached as a supervised classification problem. However, recently, this problem presents additional challenging characteristics that separate it from the standard supervised classification problems. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 6
  • 7. Challenges Classification in a streaming context: a stream of multiple sequences received over time, each sequence representing a particular client. That is, at every time step t, we receive the data Dt containing information about all the clients. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 7
  • 8. Challenges Classification in a streaming context: a stream of multiple sequences received over time, each sequence representing a particular client. That is, at every time step t, we receive the data Dt containing information about all the clients. A delayed class-feedback: the class label for each sample/client corresponds to the client’s defaulting behavior in the following twelve months and this information is therefore only available after a twelve month delay. Thus, the available data is a mixture of labeled and unlabeled samples. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 8
  • 9. Challenges Classification in a streaming context: a stream of multiple sequences received over time, each sequence representing a particular client. That is, at every time step t, we receive the data Dt containing information about all the clients. A delayed class-feedback: the class label for each sample/client corresponds to the client’s defaulting behavior in the following twelve months and this information is therefore only available after a twelve month delay. Thus, the available data is a mixture of labeled and unlabeled samples. Concept drift: the domain exhibits a form of concept drift where the data distribution as well as the set of feature variables relevant for classification may vary over time. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 9
  • 10. Challenges Classification in a streaming context: a stream of multiple sequences received over time, each sequence representing a particular client. That is, at every time step t, we receive the data Dt containing information about all the clients. A delayed class-feedback: the class label for each sample/client corresponds to the client’s defaulting behavior in the following twelve months and this information is therefore only available after a twelve month delay. Thus, the available data is a mixture of labeled and unlabeled samples. Concept drift: the domain exhibits a form of concept drift where the data distribution as well as the set of feature variables relevant for classification may vary over time. Objective: Explore the credit scoring problem based on a real-world data set. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 10
  • 11. Outline 1 Introduction 2 The financial data set 3 Risk prediction using dynamic Bayesian networks 4 Experimental results 5 Conclusion Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 11
  • 12. The financial data set Provided by a Spanish bank in the Almería region: Banco de Crédito Cooperativo (BCC). It contains monthly aggregated information for a set of BCC clients for the period from April 2007 to March 2014. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 12
  • 13. The financial data set Provided by a Spanish bank in the Almería region: Banco de Crédito Cooperativo (BCC). It contains monthly aggregated information for a set of BCC clients for the period from April 2007 to March 2014. Only “active” clients are considered, meaning that we restrict our attention to individuals between 18 and 65 years of age, who have at least one automatic bill payment or direct debit in the bank. BCC employees are excluded since they have special conditions. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 13
  • 14. The financial data set Provided by a Spanish bank in the Almería region: Banco de Crédito Cooperativo (BCC). It contains monthly aggregated information for a set of BCC clients for the period from April 2007 to March 2014. Only “active” clients are considered, meaning that we restrict our attention to individuals between 18 and 65 years of age, who have at least one automatic bill payment or direct debit in the bank. BCC employees are excluded since they have special conditions. The resulting data set includes 50 000 clients each month. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 14
  • 15. The financial data set 44 feature variables, denoted Xt, where 11 variables describing the financial status of a client (VARXX) and 33 socio-demographic variables (SOCXX). Variable ID Description Variable ID Description VAR01 Total credit amount VAR07 Unpaid amount in mortgages VAR02 Income VAR08 Unpaid amount in personal loans VAR03 Expenses VAR09 Unpaid amount in credit cards VAR04 Account balance VAR10 Unpaid amount in bank account deficit VAR05 Risk balance in mortgages VAR11 Unpaid amount in other products VAR06 Risk balance in consumer loans SOC01-33 Set of 33 socio-demographic variables Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 15
  • 16. The financial data set 44 feature variables, denoted Xt, where 11 variables describing the financial status of a client (VARXX) and 33 socio-demographic variables (SOCXX). Variable ID Description Variable ID Description VAR01 Total credit amount VAR07 Unpaid amount in mortgages VAR02 Income VAR08 Unpaid amount in personal loans VAR03 Expenses VAR09 Unpaid amount in credit cards VAR04 Account balance VAR10 Unpaid amount in bank account deficit VAR05 Risk balance in mortgages VAR11 Unpaid amount in other products VAR06 Risk balance in consumer loans SOC01-33 Set of 33 socio-demographic variables Each client u has an associated class variable C (u) t for each time step t that indicates if that particular client will default during the following 12 months. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 16
  • 17. Outline 1 Introduction 2 The financial data set 3 Risk prediction using dynamic Bayesian networks 4 Experimental results 5 Conclusion Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 17
  • 18. Dynamic Bayesian classifiers A dynamic probabilistic model for doing prediction in the BCC domain At time T (the current time), we predict the defaulting status (CT ) of a particular client based on previous socio-economical observations and the client’s known defaulting status λ = 12 months earlier. in fact apply to most credit scoring problems as well as many other domains. We will discuss this issue further in Section 5, which also serves to demonstrate the broader relevance of the above mentioned problems. In this paper we present a first approach to address the BCC credit scoring problem3 based on the use of a simple dynamic probabilistic graphical model [5]. A rough visual description of this model is given in Figure 1. Our preliminary approach is implemented based on the AMIDST Toolbox4 . This toolbox provides an e cient implementation of approximate inference and learning methods for streaming data using the Bayesian networks modeling framework [5] as well as variational Bayes inference and learning procedures [6]. CT 12 CT 11 XT 11 CT 10 XT 10 CT 1 XT 1 CT XT Figure 1. A dynamic probabilistic model for doing prediction in the BCC domain. At time T (assumed to be the current time) we wish to predict the defaulting status (CT ) of a particular customer based on previous socio-economical observations as well as the customer’s known defaulting status = 12 months earlier. Note that due to the independence assumptions in the Figure 1: Square/Round boxes indicate data which is available/non-available when predicting the defaulting status of the clients at month T. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 18
  • 19. Dynamic Bayesian classifiers A 2-time-slices Dynamic Naïve Bayes classifier X1,t−1 Ct−1 Ct ."."."X2,t−1 Xn,t−1 X1,t ."."."X2,t Xn,t Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 19
  • 20. Dynamic Bayesian classifiers A 2-time-slices Dynamic Naïve Bayes classifier X1,t−1 Ct−1 Ct ."."."X2,t−1 Xn,t−1 X1,t ."."."X2,t Xn,t It assumes that only the class variables are connected across time and that all the predictive variables at time step t are conditionally independent given the class variable at time t. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 20
  • 21. Dynamic Bayesian classifiers A 2-time-slices Dynamic Naïve Bayes classifier X1,t−1 Ct−1 Ct ."."."X2,t−1 Xn,t−1 X1,t ."."."X2,t Xn,t It assumes that only the class variables are connected across time and that all the predictive variables at time step t are conditionally independent given the class variable at time t. The joint probability factorizes as p(c1:T , x1:T ) = T t=1 p(ct|ct−1) n i=1 p (xi,t|ct) · Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 21
  • 22. Dynamic Bayesian classifiers Learning the model Bayesian approach for multinomial and normally distributed data. p (xi,t|ct) are learned from the labeled data DT−λ. p(ct|ct−1) are learned using the class transitions from DT−λ−1 to DT−λ. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 22
  • 23. Dynamic Bayesian classifiers Learning the model Bayesian approach for multinomial and normally distributed data. p (xi,t|ct) are learned from the labeled data DT−λ. p(ct|ct−1) are learned using the class transitions from DT−λ−1 to DT−λ. Prediction It amounts to calculating the conditional probability for the class label for each client u at time T given all the information collected so far, D1:T . p c (u) t |x (u) t−λ+1:t , c (u) t−λ ∝ p x (u) t |c (u) t c (u) t−1 p c (u) t |c (u) t−1 p c (u) t−1|x (u) t−λ+1:t−1, c (u) t−λ · Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 23
  • 24. Dynamic Bayesian classifiers Feature subset selection The relevance of the variables may vary over time. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 24
  • 25. Dynamic Bayesian classifiers Feature subset selection The relevance of the variables may vary over time. We consider a wrapper feature selection method with the Naïve Bayes model as the base classifier combined with greedy search. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 25
  • 26. Dynamic Bayesian classifiers Feature subset selection The relevance of the variables may vary over time. We consider a wrapper feature selection method with the Naïve Bayes model as the base classifier combined with greedy search. The area under the curve (AUC) was used as the objective function, because AUC usually performs well even if the data has class imbalance. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 26
  • 27. Dynamic Bayesian classifiers Feature subset selection The relevance of the variables may vary over time. We consider a wrapper feature selection method with the Naïve Bayes model as the base classifier combined with greedy search. The area under the curve (AUC) was used as the objective function, because AUC usually performs well even if the data has class imbalance. In our case, the feature selection method is performed at each time step to infer which variables are helpful in separating defaulters from non-defaulters. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 27
  • 28. Outline 1 Introduction 2 The financial data set 3 Risk prediction using dynamic Bayesian networks 4 Experimental results 5 Conclusion Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 28
  • 29. AMIDST toolbox Open source Java toolbox http://amidst.github.io/toolbox/ Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 29
  • 30. Predictive performance analysis The feature subset selection helps to improve the value of the AUC. The AUC value increases over time: the problem becomes easier to solve.0.650.700.750.800.850.900.951.00 AUC Dynamic NB with FS Dynamic NB May2008 Jul2008 Sep2008 Nov2008 Jan2009 Mar2009 May2009 Jul2009 Sep2009 Nov2009 Jan2010 Mar2010 May2010 Jul2010 Sep2010 Nov2010 Jan2011 Mar2011 May2011 Jul2011 Sep2011 Nov2011 Jan2012 Mar2012 May2012 Jul2012 Sep2012 Nov2012 Jan2013 Mar2013 May2013 Jul2013 Sep2013 Nov2013 Jan2014 Mar2014 Figure 2: AUC results for the Dynamic Naive Bayes (NB) classifier with and without feature selection (FS). Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 30
  • 31. Analysis of relevant features In general, the sociodemographic features play a minor role in terms of predictive performance. VAR01 VAR02 VAR04 VAR05 VAR06 VAR07 VAR08 VAR09 VAR10 VAR11 SOC01 SOC02 SOC03 SOC05 SOC06 SOC07 SOC10 SOC11 SOC12 SOC14 SOC16 SOC17 SOC18 SOC20 SOC22 SOC26 SOC28 SOC31 May2007 Jul2007 Sep2007 Nov2007 Jan2008 Mar2008 May2008 Jul2008 Sep2008 Nov2008 Jan2009 Mar2009 May2009 Jul2009 Sep2009 Nov2009 Jan2010 Mar2010 May2010 Jul2010 Sep2010 Nov2010 Jan2011 Mar2011 May2011 Jul2011 Sep2011 Nov2011 Jan2012 Mar2012 May2012 Jul2012 Sep2012 Nov2012 Jan2013 Mar2013 Figure 3: The set of selected features throughout the months. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 31
  • 32. Analysis of relevant features The most frequently selected variables consistently separate the two types of clients, such as VAR04 and VAR08. −0.3−0.2−0.10.00.1 VAR04 Jun2007 Sep2007 Dec2007 Mar2008 Jun2008 Sep2008 Dec2008 Mar2009 Jun2009 Sep2009 Dec2009 Mar2010 Jun2010 Sep2010 Dec2010 Mar2011 Jun2011 Sep2011 Dec2011 Mar2012 Jun2012 Sep2012 Dec2012 Mar2013 Jun2013 Sep2013 Dec2013 Mar2014 Non−defaulting Defaulting 0.00.51.01.52.02.5 VAR08 Jun2007 Sep2007 Dec2007 Mar2008 Jun2008 Sep2008 Dec2008 Mar2009 Jun2009 Sep2009 Dec2009 Mar2010 Jun2010 Sep2010 Dec2010 Mar2011 Jun2011 Sep2011 Dec2011 Mar2012 Jun2012 Sep2012 Dec2012 Mar2013 Jun2013 Sep2013 Dec2013 Mar2014 Non−defaulting Defaulting Figure 4: Time-dependent averages of variables VAR04 (“Account balance”) and VAR08 (“Unpaid amount in personal loans”) for non-defaulting and defaulting clients. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 32
  • 33. Outline 1 Introduction 2 The financial data set 3 Risk prediction using dynamic Bayesian networks 4 Experimental results 5 Conclusion Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 33
  • 34. Conclusion A first step towards analyzing risk prediction in credit operations for the bank Banco de Crédito Cooperativo. A dynamic Naïve Bayes classifier with a wrapper feature subset selection. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 34
  • 35. Conclusion A first step towards analyzing risk prediction in credit operations for the bank Banco de Crédito Cooperativo. A dynamic Naïve Bayes classifier with a wrapper feature subset selection. The feature subset selection helps to improve the results and gives insight into which attributes are most relevant as a function of time. Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 35
  • 36. Conclusion A first step towards analyzing risk prediction in credit operations for the bank Banco de Crédito Cooperativo. A dynamic Naïve Bayes classifier with a wrapper feature subset selection. The feature subset selection helps to improve the results and gives insight into which attributes are most relevant as a function of time. The AMIDST toolbox performs inference and learning under a Bayesian framework and provides functionality to improve the presented model Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 36
  • 37. Conclusion A first step towards analyzing risk prediction in credit operations for the bank Banco de Crédito Cooperativo. A dynamic Naïve Bayes classifier with a wrapper feature subset selection. The feature subset selection helps to improve the results and gives insight into which attributes are most relevant as a function of time. The AMIDST toolbox performs inference and learning under a Bayesian framework and provides functionality to improve the presented model Use of more expressive network structures Extend the feature subset selection method to take the set of selected features from the previous time-steps into account Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 37
  • 38. Thank you for your attention Questions? Acknowledgments: This project has received funding from the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement no 619209 Scandinavian Conference on Artificial Intelligence, Halmstad, November 5–6, 2015 38