SlideShare una empresa de Scribd logo
1 de 9
Descargar para leer sin conexión
HAL Id: cel-01620878
https://hal.archives-ouvertes.fr/cel-01620878
Submitted on 24 Oct 2017
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Symbolic Data Analysis: Regression on Gaussian
symbols
Axel Aronio de Romblay
To cite this version:
Axel Aronio de Romblay. Symbolic Data Analysis: Regression on Gaussian symbols. Master. France.
2015. cel-01620878
SYMBOLIC DATA ANALYSIS:
Regression on Gaussian symbols
Axel Aronio de Romblay
Telecom Paristech
axel.aronio-de-romblay@telecom-paristech.fr
December 2015
1 Abstract
In classical data analysis, data are single values. This is the case if you consider a dataset of n patients
which age and size you know. But what if you record the blood pressure or the weight of each patient during a
day ? Then, for each patient, you do not have a single-valued data but a set of values since the blood pressure
or the weight are not constant during the day.
Suppose now that you do not want to record blood pressure a thousand times for each patient and to store
it into a database because your memory space is limited. Therefore, you need to aggregate each set of values
into symbols: intervals (lower and upper bounds only), box plots, histograms or even distributions (distribution
law with mean and variance)...
Thus, the issue is to adapt classical statistical tools to symbolic data analysis. More precisely, this article
is aimed at proposing a method to fit a regression on Gaussian distributions. This paper is divided as follows:
first, it presents the computation of the maximum likelihood estimator and then it compares the new approach
with the usual least squares regression.
KEY WORDS: Symbolic data, Big Data, Linear regression, Maximum likelihood, Gaussian symbols
2 Introduction
Symbolic data were first introduced by Diday (1987). A realization of a symbolic-valued variable may
take a finite or an infinite set of values in Rp
. A random variable that takes a finite set of either quantitative
or qualitative values is called a multi-valued variable. A variable that takes an infinite set of numerical values
ranging from a low to a high value is called an interval-valued variable. A more complex symbolic-valued variable
may have weights, probabilities, capacities, credibilities or even a distribution associated with its values. This
class variable is called modal-valued. Classical data are a special case of symbolic data where the internal
distribution puts the probability 1 on a single point value in Rp
.
Symbolic data analysis offers a solution to the massive data problems, especially when the entities of
interest are groups or classes of observations. Recent advances in technology enable storage of extremely large
databases. Larger datasets provide more information about the subjects of interest. However, performing even
simple exploratory procedures on these datasets requires a lot of computing power. As a result, much research
effort in recent years has been steered toward finding more efficient methods to accommodate large datasets.
In many situations, characteristics of classes of observations are of higher interest to an analyst than those of
individual observations. In these cases, individual observations can be aggregated into classes of observations.
A series of papers, Diday (1995), Diday et al. (1996), Emilion (1997) and Diday and Emilion (1996, 1998),
established mathematical probabilistic framework for classes of symbolic data. These results were extended to
Galois lattices in the symbolic data context in Diday and Emilion (1997, 1998, 2003) and Brito and Polaillon
(2005). Beyond those fundamental results, a few recent papers (2010) have presented likelihood functions for
symbolic data but none of them came up with concrete results or closed form solutions.
Thus, in this paper, we mainly concentrate on trying to have an expression of the symbolic likelihood that
is valid and usable. Furthermore, we focus on Gaussian symbolic data which is very common. Moreover, some
actual work refers to uniform distributions but struggle to get a workable solution.
1
3 General settings
Let X a dataset of n observations and y the corresponding target:
X =



x1
...
xn


 , xi ∈ Rp
, i = 1 . . . n
y =



y1
...
yn


 , yi ∈ R, i = 1 . . . n
We suppose y = X β + , ∼ N(0, σ2
In) and we want to estimate θ = (β, σ2
), β ∈ Rp
and σ ∈ R
4 Gaussian symbols
4.1 Definitions
Here, we consider the case where each observation xi is no more a point in Rp
but a p dimensional normal
distribution with known parameters µi and Σi :
yi|xi ∼ N(xi β, σ2
)
xi ∼ N(µi, Σi)
Furthermore, we suppose that both samples yi|xi and xi are iid.
4.2 Symbolic likelihood
First, let us compute the density of (y,X) or (y, x1, . . . , xn) :
p(y, X) = p(y|X)p(X)
where each term is given by:
p(X) =
n
i=1
(2π)− p
2 |Σi|− 1
2 exp −
1
2
(xi − µi) Σ−1
i (xi − µi) (1)
p(y|X) =
n
i=1
(2π)− 1
2 (σ2
)− 1
2 exp −
1
2
(yi − xi β)2
σ2
(2)
Thus, we can write the joint distribution as:
p(y, X) = C.exp −
1
2
||y − Xβ||2
σ2
+
n
i=1
 xi|Σ−1
i xi  (3)
using the following notations: C = (2π)−
n(p+1)
2 σ−n n
i=1 |Σi|− 1
2 and xi = xi − µi
Finally, we can write the log-likelihood:
l(θ, µi, Σi, yi) = log(p(y)) = log
x1
. . .
xn
p(y, X)dx1 . . . dxn (4)
Indeed, from (4), we can easily notice that y is a normal distribution with parameters µy and Σy.
E[y] = E[Xβ] = Mβ (5)
with : M =



µ1
...
µn



2
If we suppose that X and are independents:
Σy = V ar(y) = V ar(Xβ) + V ar( )
= E[Xβ.(Xβ) ] − E[Xβ].E[Xβ] + σ2
In
where:
E[Xβ.(Xβ) ] = E[((xi β)(xj β))i,j] = E[(β (xixj )β)i,j]
= (β (Cov(xi, xj) + µiµj )β)i,j
= (β δi,jΣiβ)i,j + Mββ M
As a conclusion:
Σy = (β δi,jΣiβ)i,j + σ2
In =



β Σ1β + σ2
(0)
...
(0) β Σnβ + σ2


 (6)
And the log likelihood is given by :
l(θ, µi, Σi, yi) = −
n
2
log(2π) −
1
2
n
i=1
log(β Σiβ + σ2
) −
1
2
(y − Mβ) Σ−1
y (y − Mβ) (7)
4.3 Maximum Likelihood Estimator
Let us see if we can get a closed form for βMLE by computing the partial derivative of the log-likelihood with
respect to β :
We write: l1(β) =
n
i=1 log(β Σiβ + σ2
) and l2(β) = (y − Mβ) Σ−1
y (y − Mβ)
∂l1
∂β
(β) = 2
n
i=1
Σiβ
β Σiβ + σ2
∂l2
∂β
(β) =
n
i=1
∂
∂β
(yi−  µi|β )2
β Σiβ + σ2
=
n
i=1
−
2(yi−  µi|β )2
Σiβ
(β Σiβ + σ2)2
−
2(yi−  µi|β )µi
β Σiβ + σ2
Now, if we gather the two terms and factorize, we can get:
∂l
∂β
(θ, µi, Σi, yi) = 0 ⇐⇒
n
i=1
1
β Σiβ + σ2
yiµi − 1 −
yi
2
β Σiβ + σ2
Σiβ = 0 (8)
where: yi−  µi|β = yi − yi = yi
With the general settings (Σi = 0), we cannot get a closed form for βMLE. Besides, we do not know σ2
. Let us
remind the expression of the partial derivative of the log likelihood with respect to β :
∂l
∂β
(θ, µi, Σi, yi) =
n
i=1
1
β Σiβ + σ2
yiµi − 1 −
yi
2
β Σiβ + σ2
Σiβ (9)
3
Now we compute the partial derivative of the log likelihood with respect to σ2
:
∂l1
∂σ2
(σ2
) =
n
i=1
1
β Σiβ + σ2
∂l2
∂σ2
(σ2
) = −
n
i=1
yi
β Σiβ + σ2
2
Thus:
∂l
∂σ2
(θ, µi, Σi, yi) =
1
2
n
i=1
1
β Σiβ + σ2
yi
2
β Σiβ + σ2
− 1 (10)
To conclude, we have:
l(θ) =
n
i=1
l(i)
(θ) (11)
with:
l(i)
(θ) =




1
β Σiβ+σ2 yiµi − 1 − yi
2
β Σiβ+σ2 Σiβ
1
2
1
β Σiβ+σ2
yi
2
β Σiβ+σ2 − 1



 (12)
4.4 Comments
• Finally, we do have an explicit formula for the gradient of the log likelihood, which is a sum of n vector
in a p+1 dimensional space. Thus, with (11) or (12), we can run a full-batch or a stochastic gradient
descent algorithm to get an approximation of θMLE.
• We also notice that we can easily recover the ’usual’ least square regression case where Σi → 0 and
xi ↔ µi :
n
i=1
1
σ2
yiµi = 0 ⇐⇒ (y − Xβ) X = 0 (13)
1
2σ2
n
i=1
yi
2
σ2
− 1 = 0 ⇐⇒ −
n
2σ2
+
1
2σ4
(y − Xβ) (y − Xβ) = 0 (14)
• We supposed that there is no intercept. Indeed, we can get rid of it easily if we first apply the following
transformation:
xi ∼ N
µi
1
, Σi
...
. . . 0
and β ∈ Rp+1
• After estimating β, we can deduce an unbiased prediction: yj = µj β
• We expect σ2 → +∞ when Σi → +∞. Indeed, if we plot the log-likelihood as a function of σ2
for
different values of Σi (we suppose that the Σi are equals for all i: see more details in Appendix), we get
this result:
4
Figure 1: log-likelihoods for different Σi
• From (6), we can see that the variance of yi depends on Σi but especially on β which makes sense.
Indeed, if p is large and each coordinate is large, the locations of the yi are not accurate. This can lead
to an issue when fitting the regression since we can have overlapping symbols...
4.5 Results
In this section, we simulate different datasets and we compare the maximum likelihood estimation with
the naive solution of fitting a regression on (µi, yi). We remind the settings: we get (µi, Σi, yi) and we want to
estimate β (and σ2
). Let us note m, Σ, s1 and s2 the following parameters:
• M = uniform(0,m,(n,p)) : each coordinate of M (see definition in section 4.2) is generated uniformly
between 0 and m.
• Σ = [Σi for i in range(n)] = [np.diag(uniform(s1,s2,p)) for i in range(n)] : we suppose here that within a
symbol the p features are independents and we generate a diagonal matrix of each Σi where the diagonal
coefficients are uniformly taken between [s1, s2].
4.5.1 Dataset 1 (reference dataset): n=100, p=5, m=100, [s1, s2]=[0.1, 1], σ2
=0.5, β ∈[-2,2]
Figure 2: Convergence to the ML estimator Figure 3: MSE for each method
5
4.5.2 Dataset 2: n=100, p=5, m=100, [s1, s2]=[0.1, 1], σ2
=0.5, β ∈[-0.1,0.1]
Figure 4: Convergence to the ML estimator Figure 5: MSE for each method
4.5.3 Dataset 3: n=100, p=5, m=100, [s1, s2]=[0.1, 1], σ2
=5, β ∈[-2,2]
Figure 6: Convergence to the ML estimator Figure 7: MSE for each method
4.5.4 Dataset 4: n=1000, p=5, m=100, [s1, s2]=[0.1, 1], σ2
=0.5, β ∈[-2,2]
Figure 8: Convergence to the ML estimator Figure 9: MSE for each method
6
4.5.5 Dataset 5: n=100, p=5, m=100, [s1, s2]=[1, 5], σ2
=0.5, β ∈[-2,2]
Figure 10: Convergence to the ML estimator Figure 11: MSE for each method
4.5.6 Dataset 6: n=100, p=20, m=100, [s1, s2]=[0.1, 1], σ2
=0.5, β ∈[-2,2]
Figure 12: Convergence to the ML estimator Figure 13: MSE for each method
5 Conclusion
From these results, we can see that the Maximum Likelihood Estimator is always more accurate than
fitting a naive least squares regression on the (µi, yi). Nevertheless, the computation time is much longer with
a simple gradient descent even if you get an ’appropriate’ learning rate and you start with a ’good’
initialization. To face this issue, it is highly recommanded to start with β0 = (M M)−1
M y. Thus, it will
take just a few seconds to get an ’accurate’ approximation of βMLE. Besides, from the graphics, we can notice
that:
• When the quantity β Σiβ + σ2
is ’large’ (cf Figure 5, Figure 7 and Figure 11), the MLE is highly more
accurate than the naive estimator. Indeed, the naive estimator does not take into account the internal
variation inside a symbol and thus it is very biased. But in this case, we need to run more iterations.
• When n is ’large’ (cf Figure 9): we need to run less iterations but each iteration takes more time. The
more you get symbols, the more accurate are the two estimators. Here, it is preferable to run a
stochastic gradient descent algorithm: it will be definitely faster but a bit less accurate (unless you run
some more iterations).
• When p is ’large’ (cf Figure 13): the difference between both estimators is negligible. The MLE
approximation needs a lot of iterations and subsampling is not a good solution to speed up the
convergence.
7
6 Acknowledgment
I would like to thank my supervisor Scott Sisson who guided me through my research exchange and who
helped me. I also would like to thank all the administration team at UNSW for their welcome and their useful
help to set up my office.
7 References
[1] Billard, L. and Diday, E. (2000). Regression Analysis for Interval-Valued Data. Data analysis,
Classification, and Related Methods (eds. H.A.L. Kiers, J.-P. Rassoon, P.J.F. Groenen, and M. Schader).
Springer-Verlag, Berlin, 369-374.
[2] Billard, L. and Diday, E. (2007). Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley,
Chichester.6
[3] de Carvalho F.A.T., Lima Neto, E.A. and Tenorio, C.P. (2004). A New Method to Fit a Linear Regression
Model for Interval-valued Data. Lecture Notes in Computer Science, KI2004 Advances in Artifical Inteligence.
Springer-Verlag, 295-306.
[4] Le-Rademacher, J. and Billard, L. (2010). Likelihood Functions and Some Maximum Likelihood Estimators
for Symbolic Data. Journal of Statistical Planning and Inference, submitted.7
[5] Lima Neto, E.A. and de Carvalho F.A.T. (2010). Constrained Linear Regression Models for Symbolic
Interval-valued Variables. Computational Statistics Data Analysis, 54(2), 333-347.
8 Appendix
(*) Figure 1 has been plotted with these parameters: n=500, p=5, m=10, s1=s2, σ2
=0.5, β ∈[-2,2]
8

Más contenido relacionado

La actualidad más candente

SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMSSOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMSTahia ZERIZER
 
A common fixed point of integral type contraction in generalized metric spacess
A  common fixed point of integral type contraction in generalized metric spacessA  common fixed point of integral type contraction in generalized metric spacess
A common fixed point of integral type contraction in generalized metric spacessAlexander Decker
 
Graph theory concepts complex networks presents-rouhollah nabati
Graph theory concepts   complex networks presents-rouhollah nabatiGraph theory concepts   complex networks presents-rouhollah nabati
Graph theory concepts complex networks presents-rouhollah nabatinabati
 
Introductory maths analysis chapter 09 official
Introductory maths analysis   chapter 09 officialIntroductory maths analysis   chapter 09 official
Introductory maths analysis chapter 09 officialEvert Sandye Taasiringan
 
On the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansOn the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansFrank Nielsen
 
Interpreting Multiple Regression via an Ellipse Inscribed in a Square Extensi...
Interpreting Multiple Regressionvia an Ellipse Inscribed in a Square Extensi...Interpreting Multiple Regressionvia an Ellipse Inscribed in a Square Extensi...
Interpreting Multiple Regression via an Ellipse Inscribed in a Square Extensi...Toshiyuki Shimono
 
Discrete-Chapter 11 Graphs Part I
Discrete-Chapter 11 Graphs Part IDiscrete-Chapter 11 Graphs Part I
Discrete-Chapter 11 Graphs Part IWongyos Keardsri
 
Discrete-Chapter 11 Graphs Part III
Discrete-Chapter 11 Graphs Part IIIDiscrete-Chapter 11 Graphs Part III
Discrete-Chapter 11 Graphs Part IIIWongyos Keardsri
 
(α ψ)- Construction with q- function for coupled fixed point
(α   ψ)-  Construction with q- function for coupled fixed point(α   ψ)-  Construction with q- function for coupled fixed point
(α ψ)- Construction with q- function for coupled fixed pointAlexander Decker
 
Module 1 linear functions
Module 1   linear functionsModule 1   linear functions
Module 1 linear functionsdionesioable
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingFrank Nielsen
 
Introductory maths analysis chapter 13 official
Introductory maths analysis   chapter 13 officialIntroductory maths analysis   chapter 13 official
Introductory maths analysis chapter 13 officialEvert Sandye Taasiringan
 
Introductory maths analysis chapter 17 official
Introductory maths analysis   chapter 17 officialIntroductory maths analysis   chapter 17 official
Introductory maths analysis chapter 17 officialEvert Sandye Taasiringan
 
Introductory maths analysis chapter 02 official
Introductory maths analysis   chapter 02 officialIntroductory maths analysis   chapter 02 official
Introductory maths analysis chapter 02 officialEvert Sandye Taasiringan
 
Introductory maths analysis chapter 11 official
Introductory maths analysis   chapter 11 officialIntroductory maths analysis   chapter 11 official
Introductory maths analysis chapter 11 officialEvert Sandye Taasiringan
 

La actualidad más candente (19)

SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMSSOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
 
Lecture2 xing
Lecture2 xingLecture2 xing
Lecture2 xing
 
A common fixed point of integral type contraction in generalized metric spacess
A  common fixed point of integral type contraction in generalized metric spacessA  common fixed point of integral type contraction in generalized metric spacess
A common fixed point of integral type contraction in generalized metric spacess
 
Graph theory concepts complex networks presents-rouhollah nabati
Graph theory concepts   complex networks presents-rouhollah nabatiGraph theory concepts   complex networks presents-rouhollah nabati
Graph theory concepts complex networks presents-rouhollah nabati
 
Mtc ssample05
Mtc ssample05Mtc ssample05
Mtc ssample05
 
Introductory maths analysis chapter 09 official
Introductory maths analysis   chapter 09 officialIntroductory maths analysis   chapter 09 official
Introductory maths analysis chapter 09 official
 
On the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansOn the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract means
 
Interpreting Multiple Regression via an Ellipse Inscribed in a Square Extensi...
Interpreting Multiple Regressionvia an Ellipse Inscribed in a Square Extensi...Interpreting Multiple Regressionvia an Ellipse Inscribed in a Square Extensi...
Interpreting Multiple Regression via an Ellipse Inscribed in a Square Extensi...
 
Discrete-Chapter 11 Graphs Part I
Discrete-Chapter 11 Graphs Part IDiscrete-Chapter 11 Graphs Part I
Discrete-Chapter 11 Graphs Part I
 
Discrete-Chapter 11 Graphs Part III
Discrete-Chapter 11 Graphs Part IIIDiscrete-Chapter 11 Graphs Part III
Discrete-Chapter 11 Graphs Part III
 
(α ψ)- Construction with q- function for coupled fixed point
(α   ψ)-  Construction with q- function for coupled fixed point(α   ψ)-  Construction with q- function for coupled fixed point
(α ψ)- Construction with q- function for coupled fixed point
 
Module 1 linear functions
Module 1   linear functionsModule 1   linear functions
Module 1 linear functions
 
Slides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processingSlides: A glance at information-geometric signal processing
Slides: A glance at information-geometric signal processing
 
Introductory maths analysis chapter 13 official
Introductory maths analysis   chapter 13 officialIntroductory maths analysis   chapter 13 official
Introductory maths analysis chapter 13 official
 
Introductory maths analysis chapter 17 official
Introductory maths analysis   chapter 17 officialIntroductory maths analysis   chapter 17 official
Introductory maths analysis chapter 17 official
 
Graph theory
Graph theoryGraph theory
Graph theory
 
CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...
CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...
CLIM Fall 2017 Course: Statistics for Climate Research, Nonstationary Covaria...
 
Introductory maths analysis chapter 02 official
Introductory maths analysis   chapter 02 officialIntroductory maths analysis   chapter 02 official
Introductory maths analysis chapter 02 official
 
Introductory maths analysis chapter 11 official
Introductory maths analysis   chapter 11 officialIntroductory maths analysis   chapter 11 official
Introductory maths analysis chapter 11 official
 

Similar a Regression on Gaussian Symbolic Data

Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
 
Litv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfLitv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfAlexander Litvinenko
 
DETECTION OF MOVING OBJECT
DETECTION OF MOVING OBJECTDETECTION OF MOVING OBJECT
DETECTION OF MOVING OBJECTAM Publications
 
Cs229 notes9
Cs229 notes9Cs229 notes9
Cs229 notes9VuTran231
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfJunghyun Lee
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Umberto Picchini
 
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdfStatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdfAnna Carbone
 
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Mengxi Jiang
 
Formulation of model likelihood functions
Formulation of model likelihood functionsFormulation of model likelihood functions
Formulation of model likelihood functionsAndreas Scheidegger
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...Alexander Decker
 
The linear regression model: Theory and Application
The linear regression model: Theory and ApplicationThe linear regression model: Theory and Application
The linear regression model: Theory and ApplicationUniversity of Salerno
 
Machine learning (2)
Machine learning (2)Machine learning (2)
Machine learning (2)NYversity
 

Similar a Regression on Gaussian Symbolic Data (20)

Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...
 
Litv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdfLitv_Denmark_Weak_Supervised_Learning.pdf
Litv_Denmark_Weak_Supervised_Learning.pdf
 
Linearization
LinearizationLinearization
Linearization
 
DETECTION OF MOVING OBJECT
DETECTION OF MOVING OBJECTDETECTION OF MOVING OBJECT
DETECTION OF MOVING OBJECT
 
Cs229 notes9
Cs229 notes9Cs229 notes9
Cs229 notes9
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdf
 
Chapter1
Chapter1Chapter1
Chapter1
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...Bayesian inference for mixed-effects models driven by SDEs and other stochast...
Bayesian inference for mixed-effects models driven by SDEs and other stochast...
 
F0742328
F0742328F0742328
F0742328
 
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdfStatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
StatPhysPerspectives_AMALEA_Cetraro_AnnaCarbone.pdf
 
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
Application of Graphic LASSO in Portfolio Optimization_Yixuan Chen & Mengxi J...
 
Formulation of model likelihood functions
Formulation of model likelihood functionsFormulation of model likelihood functions
Formulation of model likelihood functions
 
An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...An investigation of inference of the generalized extreme value distribution b...
An investigation of inference of the generalized extreme value distribution b...
 
2018 MUMS Fall Course - Issue Arising in Several Working Groups: Probabilisti...
2018 MUMS Fall Course - Issue Arising in Several Working Groups: Probabilisti...2018 MUMS Fall Course - Issue Arising in Several Working Groups: Probabilisti...
2018 MUMS Fall Course - Issue Arising in Several Working Groups: Probabilisti...
 
The linear regression model: Theory and Application
The linear regression model: Theory and ApplicationThe linear regression model: Theory and Application
The linear regression model: Theory and Application
 
Machine learning (2)
Machine learning (2)Machine learning (2)
Machine learning (2)
 
D0111720
D0111720D0111720
D0111720
 
pattern recognition
pattern recognition pattern recognition
pattern recognition
 

Más de Axel de Romblay

[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation SystemsAxel de Romblay
 
Meetup "Big Data & Machine Learning" (French version)
Meetup "Big Data & Machine Learning" (French version)Meetup "Big Data & Machine Learning" (French version)
Meetup "Big Data & Machine Learning" (French version)Axel de Romblay
 
How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?Axel de Romblay
 
Automate Machine Learning Pipeline Using MLBox
Automate Machine Learning Pipeline Using MLBoxAutomate Machine Learning Pipeline Using MLBox
Automate Machine Learning Pipeline Using MLBoxAxel de Romblay
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsAxel de Romblay
 

Más de Axel de Romblay (8)

MLBox 0.8.2
MLBox 0.8.2 MLBox 0.8.2
MLBox 0.8.2
 
Meetup "Paris NLP"
Meetup "Paris NLP" Meetup "Paris NLP"
Meetup "Paris NLP"
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems
 
Meetup "Big Data & Machine Learning" (French version)
Meetup "Big Data & Machine Learning" (French version)Meetup "Big Data & Machine Learning" (French version)
Meetup "Big Data & Machine Learning" (French version)
 
How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?How to automate Machine Learning pipeline ?
How to automate Machine Learning pipeline ?
 
Automate Machine Learning Pipeline Using MLBox
Automate Machine Learning Pipeline Using MLBoxAutomate Machine Learning Pipeline Using MLBox
Automate Machine Learning Pipeline Using MLBox
 
MLBox
MLBoxMLBox
MLBox
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation Systems
 

Último

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 

Último (20)

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 

Regression on Gaussian Symbolic Data

  • 1. HAL Id: cel-01620878 https://hal.archives-ouvertes.fr/cel-01620878 Submitted on 24 Oct 2017 HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Symbolic Data Analysis: Regression on Gaussian symbols Axel Aronio de Romblay To cite this version: Axel Aronio de Romblay. Symbolic Data Analysis: Regression on Gaussian symbols. Master. France. 2015. cel-01620878
  • 2. SYMBOLIC DATA ANALYSIS: Regression on Gaussian symbols Axel Aronio de Romblay Telecom Paristech axel.aronio-de-romblay@telecom-paristech.fr December 2015 1 Abstract In classical data analysis, data are single values. This is the case if you consider a dataset of n patients which age and size you know. But what if you record the blood pressure or the weight of each patient during a day ? Then, for each patient, you do not have a single-valued data but a set of values since the blood pressure or the weight are not constant during the day. Suppose now that you do not want to record blood pressure a thousand times for each patient and to store it into a database because your memory space is limited. Therefore, you need to aggregate each set of values into symbols: intervals (lower and upper bounds only), box plots, histograms or even distributions (distribution law with mean and variance)... Thus, the issue is to adapt classical statistical tools to symbolic data analysis. More precisely, this article is aimed at proposing a method to fit a regression on Gaussian distributions. This paper is divided as follows: first, it presents the computation of the maximum likelihood estimator and then it compares the new approach with the usual least squares regression. KEY WORDS: Symbolic data, Big Data, Linear regression, Maximum likelihood, Gaussian symbols 2 Introduction Symbolic data were first introduced by Diday (1987). A realization of a symbolic-valued variable may take a finite or an infinite set of values in Rp . A random variable that takes a finite set of either quantitative or qualitative values is called a multi-valued variable. A variable that takes an infinite set of numerical values ranging from a low to a high value is called an interval-valued variable. A more complex symbolic-valued variable may have weights, probabilities, capacities, credibilities or even a distribution associated with its values. This class variable is called modal-valued. Classical data are a special case of symbolic data where the internal distribution puts the probability 1 on a single point value in Rp . Symbolic data analysis offers a solution to the massive data problems, especially when the entities of interest are groups or classes of observations. Recent advances in technology enable storage of extremely large databases. Larger datasets provide more information about the subjects of interest. However, performing even simple exploratory procedures on these datasets requires a lot of computing power. As a result, much research effort in recent years has been steered toward finding more efficient methods to accommodate large datasets. In many situations, characteristics of classes of observations are of higher interest to an analyst than those of individual observations. In these cases, individual observations can be aggregated into classes of observations. A series of papers, Diday (1995), Diday et al. (1996), Emilion (1997) and Diday and Emilion (1996, 1998), established mathematical probabilistic framework for classes of symbolic data. These results were extended to Galois lattices in the symbolic data context in Diday and Emilion (1997, 1998, 2003) and Brito and Polaillon (2005). Beyond those fundamental results, a few recent papers (2010) have presented likelihood functions for symbolic data but none of them came up with concrete results or closed form solutions. Thus, in this paper, we mainly concentrate on trying to have an expression of the symbolic likelihood that is valid and usable. Furthermore, we focus on Gaussian symbolic data which is very common. Moreover, some actual work refers to uniform distributions but struggle to get a workable solution. 1
  • 3. 3 General settings Let X a dataset of n observations and y the corresponding target: X =    x1 ... xn    , xi ∈ Rp , i = 1 . . . n y =    y1 ... yn    , yi ∈ R, i = 1 . . . n We suppose y = X β + , ∼ N(0, σ2 In) and we want to estimate θ = (β, σ2 ), β ∈ Rp and σ ∈ R 4 Gaussian symbols 4.1 Definitions Here, we consider the case where each observation xi is no more a point in Rp but a p dimensional normal distribution with known parameters µi and Σi : yi|xi ∼ N(xi β, σ2 ) xi ∼ N(µi, Σi) Furthermore, we suppose that both samples yi|xi and xi are iid. 4.2 Symbolic likelihood First, let us compute the density of (y,X) or (y, x1, . . . , xn) : p(y, X) = p(y|X)p(X) where each term is given by: p(X) = n i=1 (2π)− p 2 |Σi|− 1 2 exp − 1 2 (xi − µi) Σ−1 i (xi − µi) (1) p(y|X) = n i=1 (2π)− 1 2 (σ2 )− 1 2 exp − 1 2 (yi − xi β)2 σ2 (2) Thus, we can write the joint distribution as: p(y, X) = C.exp − 1 2 ||y − Xβ||2 σ2 + n i=1 xi|Σ−1 i xi (3) using the following notations: C = (2π)− n(p+1) 2 σ−n n i=1 |Σi|− 1 2 and xi = xi − µi Finally, we can write the log-likelihood: l(θ, µi, Σi, yi) = log(p(y)) = log x1 . . . xn p(y, X)dx1 . . . dxn (4) Indeed, from (4), we can easily notice that y is a normal distribution with parameters µy and Σy. E[y] = E[Xβ] = Mβ (5) with : M =    µ1 ... µn    2
  • 4. If we suppose that X and are independents: Σy = V ar(y) = V ar(Xβ) + V ar( ) = E[Xβ.(Xβ) ] − E[Xβ].E[Xβ] + σ2 In where: E[Xβ.(Xβ) ] = E[((xi β)(xj β))i,j] = E[(β (xixj )β)i,j] = (β (Cov(xi, xj) + µiµj )β)i,j = (β δi,jΣiβ)i,j + Mββ M As a conclusion: Σy = (β δi,jΣiβ)i,j + σ2 In =    β Σ1β + σ2 (0) ... (0) β Σnβ + σ2    (6) And the log likelihood is given by : l(θ, µi, Σi, yi) = − n 2 log(2π) − 1 2 n i=1 log(β Σiβ + σ2 ) − 1 2 (y − Mβ) Σ−1 y (y − Mβ) (7) 4.3 Maximum Likelihood Estimator Let us see if we can get a closed form for βMLE by computing the partial derivative of the log-likelihood with respect to β : We write: l1(β) = n i=1 log(β Σiβ + σ2 ) and l2(β) = (y − Mβ) Σ−1 y (y − Mβ) ∂l1 ∂β (β) = 2 n i=1 Σiβ β Σiβ + σ2 ∂l2 ∂β (β) = n i=1 ∂ ∂β (yi− µi|β )2 β Σiβ + σ2 = n i=1 − 2(yi− µi|β )2 Σiβ (β Σiβ + σ2)2 − 2(yi− µi|β )µi β Σiβ + σ2 Now, if we gather the two terms and factorize, we can get: ∂l ∂β (θ, µi, Σi, yi) = 0 ⇐⇒ n i=1 1 β Σiβ + σ2 yiµi − 1 − yi 2 β Σiβ + σ2 Σiβ = 0 (8) where: yi− µi|β = yi − yi = yi With the general settings (Σi = 0), we cannot get a closed form for βMLE. Besides, we do not know σ2 . Let us remind the expression of the partial derivative of the log likelihood with respect to β : ∂l ∂β (θ, µi, Σi, yi) = n i=1 1 β Σiβ + σ2 yiµi − 1 − yi 2 β Σiβ + σ2 Σiβ (9) 3
  • 5. Now we compute the partial derivative of the log likelihood with respect to σ2 : ∂l1 ∂σ2 (σ2 ) = n i=1 1 β Σiβ + σ2 ∂l2 ∂σ2 (σ2 ) = − n i=1 yi β Σiβ + σ2 2 Thus: ∂l ∂σ2 (θ, µi, Σi, yi) = 1 2 n i=1 1 β Σiβ + σ2 yi 2 β Σiβ + σ2 − 1 (10) To conclude, we have: l(θ) = n i=1 l(i) (θ) (11) with: l(i) (θ) =     1 β Σiβ+σ2 yiµi − 1 − yi 2 β Σiβ+σ2 Σiβ 1 2 1 β Σiβ+σ2 yi 2 β Σiβ+σ2 − 1     (12) 4.4 Comments • Finally, we do have an explicit formula for the gradient of the log likelihood, which is a sum of n vector in a p+1 dimensional space. Thus, with (11) or (12), we can run a full-batch or a stochastic gradient descent algorithm to get an approximation of θMLE. • We also notice that we can easily recover the ’usual’ least square regression case where Σi → 0 and xi ↔ µi : n i=1 1 σ2 yiµi = 0 ⇐⇒ (y − Xβ) X = 0 (13) 1 2σ2 n i=1 yi 2 σ2 − 1 = 0 ⇐⇒ − n 2σ2 + 1 2σ4 (y − Xβ) (y − Xβ) = 0 (14) • We supposed that there is no intercept. Indeed, we can get rid of it easily if we first apply the following transformation: xi ∼ N µi 1 , Σi ... . . . 0 and β ∈ Rp+1 • After estimating β, we can deduce an unbiased prediction: yj = µj β • We expect σ2 → +∞ when Σi → +∞. Indeed, if we plot the log-likelihood as a function of σ2 for different values of Σi (we suppose that the Σi are equals for all i: see more details in Appendix), we get this result: 4
  • 6. Figure 1: log-likelihoods for different Σi • From (6), we can see that the variance of yi depends on Σi but especially on β which makes sense. Indeed, if p is large and each coordinate is large, the locations of the yi are not accurate. This can lead to an issue when fitting the regression since we can have overlapping symbols... 4.5 Results In this section, we simulate different datasets and we compare the maximum likelihood estimation with the naive solution of fitting a regression on (µi, yi). We remind the settings: we get (µi, Σi, yi) and we want to estimate β (and σ2 ). Let us note m, Σ, s1 and s2 the following parameters: • M = uniform(0,m,(n,p)) : each coordinate of M (see definition in section 4.2) is generated uniformly between 0 and m. • Σ = [Σi for i in range(n)] = [np.diag(uniform(s1,s2,p)) for i in range(n)] : we suppose here that within a symbol the p features are independents and we generate a diagonal matrix of each Σi where the diagonal coefficients are uniformly taken between [s1, s2]. 4.5.1 Dataset 1 (reference dataset): n=100, p=5, m=100, [s1, s2]=[0.1, 1], σ2 =0.5, β ∈[-2,2] Figure 2: Convergence to the ML estimator Figure 3: MSE for each method 5
  • 7. 4.5.2 Dataset 2: n=100, p=5, m=100, [s1, s2]=[0.1, 1], σ2 =0.5, β ∈[-0.1,0.1] Figure 4: Convergence to the ML estimator Figure 5: MSE for each method 4.5.3 Dataset 3: n=100, p=5, m=100, [s1, s2]=[0.1, 1], σ2 =5, β ∈[-2,2] Figure 6: Convergence to the ML estimator Figure 7: MSE for each method 4.5.4 Dataset 4: n=1000, p=5, m=100, [s1, s2]=[0.1, 1], σ2 =0.5, β ∈[-2,2] Figure 8: Convergence to the ML estimator Figure 9: MSE for each method 6
  • 8. 4.5.5 Dataset 5: n=100, p=5, m=100, [s1, s2]=[1, 5], σ2 =0.5, β ∈[-2,2] Figure 10: Convergence to the ML estimator Figure 11: MSE for each method 4.5.6 Dataset 6: n=100, p=20, m=100, [s1, s2]=[0.1, 1], σ2 =0.5, β ∈[-2,2] Figure 12: Convergence to the ML estimator Figure 13: MSE for each method 5 Conclusion From these results, we can see that the Maximum Likelihood Estimator is always more accurate than fitting a naive least squares regression on the (µi, yi). Nevertheless, the computation time is much longer with a simple gradient descent even if you get an ’appropriate’ learning rate and you start with a ’good’ initialization. To face this issue, it is highly recommanded to start with β0 = (M M)−1 M y. Thus, it will take just a few seconds to get an ’accurate’ approximation of βMLE. Besides, from the graphics, we can notice that: • When the quantity β Σiβ + σ2 is ’large’ (cf Figure 5, Figure 7 and Figure 11), the MLE is highly more accurate than the naive estimator. Indeed, the naive estimator does not take into account the internal variation inside a symbol and thus it is very biased. But in this case, we need to run more iterations. • When n is ’large’ (cf Figure 9): we need to run less iterations but each iteration takes more time. The more you get symbols, the more accurate are the two estimators. Here, it is preferable to run a stochastic gradient descent algorithm: it will be definitely faster but a bit less accurate (unless you run some more iterations). • When p is ’large’ (cf Figure 13): the difference between both estimators is negligible. The MLE approximation needs a lot of iterations and subsampling is not a good solution to speed up the convergence. 7
  • 9. 6 Acknowledgment I would like to thank my supervisor Scott Sisson who guided me through my research exchange and who helped me. I also would like to thank all the administration team at UNSW for their welcome and their useful help to set up my office. 7 References [1] Billard, L. and Diday, E. (2000). Regression Analysis for Interval-Valued Data. Data analysis, Classification, and Related Methods (eds. H.A.L. Kiers, J.-P. Rassoon, P.J.F. Groenen, and M. Schader). Springer-Verlag, Berlin, 369-374. [2] Billard, L. and Diday, E. (2007). Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley, Chichester.6 [3] de Carvalho F.A.T., Lima Neto, E.A. and Tenorio, C.P. (2004). A New Method to Fit a Linear Regression Model for Interval-valued Data. Lecture Notes in Computer Science, KI2004 Advances in Artifical Inteligence. Springer-Verlag, 295-306. [4] Le-Rademacher, J. and Billard, L. (2010). Likelihood Functions and Some Maximum Likelihood Estimators for Symbolic Data. Journal of Statistical Planning and Inference, submitted.7 [5] Lima Neto, E.A. and de Carvalho F.A.T. (2010). Constrained Linear Regression Models for Symbolic Interval-valued Variables. Computational Statistics Data Analysis, 54(2), 333-347. 8 Appendix (*) Figure 1 has been plotted with these parameters: n=500, p=5, m=10, s1=s2, σ2 =0.5, β ∈[-2,2] 8