Vaičiulytė, Ingrida ; Sakalauskas, Leonidas „Daugiamatis retų įvykių tikimybių vertinimo algoritmas“ (VU MII)

MULTIDIMENSIONAL RARE
EVENT PROBABILITY
ESTIMATION ALGORITHM
Ingrida Vaičiulytė
Vilnius University
Mathematics and Informatics Institute

COMPUTER DAYS – 2013
Šiauliai

Introduction
This work describes the empirical Bayesian
approach applied in the estimation of multi –
dimensional frequency. It also introduces the
Monte-Carlo Markov Chain (MCMC) procedure,
which is designed for Bayesian computation.
Modeling of the discrete variable - the number
of occurrences of rare, used statistical models: a
normal distribution with unknown parameters mean and variance, and Poisson distribution.
Šiauliai

Introduction
Let us consider a set
1 , 2 , , K
of K populations, where each population j
consists of N j individuals j 1, K .
Assume that some event (e.g., death due to
some disease, insured event) can occur in the
populations under observation.

Šiauliai

The aim
Our aim is to estimate unknown probabilities of
events Pjm ,
Y jm of events in populations
when the numbers
are observed j 1, K ; m 1, M .
Y jm
Since a simple estimate of relative risk N j
cannot be used in many cases due to great
differences in the population size N j ,
the empirical Bayesian approach is applied.
Šiauliai

Poisson-Gaussian model
An assumption is often justified that the
numbers of cases Y jm follow to the Poisson
m
N j Pjm
distribution with the parameters j
and its density is as follows:
m

m
j

f Y ,

m
j

e

m
j

m Yj
j
m
j

Y

!

Šiauliai

j 1, , K .

The empirical Bayesian method is a two stage
procedure, depending on the prior distribution
introduced in the second stage. It is of interest
to consider a model in which the logits
P
ln
1 P

are normally distributed with the parameters , .

Šiauliai

Thus the density of logit is
, ,

1

2

g

T

exp

M
2

Pjm are evaluated as a posteriori
Then the rates
means for given ,
m
j

P

where

1
1 e

m

m
j

f Y ,
m 1

Dj
M

Dj

m
j

,

f Y ,
m 1

Nj

M

Nj
1 e

m

1 e

m

g

, , d
,

,

g

Šiauliai

, , d ,

j 1, K , m 1, M .

Maximum likelihood method
The Bayesian analysis is often related in statistics to the
minimization of a certain function, expressed as the
integral of a posteriori density. Thus, in the empirical
Bayesian approach, the unknown parameters
are
,
estimated by the maximum likelihood method.
We get the logarithmic likelihood function after some
manipulation such as
M

K

L ,

m
j

ln
j 1

f Y ,
m 1

Nj
1 e

K

g

, , d

m

ln D j

,

,

j 1

which have to be minimized to get estimates for the
parameters.
Šiauliai

Derivatives of the maximum likelihood
function
Likelihood function is differentiable many times
with respect to the parameters ,
and the respective first derivatives of this
function are as follows:
M
1

L ,

m
j

f Y ,

K

m 1

Dj

j 1

1

L ,

1

Nj
1 e
,

g

, , d
,

M

T

1

K
j 1

m

f Y jm ,
m 1

Dj

Šiauliai

,

Nj
1 e

m

g

, , d
.

Poisson-Gaussian model estimates
The maximum likelihood estimates of
parameters , of Poisson-Gaussian model are
found by solving equations, where the first
derivatives must be equal to zero:
Nj

M

1
K

K

f Y jm ,
m 1

D

j 1

T

1
K

K
j 1

1 e
m
j ,k

m

, , d
,

,

M

f Y jmk ,
,
m 1

D m, k
j

g

,

Šiauliai

Nj
1 e

g

, , d
.

Poisson-Gaussian model estimates
For instance, the “fixed point iteration” method
is useful to solve these equations in order to get
the maximum likelihood estimates of , :
1
K

t 1

f Yj ,

K
j 1

Nj
1 e
Dj t ,
T

t 1

1
K

K
j 1

t

t

f Yj ,
Dj

g

,

,

t

d
,

t

Nj

1 e
t, t

Šiauliai

t

g

,

t

,

t

d
.

MCMC algorithm
The “fixed point iteration” method we can to
realize by Monte-Carlo Markov chain approach.
Let be generated t chains and in each chain we
generate a multivariate Gaussian vector
j ,k

~ N( t ,

t

), k 1,, N t .

t

N is the Monte – Carlo sample size at the t

step.
Šiauliai

th

MCMC algorithm
In order to avoid computational problems, when
the intermediate results are very small, we have
introduced the auxiliary function
M

rj

m
j

ln

f j (Y ,
m 1

Nj

Nj

M

1 e

m

m
j

)/

f j (Y ,
m 1

1 e

m

or
M

rj
m 1

Mj e
1 e

m

m

e
1 e

m

m

Šiauliai

Y

m
j

1 e
ln
1 e

m
m

.

) ,

MCMC algorithm
And then we get estimates of parameters
t 1

1
K

K
j

~
m tj
~t ,
1 Dj

1
K

t 1

K
j

~t
Sj
~t ,
1 Dj

where the Monte-Carlo estimators are as follows
~t
Dj

Nt

rj (

j ,k

~
D2tj

),

k 1

~
m tj

j ,k

r(

j ,k ),

k 1

p

t
j ,m
k 1

rj (

j ,k

k 1

Nt

Nt

Nt

~
S jt

Nt
j ,k
k 1

r(
1 e

j ,k

)

j ,k ,m

.
Šiauliai

~
mtj

)

~
D tj
Nt

j ,k

2

,

~
mtj

T

r(

j ,k

),

MCMC algorithm
Next, the estimate of the log-likelihood function is
obtained using the Monte-Carlo estimate:
K

~
ln D tj ,

t

L

j 1

its sample variance estimate:
K

dt
j 1

~
D 2 tj N t
~ 2
D tj

1,

population of events probabilities estimate:
~t
Pj ,m

p tj ,m
~t .
Dj

Šiauliai

MCMC algorithm
The Monte-Carlo chain can be terminated at the
t th step, if difference between estimates of
two current steps differs insignificantly. Thus,
the hypothesis on the termination condition is
rejected, if
K

Ht
1
K

K
j 1

k 1

~
D 2tj
~ 2
D tj

ln

k

SP

k 1

k

1

k 1

Šiauliai

k T

k

1

k 1

k

M

F

,v

MCMC algorithm
The next rule of sample size regulation is
implemented; in order large samples would be
taken only at the moment of making the
decision on termination of the Monte-Carlo
Markov chain:
t
N

F

,v

t 1

N v
F
t
H

,v

- Fisher’s quantile,
- is the significance level.
Šiauliai

MCMC algorithm
Application of this rule allows to rational select
of samples size in Monte-Carlo Markov chain to
ensure the convergence of the maximum
likelihood function.

Šiauliai

Computer simulation
Next, we used familiar data to construct and
estimate this statistical model.
The random sample
of K 10
1 , 2 , , K
populations has been simulated to explore the
approach developed, in which can occur M 3
events. The logits of probabilities are normally
distributed with these parameters
3

0,25

0

0

4 ;

0

0,25

0

5

0

0

0,25

Šiauliai

.

Computer simulation
Next, we have computed the Monte-Carlo
Markov chain of t 100 estimators. To avoid
very small or very large sample sizes, the
following limits were applied
500 N k

17000.

The termination conditions started to be valid
after t 6 iterations.
And we have got these means of parameters:
Šiauliai

Estimates of parameters
Iteration

µ1

µ2

µ3

Loglikelihood
function

1

-2,96

-4,29

-5,52

-62,90

5,57

500

9,55

2

-2,89

-4,04

-5,27

-396,58

4,81

500

6,18

3

-2,91

-4,03

-5,19

-420,42

2,97

500

3,86

4

-2,90

-4,04

-5,16

-424,87

3,2

500

0,35

5

-2,91

-4,04

-5,13

-428,05

1,57

2 963

1,41

6

-2,90

-4,04

-5,14

-427,57

1,32

4 383

0,32

7

-2,91

-4,04

-5,13

-425,54

0,75

13 986

0,40

8

-2,91

-4,04

-5,14

-425,33

0,75

14 345

0,40

9

-2,91

-4,04

-5,13

-425,71

0,75

13 525

0,84

10

-2,91

-4,04

-5,13

-426,47

0,75

15 135

0,22

Confidence
interval

Sample
size

Statistical
hypothesis

Šiauliai

Conclusions
The empirical Bayesian approach applied in the estimation of
multi-dimensional frequency has been described in this work.
In this paper we:
• presented an iterative method of “fixed point iteration” to
compute the estimates;
• introduced the Monte-Carlo Markov Chain procedure with
adaptive regulation sample size and treatment of the
simulation error in the statistical manner;
• computed the empirical Bayesian estimation of unknown
parameters and probabilities of the events.
The approach developed can be applied in the analysis of
social and medical data.
Šiauliai

Šiauliai

Vaičiulytė, Ingrida ; Sakalauskas, Leonidas „Daugiamatis retų įvykių tikimybių vertinimo algoritmas“ (VU MII)

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (7)

Similar a Vaičiulytė, Ingrida ; Sakalauskas, Leonidas „Daugiamatis retų įvykių tikimybių vertinimo algoritmas“ (VU MII)

Similar a Vaičiulytė, Ingrida ; Sakalauskas, Leonidas „Daugiamatis retų įvykių tikimybių vertinimo algoritmas“ (VU MII) (20)

Más de Lietuvos kompiuterininkų sąjunga

Más de Lietuvos kompiuterininkų sąjunga (20)

Último

Último (20)

Vaičiulytė, Ingrida ; Sakalauskas, Leonidas „Daugiamatis retų įvykių tikimybių vertinimo algoritmas“ (VU MII)