2. Introduction
This work describes the empirical Bayesian
approach applied in the estimation of multi –
dimensional frequency. It also introduces the
Monte-Carlo Markov Chain (MCMC) procedure,
which is designed for Bayesian computation.
Modeling of the discrete variable - the number
of occurrences of rare, used statistical models: a
normal distribution with unknown parameters mean and variance, and Poisson distribution.
COMPUTER DAYS – 2013
Šiauliai
3. Introduction
Let us consider a set
1 , 2 , , K
of K populations, where each population j
consists of N j individuals j 1, K .
Assume that some event (e.g., death due to
some disease, insured event) can occur in the
populations under observation.
COMPUTER DAYS – 2013
Šiauliai
4. The aim
Our aim is to estimate unknown probabilities of
events Pjm ,
Y jm of events in populations
when the numbers
are observed j 1, K ; m 1, M .
Y jm
Since a simple estimate of relative risk N j
cannot be used in many cases due to great
differences in the population size N j ,
the empirical Bayesian approach is applied.
COMPUTER DAYS – 2013
Šiauliai
5. Poisson-Gaussian model
An assumption is often justified that the
numbers of cases Y jm follow to the Poisson
m
N j Pjm
distribution with the parameters j
and its density is as follows:
m
m
j
f Y ,
m
j
e
m
j
m Yj
j
m
j
Y
!
COMPUTER DAYS – 2013
Šiauliai
j 1, , K .
6. Poisson-Gaussian model
The empirical Bayesian method is a two stage
procedure, depending on the prior distribution
introduced in the second stage. It is of interest
to consider a model in which the logits
P
ln
1 P
are normally distributed with the parameters , .
COMPUTER DAYS – 2013
Šiauliai
7. Poisson-Gaussian model
Thus the density of logit is
, ,
1
2
g
T
exp
M
2
Pjm are evaluated as a posteriori
Then the rates
means for given ,
m
j
P
where
1
1 e
m
m
j
f Y ,
m 1
Dj
M
Dj
m
j
,
f Y ,
m 1
Nj
M
Nj
1 e
m
1 e
m
g
, , d
,
,
g
COMPUTER DAYS – 2013
Šiauliai
, , d ,
j 1, K , m 1, M .
8. Maximum likelihood method
The Bayesian analysis is often related in statistics to the
minimization of a certain function, expressed as the
integral of a posteriori density. Thus, in the empirical
Bayesian approach, the unknown parameters
are
,
estimated by the maximum likelihood method.
We get the logarithmic likelihood function after some
manipulation such as
M
K
L ,
m
j
ln
j 1
f Y ,
m 1
Nj
1 e
K
g
, , d
m
ln D j
,
,
j 1
which have to be minimized to get estimates for the
parameters.
COMPUTER DAYS – 2013
Šiauliai
9. Derivatives of the maximum likelihood
function
Likelihood function is differentiable many times
with respect to the parameters ,
and the respective first derivatives of this
function are as follows:
M
1
L ,
m
j
f Y ,
K
m 1
Dj
j 1
1
L ,
1
Nj
1 e
,
g
, , d
,
M
T
1
K
j 1
m
f Y jm ,
m 1
Dj
COMPUTER DAYS – 2013
Šiauliai
,
Nj
1 e
m
g
, , d
.
10. Poisson-Gaussian model estimates
The maximum likelihood estimates of
parameters , of Poisson-Gaussian model are
found by solving equations, where the first
derivatives must be equal to zero:
Nj
M
1
K
K
f Y jm ,
m 1
D
j 1
T
1
K
K
j 1
1 e
m
j ,k
m
, , d
,
,
M
f Y jmk ,
,
m 1
D m, k
j
g
,
COMPUTER DAYS – 2013
Šiauliai
Nj
1 e
g
, , d
.
11. Poisson-Gaussian model estimates
For instance, the “fixed point iteration” method
is useful to solve these equations in order to get
the maximum likelihood estimates of , :
1
K
t 1
f Yj ,
K
j 1
Nj
1 e
Dj t ,
T
t 1
1
K
K
j 1
t
t
f Yj ,
Dj
g
,
,
t
d
,
t
Nj
1 e
t, t
COMPUTER DAYS – 2013
Šiauliai
t
g
,
t
,
t
d
.
12. MCMC algorithm
The “fixed point iteration” method we can to
realize by Monte-Carlo Markov chain approach.
Let be generated t chains and in each chain we
generate a multivariate Gaussian vector
j ,k
~ N( t ,
t
), k 1,, N t .
t
N is the Monte – Carlo sample size at the t
step.
COMPUTER DAYS – 2013
Šiauliai
th
13. MCMC algorithm
In order to avoid computational problems, when
the intermediate results are very small, we have
introduced the auxiliary function
M
rj
m
j
ln
f j (Y ,
m 1
Nj
Nj
M
1 e
m
m
j
)/
f j (Y ,
m 1
1 e
m
or
M
rj
m 1
Mj e
1 e
m
m
e
1 e
m
m
COMPUTER DAYS – 2013
Šiauliai
Y
m
j
1 e
ln
1 e
m
m
.
) ,
14. MCMC algorithm
And then we get estimates of parameters
t 1
1
K
K
j
~
m tj
~t ,
1 Dj
1
K
t 1
K
j
~t
Sj
~t ,
1 Dj
where the Monte-Carlo estimators are as follows
~t
Dj
Nt
rj (
j ,k
~
D2tj
),
k 1
~
m tj
j ,k
r(
j ,k ),
k 1
p
t
j ,m
k 1
rj (
j ,k
k 1
Nt
Nt
Nt
~
S jt
Nt
j ,k
k 1
r(
1 e
j ,k
)
j ,k ,m
.
COMPUTER DAYS – 2013
Šiauliai
~
mtj
)
~
D tj
Nt
j ,k
2
,
~
mtj
T
r(
j ,k
),
15. MCMC algorithm
Next, the estimate of the log-likelihood function is
obtained using the Monte-Carlo estimate:
K
~
ln D tj ,
t
L
j 1
its sample variance estimate:
K
dt
j 1
~
D 2 tj N t
~ 2
D tj
1,
population of events probabilities estimate:
~t
Pj ,m
p tj ,m
~t .
Dj
COMPUTER DAYS – 2013
Šiauliai
16. MCMC algorithm
The Monte-Carlo chain can be terminated at the
t th step, if difference between estimates of
two current steps differs insignificantly. Thus,
the hypothesis on the termination condition is
rejected, if
K
Ht
1
K
K
j 1
k 1
~
D 2tj
~ 2
D tj
ln
k
SP
k 1
k
1
k 1
COMPUTER DAYS – 2013
Šiauliai
k T
k
1
k 1
k
M
F
,v
17. MCMC algorithm
The next rule of sample size regulation is
implemented; in order large samples would be
taken only at the moment of making the
decision on termination of the Monte-Carlo
Markov chain:
t
N
F
,v
t 1
N v
F
t
H
,v
- Fisher’s quantile,
- is the significance level.
COMPUTER DAYS – 2013
Šiauliai
18. MCMC algorithm
Application of this rule allows to rational select
of samples size in Monte-Carlo Markov chain to
ensure the convergence of the maximum
likelihood function.
COMPUTER DAYS – 2013
Šiauliai
19. Computer simulation
Next, we used familiar data to construct and
estimate this statistical model.
The random sample
of K 10
1 , 2 , , K
populations has been simulated to explore the
approach developed, in which can occur M 3
events. The logits of probabilities are normally
distributed with these parameters
3
0,25
0
0
4 ;
0
0,25
0
5
0
0
0,25
COMPUTER DAYS – 2013
Šiauliai
.
20. Computer simulation
Next, we have computed the Monte-Carlo
Markov chain of t 100 estimators. To avoid
very small or very large sample sizes, the
following limits were applied
500 N k
17000.
The termination conditions started to be valid
after t 6 iterations.
And we have got these means of parameters:
COMPUTER DAYS – 2013
Šiauliai
22. Conclusions
The empirical Bayesian approach applied in the estimation of
multi-dimensional frequency has been described in this work.
In this paper we:
• presented an iterative method of “fixed point iteration” to
compute the estimates;
• introduced the Monte-Carlo Markov Chain procedure with
adaptive regulation sample size and treatment of the
simulation error in the statistical manner;
• computed the empirical Bayesian estimation of unknown
parameters and probabilities of the events.
The approach developed can be applied in the analysis of
social and medical data.
COMPUTER DAYS – 2013
Šiauliai