J. Suzuki. ``Bayesian network structure estimation based on the Bayesian/MDL criteria when both discrete and continuous variables are present". IEEE Data Compression Conference, pp. 307-316, Snowbird, Utah, April 2012.
FAIRSpectra - Enabling the FAIRification of Analytical Science
Bayesian network structure estimation based on the Bayesian/MDL criteria when both discrete and continuous variables are present
1. .
......
Bayesian Network Structure Estimation
Based on the Bayesian/MDL Criteria
When Both Discrete and Continuous Variables are Present
Joe Suzuki
Osaka University
April 11, 2012
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 1 / 17
2. Road Map
...1 Problem
...2 Density Estimation
...3 Density Estimation in a General Sense
...4 Structure Estimation in a General Sense
...5 Summary
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 2 / 17
4. Y Y Y YZ Z Z Z
X X X X
¡
¡
¡
¡
E
¡
¡
e
e…
¡
¡
e
e…
E
5. Y Y Y YZ Z Z Z
X X X X
E
e
e…
e
e…
E
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 3 / 17
6. Problem
Structure Estimation
X, Y , Z: random variables over sets A, B, C
{(xi , yi , zi )}n
i=1 ∈ (A × B × C)n:
n examples independently emitted by P(X, Y , Z)
.
Structure Estimation
..
......Choose one among the eight structures based on {(xi , yi , zi )}n
i=1
(The three variable case X, Y , Z can be extended to the d variable case
{Xj }d
j=1 in a straightforward manner. )
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 4 / 17
7. Problem
Previous Works
Previous approaches assume either
all of Xj are finite, or
all of Xj are Gaussian.
.
In reality,
..
......in any database, some fields are discrete, and other fields continuous.
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 5 / 17
8. Problem
If A, B, C are finite
Given xn ∈ An, yn ∈ Bn, zn ∈ Cn, we compute
Qn
(xn
), Qn
(yn
), Qn
(zn
), Q(xn
, yn
), Qn
(xn
, zn
), Qn
(yn
, zn
), Qn
(xn
, yn
, zn
)
For some prior probabilities p0, p1, p00, p01, p10, p11,
what Y depends on is based on which is larger between
p0Q(xn
), p1
Qn(xn, yn)
Qn(xn)
and what Z depends on is based on which is the largest among
p00Qn
(zn
), p01
Qn(yn, zn)
Q(yn)
, p10
Qn(xn, zn)
Q(xn)
, p11
Qn(xn, yn, zn)
Qn(xn, yn)
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 6 / 17
9. Problem
Universal Coding
A := {0, 1, · · · , m − 1} with m ≥ 2
xn = (x1, · · · , xn) ∈ An: independently emitted by unknown
Pn
(xn
) :=
n∏
i=1
P(xi )
φ: uniquely decodable coding An → {0, 1}∗
φ(xn
) ∈ {0, 1}m
=⇒ Lφ(xn
) := m
.
φ: universal
..
......
Lφ(xn)
n
→ H :=
∑
x∈A
−P(x) log P(x)
for any P, such as LZ, CTW
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 7 / 17
10. Problem
Why can Pn
be replaced by Qn
?
.
Qn: a universal coding measure w.r.t. A
..
......
−
1
n
log Qn
(xn
) → H for any P
∑
xn∈An
Qn
(xn
) ≤ 1
such as Qn
(xn
) := 2−Lφ(xn)
if φ is universal
Shannon-McMillan-Breiman: for any P,
−
1
n
log Pn
(xn
) =
1
n
n∑
i=1
{− log P(xi )} → E[− log P(X)] = H
.
Universality
..
......
1
n
log
Pn(xn)
Qn(xn)
→ 0
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 8 / 17
11. Problem
Today’s Problem: What if A, B, C are not finite?
X ∈ A = [0, 1) Continuous
Y ∈ B = {1, 2, · · · } Discrete and Infinite
Z ∈ C = [0, 1) ∪ {1, 2, · · · } neither Continuous nor Discrete
Without assuming that A, B, C are either discrete or continuous,
What is universality like
1
n
log
Pn(xn)
Qn(xn)
→ 0 ?
What is a universal measure like Qn?
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 9 / 17
12. Density Estimation
If Density Function f exists for X
A0 := {A}
Ak+1 is a refinment of Ak
Example 1: A = [0, 1)
A0 = {[0, 1)}
A1 = {[0, 1/2), [1/2, 1)}
A2 = {[0, 1/4), [1/4, 1/2), [1/2, 3/4), [3/4, 1)}
. . .
Ak = {[0, 2−(k−1)), [2−(k−1), 2 · 2−(k−1)), · · · , [(2k−1 − 1)2−(k−1), 1)}
. . .
sk : A → Ak (quantizer over A)
sn
k : An → An
k (quantizer over An)
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 10 / 17
13. Density Estimation
Qn
k : a universal coding measure w.r.t. Ak
λn: Lebesgue measure (width of an interval), λn
(sn
k (xn
)) =
n∏
i=1
λ(sk(xi ))
gn
k (xn
) :=
Qn
k (sn
k (xn))
λn(sn
k (xn))
{ωk}∞
k=1:
∑
k ωk = 1, ωk 0 , gn(xn) :=
∑
k ωkgn
k (xn)
f n
k (xn
) :=
Pn
k (sn
k (xn))
λn(sn
k (xn))
=
n∏
i=1
Pk(sk(xi ))
λ(sk(xi ))
If {Ak} is s.t. h(fk) → h(f ) (k → ∞), for any f n,
1
n
log
f n(xn)
gn(xn)
→ 0
(B. Ryabko, 2009)
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 11 / 17
14. Density Estimation in a General Sense
Exactly when does a density function f exist given X?
B: the Borel set of R
µ(D): the probability of D ∈ B
λ(D): the Lebesgues measure of D ∈ B
.
µ is Absolutely Continuous w.r.t. λ
..
......
Equivalent Conditions (Radon-Nykodim):
µ ≪ λ: for each D ∈ B, λ(D) = 0 =⇒ µ(D) = 0.
There exists
dµ
dλ
:= f s.t. µ(D) =
∫
t∈D
f (t)dλ(t).
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 12 / 17
15. Density Estimation in a General Sense
Density Estimation in a General Sense (Suzuki 2011)
.
µ is Absolutely Continuous w.r.t. η
..
......
Equivalent Conditions (Radon-Nykodim):
µ ≪ η: for each D ∈ B, η(D) = 0 =⇒ µ(D) = 0
There exists
dµ
dη
:= f s.t. µ(D) =
∫
t∈D
f (t)dη(t)
Example 2: µ({j}) 0, η({j}) :=
1
j(j + 1)
, j ∈ B = {1, 2, · · · }
=⇒ µ ≪ η ⇐⇒ there exists f s.t. µ(D) =
∑
j∈D
f (j)η({j}) , D ⊆ B
In fact, f (j) =
µ({j})
η({j})
satisfies the condition.
(The Lebesgues
∫
does not distinguish discrete Σ and continuous
∫
.)
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 13 / 17
16. Density Estimation in a General Sense
B0 := {B} with B = {1, 2, · · · }
B1 := {{1}, {2, 3, · · · }}
B2 := {{1}, {2}, {3, 4, · · · }}
. . .
Bk := {{1}, {2}, · · · , {k}, {k + 1, k + 2, · · · }}
. . .
sk : B → Bk, sn
k : Bn → Bn
k
gn
k (yn
) :=
Qn
k (sn
k (yn))
ηn(sn
k (yn))
, gn
(yn
) :=
∞∑
k=1
ωkgn
k (yn
)
If {Bk} is s.t. h(fk) → h(f ) (k → ∞), for any f n,
1
n
log
f n(yn)
gn(yn)
→ 0
(gn(yn)
∏n
i=1 ηn({yi }) estimates P(yn) = f n(yn)
∏n
i=1 ηn({yi }).)
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 14 / 17
17. Density Estimation in a General Sense
Estimation of Simultaneous Density Functions
Example 3: A × B (based on Examples 1,2 for A, B)
µ ≪ λ and µ ≪ η
A0 × B0 = {A} × {B} = [0, 1) × {1, 2, · · · }
A1 × B1
A2 × B2
. . .
Ak × Bk
. . .
sk : A × B → Ak × Bk
If {Ak × Bk} is s.t. h(fk) → h(f ) (k → ∞), for any f n, gn can be
constructed so that
1
n
log
f n(xn, yn)
gn(xn, yn)
→ 0 (1)
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 15 / 17
18. Structure Estimation in a General Sense
Structure Estimation in a General Sense
Estimate the generalized density functions:
f n
X (xn
), f n
Y (yn
), f n
Z (zn
)
f n
XY (xn
, yn
), f n
XZ (xn
, zn
), f n
YZ (yn
, zn
), f n
XYZ (xn
, yn
, zn
)
by
gn
X (xn
), gn
Y (yn
), gn
Z (zn
)
gn
XY (xn
, yn
), gn
XZ (xn
, zn
), gn
YZ (yn
, zn
), gn
XYZ (xn
, yn
, zn
)
so that we can compare
p0gn
Y (yn
), p1
gXY (xn, yn)
gn
X (xn)
p00gn
Z (zn
), p01
gn
YZ (yn, zn)
gn
Y (yn)
, p10
gn
XZ (xn, zn)
gn
XY (xn)
, p11
gn
XYZ (xn, yn, zn)
gn
XY (xn, yn)
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 16 / 17
19. Summary
Summary
.
Universal measure without assuming either discrete or continuous
..
......
1
n
log
f n(xn)
gn(xn)
→ 0
f n
(xn
) =
dµn
dηn
(xn
), gn
(xn
) =
dνn
dηn
(xn
): extended density functions
.
Many applications based on the same approach
..
......
Estimation of Markov orders (discrete times and continuous values)
Estimation of mutual information and its application to Chow-Liu
.
Future Works
..
......
Realistic settings of {Ak}, {ωk} based on the a prior informaation
Development of structure estimation modules
Joe Suzuki (Osaka University) Bayesian Network Structure Estimation Based on the Bayesian/MDL Criteria When Both DApril 11, 2012 17 / 17