(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
Expectation propagation for latent Dirichlet allocation
1. Expectation-Propagation for Latent Dirichlet Allocation
Tomonari MASADA @ Nagasaki University
November 22, 2018
1
This manuscript contains a derivation of the update formulae of expectation-propagation (EP) presented
in the following paper:
Thomas Minka and John Lafferty.
Expectation-Propagation for the Generative Aspect Model.
in Proc. of the 18th Conference on Uncertainty in Artificial Intelligence, pp. 352-359, 2002.1
2
A joint distribution for document d in LDA can be written as below.
p(x, θd|α) = p(θd|α)p(x|θd)
= p(θd|α)
nd
i=1
p(xdi|θd) = p(θd|α)
nd
i=1 k
θdkφkxdi
= p(θd|α)
w k
θdkφkw
ndw
, (1)
where p(θd|α) is a Dirichlet prior distribution.
We approximate p(w|θd) = k θdkφkw by tw(θd) = sw k θβwk
dk .
Then we obtain an approximated joint distribution as follows:
p(x, θd|α) ≈ p(θd|α)
w
tw(θd)ndw
= p(θd|α)
w
sw
k
θβwk
dk
ndw
=
Γ( k αk)
k Γ(αk) w
sndw
w
k
θ
αk−1+ w βwkndw
dk (2)
Let γdk = αk + w βwkndw. That is,
p(xd, θd|α) ≈
Γ( k αk)
k Γ(αk) w
sndw
w
k
θγdk−1
dk . (3)
An approximated posterior q(θd) can be obtained as follows:
q(θd) =
Γ( k αk)
k Γ(αk) w sndw
w k θγdk−1
dk
Γ( k αk)
k Γ(αk) w sndw
w k θγdk−1
dk dθd
= k θγdk−1
dk
k θγdk−1
dk dθd
=
Γ( k γdk)
k Γ(γdk)
k
θγdk−1
dk . (4)
1http://research.microsoft.com/en-us/um/people/minka/papers/aspect/
1
2. 3
For each word token in document d, we remove its contribution to w tw(θd)ndw
= w sw k θβwk
dk
ndw
.
When the token is a token of word w, this corresponds to a division by sw k θβwk
dk .
Note that we here consider a removal of word token, not a removal of word type.
Then we obtain an ‘old’ posterior after this division as follows:
qw
(θd) =
Γ k(γdk − βwk)
k Γ(γdk − βwk)
k
θγdk−βwk−1
dk =
Γ( k γ
w
dk )
k Γ(γ
w
dk ) k
θ
γ
w
dk −1
dk , (5)
where γ
w
dk ≡ γdk − βwk.
By combining p(w|θd) = k θdkΦkw with qw
(θd), we obtain a something similar to joint distribution
as follows:
(
k
θdkφkw) ·
Γ( k γ
w
dk )
k Γ(γ
w
dk ) k
θ
γ
w
dk −1
dk .
Therefore, by integrating θd out, we obtain a something similar to evidence as follows:
k
θdkφkw
Γ( k γ
w
dk )
k Γ(γ
w
dk ) k
θ
γ
w
dk
−1
dk dθd = k φkwγ
w
dk
k γ
w
dk
.
Consequently, we obtain a new posterior as follows:
˜q(θd) = k γ
w
dk
k φkwγ
w
dk k
θdkφkw
Γ( k γ
w
dk )
k Γ(γ
w
dk ) k
θ
γ
w
dk −1
dk
=
ˆγd
w
k φkwγ
w
dk k
θdkφkw
Γ( ˆγd
w
)
k Γ(γ
w
dk ) k
θ
γ
w
dk −1
dk , (6)
where ˆγd
w
≡ k γ
w
dk .
4
Based on Eq. (6), we calculate E˜q[θdk] and E˜q[θ2
dk].
E˜q[θdk] = θdk ˜q(θd)dθd
= θdk
ˆγd
w
k φkwγ
w
dk k
θdkφkw
Γ( ˆγd
w
)
k Γ(γ
w
dk ) k
θ
γ
w
dk −1
dk dθd
=
ˆγd
w
k φkwγ
w
dk
·
Γ( ˆγd
w
)
k Γ(γ
w
dk )
θdk
k
θdkφkw
k
θ
γ
w
dk −1
dk dθd
=
ˆγd
w
k φkwγ
w
dk
·
Γ( ˆγd
w
)
k Γ(γ
w
dk )
θ2
dkφkw
k
θ
γ
w
dk −1
dk dθd +
k =k
θdkθdk φk w
k
θ
γ
w
dk −1
dk dθd
=
ˆγd
w
k φkwγ
w
dk
·
Γ( ˆγd
w
)
k Γ(γ
w
dk )
φkw θ2
dk
k
θ
γ
w
dk −1
dk dθd +
k =k
φk w θdkθdk
k
θ
γ
w
dk −1
dk dθd
=
ˆγd
w
k φkwγ
w
dk
·
Γ( ˆγd
w
)
k Γ(γ
w
dk )
φkw θ
γ
w
dk +1
dk
l=k
θ
γ
w
dl −1
dl dθd +
k =k
φk w θ
γ
w
dk
dk θ
γ
w
dk
dk
l=k∧l=k
θ
γ
w
dl −1
dl dθd
It should be noted that k Γ(γdk)
Γ( k γdk) = k θγdk−1
dk dθd holds for any γd = (γd1, . . . , γdK). This is how
we obtain the normalizing constant of Dirichlet distribution. Therefore, θ
γ
w
dk +1
dk l=k θ
γ
w
dl −1
dl dθd =
2
3. Γ(γ
w
dk +2) l=k Γ(γ
w
dl )
Γ( ˆγd
w+2)
. Further, θ
γ
w
dk
dk θ
γ
w
dk
dk l=k∧l=k θ
γ
w
dl −1
dl dθd =
Γ(γ
w
dk +1)Γ(γ
w
dk
+1) l=k∧l=k Γ(γ
w
dl )
Γ( ˆγd
w+2)
. With
these results, we can continue the formula rearrangement as follows:
∴ E˜q[θdk] =
ˆγd
w
k φkwγ
w
dk
·
Γ( ˆγd
w
)
k Γ(γ
w
dk )
φkw
Γ(γ
w
dk + 2) l=k Γ(γ
w
dl )
Γ( ˆγd
w
+ 2)
+
k =k
φk w
Γ(γ
w
dk + 1)Γ(γ
w
dk + 1) l=k∧l=k Γ(γ
w
dl )
Γ( ˆγd
w
+ 2)
=
ˆγd
w
k φkwγ
w
dk
φkw
Γ( ˆγd
w
)
k Γ(γ
w
dk )
·
Γ(γ
w
dk + 2) l=k Γ(γ
w
dl )
Γ( ˆγd
w
+ 2)
+
k =k
φk w
Γ( ˆγd
w
)
k Γ(γ
w
dk )
·
Γ(γ
w
dk + 1)Γ(γ
w
dk + 1) l=k∧l=k Γ(γ
w
dl )
Γ( ˆγd
w
+ 2)
=
ˆγd
w
k φkwγ
w
dk
φkw
γ
w
dk (γ
w
dk + 1)
ˆγd
w
( ˆγd
w
+ 1)
+
k =k
φk w
γ
w
dk γ
w
dk
ˆγd
w
( ˆγd
w
+ 1)
=
ˆγd
w
k φkwγ
w
dk
·
γ
w
dk
ˆγd
w
·
φkw + k φkwγ
w
dk
ˆγd
w
+ 1
(7)
E˜q[θ2
dk] = θ2
dk ˜q(θd)dθd
= θ2
dk
ˆγd
w
k φkwγ
w
dk k
θdkφkw
Γ( ˆγd
w
)
k Γ(γ
w
dk ) k
θ
γ
w
dk −1
dk dθd
=
ˆγd
w
k φkwγ
w
dk
φkw
γ
w
dk (γ
w
dk + 1)(γ
w
dk + 2)
ˆγd
w
( ˆγd
w
+ 1)( ˆγd
w
+ 2)
+
k =k
φk w
γ
w
dk (γ
w
dk + 1)γ
w
dk
ˆγd
w
( ˆγd
w
+ 1)( ˆγd
w
+ 2)
=
ˆγd
w
k φkwγ
w
dk
·
γ
w
dk
ˆγd
w
·
γ
w
dk + 1
ˆγd
w
+ 1
·
2φkw + k φkwγ
w
dk
ˆγd
w
+ 2
. (8)
We determine a Dirichlet distribution Dir(γd) so that the mean
γdk
ˆγd
and the average second moment2
1
K k
γdk(γdk+1)
ˆγd(ˆγd+1) are equal to those of ˜q(θd), where ˆγd = k γdk. This procedure is called moment
matching.
To be precise, we obtain γd by solving the following equations:
γdk
ˆγd
= E˜q[θdk] for each k, and (9)
1
K
k
γdk(γdk + 1)
ˆγd(ˆγd + 1)
=
1
K
k
E˜q[θ2
dk] . (10)
2http://www.ucl.ac.uk/statistics/research/pdfs/135.zip
3
4. We can obtain γdk as below.
γdk = E˜q[θdk]ˆγd
k
E˜q[θdk]ˆγd(E˜q[θdk]ˆγd + 1)
ˆγd(ˆγd + 1)
=
k
E˜q[θ2
dk]
k
E˜q[θdk](E˜q[θdk]ˆγd + 1) =
k
E˜q[θ2
dk](ˆγd + 1)
k
E˜q[θdk]2
ˆγd +
k
E˜q[θdk] =
k
E˜q[θ2
dk]ˆγd +
k
E˜q[θ2
dk]
k
E˜q[θdk]2
− E˜q[θ2
dk] ˆγd =
k
E˜q[θ2
dk] − E˜q[θdk]
∴ ˆγd = k(E˜q[θ2
dk] − E˜q[θdk])
k(E˜q[θdk]2 − E˜q[θ2
dk])
(11)
∴ γdk = k(E˜q[θ2
dk] − E˜q[θdk])
k(E˜q[θdk]2 − E˜q[θ2
dk])
· E˜q[θdk] (12)
5
We now have a Dirichlet distribution that approxiates the posterior ˜q(θd) in Eq. (6), i.e.,
˜q(θd) = k γ
w
dk
k φkwγ
w
dk k
θdkφkw
Γ( k γ
w
dk )
k Γ(γ
w
dk ) k
θ
γ
w
dk −1
dk .
To obtain a density function of Dirichlet distribution from Eq. (6), we replace the term ( k θdkφkw)
by tw(θd) = sw k θ
βwk
dk . We set the resulting function equal to the density function of Dir(γd) and obtain
the following equation:
k γ
w
dk
k φkwγ
w
dk
sw
k
θ
βwk
dk
Γ( k γ
w
dk )
k Γ(γ
w
dk ) k
θ
γ
w
dk −1
dk =
Γ( k γdk)
k Γ(γdk)
k
θ
γdk−1
dk . (13)
Consequently, sw and βwk are determined as follows:
βwk = γdk − γ
w
dk , (14)
sw = k φkwγ
w
dk
k γ
w
dk
Γ( k γdk)
k Γ(γdk)
k Γ(γ
w
dk )
Γ( k γ
w
dk )
. (15)
4