2. Topic Model: Terminology
• Document Model
• Word: element in vocabulary set
• Document: collection of words
• Corpus: collection of documents
• Topic Model
• Topic: collection of words (subset of vocabulary)
• Document is represented by (latent) mixture of topics
• 𝑝 𝑤 𝑑 = 𝑝 𝑤 𝑧 𝑝(𝑧|𝑑) (𝑧: topic)
• Note: document is collection of words (not sequence)
• We call it bag-of-words assumption
• In probability, we call it exchangeability assumption
• 𝑝 𝑤), … , 𝑤, = 𝑝(𝑤- ) , … , 𝑤- , ) (𝜎: permutation)
2/15
14. LDA: Variational EM
• Variational EM (EM: Expectation Maximization)
• E-step: optimize local parameter 𝜆, 𝛾, 𝜑 (w.r.t. 𝛼, 𝜂)
• M-step: optimize global parameter 𝛼, 𝜂 (w.r.t. 𝜆, 𝛾, 𝜑)
• Each subproblem is simple one-variable constraint optimization
• We can solve it by taking derivative of Lagrangian to zero1
• e.g. optimize 𝐿 over 𝜑 (since 𝜑 ∼ Multinomial, ∑ 𝜑E/
5
/P) = 1)
1. In fact, 𝐿[l] cannot be solved analytically. Authors suggest to use Netwon-Raphson method for efficient implementation.
See A.3 and A.4 of Blei 2003 for detail.
Source: Blei, JMLR 2003 paper 14/15
22. de Finetti's theorem
• Q. We only assumed exchangeability (not i.i.d.)
𝑝 𝑤), … , 𝑤, = 𝑝(𝑤- ) , … , 𝑤- , ) (𝜎: permutation)
• Why is it reasonable to factorize 𝑝 𝑤 𝛽, 𝑧 ? ⇒ de Finetti’s theorem!
• Statement: Exchangeable r.v. is mixture of conditional i.i.d. r.v.s
• Since word is generated by topic (fixed conditional distribution)
and topic is exchangeable within document, by de Finetti’s thm,
there is (mixture proportion) 𝑝(𝜃) s.t.
𝑝 𝑤, 𝑧 = ∫ 𝑝 𝜃 ∏𝑝 𝑧E 𝜃 𝑝 𝑤E 𝑧E 𝑑𝜃