The document describes the sum-product algorithm, an efficient exact inference algorithm for finding marginals in graphical models. The algorithm works by:
1) Representing the factor graph as a set of factor nodes connected by links to variable nodes.
2) Passing "messages" along the links in both directions between variable and factor nodes. These messages are functions of neighboring variables.
3) Computing the marginal probability of a variable as the product of all incoming messages to that variable node.
2. • Factor graph
• undirected tree, directed tree, ploy tree (F8.43)
• Goal:
• Obtain an efficient, exact inference algorithm for
finding marginals
• Compute efficiently where several marginals are
require
3. den. Later we shall see how to modify p(x) algorithm to incorporate evidenc
p(x) =
the (8.61)
onding to observed variables. By definition, the marginal is obtained by sum
• x denotes the set of variables in x with variable x omitted. The idea is
xx
he joint distribution over all variables except x so that node x
Calculate marginals for particular variable
where x
to substitute for p(x) using the factor graph expression (8.59) and then interchange
p(x) =
summations and products in order to obtain p(x) an efficient algorithm. Consider the (8.61
fragment of graph shown in Figure 8.46xin which we see that8.61 tree structure of
x
F
the
the graph allows us to partitiondistribution overthe variables except x into groups, with
sum the joint the factors in all joint distribution
x x group associated with each of the factor x with variable x omitted. The idea
one denotes the set of variables in nodes that is a neighbour of the variable
•
node x. We see using the factor graph expression (8.59)factorsform
titute for p(x)Joint distribution in form a production of andthe interchang
that the joint distribution can be written as a product of then
tions and products in order to obtain san Xs )
p(x) = F (x, efficient algorithm. Consider th
(8.62)
nt 404 graph shown in Figure 8.46ne(x) which we see 8.62 the tree structure
of 8. GRAPHICAL MODELS
s∈ in F that
ph ne(x) denotes the set of factor nodes that in the joint distribution into groups, wi
allows us to partition the factors are neighbours of x, and X denotes the
s
upset of all variables in the subtree connected to the variable node x via the factor node
associated evaluation of the marginal p(x).
Figure 8.46 with each of graph illustrating the
A fragment of a factor the factor nodes that is a neighbour of the variab
We see that the joint distribution can be written as a product of the form
µ (x)
fs →x
Fs (x, Xs )
p(x) = Fs (x, Xs ) fs x
(8.62
s∈ne(x)
denotes the set of factor nodes that are neighbours of x, and Xs denotes th
ll variables in theand Fs(x, Xs)connected to theall the factors innode x associated factor nod
fs , subtree represents the product of variable the group via the
4. x, Xs ) represents the product of all the factors in the group associated
g.
fs(8.62) into (8.61) and interchanging the sums and products, we ob-
•
uting (8.62) into and product
sum (8.61) and interchanging the sums and products, we ob-
p(x) = Fs (x, Xs )
p(x) = ne(x)
s∈ Xs Fs (x, Xs )
F 8.61, F8.62 -> F8.63
s∈ne(x) Xs
= µfs →x (x). (8.63)
= ne(x)
s∈ µfs →x (x). (8.63)
s∈ne(x)
ntroduced a set of functions µfs →x (x),
defined by
ve introduced a set of functions µfs →x (x), defined by
µfs →x (x) ≡ Fs (x, Xs ) (8.64)
µfs →x (x)Xs
≡ Fs (x, Xs ) F 8.64 (8.64)
Xs
iewed as messages8.63 message from factor node to to variable node x x.
F from the factor nodes fs fx the variable node
be viewed marginal p(x)from the by the nodes fsof all the incoming x.
required as messages is given factor product to the variable node
ng atrequired marginalproductis given by the product of all the incoming
the node x. F 8.64
p(x) of all incoming messages arriving at node x
riving at these x.
evaluate node messages, we again turn to Figure 8.46 and note that
rx, Xs ) is describedmessages, we again turnandFigure 8.46 and fac- that
to evaluate these by a factor (sub-)graph to so can itself be note
cular, we can write
Fs (x, Xs ) is described by a factor (sub-)graph and so can itself be fac-
particular, we, can, write (x , X ) . . . G (x , X )
) = f (x, x . . . x )G
s 1 M 1 1 s1 M M sM (8.65)
enience,fwe have .denoted)G1 variables )associated with factor fx , in
Xs ) = s (x, x1 , . . , xM the (x1 , Xs1 . . . GM (xM , XsM ) (8.65)
5. messages arriving at node x. a set of functions µfs →x (x), defined by
Here we have introduced
In order to evaluate these messages, we again s
f turn to Figure 8.46 and note that
• Evaluate is describeds →x (x)
µf by a factor Fs (x, Xs )
each factor Fs (x, Xs )these messages≡ (sub-)graph and so can x
torized. In particular, we can write Xs µfs →x (x)
(8.64)
itself be fac-
which can ) =viewedxas . . . , xM )G1 (x1the s1 ) . . . nodes fM to the variable node x.
Fs (x, Xs be fs (x, 1 , messagesxfrom , Xfactor GM (x s , XsM ) (8.65)
m
We see that the required marginal p(x) is given by the product Models 405
8.4. Inference in Graphicalof all the incoming
where, for convenience, we have denoted the X )
messages arriving at node x. Gm (xm , variables associated with factor fx , in
Figure 8.47 x , . . . x the factorization of the sm as- x
addition to x, by Illustration, of Mthese messages, subgraphillustratedFigure 8.468.47. note that
. This factorizationagain turn M in Figure and Note
In order sociated with factor node fs .
to1 evaluate we is to
µx →f (xM )
that the set of variables {x,is 1 , . . . , xM byis the set (sub-)graph and so can itself be fac-
each factor Fs (x, Xs ) xdescribed } a factor of variables on which the factor M s
fs depends, andparticular,alsocan denoted xs , using the notation of (8.59).
torized. In so it can we be write
fs
denotes the set of(8.65) into (8.64)that are neighbours of the factor node
Substituting variable nodes we obtain x
Fs (x, Xs ) = fs (x, x1 , . . . , xM )G1 (x1 , Xs1 ) . . . GM (xM , XsM )→x (x) (8.65)
s) x denotes the same set but with node x removed. Here we have µf s
ollowing messages from. variable have denoted the nodes xassociated (x , factor) fx , in
µ where, for convenience, we(x, x , . to ,factor variables m
(x) = . . f nodes . . x ) G with X
fs →x s 1 M m m sm
addition to x, by x1 , . . . , xM . This factorization is illustrated in Figure 8.47. Note
x1 xM m∈ne(fs )Gm (xmxm sm )
x X , X
that µxmset of variables {x,Gm (x. , , Xsm ).the set of variables(8.67)
the →fs (xm ) ≡ x1 , . . m xM } is on which the factor
= . . Xsm be , . . . , x x)
fs depends, and so.it can fs (x, x1denotedM s , usingF 8.67 xm →fsof (8.59). (8.66)
also the notation (xm )
µ
Substituting (8.65))into (8.64)set ofobtainm∈ne(fthat are neighbours of the factor node
where ne(f Mdenotes the
x1 x we variable nodes s )x
s
efore introduced two, distinctskinds of message, those that go from factor Here we have
fs and ne(f ) x denotes the same set but with node x removed.
µfs denoted= f →x (x), andfmessages from x from variable nodes to
able nodes→x (x) defined the .following those 1 , . . .go M ) nodes to factor nodes (xm , Xsm )
µ .. s (x, x that , variable Gm
denoted µx→f (x). In each case, we see that )messages(x s )x Xxm a
x1 xM µx →f (xm ≡ Gm m
passed along
m∈ne(f , Xsm ). (8.67)
ys a function of the variable associated with the variable node that link
m s
= fs (x, x1 , . . . , xM ) µxm →fs (xm )
X
... sm
(8.66)
We have 1
x therefore introduced two distinct kinds ne(message, those that go from factor
xM m∈ of fs )x
t (8.66) says that to evaluate thenodes denoted µf →x (x),factor node to a vari-
nodes to variable message sent by a and those that go from variable nodes to
ng the link connecting them,denoted µx→f (x). In of the incoming messagespassed along a
factor nodes take the product each case, we see that messages
link are always a function of the variable associated with the variable node that link
6. always a function of the variable associated with the variable node that link
s to.
e result (8.66) says that to evaluate the message sent by a factor node to a vari-
•
de along the link connecting them, take the product of thefrom variable to factor
CAL MODELS
Evaluate messages from messages incoming messages
ll other linksusing sub-graph factorization the factor associated
by coming into the factor node, multiply by
at node, and then marginalize over all of the variables associated with the
ng406 of the8. GRAPHICAL MODELS sent by a It fL important to note that
messages. evaluationillustrated in Figure 8.47. is
stration This is of the message
able node to an adjacent factor node.
node can send a message to a variable node once it has received incoming
Figure 8.48 Illustration of variable nodes.
es from all other neighbouring the evaluation of the message sent by a fL
variable node to an adjacent factor node.
ally, we derive an expression for evaluating the messages from variable nodes
r nodes, again by making use of the (sub-)graph factorization. From Fig- s
xm f
8, we see that term Gm (xm , Xsm ) associated with node xm is given by a
fs
of terms Fl (xm , Xml ) each associated with one of the factor nodes fl that is xm
o node xm (excluding node fs ), so that fl
fl
Fl (xm , Xml )
Gm (xm , Xsm ) = Fl (xm , Xml ) Fl (xm(8.68)
, Xml )
l∈ne(xm )fs
F 8.68
n obtain
he product is taken overobtain
then all neighbours of node xm except for node fs .Xm except for node fs
product of node Note
ch of the factors Fl (xm , Xml ) represents a subtree of the original graph of
y the same µxm →fs (xm ) = in xm →fs (xm ) = Fl (xm (8.68)) into l(8.67),ml )
kind as introduced µ (8.62). Substituting , Xml F (xm , X we
l∈ne(xm )fs Xml l∈ne(xm )fs Xml
= =
µfl →xm (xm ) µfl →xm (xm ) (8.69) (8.69)
l∈ne(xm )fs
l∈ne(xm )fs
F 8.67 + F 8.68 -> F 8.69
where we have used the definition (8.64) of the messages passed from factor nodes to
ere we have used the definition (8.64) of the messages passed from factor nodes to
7. from (8.66) that the message sent should take the form
• Message send by leaf(variable fnode = f (x)factor node)
µ →x (x) and (8.71)
Figure 8.49 The sum-product algorithm µx→f (x) = 1 µf →x (x) = f (x)
begins with messages sent
by the leaf nodes, which de-
pend on whether the leaf x f f x
node is (a) a variable node,
(a) (b)
or (b) a factor node.
• Find marginals for every variable node introduced by John-san
08 • Sum-product algorithm
8. GRAPHICAL MODELS
Figure 8.50 The sum-product algorithm can be viewed
purely in terms of messages sent out by factor
nodes to other factor nodes. In this example,
the outgoing message shown by the blue arrow
is obtained by taking the product of all the in- x1
coming messages shown by green arrows, mul-
tiplying by the factor fs , and marginalizing over x3
the variables x1 and x2 . x2 fs
and indeed the notion of one node having a special status was introduced only as a
8. • Normalization Inference in Graphical Models
8.4. (undirected graph) 409
• totoget normalization coefficient 1/Z p(x) = p~(x)/Z
graph used illustrate the x x 1 x 2 3
• use sum-product to findfunnormalized marginals for xi
orithm.
f
• coefficient 1/Z can be obtained by normalizing the marginal
a b
• f
efficient as calculated only over one single variable c
8.4. Inference in Graphical Models 409
Figure 8.51 A simple factor graph used to illustrate the x1 x2 x3
sum-product algorithm.
fa fb
x4
fc
nnormalized joint distribution is given by
x4
p(x) = fa (x1 , x2 )fb (x2 , x3 )fc (x2 , x4 ). (8.73)
F 8.73
graph whose unnormalized joint distribution is given by
Unnormalized joint distributions
ply the sum-product algorithm to this graph, let us designate node x3
which case there are two leaf nodes fa1 1 , x2 )fb (x.2 ,Startingxwith the leaf (8.73)
p(x) = x(xand x
4
x3 )fc (x2 , 4 ).
9. 410 8.4.8. GRAPHICAL MODELS
Inference in Graphical Models 409 p(x) = fa (x1 , x2 )fb (x2 , x3 )fc (x2 , x4 ). (8.73)
r graph used to illustrate the
x1 x1 x2 x2 In order to3apply thex
x3 x sum-product algorithm to this graph, let us x
x2 designate node x3
1 3
algorithm. as the root, in which case there are two leaf nodes x1 and x4 . Starting with the leaf
fa
nodes, we then have the following sequence of six messages
f b
µx1 →fa (x1 ) = 1 (8.74)
fc
µfa →x2 (x2 ) = fa (x1 , x2 ) (8.75)
x1
µx4 →fc (x4 ) = 1 (8.76)
x4 µfc →x2 (x2 ) = fc (x2 , x4 ) (8.77)
x4
x4 x4
µx2 →fb (x2 ) = µfa →x2 (x2 )µfc →x2 (x2 ) (8.78)
unnormalized joint distribution is given by
(a) µ (x ) = (b) , x )µ
f (x . (8.79)
fb →x3 3 b 2 3 x2 →fb
p(x) = fa (x1 , x2 )fb (x2 , x3 )fc (x2 , x4 ). (8.73) x2
Figure 8.52 Flow of messages for the sum-product algorithm applied to the example graph in Figure 8.51. (a)
pply the sum-productleaf nodesto 1 and x4 towards theThe direction 3 . (b) From the messages istowards the leaf nodes. Once this mes-
From the algorithm x this graph, let us designate node xx of flow of these root node illustrated in Figure 8.52.
root node
3
n which case there are two leaf nodes x1 and x4 . Startingsage propagation is complete, we can then propagate messages from the root node
with the leaf
en have the following sequence of six messages out to the leaf nodes, and these are given by
One message has now passed in each direction across each link, and we can now
µx1 →fa (x1 ) = 1 evaluate the marginals. As a simplex3 →fb (x3 ) = verify that the marginal p(x2 ) is (8.80)
(8.74) µ check, let us 1
µfa →x2 (x2 ) = fa (x1given by the correct expression. Using→x2 (x2 )and substitutingxfor the messages using (8.81)
, x2 ) (8.75) µfb (8.63) = fb (x2 , 3 )
x1 the above results, we have x3
µx4 →fc (x4 ) = 1 (8.76) µx2 →fa (x2 ) = µfb →x2 (x2 )µfc →x2 (x2 ) (8.82)
p(x2 ) = µfa →x2 (x2 )µfb →x2 (x2 )µfc →x2 (x2 )
µfc →x2 (x2 ) = fc (x2 , x4 ) (8.77) µfa →x1 (x1 ) = fa (x1 , x2 )µx2 →fa (x2 ) (8.83)
x2
= fa (x1 , xµ) fb (x2 ,µ 3 ) fc (x2 , x4 )
x4
x
x2 →fc (x2 ) = fa →x2 (x2 )µfb →x2 (x2 )
2 (8.84)
µx2 →fb (x2 ) = µfa →x2 (x2 )µfc →x2 (x2 ) (8.78)
x1 x3 x4
µfb →x3 (x3 ) = fb (x2 , x3 )µx2 →fb . (8.79) µfc →x4 (x4 ) = fc (x2 , x4 )µx2 →fc (x2 ). (8.85)
x2
= fa (x1 , x2 )fb (x2 , xx2)fc (x2 , x4 )
3
x1 x2 x4
n of flow of these messages is illustrated in Figure 8.52. Once this mes-
=
ation is complete, we can then propagate messages from the root node p(x) (8.86)
f nodes, and these are given by x1 x3 x4
µx3 →fb (x3 ) = 1 as required. (8.80)
10. af nodes x1 and x4 towards the root node x3 . (b) From the root node towards the leaf nodes.
One message has now passed in each direction across each link, and we can now
evaluate the marginals. As a simple check, let us verify that the marginal p(x2 ) is
•given by the correct expression. Usingcalculated
Marginal p(x2) can be (8.63) and substituting for the messages using
the above results, we have
p(x2 ) = µfa →x2 (x2 )µfb →x2 (x2 )µfc →x2 (x2 )
= fa (x1 , x2 ) fb (x2 , x3 ) fc (x2 , x4 )
x1 x3 x4
= fa (x1 , x2 )fb (x2 , x3 )fc (x2 , x4 )
x1 x2 x4
= p(x) (8.86)
x1 x3 x4
as required.
So far, we have assumed that all of the variables in the graph are hidden. In most
practical applications, a subset of the variables will be observed, and we wish to cal-
culate posterior distributions conditioned on these observations. Observed nodes are
easily handled within the sum-product algorithm as follows. Suppose we partition x
into hidden variables h and observed variables v, and that the observed value of v
is denoted v. Then we simply multiply the joint distribution p(x) by i I(vi , vi ),
references @n_shuyo product corresponds
where I(v, v) = 1 if v = v and I(v, v) = 0 otherwise. This @sleepy_yoshi @nokuno
to p(h, v = v) and hence is an unnormalized version of p(h|v = v). By run-
ning the sum-product algorithm, we can efficiently calculate the posterior marginals
p(hi |v = v) up to a normalization coefficient whose value can be found efficiently