Stochastic Definite Clause Grammars

Stochastic Definite Clause Grammars

InterLogOnt, Nov 24
Saarbrücken

Christian Theil Have
cth@ruc.dk

What and why?

● DCG Syntax ● Probabilistic model
– Convenient – Polynomial parsing
– Expresssive – Parameter learning
– Flexible – Robust

Stochastic
Definite Clause
Grammars

DCG Grammar rules
● Definite Clause Grammars
– Grammar formalism on top of Prolog.
– Production rules with unification variables
– Context-sensitive.. (stronger actually)
– Exploits unification semantics of Prolog

Simple DCG grammar Difference list representation
sentence --> subject(N), verb(N), object.
sentence(L1,L4) :-
subject(sing) --> [he].
subject(N,L1, L2),
subject(plur) --> [they].
verb(N,L2,L3),
object --> [cake].
object(L3,L4).
object --> [food].
subject(sing,[he|R],R).
verb(sing) --> [eats].
...
verb(plur) --> [eat].

Stochastic Definite Clause
Grammars
● Implemented as a DCG compiler
– With some extensions to DCG syntax
● Transforms a DCG (grammar) into a stochastic
logic program implemented in PRISM.
● Probabilistic inferences and parameter learning
are then performed using PRISM

(S)DCG Compilation PRISM program

● PRISM - http://sato-www.cs.titech.ac.jp/prism/
● Extends Prolog with random variables (msws in PRISM lingo)

● Performs probabilistic inferences over such programs ,

● Probability calculation - probability of a derivation

● Viterbi - find most probable derivation

● EM learning – learn parameters from a set of example goals

PRISM program example: Bernoulli trials

target(ber,2).
values(coin,[heads,tails]).
:- set_sw(coin, 0.6+0.4).

ber(N,[R,Y]) :-
N>0,
msw(coin,R), % Probabilistic choice
N1 is N – 1,
ber(N1,Y). % Recursion
ber(0,[]).

The probabilistic model
One random variable encodes probability
of expansion for rules with same functor/
arity s(N) ==> np(N).
s(N) ==> np(N),vp(N).
The choice is made a selection rule
The selected rule is invoked through
unification transformation

target(s,2).
values(s,[s1,s2]).

Selection rule s(A,B) :- msw(s,Outcome), s(Outcome, A, B).

s(s1, A, B) :- np(_, A, B).
Implementation rules
s(s2, A, B) :- np(N, A, D), vp(N, D, B).

Unification failure
Since SDCG embodies unification constraints,
some derivations may fail

We only observe the successful
derivations in sample data.
All derivations
If the training algorithm only
considers successful derivations, it
will converge to a wrong probability
distribution (missing probability Failed derivations
mass).

In PRISM this is handled using the fgEM algorithm, which is based on
Cussens Failure-Adjusted Maximization (FAM) algorithm.

A “failure program” which traces all derivations is derived using First Order
Compilaton and the probabilities of failed derivations are estimated as part of
the fgEM algorithm.

Unification failure issues
Infinite/long derivation paths
● Impossible/difficult to derive failure program.
● Workaround: SDCG has an option which limits the depth of
derivation.
● Still: size of the failure program is very much an issue.

FOC requirement - “universally quantified clauses”:
● Not the case with Difference Lists: 'C'([X|Y], X,Y).
● Workaround 1:
– Trick the first order compiler by manually adding
implications after program is partly compiled.
– Works empirically, but may be dubious
● Workaround 2:
– Append based grammar
– Works, but have inherent inefficiencies

Syntax extensions
● SDCG extends the usual DCG syntax
– Compatible with DCG (superset)
● Extensions:
– Regular expression operators
● Convenient rule recursion
– “Macros”
● Allows writing rules as templates which are filled out
according to certain rules
– Conditioning
● Convenient expression of higher order HMM's
● Lexicalization

Regular expression operators
Regular expressions operators can be associated with rule constituents:
name ==> ?(title), +(firstname), *(lastname).

Meaning:
? may be repeated zero or one times
* may be repeated zero or more times
+ may be one or more time

The constituent in the original rule is replaced with a substitute which
refers to intermediary rules, which implements the regular expression.

?
regex_sub ==> []

*
regex_sub ==> original_constituent

regex_sub ==> regex_sub,regex_sub
+
Limitation: Cannot be used in rules with unification variables.

Template macros
Special goals prefixed with @ are treated as macros.
Grammar rules with macros are dynamically expanded.
expand_mode
Example: determines which
word(he,sg,masc). word(she,sg,fem). variables to keep

number(Word,Number) :- word(Word,Number,_).
gender(Word,Gender) :- word(Word,_,Gender).
wordlist(X,[X]).
remove
insert
expand_mode(number(-, +)). word(@number(Word, N), @gender(Word,G)) ==>
expand_mode(gender(-, +)). @wordlist(Word, WordList).
expand_mode(wordlist(-, +)).
Meta rule is created and called,
exp(Word, N, G, WordList) :- number(Word,N), gender(Word, G), wordlist(Word,WordList).
Resulting grammar:
word(sg,masc) ==> [ he ].
find all answers
word(sg,fem) ==> [ she ].

Conditioning
A conditioned rule takes the form,
name(F1,F2,...,Fn) | V1,V2,...,Vn ==> C1,C2,...,Cn.

The | operator can be seen as a guard that assures the rule is only
expanded if the conditions V1..Vn unify with F1..FN

It is possible to specify which variables must unify using a condition_mode:

condition_mode(n(+,+,-)).

n(A,B,C) | x,y ==> c1, c2.

Conditioned rules are grouped by non-terminal name and arity and
always has the same number of conditions.

Probabilistic semantics: A distinct probability distribution for each
distinct set of conditions.

Example, simple toy grammar
start ==> s(N). n(sg) ==> [time].
s(N) ==> np(N). n(pl) ==> [flies].
s(N) ==> np(N),vp(N). v(sg) ==> [flies].
np(N) ==> n(sg),n(N). v(sg) ==> [crawls].
np(N) ==> n(N). v(pl) ==> [fly].
vp(N) ==> v(N),np(N).
vp(N) ==> v(N)
Probability of a
| ?- prob(start([time,flies],[],Tree), P). sentence
P = 0.083333333333333 ?
yes
| ?- viterbig(start([time,flies],[],Tree), P).
Tree = [start,[[s(pl),[[np(pl),[[n(sg),[[]]],[n(pl),[[]]]]]]]]] The most probable
P = 0.0625 ? parse
yes
| ?- n_viterbig(10,start([time,flies],[],Tree), P).
Tree = [start,[[s(pl),[[np(pl),[[n(sg),[[]]],[n(pl),[[]]]]]]]]]
P = 0.0625 ?; Most probable parses
Tree = [start,[[s(sg),[[np(sg),[[n(sg),[[]]]]],[vp(sg),[[v(sg),[[]]]]]]]]] (indeed all two)
P = 0.020833333333333 ?;
no

More interesting example
Simple part of speech tagger – fully connected first order HMM.

consume_word([Word]) :-
word(Word).

conditioning_mode(tag_word(+,-,-)).

start(TagList) ==>
tag_word(none,_,TagList).

tag_word(Previous, @tag(Current), [Current|TagsRest]) | @tag(SomeTag) ==>
@consume_word(W),
?(tag_word(Current,_,TagsRest)).

Some tags Some words

tag(none).
word(the).
tag(det).
word(can).
tag(noun).
word(will).
tag(verb).
word(rust).
tag(modalverb).

Stochastic Definite Clause Grammars

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Stochastic Definite Clause Grammars

Similar a Stochastic Definite Clause Grammars (20)

Más de Christian Have

Más de Christian Have (7)

Stochastic Definite Clause Grammars