SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
Stochastic Definite Clause Grammars

         InterLogOnt, Nov 24
             Saarbrücken


          Christian Theil Have
              cth@ruc.dk
What and why?

●   DCG Syntax                 ●   Probabilistic model
    –   Convenient                 –    Polynomial parsing
    –   Expresssive                –    Parameter learning
    –   Flexible                   –    Robust



                        Stochastic
                      Definite Clause
                        Grammars
DCG Grammar rules
●    Definite Clause Grammars
      –   Grammar formalism on top of Prolog.
      –   Production rules with unification variables
      –   Context-sensitive.. (stronger actually)
      –   Exploits unification semantics of Prolog

Simple DCG grammar                              Difference list representation
    sentence --> subject(N), verb(N), object.
                                                        sentence(L1,L4) :-
    subject(sing) --> [he].
                                                            subject(N,L1, L2),
    subject(plur) --> [they].
                                                            verb(N,L2,L3),
    object --> [cake].
                                                            object(L3,L4).
    object --> [food].
                                                        subject(sing,[he|R],R).
    verb(sing) --> [eats].
                                                        ...
    verb(plur) --> [eat].
Stochastic Definite Clause
                  Grammars
●   Implemented as a DCG compiler
    –   With some extensions to DCG syntax
●   Transforms a DCG (grammar) into a stochastic
    logic program implemented in PRISM.
●   Probabilistic inferences and parameter learning
    are then performed using PRISM

        (S)DCG         Compilation       PRISM program
Compilation process
●   PRISM - http://sato-www.cs.titech.ac.jp/prism/
    ● Extends Prolog with random variables (msws in PRISM lingo)

    ● Performs probabilistic inferences over such programs ,

      ● Probability calculation - probability of a derivation

      ● Viterbi - find most probable derivation

      ● EM learning – learn parameters from a set of example goals




PRISM program example: Bernoulli trials

target(ber,2).
values(coin,[heads,tails]).
:- set_sw(coin, 0.6+0.4).

ber(N,[R,Y]) :-
    N>0,
    msw(coin,R),                % Probabilistic choice
    N1 is N – 1,
    ber(N1,Y).                  % Recursion
ber(0,[]).
The probabilistic model
One random variable encodes probability
of expansion for rules with same functor/
arity                                                 s(N) ==> np(N).
                                                      s(N) ==> np(N),vp(N).
The choice is made a selection rule
The selected rule is invoked through
unification                                                             transformation

                             target(s,2).
                             values(s,[s1,s2]).

Selection rule               s(A,B) :- msw(s,Outcome), s(Outcome, A, B).

                             s(s1, A, B) :- np(_, A, B).
Implementation rules
                             s(s2, A, B) :- np(N, A, D), vp(N, D, B).
Unification failure
Since SDCG embodies unification constraints,
some derivations may fail

We only observe the successful
derivations in sample data.
                                                            All derivations
If the training algorithm only
considers successful derivations, it
will converge to a wrong probability
distribution (missing probability                         Failed derivations
mass).



In PRISM this is handled using the fgEM algorithm, which is based on
Cussens Failure-Adjusted Maximization (FAM) algorithm.

A “failure program” which traces all derivations is derived using First Order
Compilaton and the probabilities of failed derivations are estimated as part of
the fgEM algorithm.
Unification failure issues
Infinite/long derivation paths
●  Impossible/difficult to derive failure program.
●  Workaround: SDCG has an option which limits the depth of
   derivation.
●  Still: size of the failure program is very much an issue.


FOC requirement - “universally quantified clauses”:
● Not the case with Difference Lists: 'C'([X|Y], X,Y).
● Workaround 1:
   – Trick the first order compiler by manually adding
     implications after program is partly compiled.
   – Works empirically, but may be dubious
● Workaround 2:
   – Append based grammar
   – Works, but have inherent inefficiencies
Syntax extensions
●   SDCG extends the usual DCG syntax
    –   Compatible with DCG (superset)
●   Extensions:
    –   Regular expression operators
         ●   Convenient rule recursion
    –   “Macros”
         ●   Allows writing rules as templates which are filled out
             according to certain rules
    –   Conditioning
         ●   Convenient expression of higher order HMM's
         ●   Lexicalization
Regular expression operators
Regular expressions operators can be associated with rule constituents:
       name ==> ?(title), +(firstname), *(lastname).

  Meaning:
  ?     may be repeated zero or one times
  *     may be repeated zero or more times
  +     may be one or more time


The constituent in the original rule is replaced with a substitute which
refers to intermediary rules, which implements the regular expression.


                                                        ?
         regex_sub ==> []


*
         regex_sub ==> original_constituent

         regex_sub ==> regex_sub,regex_sub
                                                   +
    Limitation: Cannot be used in rules with unification variables.
Template macros
Special goals prefixed with @ are treated as macros.
Grammar rules with macros are dynamically expanded.
                                                                      expand_mode
Example:                                                             determines which
word(he,sg,masc). word(she,sg,fem).                                  variables to keep

number(Word,Number) :- word(Word,Number,_).
gender(Word,Gender) :- word(Word,_,Gender).
wordlist(X,[X]).
                                 remove
                                     insert
expand_mode(number(-, +)).           word(@number(Word, N), @gender(Word,G)) ==>
expand_mode(gender(-, +)).             @wordlist(Word, WordList).
expand_mode(wordlist(-, +)).
                               Meta rule is created and called,
exp(Word, N, G, WordList) :- number(Word,N), gender(Word, G), wordlist(Word,WordList).
                                 Resulting grammar:
                                              word(sg,masc) ==> [ he ].
            find all answers
                                              word(sg,fem) ==> [ she ].
Conditioning
A conditioned rule takes the form,
    name(F1,F2,...,Fn) | V1,V2,...,Vn ==> C1,C2,...,Cn.

The | operator can be seen as a guard that assures the rule is only
expanded if the conditions V1..Vn unify with F1..FN

It is possible to specify which variables must unify using a condition_mode:

  condition_mode(n(+,+,-)).

                        n(A,B,C) | x,y ==> c1, c2.



Conditioned rules are grouped by non-terminal name and arity and
always has the same number of conditions.

Probabilistic semantics: A distinct probability distribution for each
distinct set of conditions.
Conditioning semantics
Model without conditioning:                 Model with conditioning:

n ==> n1.                                   n|a   ==> n1(X).
n ==> n2.                                   n|a   ==> n2(X).
n1 ==> ...                                  n|b   ==> n1(X).
...                                         n|b   ==> n2(X).
                                            ...
                                                                  n1_1

                                                       n|a
             n1                                                   n2_1

   n                                        n
             n2                                                   n1_2

                                                       n|b
                                                                  n2_2
                              Stochastic selection

                              Selection using unification
Example, simple toy grammar
     start ==> s(N).                    n(sg) ==> [time].
     s(N) ==> np(N).                    n(pl) ==> [flies].
     s(N) ==> np(N),vp(N).              v(sg) ==> [flies].
     np(N) ==> n(sg),n(N).              v(sg) ==> [crawls].
     np(N) ==> n(N).                    v(pl) ==> [fly].
     vp(N) ==> v(N),np(N).
     vp(N) ==> v(N)
                                                                                Probability of a
| ?- prob(start([time,flies],[],Tree), P).                                        sentence
P = 0.083333333333333 ?
yes
| ?- viterbig(start([time,flies],[],Tree), P).
Tree = [start,[[s(pl),[[np(pl),[[n(sg),[[]]],[n(pl),[[]]]]]]]]]               The most probable
P = 0.0625 ?                                                                       parse
yes
| ?- n_viterbig(10,start([time,flies],[],Tree), P).
Tree = [start,[[s(pl),[[np(pl),[[n(sg),[[]]],[n(pl),[[]]]]]]]]]
P = 0.0625 ?;                                                                Most probable parses
Tree = [start,[[s(sg),[[np(sg),[[n(sg),[[]]]]],[vp(sg),[[v(sg),[[]]]]]]]]]     (indeed all two)
P = 0.020833333333333 ?;
no
More interesting example
 Simple part of speech tagger – fully connected first order HMM.

 consume_word([Word]) :-
     word(Word).

 conditioning_mode(tag_word(+,-,-)).

 start(TagList) ==>
       tag_word(none,_,TagList).

 tag_word(Previous, @tag(Current), [Current|TagsRest]) | @tag(SomeTag) ==>
     @consume_word(W),
     ?(tag_word(Current,_,TagsRest)).


Some tags              Some words

 tag(none).
                        word(the).
 tag(det).
                        word(can).
 tag(noun).
                        word(will).
 tag(verb).
                        word(rust).
 tag(modalverb).
Questions?

Más contenido relacionado

La actualidad más candente

Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10Traian Rebedea
 
Harnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesHarnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesSho Takase
 
The Kernel Trick
The Kernel TrickThe Kernel Trick
The Kernel TrickEdgar Marca
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector MachinesEdgar Marca
 
Introduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenIntroduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenZhuyi Xue
 
Online Character Recognition
Online Character RecognitionOnline Character Recognition
Online Character RecognitionKamakhya Gupta
 
Gradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introductionGradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introductionChristian Perone
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1arogozhnikov
 
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMSSOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMSTahia ZERIZER
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3arogozhnikov
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsSelman Bozkır
 
Cso gaddis java_chapter15
Cso gaddis java_chapter15Cso gaddis java_chapter15
Cso gaddis java_chapter15mlrbrown
 

La actualidad más candente (20)

Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10Algorithm Design and Complexity - Course 10
Algorithm Design and Complexity - Course 10
 
Prml
PrmlPrml
Prml
 
Harnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic RulesHarnessing Deep Neural Networks with Logic Rules
Harnessing Deep Neural Networks with Logic Rules
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
The Kernel Trick
The Kernel TrickThe Kernel Trick
The Kernel Trick
 
Kernels and Support Vector Machines
Kernels and Support Vector  MachinesKernels and Support Vector  Machines
Kernels and Support Vector Machines
 
Introduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenIntroduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi Chen
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
 Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli... Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
 
Lecture5
Lecture5Lecture5
Lecture5
 
Online Character Recognition
Online Character RecognitionOnline Character Recognition
Online Character Recognition
 
Gradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introductionGradient-based optimization for Deep Learning: a short introduction
Gradient-based optimization for Deep Learning: a short introduction
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1MLHEP 2015: Introductory Lecture #1
MLHEP 2015: Introductory Lecture #1
 
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMSSOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
SOLVING BVPs OF SINGULARLY PERTURBED DISCRETE SYSTEMS
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Cso gaddis java_chapter15
Cso gaddis java_chapter15Cso gaddis java_chapter15
Cso gaddis java_chapter15
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 

Similar a Stochastic Definite Clause Grammars

A note on word embedding
A note on word embeddingA note on word embedding
A note on word embeddingKhang Pham
 
How to use SVM for data classification
How to use SVM for data classificationHow to use SVM for data classification
How to use SVM for data classificationYiwei Chen
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...AmirParnianifard1
 
ppt of no tensors , how they word etc etc etc
ppt of no tensors , how they word etc etc etcppt of no tensors , how they word etc etc etc
ppt of no tensors , how they word etc etc etcUdayYadav90
 
Introducing Pattern Matching in Scala
 Introducing Pattern Matching  in Scala Introducing Pattern Matching  in Scala
Introducing Pattern Matching in ScalaAyush Mishra
 
Algorithm review
Algorithm reviewAlgorithm review
Algorithm reviewchidabdu
 
DeepStochLog: Neural Stochastic Logic Programming
DeepStochLog: Neural Stochastic Logic ProgrammingDeepStochLog: Neural Stochastic Logic Programming
DeepStochLog: Neural Stochastic Logic ProgrammingThomas Winters
 
Lec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgLec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgRonald Teo
 
Multiplicative Interaction Models in R
Multiplicative Interaction Models in RMultiplicative Interaction Models in R
Multiplicative Interaction Models in Rhtstatistics
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learningSteve Nouri
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacesbutest
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacesbutest
 
dynamic programming Rod cutting class
dynamic programming Rod cutting classdynamic programming Rod cutting class
dynamic programming Rod cutting classgiridaroori
 
Comparative Study of the Effect of Different Collocation Points on Legendre-C...
Comparative Study of the Effect of Different Collocation Points on Legendre-C...Comparative Study of the Effect of Different Collocation Points on Legendre-C...
Comparative Study of the Effect of Different Collocation Points on Legendre-C...IOSR Journals
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...SSA KPI
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative ModelsKenta Oono
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemTMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemIosif Itkin
 

Similar a Stochastic Definite Clause Grammars (20)

A note on word embedding
A note on word embeddingA note on word embedding
A note on word embedding
 
How to use SVM for data classification
How to use SVM for data classificationHow to use SVM for data classification
How to use SVM for data classification
 
Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...Computational Intelligence Assisted Engineering Design Optimization (using MA...
Computational Intelligence Assisted Engineering Design Optimization (using MA...
 
ppt of no tensors , how they word etc etc etc
ppt of no tensors , how they word etc etc etcppt of no tensors , how they word etc etc etc
ppt of no tensors , how they word etc etc etc
 
Introducing Pattern Matching in Scala
 Introducing Pattern Matching  in Scala Introducing Pattern Matching  in Scala
Introducing Pattern Matching in Scala
 
Algorithm review
Algorithm reviewAlgorithm review
Algorithm review
 
DeepStochLog: Neural Stochastic Logic Programming
DeepStochLog: Neural Stochastic Logic ProgrammingDeepStochLog: Neural Stochastic Logic Programming
DeepStochLog: Neural Stochastic Logic Programming
 
Lec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scgLec7 deeprlbootcamp-svg+scg
Lec7 deeprlbootcamp-svg+scg
 
Multiplicative Interaction Models in R
Multiplicative Interaction Models in RMultiplicative Interaction Models in R
Multiplicative Interaction Models in R
 
Cheatsheet supervised-learning
Cheatsheet supervised-learningCheatsheet supervised-learning
Cheatsheet supervised-learning
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspaces
 
pptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspacespptx - Psuedo Random Generator for Halfspaces
pptx - Psuedo Random Generator for Halfspaces
 
dynamic programming Rod cutting class
dynamic programming Rod cutting classdynamic programming Rod cutting class
dynamic programming Rod cutting class
 
Comparative Study of the Effect of Different Collocation Points on Legendre-C...
Comparative Study of the Effect of Different Collocation Points on Legendre-C...Comparative Study of the Effect of Different Collocation Points on Legendre-C...
Comparative Study of the Effect of Different Collocation Points on Legendre-C...
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...The Sample Average Approximation Method for Stochastic Programs with Integer ...
The Sample Average Approximation Method for Stochastic Programs with Integer ...
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
VAE-type Deep Generative Models
VAE-type Deep Generative ModelsVAE-type Deep Generative Models
VAE-type Deep Generative Models
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemTMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
 

Más de Christian Have

Efficient Probabilistic Logic Programming for Biological Sequence Analysis
Efficient Probabilistic Logic Programming for Biological Sequence AnalysisEfficient Probabilistic Logic Programming for Biological Sequence Analysis
Efficient Probabilistic Logic Programming for Biological Sequence AnalysisChristian Have
 
Efficient Probabilistic Logic Programming for Biological Sequence Analysis
Efficient Probabilistic Logic Programming for Biological Sequence AnalysisEfficient Probabilistic Logic Programming for Biological Sequence Analysis
Efficient Probabilistic Logic Programming for Biological Sequence AnalysisChristian Have
 
Efficient Tabling of Structured Data Using Indexing and Program Transformation
Efficient Tabling of Structured Data Using Indexing and Program TransformationEfficient Tabling of Structured Data Using Indexing and Program Transformation
Efficient Tabling of Structured Data Using Indexing and Program TransformationChristian Have
 
Constraints and Global Optimization for Gene Prediction Overlap Resolution
Constraints and Global Optimization for Gene Prediction Overlap ResolutionConstraints and Global Optimization for Gene Prediction Overlap Resolution
Constraints and Global Optimization for Gene Prediction Overlap ResolutionChristian Have
 
Nagios præsentation (på dansk)
Nagios præsentation (på dansk)Nagios præsentation (på dansk)
Nagios præsentation (på dansk)Christian Have
 
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...Christian Have
 
Inference with Constrained Hidden Markov Models in PRISM
Inference with Constrained Hidden Markov Models in PRISMInference with Constrained Hidden Markov Models in PRISM
Inference with Constrained Hidden Markov Models in PRISMChristian Have
 

Más de Christian Have (7)

Efficient Probabilistic Logic Programming for Biological Sequence Analysis
Efficient Probabilistic Logic Programming for Biological Sequence AnalysisEfficient Probabilistic Logic Programming for Biological Sequence Analysis
Efficient Probabilistic Logic Programming for Biological Sequence Analysis
 
Efficient Probabilistic Logic Programming for Biological Sequence Analysis
Efficient Probabilistic Logic Programming for Biological Sequence AnalysisEfficient Probabilistic Logic Programming for Biological Sequence Analysis
Efficient Probabilistic Logic Programming for Biological Sequence Analysis
 
Efficient Tabling of Structured Data Using Indexing and Program Transformation
Efficient Tabling of Structured Data Using Indexing and Program TransformationEfficient Tabling of Structured Data Using Indexing and Program Transformation
Efficient Tabling of Structured Data Using Indexing and Program Transformation
 
Constraints and Global Optimization for Gene Prediction Overlap Resolution
Constraints and Global Optimization for Gene Prediction Overlap ResolutionConstraints and Global Optimization for Gene Prediction Overlap Resolution
Constraints and Global Optimization for Gene Prediction Overlap Resolution
 
Nagios præsentation (på dansk)
Nagios præsentation (på dansk)Nagios præsentation (på dansk)
Nagios præsentation (på dansk)
 
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
ICLP 2009 doctoral consortium presentation; Logic-Statistic Models with Const...
 
Inference with Constrained Hidden Markov Models in PRISM
Inference with Constrained Hidden Markov Models in PRISMInference with Constrained Hidden Markov Models in PRISM
Inference with Constrained Hidden Markov Models in PRISM
 

Stochastic Definite Clause Grammars

  • 1. Stochastic Definite Clause Grammars InterLogOnt, Nov 24 Saarbrücken Christian Theil Have cth@ruc.dk
  • 2. What and why? ● DCG Syntax ● Probabilistic model – Convenient – Polynomial parsing – Expresssive – Parameter learning – Flexible – Robust Stochastic Definite Clause Grammars
  • 3. DCG Grammar rules ● Definite Clause Grammars – Grammar formalism on top of Prolog. – Production rules with unification variables – Context-sensitive.. (stronger actually) – Exploits unification semantics of Prolog Simple DCG grammar Difference list representation sentence --> subject(N), verb(N), object. sentence(L1,L4) :- subject(sing) --> [he]. subject(N,L1, L2), subject(plur) --> [they]. verb(N,L2,L3), object --> [cake]. object(L3,L4). object --> [food]. subject(sing,[he|R],R). verb(sing) --> [eats]. ... verb(plur) --> [eat].
  • 4. Stochastic Definite Clause Grammars ● Implemented as a DCG compiler – With some extensions to DCG syntax ● Transforms a DCG (grammar) into a stochastic logic program implemented in PRISM. ● Probabilistic inferences and parameter learning are then performed using PRISM (S)DCG Compilation PRISM program
  • 6. PRISM - http://sato-www.cs.titech.ac.jp/prism/ ● Extends Prolog with random variables (msws in PRISM lingo) ● Performs probabilistic inferences over such programs , ● Probability calculation - probability of a derivation ● Viterbi - find most probable derivation ● EM learning – learn parameters from a set of example goals PRISM program example: Bernoulli trials target(ber,2). values(coin,[heads,tails]). :- set_sw(coin, 0.6+0.4). ber(N,[R,Y]) :- N>0, msw(coin,R), % Probabilistic choice N1 is N – 1, ber(N1,Y). % Recursion ber(0,[]).
  • 7. The probabilistic model One random variable encodes probability of expansion for rules with same functor/ arity s(N) ==> np(N). s(N) ==> np(N),vp(N). The choice is made a selection rule The selected rule is invoked through unification transformation target(s,2). values(s,[s1,s2]). Selection rule s(A,B) :- msw(s,Outcome), s(Outcome, A, B). s(s1, A, B) :- np(_, A, B). Implementation rules s(s2, A, B) :- np(N, A, D), vp(N, D, B).
  • 8. Unification failure Since SDCG embodies unification constraints, some derivations may fail We only observe the successful derivations in sample data. All derivations If the training algorithm only considers successful derivations, it will converge to a wrong probability distribution (missing probability Failed derivations mass). In PRISM this is handled using the fgEM algorithm, which is based on Cussens Failure-Adjusted Maximization (FAM) algorithm. A “failure program” which traces all derivations is derived using First Order Compilaton and the probabilities of failed derivations are estimated as part of the fgEM algorithm.
  • 9. Unification failure issues Infinite/long derivation paths ● Impossible/difficult to derive failure program. ● Workaround: SDCG has an option which limits the depth of derivation. ● Still: size of the failure program is very much an issue. FOC requirement - “universally quantified clauses”: ● Not the case with Difference Lists: 'C'([X|Y], X,Y). ● Workaround 1: – Trick the first order compiler by manually adding implications after program is partly compiled. – Works empirically, but may be dubious ● Workaround 2: – Append based grammar – Works, but have inherent inefficiencies
  • 10. Syntax extensions ● SDCG extends the usual DCG syntax – Compatible with DCG (superset) ● Extensions: – Regular expression operators ● Convenient rule recursion – “Macros” ● Allows writing rules as templates which are filled out according to certain rules – Conditioning ● Convenient expression of higher order HMM's ● Lexicalization
  • 11. Regular expression operators Regular expressions operators can be associated with rule constituents: name ==> ?(title), +(firstname), *(lastname). Meaning: ? may be repeated zero or one times * may be repeated zero or more times + may be one or more time The constituent in the original rule is replaced with a substitute which refers to intermediary rules, which implements the regular expression. ? regex_sub ==> [] * regex_sub ==> original_constituent regex_sub ==> regex_sub,regex_sub + Limitation: Cannot be used in rules with unification variables.
  • 12. Template macros Special goals prefixed with @ are treated as macros. Grammar rules with macros are dynamically expanded. expand_mode Example: determines which word(he,sg,masc). word(she,sg,fem). variables to keep number(Word,Number) :- word(Word,Number,_). gender(Word,Gender) :- word(Word,_,Gender). wordlist(X,[X]). remove insert expand_mode(number(-, +)). word(@number(Word, N), @gender(Word,G)) ==> expand_mode(gender(-, +)). @wordlist(Word, WordList). expand_mode(wordlist(-, +)). Meta rule is created and called, exp(Word, N, G, WordList) :- number(Word,N), gender(Word, G), wordlist(Word,WordList). Resulting grammar: word(sg,masc) ==> [ he ]. find all answers word(sg,fem) ==> [ she ].
  • 13. Conditioning A conditioned rule takes the form, name(F1,F2,...,Fn) | V1,V2,...,Vn ==> C1,C2,...,Cn. The | operator can be seen as a guard that assures the rule is only expanded if the conditions V1..Vn unify with F1..FN It is possible to specify which variables must unify using a condition_mode: condition_mode(n(+,+,-)). n(A,B,C) | x,y ==> c1, c2. Conditioned rules are grouped by non-terminal name and arity and always has the same number of conditions. Probabilistic semantics: A distinct probability distribution for each distinct set of conditions.
  • 14. Conditioning semantics Model without conditioning: Model with conditioning: n ==> n1. n|a ==> n1(X). n ==> n2. n|a ==> n2(X). n1 ==> ... n|b ==> n1(X). ... n|b ==> n2(X). ... n1_1 n|a n1 n2_1 n n n2 n1_2 n|b n2_2 Stochastic selection Selection using unification
  • 15. Example, simple toy grammar start ==> s(N). n(sg) ==> [time]. s(N) ==> np(N). n(pl) ==> [flies]. s(N) ==> np(N),vp(N). v(sg) ==> [flies]. np(N) ==> n(sg),n(N). v(sg) ==> [crawls]. np(N) ==> n(N). v(pl) ==> [fly]. vp(N) ==> v(N),np(N). vp(N) ==> v(N) Probability of a | ?- prob(start([time,flies],[],Tree), P). sentence P = 0.083333333333333 ? yes | ?- viterbig(start([time,flies],[],Tree), P). Tree = [start,[[s(pl),[[np(pl),[[n(sg),[[]]],[n(pl),[[]]]]]]]]] The most probable P = 0.0625 ? parse yes | ?- n_viterbig(10,start([time,flies],[],Tree), P). Tree = [start,[[s(pl),[[np(pl),[[n(sg),[[]]],[n(pl),[[]]]]]]]]] P = 0.0625 ?; Most probable parses Tree = [start,[[s(sg),[[np(sg),[[n(sg),[[]]]]],[vp(sg),[[v(sg),[[]]]]]]]]] (indeed all two) P = 0.020833333333333 ?; no
  • 16. More interesting example Simple part of speech tagger – fully connected first order HMM. consume_word([Word]) :- word(Word). conditioning_mode(tag_word(+,-,-)). start(TagList) ==> tag_word(none,_,TagList). tag_word(Previous, @tag(Current), [Current|TagsRest]) | @tag(SomeTag) ==> @consume_word(W), ?(tag_word(Current,_,TagsRest)). Some tags Some words tag(none). word(the). tag(det). word(can). tag(noun). word(will). tag(verb). word(rust). tag(modalverb).