2. OVERVIEW Context free grammars CFG recognition using append CFG recognition using difference lists Definite clause grammars Adding Recursive Rules A DCG for a simple formal language
3. Context Free Grammars Definite Clauses Grammars(DCGs) are a special notation for defining grammars. context free grammar are a finite collection of rules which tell us that certain sentences are grammatical (that is, syntactically correct) and what their grammatical structure actually is.
4.
5. The symbols: a, the, woman, man, and shootsare called terminal symbolsA context free rule consists of a single non-terminal symbol, followed by ->, followed by a finite sequence made up of terminal and/or nonterminal symbols.
6. CFG recognition using append we can simply `turn the grammar into Prolog'. Ex: the string a woman shoots a man will be represented by the list [a,woman,shoots,a,man]. the rule s -> np vp can be thought of as saying: a list of words is an s list if it is the result of concatenating an np list with a vp list. We can make use of append to turn these kinds of rules into Prolog. s(Z) :- np(X), vp(Y), append(X,Y,Z). np(Z) :- det(X), n(Y), append(X,Y,Z). vp(Z) :- v(X), np(Y), append(X,Y,Z). vp(Z) :- v(Z). v([shoots]).
7.
8. CFG recognition using difference lists A more efficient implementation can be obtained by making use of difference lists. The key idea underlying difference lists is to represent the information about grammatical categories not as a single list, but as the difference between two lists. For example, instead of representing a woman shoots a man as [a,woman,shoots,a,man] we might represent it as the pair of lists [a,woman,shoots,a,man] []. represents the sentence a woman shoots a man because it says: If I consume all the symbols on the left, and leave behind the symbols on the right, I have the sentence I am interested in. That is: the sentence we are interested in is the difference between the contents of these two lists.
9. Definite clause grammars DCGs, quite simply is a nice notation for writing grammars that hides the underlying difference list variables. Ex: The previous grammar written as a DCG: s --> np,vp. np --> det,n. vp --> v,np. vp --> v. det --> [the]. det --> [a]. n --> [woman]. n --> [man]. v --> [shoots].
10. Definite clause grammars To find out whether a woman shoots a man is a sentence, we pose the query: s([a,woman,shoots,a,man],[]). That is, just as in the difference list recognizer, we ask whether we can get an s by consuming the symbols in [a,woman,shoots,a,man], leaving nothing behind. Similarly, to generate all the sentences in the grammar, we pose the query: s(X,[]).
11. Adding Recursive Rules Our original context free grammar generated only 20 sentences. However it is easy to write context free grammars that generate infinitely many sentences: we need simply use recursive rules. EX: Let's add the following rules to our grammar: s -> s conj s conj -> and conj -> or conj -> but This rule allows us to join as many sentences together as we like using the words and, but and or.
12. Adding Recursive Rules Turning this grammar into DCG rules. s --> s,conj,s. conj --> [and]. conj --> [or]. conj --> [but]. First, let's add the rules at the beginning of the knowledge base before the rule s --> np,vp. If now, we pose the query s([a,woman,shoots],[])? Prolog gets into an infinite loop.
13. So, by just reordering clauses or goals, we won't solve the problem. The only possible solution is to introduce a new nonterminal symbol. We could for example use the category simple_s for sentences without embedded sentences. Our grammar would then look like this: s --> simple_s. s --> simple_s conj s. simple_s --> np,vp. np --> det,n. vp --> v,np. vp --> v. det --> [the]. det --> [a]. n --> [woman]. n --> [man]. v --> [shoots]. conj --> [and]. conj --> [or]. conj --> [but].
14. A DCG for a simple formal language we shall define a DCG for the formal language a^nb^n. There are only two `words' in this language: The symbol a and the symbol b. The language consist of all strings made up from these two symbols that have the following form: the string must consist of an unbroken block of as of length n, followed by an unbroken block of bs of length n, and nothing else. So the strings ab, aabb, aaabbb and aaaabbbb all belong to a^nb^n.
15. CFG to generate this language: s -> epsilon s -> l s r l -> a r -> b Turning this grammar into DCG. s --> []. s --> l,s,r. l --> [a]. r --> [b]. And this DCG works exactly as we would hope. For example, to the query s([a,a,a,b,b,b],[]). we get the answer `yes',
16. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net