2. 목 차
2.1 언어(Language)
2.2 문법(Grammar)
2.3 문법의 분류
Inroduction to FL theory
Page 2
꼭 기억해야 할 세 가지 개념
1. 언어의 정의
2. 문법의 정의 및 개념
3. 인식기의 의미
3. Language
Basic definitions
(1) alphabet
a finite set of symbols.
ex) T1 = {ㄱ,ㄴ,ㄷ,...,ㅎ,ㅏ,ㅑ, … ,ㅡ,ㅣ}
T2 = {A,B,C, … ,Z, a, b, c, …, z}
T3 = {main, int, char, …, while}
(2) string(or sentence, word)
a sequence of symbols from some alphabet T.
(3) length
the number of symbols in the string.
denoted by |ω|
Inroduction to FL theory
Page 3
4. (4) empty string
a string consisting of no symbols.
denoted by ε or λ.
(5) T*
denotes the set of all strings of symbols over the
alphabet T, including the empty string.
T+
= T*
- {ε}
T*
: T star
T+
: T dagger
(6) Language is any set of strings over an alphabet.(Text p.40)
(or A Language L over the alphabet T is a subset of T*
.)
L ⊆ T*
Inroduction to FL theory
Page 4
5. Two problems
(1) How do we represent a language ?
If the language is finite, the answer is easy.
If the language is infinite, we are faced with the problem
of finding a finite representation for the language.
Set description
Grammar : Generating Scheme
Recognizer : Recognition Scheme
(2) Does there exist a finite representation for every language ?
No !
This is not always possible.
Inroduction to FL theory
Page 5
L1 = {a, ab, ba, aba}
L2 = T* = {, a, b, aa, ab, ….}
L3 = {anbn | n1}
L4 = {wwR | wT*}
6. More definitions
(1) concatenation
u = a1a2a3...an, v = b1b2b3...bm , u • v = a1a2a3...anb1b2b3...bm
u • v를 보통 uv로 표기.
uε= u = εu
∀u,v ∈ T*
, uv ∈ T*
.
|uv| = |u| + |v|
(2) an
represents n a's.
a0
= ε
(3) the reversal of a string ω, denoted ωR
is the string ω
written in reverse order:
i.e., if ω = a1a2...an then ωR
= anan-1...a1.
Inroduction to FL theory
Page 6
(ωR)R=ω
7. (4) language product
LL' = {xy| x ∈ L and y ∈ L'}
(5) The powers of a language L are defined recursively by:
L0
= {ε}
Ln
= LLn-1
for n 1.
(6) L*
: reflexive transitive closure
= L0
∪ L1
∪ L2
∪ ...∪ Ln
∪… =
(7) L+
: transitive closure
= L1
∪ L2
∪... ∪ Ln
∪ ...
= L*
- L0
0i
i
L
Inroduction to FL theory
Page 7
L = {a, ab, ba, aba}
L0 = {}
L2 = {aa, aab, aba, aaba, …}
8. Grammar
Language
문장(sentence)들을 원소로 갖는 집합
언어를 어떻게 표현할 것인가 ?
Grammar
terminal : 정의된 언어의 알파벳
nonterminal :
스트링을 생성하는 데 사용되는 중간 과정의 심볼
언어의 구조를 정의하는데 사용
grammar symbol (V)
Inroduction to FL theory
Page 8
9. Inroduction to FL theory
Page 9
G = (VN, VT, P, S)
VN : a finite set of nonterminal symbols
VT : a finite set of terminal symbols
VN ∩ VT = , VN∪ VT = V
P : a finite set of production rules
α → β, α∈ V+
, β∈ V*
lhs rhs
S : start symbol(sentence symbol)
10. [예] G = ( {S, A}, {a, b}, P, S ) Text p.47 [예 8]
P : S → aAS S → a
A → SbA A → ba A → SS
⇒ S → aAS | a
A → SbA | ba | SS
Inroduction to FL theory
Page 10
11. Derivation
1. ⇒ : “directly produce” or “directly derive”
if α → β∈ P and , δ∈ V*
then
αδ ⇒ βδ
2. ⇒ : Suppose α1,α2,...,αn ∈ V*
and α1 ⇒α2 ⇒… ⇒αn,
then α1 ⇒ αn
(zero or more derivations)
3. ⇒ : one or more derivations.
cf) → : production rule에서 사용.
“may be replaced by”
⇒ : derivation할 때 사용한다.
Inroduction to FL theory
Page 11
*
+
*
12. Ex. P : S → aA | bB | ε
A → bS
B → aS
S ⇒ abba 유도 과정
Inroduction to FL theory
Page 12
S ⇒ aA (생성규칙 S → aA)
⇒ abS (생성규칙 A → bS)
⇒ abbB (생성규칙 S → bB)
⇒ abbaS (생성규칙 B → aS)
⇒ abba (생성규칙 S → )
13. L(G) : Language generated by grammar G
L(G) = {ω | S ⇒ ω, ω ∈ VT
*
}
☞ ω is a sentential form of G if S ⇒ ω and ω ∈ V*
.
ω is a sentence of G if S ⇒ ω and ω ∈ VT
*
.
Inroduction to FL theory
Page 13
*
*
*
S ⇒ aA ( 생성규칙 S → aA를 적용 )
⇒ abS ( 생성규칙 A → bS를 적용 )
⇒ abbB ( 생성규칙 S → bB를 적용 )
⇒ abbaS ( 생성규칙 B → aS를 적용 )
⇒ abba ( 생성규칙 S → 를 적용 )
S ⇒ abba
sentential form : aA, abS, abbB, abbaS, abba
sentence : abba
14. G1 = ( {S}, {a}, P, S ) 을 이용하여 L(G1)
P : S → a | aS
L (G1) = { an
| n 1 }
Language design
Grammar Language
generation
design
Inroduction to FL theory
Page 14
15. 주어진 문법으로부터 생성되는 언어 발견 과정
1. start symbol로부터 길이가 짧은 순으로 생성규칙 적용
2. 생성된 문장들의 형태를 고려하여 일정한 규칙 발견
G = ( {A, B, C}, {a, b, c}, P, A)
P : A → abc A → aBbc
Bb → bB Bc → Cbcc
bC → Cb aC → aaB
aC → aa
Inroduction to FL theory
Page 15
L(G) = { anbncn | n 1 }
16. (===>) ex1) S → 0S1 | 01
ex2) S → aSb | c
ex3) A → aB
B → bB | b
ex4) O → a
O → aE
E → aO
ex5) A → abc A → aBbc
Bb → bB Bc → Cbcc
bC → Cb aC → aaB
aC → aa
Inroduction to FL theory
Page 16
17. 문법 기술 방법
embedded rule : S → aSb
right-recursive rule : A → aA
left-recursive rule : A → Aa
L = { an
| n 0 } 일 때 문법 : A → aA | ε
L = { an
| n 1 } 일 때 문법 : A → aA | a
Embedded production A → aAb
ex1) L1 = { an
bn
| n 0 }
ex2) L2 = { 0i
1j
| i j, i,j 1 }
ex3) Constructs of Conventional PL
Inroduction to FL theory
Page 17
18. ex) C 언어의 정수 선언 부분 :
정수선언 부분은 여러 개의 정수선언으로 구성되며 하나의 선
언은 int a,a,a;와 같은 형태를 갖는다. 여기서 a는 임의의
identifier를 나타낸다.
그리고 ; 으로 각각의 선언을 구분한다. 예를 들어, int i,j; int sum;
과 같다.
※ 문법을 고안할 때, nonterminal의 이름은 구문 구조를
대변할 수 있는 명칭으로 쓰는 것이 바람직하다.
Inroduction to FL theory
Page 18
19. In order to prove that a grammar generates a
language L
i) Every sentence generated by the grammar is in L.
ii) Every string in L can be generated by the grammar.
교과서 55쪽
[예16]
proof) (=>) Every sentence derivable from S is balanced.
(<=) Every balanced string is derivable from S.
Inroduction to FL theory
Page 19
G = ( { S }, { ( , ) }, {S → (S)S |ε}, S )
⇔ All strings of balanced parentheses.
20. (=>) Every sentence derivable from S is balanced.
(i.e., S ⇒ ω, ω: balanced)
By induction on the number of steps in a derivation.
i) n = 1 일 때, S ⇒ ε, ε is surely balanced.
ii) Suppose that all derivations of fewer than n steps
produce balanced sentences.
iii) Consider a leftmost derivation of exactly n steps.
S ⇒ (S)S ⇒ (x)S ⇒ (x)y
By the hypothesis x, y : balanced.
Thus (x)y balanced.
Inroduction to FL theory
Page 20
*
* *
*
21. (<=) Every balanced string is derivable from S.
By induction on the length of a string.
i) |ω| = 0, S ⇒ ε
(the empty string is derivable from S.)
ii) Assume that every balanced string of length less than 2n is derived from S.
iii) Consider a balanced string ω of length 2n.
Let (x) : shortest prefix of ω being balanced.
Thus ω = (x)y, where x, y : balanced.
Since |x|, |y | < 2n, they are derivable from S by inductive hypothesis.
Thus S ⇒ (S)S ⇒ (x)S ⇒ (x)y = ω
Therefore, (x)y is also derivable from S.
Inroduction to FL theory
Page 21
* *
22. Chomsky Hierarchy
Noam Chomsky
According to the form of the productions.
α → β ∈ P
Type 0 : No restrictions(unrestricted grammar)
Type 1 : Context-sensitive grammar(CSG).
→ β, | | | β|
Type 2 : Context-free grammar(CFG).
A → , where A : nonterminal, ∈ V*
.
Type 3 : Regular grammar(RG).
A → tB or A → t, (right-linear)
A → Bt or A → t, (left-linear)
where, A, B : nonterminal, t ∈ VT
*.
Inroduction to FL theory
Page 22
23. REL (Recursively Enumerable Language)
CSL (Context Sensitive Language)
CFL (Context Free Language)
RL (Regular Language)
Examples of Formal Language
simple matching language : Lm = {an
bn
| n ≥ 0} CFL
double matching language : Ldm = {an
bn
cn
| n ≥ 0} CSL
mirror image language : Lmi = {ωωR
| ω ∈ VT
*
} CFL
palindrome language : Lr = {ω | ω = ωR
} CFL
parenthesis language : Lp = {ω | ω: balanced parenthesis} CFL
Inroduction to FL theory
Page 23
24. The Chomsky Hierarchy of Languages
unrestricted language
context-sensitive language
context-free language
regular language
Inroduction to FL theory
Page 24
25. Languages & Recognizers
Grammar Language Recognizer
type 0
(unrestricted)
recursively
enumerable set
Turing
machine
type 1
(context-sensitive)
context-sensitive
language
Linear Bounded
Automata
type 2
(context-free)
context-free
language
Pushdown
Automata
type 3
(regular)
regular
language
Finite Automata
Inroduction to FL theory
Page 25