Ch03

제 3 장
정규 언어
컴파일러 입문

목 차
3.1 정규 문법과 정규 언어
3.2 정규 표현
3.3 유한 오토마타
3.4 정규 언어의 속성
Regular Language Page 2

정규 문법과 정규 언어
 A study of the theory of regular languages is often justified by the fact
that they model the lexical analysis stage of a compiler.
 Type 3 Grammar(N. Chomsky)
RLG (Right-Linear Grammar): A → tB, A → t
LLG (Left-Linear Grammar) : A → Bt, A → t
 where, A,B ∈ VN and t ∈ VT*.
 ex. G1 : S → 000S | 000 G2 : S→ S000 | 000
 It is important to note that grammars in which
left-linear productions are intermixed with right-linear productions
are not regular.
For example,
G : S → aR S → c R → Sb
L(G) = {an
cbn
| n  0} is a cfl.

 Definition
(1) A grammar is regular if each rule is
i) A  aB, A  a, where a  VT, A, B  VN.
ii) if S  ε  P, then S doesn't appear in RHS.
 우선형 문법 A  tB, A  t의 형태에서 t가 하나의 terminal로
이루어진 경우로 정규 문법에 관한 속성을 체계적으로 전개하기 위
하여 바람직한 형태이다.
(2) A language is said to be a regular language(rl) if it can be
generated by a regular grammar.
ex) L = { an
bm
| n, m ≥1 } is rl.
S  aS | aA
A  bA | b

[Theorem] The production forms of regular grammar
can be derived from those of RLG.(RLG => RG) (Text p.69)
(proof)
A  tB, where t  VT.
Let t = a1a2...an, ai  VT.
A  a1A1
A1  a2A2...
An-1  anB.
ex) S  abcA ⇒ S  aS1, S1  bS2 S2  cA
A  bcA ⇒ A  bA1, A1  cA
A  cd ⇒ A  cA1', A1'  d
If t = e, then A  B (single production) or A  e (epsilon production).
⇒ These forms of productions can be easily removed.
(Text pp.175-181)
Right-linear grammar :
A → tB or A → t,
where A, B ∈ VN and t ∈ VT
*
.

Equivalence
1. 언어 L은 우선형 문법에 의해 생성된다.
2. 언어 L은 좌선형 문법에 의해 생성된다.
3. 언어 L은 정규 문법에 의해 생성된다.
정규 언어
[예] L = {an
bm
| n,m ≥ 1} : rl
S  aS | aA
A  bA | b

 토큰의 구조를 정의하는데 정규 언어를 사용하는 이유
(1) 토큰의 구조는 간단하기 때문에 정규 문법으로 표현할 수 있다.
(2) context-free 문법보다는 정규 문법으로부터 효율적인 인식기를
구현할 수 있다.
(3) 컴파일러의 전반부를 모듈러하게 나누어 구성할 수 있다.
(Scanner + Parser)
 문법의 형태가 정규 문법이면
그 문법이 나타내는 언어의 형태를 체계적으로 구하여
정규 문법으로 나타낼 수 있다.
derivation
LG
if G = rg, L: re.

정규 표현 (Regular Expression)
 A notation that allows us to describe the structures of
sentences in regular language.
 The methods for specifying the regular languages
(1) regular grammar(rg)
(2) regular expression(re)
(3) finite automata(fa)
rg
refa

 Definition :
 A regular expression over the alphabet T and the language
denoted by that expression are defined recursively as follows :
I. Basis : f , e , a  T.
(1) f is a regular expression denoting the empty set.
(2) e is a regular expression denoting {e}.
(3) a where a  T is a regular expression denoting {a}.
II. Recurse : + , • , *
If P and Q are regular expressions denoting Lp and Lq
respectively, then
(1) (P + Q) is a regular expression denoting Lp U Lq. (union)
(2) (P • Q) is a regular expression denoting Lp  Lq. (concatenation)
(3) (P*) is a regular expression denoting (closure)
{e} U Lp U Lp
2
U ... U Lp
n
...
Note : precedence : + < • < *
III. Nothing else is a regular expression.

 other notations
 (e1)•(e2)= (e1)(e2)
 (e1)+
=(e1)(e1)*
 (e1)+(e2)=(e1)|(e2)
 regular expression examples
 ab* = {abn
| n0 }
 (0+1)* denotes {0,1}*.
 (0+1)*011 denotes the set of all strings of 0
s and 1
s
ending in 011.
 identifier의 정규수식 : letter(letter+digit)*

 Definition : if α is a regular expression, L(α) denotes
the language associated with α . (Text p.77)
 Let a and b be regular expressions. Then,
(1) L(α+ β) = L(α)  L(β)
(2) L(α β) = L(α) L(β)
(3) L(α*
) = L(α)*
 examples :
(1) L(a*
) = {e, a, aa, aaa, … } = {an
| n  0}
(2) L((aa)*
(bb)*
b) = {a2n
b2m+1
| n,m  0}
(3) L((a+b)*
b(a+ab)*
) --- 연습문제 3.2 (3) - text p.115
= { b, ba, bab, ab, bb, aab, bbb, … }

 Definition : Two regular expressions are equal if and
only if they denote the same language.
 α= β if L(α) = L(β).
 Axioms : Some algebraic properties of regular
expressions. Let a, b and g be regular expressions.
Then, (Text p.73)
A1. α+β = β+α A2. (α+β) +γ = α+ (β+γ)
A3. (αβ) γ = α (βγ) A4. α(β+γ) = αβ +αγ
A5. (β + γ) α = βα + γα A6. α+α=α
A7. α + f = α A8. αf = f = fα
A9. e α = α = α e A10. α*
= e +α•α*
A11. α*
= (e + α)*
A12. (α*
)*
= α*
A13. α*
+ α = α *
A14. α*
+ α+
= α*
A15. (α + β)*
= (α*
β *
) *

 All of these identities(=Axioms) are easily proved by the
definition of regular expression.
A8. αf = f = f α
(proof) αf = { xy | x  Lα and y Lf }
Since y  Lf is false, (x  Lα and y  Lf) is false.
Thus αf = f .
 Definitions : regular expression equations.
::= the set of equations whose coefficient are
regular expressions.
ex) α,β가 정규 표현이면, X = αX+β가 정규 표현식이다. 이때,
X 의 의 미 는 nonterminal 심 볼 이 며 우 측 의 식 이 그
nonterminal이 생성하는 언어의 형태이다.

▶ The solution of the regular expression equation
X = αX + β.
 When we substitute X = α*β in both side of the equation,
each side of the equation represents the same language.
X = αX + β
= α(α*β) + β
= αα*β + β = (αα* + ε)β = α*β.
 fixed point iteration
X = αX + β
= α(αX + β) + β
= α2
X + αβ + β
= α2
X + (ε + α)β...
= αk+1
X + (ε + α + α2
+ ... αk
)β
= (ε + α + α2
+ ... + αk
+ ...)β = α*β.

 Not all regular expression equations have unique solution.
X = αX + β
(a) If ε is not in α, then X = α*
β is the unique solution.
(b) If ε is in α, then X = α*
(β + L) for some language L.
So it has an infinity of solutions.
⇒ Smallest solution : X = α*
β.
ex) X = X + a : not unique solution
⇒ X = a + b or X = b*
a or X = (a + b)*
etc.
X = X + a X = X + a
= a + b + a = b*
a + a
= a + a + b = (b*
+ ε) a
= a + b. = b*
a

 Finding a regular expression denoting L(G) for a given rg G.
 L(A) where A  VN denotes the language generated by A.
By definition, if S is a start symbol, then L(G)= L(S).
 Two steps :
1. Construct a set of simultaneous equations from G.
A  aB, A  a
L(A) = {a}·L(B) U {a}  A = aB + a
In general, X  α |β| γ ⇒ X = α + β + γ.
2. Solve these equations.
X = αX + β  X = α*
β.
derivation
LGG L
if G = rg, L: re.

 P.80
 step1) 정규문법에서 정규표현식을 구성
 X  α |β| γ ⇒ X = α + β + γ
 step2) 구성된 정규표현식에서
X = αX + β 형태의 식은 X = α*
β 으로 대체
 step3) step2에서 얻는 X의 정규표현식을 다른 표현에 대입하고
다시 Y = αY + β 형태가 나타나면 Y = α*
β 으로 대체
 step4) 시작 심볼에 대한 정규표현식을 S = αS + β 형태로 고쳐
S = α*
β 로 풀면
α*
β 가 정규문법(G)으로부터 생성될 수 있는
정규언어(L(G))가 됨
Page 17

ex1) S  aS S  bR S  ε R  aS
L(S) = {a}L(S) U {b}L(R) U{ε}
L(R) = {a}L(S)
 ree: S = aS + bR + ε
R = As
S = aS + baS + ε
= (a + ba)S + ε
= (a + ba)*
ε = (a + ba)*
ex2) S  aA | bB | b A bA | ε B bS
 ree: S = aA + bB + b
A = bA + ε ⇒ A = b*
ε = b*
B = bS
 S = ab*
+ bbS + b
= bbS + ab*
+ b
= (bb)*
(ab*
+b)
X  α |β| γ ⇒ X = α + β + γ
X = αX + β  X = α*β.

인식기(Recognizer)
☞ A recognizer for a language L is a program
that takes as input string x and answers “yes ”
if x is a sentence of L and “no ” otherwise.
a0a1a2 … aiai+1ai+2 … an
Finite State Control
input head
Auxiliary Storage
input
• Turing Machine
• Linear Bounded A
• PushDown Automata
• Finite Automata

유한 오토마타
G = (VN, VT, P, S)
re : f, e, a, + , • , *
M = (Q, , , q0, F)
 Definition : fa
A finite automaton M over an alphabet  is a system (Q, , , q0, F)
where, Q : finite, non-empty set of states.
 : finite input alphabet.
 : mapping function.
q0  Q : start(or initial) state.
F ⊆ Q : set of final states.
 mapping  : Q x   2Q
.
i,e. (q,a) = {p1, p2, ... , pn}
 DFA , NFA.

목차 - FA
1. DFA
2. NFA
3. Converting NFA into DFA
4. Minimization of FA
5. Closure Properties of FA

1. Deterministic Finite Automata(DFA)
 deterministic if (q,a) consists of one state.
We shall write "(q,a) = p " instead of (q,a) = {p} if deterministic.
If δ(q,a) always has exactly one number,
We say that M is completely specified.
 예) DFA M = ({q0, q1, q2}, {a, b}, , q0, {q2})
(q0, a) = q1 (q0, b) = q2 (q1, a) = q2
(q1, b) = q0 (q2, a) = q0 (q2, b) = q1
 전이함수를 행렬로 표시한 것을
상태전이표(state-transition table)라 함.
 상태 q0에서 input string aba가 나타났을 때
(q0, aba) = ( (q0, ab), a) = (((q0, a), b), a) -> ( (q1, b), a) = (q0, a) =q1
 a b
q0 q1 q2
q1 q2 q0
q2 q0 q1

1. Deterministic Finite Automata(DFA)
 extension of  : Q x  ⇒ Q x *
(q, e ) = q
(q,xa) = ((q,x),a), where x  *
and a .
 A sentence x is said to be accepted by M
if (q0, x) = p , for some p  F.
 The language accepted by M :
L(M) = { x | (q0,x)  F }

ex) M = ( {p, q, r}, {0, 1}, , p, {r} )
 : (p,0) = q (p,1) = p
(q,0) = r (q,1) = p
(r,0) = r δ(r,1) = r
 1001  L(M) ?
(p,1001) = (p,001) = (q,01) = (r,1) = r  F．
∴ 1001  L(M).
 1010  L(M) ?
(p,1010) = (p,010) = (q,10) = (p,0) = q  F.
∴ 1010  L(M).
  : matrix 형태로  transition table. ex)
pqp
rrr
prq
10
Input symbols


 Definition : State (or Transition) diagram for automaton.
 The state diagram consists of a node for every state
and a directed arc from state q to state p with label
a   if (q,a) = p.
 Final states are indicated by a double circle and
the initial state is marked by an arrow labeled start.
p rstart
0, 11
q
0
1
0
(1+01)*
00(0+1)*
Astart
letter, digit
S
letter
Identifier :
pqp
rrr
prq
10
Input symbols


Algorithm : w  L(M).
assume M = (Q, , , q0, F);
begin
currentstate := q0; (* start state *)
get(nextsymbol);
while not eof do
begin currentstate := (currentstate, nextsymbol);
get(nextsymbol)
end;
if currentstate in F then write(‘Valid String’)
else write(‘Invalid String’);
end.
?

2. Nondeterministic Finite Automata(NFA)
 nondeterministic if (q,a) = {p1, p2, ..., pn}
 In state q, scanning input data a, moves input head one symbol
right and chooses any one of p1, p2, ..., pn as the next state.
 ex) NFA (Nondeterministic Finite Automata)
M = ( {q0,q1,q2,q3,qf}, {0,1}, , q0, {qf} )
 if (q,a) = f, then (q,a) is undefined.
 (q0,1001) = {q1,q3,qf)
δ 0 1
q0 {q1, q2} {q1, q3}
q1 {q1, q2} {q1, q3}
q2 {qf} f
q3 f {qf}
qf {qf} {qf}

 To define the language recognized by NFA, we must extend .
(i)  : Q x *
→ 2Q
( q, ε ) = { q }
( q, xa ) = U (p,a), where a  VT and x  VT
*
.
p  ( q, x )
(ii)  : 2Q
x *
→ 2Q
({p1, p2, ..., pk}, x) =
 Definition : A sentence x is accepted by M
if there is a state p in both F and (q0, x).
ex) 1011  L(M) ?
({q0}, 1011) = ({q1,q3}, 011) = ({q1,q2},11)
= ({q1,q3},1) = {q1,q3,qf}
 1011  L(M) ( ∵ {q1,q3,qf} ∩ {qf}  Φ)
ex) 0100  L(M) ?

k
i
i xp
1
),(


δ 0 1
q0 {q1, q2} {q1, q3}
q1 {q1, q2} {q1, q3}
q2 {qf} f
q3 f {qf}
qf {qf} {qf}

 Nondeterministic behavior
 If the number of states |Q| = m and input length |x| = n,
then there are mn
nodes.
 exponential time -> computationally intractable
 In general, NFA can not be easily simulated by a simple
program, but DFA can be simulated easily.
And so we shall see DFA is constructible from the NFA.
(q0, 1011)
(q1, 011) (q3, 011)
(q1, 11) (q2, 11) f
(q1, 1) (q3, 1) f
q1 q3 qf

3. Converting NFA into DFA
 NFA : easily describe the real world.
DFA : easily simulated by a simple program.
===> Fortunately, for each NFA we can find a DFA accepting
the same language.
 Accepting Sequence(NFA)
(q0, a1,a2 ... an) = ({q1,q2, … ,qi}, a2a3 ... an)
... ...
= ({p1,p2, … ,pj}, ai ... an)
... ...
= {r1,r2, ... ,rk}
 Since the states of the DFA represent subsets of the set of all
states of the NFA, this algorithm is often called the subset
construction.

[Theorem] Let L be a language accepted by NFA. Then
there exists DFA which accepts L. (Text p.86)
(proof) Let M = (Q, , , q0, F) be a NFA accepting L.
Define DFA M' = (Q', , ', q0', F') such that
(1) Q' = 2Q
, {q1, q2, ..., qi} ∈ Q', where qi ∈ Q.
denote a set of Q' as [q1, q2, ..., qi].
(2) q0' = {q0} = [q0]
(3) F' = {[r1, r2, ..., rk] | ri ∈ F}
(4) ' : ' ([q1, q2, ...,qi], a) = [p1, p2, ..., pj]
if ({q1, q2, ..., qj}, a) = {p1, p2, ..., pj}.
Now we must prove that L(M) = L(M’) i.e,
' (q0',x)  F'  (q0, x) ∩ F  f.
 we can easily show that by inductive hypothesis on the length
of the input string x.

ex1) M = ({q0,q1}, {0,1}, , q0, {q1}),
 dfa M' = (Q', , ', q0', F'),
where Q' = 2Q
= {[q0], [q1], [q0,q1]}
q0' = [q0]
F' = {[q1], [q0,q1]}
δ' :δ'([q0],0) = δ({q0},0) = {q0,q1} = [q0,q1]
δ'([q0],1) = {q0} = [q0]
δ' ([q1],0) = δ(q1,0) = f
δ' ([q1],1) = δ(q1,1) = {q0,q1} = [q0,q1]
δ' ([q0,q1],0) = δ({q0,q1},0) = {q0,q1} = [q0,q1]
δ' ([q0,q1],1) = δ({q0,q1},1) = {q0,q1} = [q0,q1]
 0 1
q0
{q0 , q1} {q0}
q1
f {q0 , q1}

 State renaming : [q0] = A, [q1] = B, [q0,q1] = C.
 Since B is an inaccessible state, it can be removed.
’ 0 1
A C A
B f C
C C C
A Cstart
0, 11
0
B
1
A Cstart
0, 11
0

 Definition : we call a state p accessible if there is w
such that (q0, w)  (p, ε) , where q0 is the initial state.
ex2) NFA  DFA
*
NFA :  0 1
q0 {q1,q2} {q1,q3}
q1 {q1,q2} {q1,q3}
q2 {qf} f
q3 f {qf}
qf {qf} {qf}
DFA : ’ 0 1
q0 q1q2 q1q3
q1q2 q1q2qf q1q3
q1q3 q1q2 q1q3qf
q1q2qf q1q2qf q1q3qf
q1q3qf q1q2qf q1q3qf

Definition : e - NFA M = (Q, , , q0, F)
 : Q  (   {e} )  2Q
e - CLOSURE : e을 보고 갈 수 있는 상태들의 집합
 s가 하나의 상태
e-CLOSURE(s) = {s}{q | (p, e)=q, p  e-CLOSURE(s)}
 T가 하나 이상의 상태 집합인 경우
e-CLOSURE(T) =
ex) e - NFA에서 e - CLOSURE를 구하기
e - CLOSURE (A) = {A, B, D}
e - CLOSURE({A,C}) = CLOSURE(A)  CLOSURE(C) = {A, B, C, D}
Tq
qCLOSURE

 )(e
A Dstart
a
C
a
B
b
ε
εε
a

Ex) e - NFA  DFA
A = [1,3,4], B = [2], C = [3,4], D = [4]
1start
a
c
2 b
ε ε3
4
Dstart
a b
A B
C
c
c

CLOSURE(1) = {1,3,4}
 [1,3,4]
a
CLOSURE(2) = {2}
 [2]
b
f CLOSURE(3) = {3,4}
 [3,4]
c
[2] f CLOSURE(4) = {4}
 [4]
f
[3,4]
[4]
f f
CLOSURE(3) = {3,4}
 [3,4]
f f f

4. Minimization of FA
 State minimization => state merge
 Definition :
 ω* distinguishes q1 from q2 if (q1,ω) = q3,
(q2,ω) = q4 and exactly one of q3, q4 is in F.
 Algorithm : equivalence relation() ⇒ partition.
(1)  : final state인가 아닌 가로 partition.
(2)  : input symbol에 따라 다른 equivalence class 로 가는가?
그 symbol로 distinguish 된다고 함.
:
(3)  : 더 이상 partition이 일어나지 않을 때까지.
 The states that can not be distinguished are merged into a
single state.

a b
[AF] [BE]
[BE] [CD]
[CD] [AF]
[AF]
[BE]
[CD]
δ’
Ex)
 : {A,F}, {B, C, D, E} : 처음에 final, nonfinal로 분할한다.
 : {A,F}, {B,E}, {C,D} : {B, C, D, E} 가 input symbol에 의해
partition 됨
 : {A,F}, {B,E}, {C,D}.
D
F
B E
A
a
a
C
a
a
ba
b
b b
b
b
a
start

 How to minimize the number of states in a fa.
<step 1> Delete all inaccessible states;
<step 2> Construct the equivalence relations;
<step 3> Construct fa M’ = (Q’, , ’, q0’, F’),
(a) Q’ : set of equivalence classes under 
Let [p] be the equivalence class of state p under .
(b) ’([p],a) = [q] if (p,a) = q.
(c) q0’ is [q0].
(d) F' = {[q] | q  F}.
 Definition : M is said to be reduced
if (1) no state in Q is inaccessible and
(2) no two distinct states of Q are indistinguishable

A
B
C
δ
D
E
F
B
E
A
F
D
D
C
F
A
E
F
E
0 1
ex) Find the minimum state finite automaton for the language specified by
the finite automaton M = ({A,B,C,D,E,F}, {0,1}, , A, {E,F}),
 where  is given by
 : {A, B, C, D}, {E, F}  : {A}, {C}, {B, D}, {E, F}
0 1
[A] = S1 S3 S2
[C] = S2 S1 S1
[BD] = S3 S4 S4
[EF] = S4 S3 S4

5. Closure properties of FA
[Theorem] If L1 and L2 are finite automaton languages (FAL),
then so are (i) L1 U L2 (ii) L1 • L2 (iii) L1
*
.
(proof) M1 = (Q1, , 1, q1, F1)
M2 = (Q2, , 2, q2, F2), Q1  Q2 = f (∵ renaming)
(i) M = (Q1 U Q2 U {q0}, , , q0, F)
where, (1) q0 is a new state.
(2) F = F1 U F2 if e  L1 U L2.
F1 U F2 U {q0} if e  L1 U L2.
(3) (a) (q0,a) = (q1,a) U (q2,a) for all a  .
(b) (q,a) = 1(q,a) for all q  Q1, a  .
(c) (q,a) = 2(q,a) for all q  Q2, a  .
 새로운 시작 상태를 만들어 각각의 fa에 마치 각 fa의 시작 상태에서 온
것처럼 연결한다. 그리고 를 인식하면 새로 만든 시작 상태도 종결 상
태로 만든다.
ex) p.105 [예 28]

(ii) M = (Q1 U Q2, , , q0, F)
(1) F = F2 if q2  F2
F1 U F2 if q2  F2
(2) (a) (q,a) = 1(q,a) for all q  Q1 - F1.
(b) (q,a) = 1(q,a) U 2(q2,a) for all q  F1.
(c) (q,a) = 2(q,a) for all q  Q2.
 M1의 종결 상태에서 M2의 시작 상태에서 온 것처럼 연결한다. 그리고 M1의
시작 상태가 접속한 오토마타의 시작 상태가 된다.
A Bstart
1
0
M1 : => 01*
X Ystart
0
1
M2 : => 10*
A Ystart
0
0
M1 •M2 : => 01*
10*
B
1
1

정규 언어의 속성
Regular grammar (rg)
Finite automata (fa) Regular expression (re)
※ re ===> fa : scanner generator

목 차
1. RG & FA
2. FA & RE
3. Closure Properties of Regular Language
4. The Pumping Lemma for Regular
Language

1. RG & FA
 Given rg, there exists a fa that accepts the same
language generated by rg and vice versa.
 rg  fa
Given rg, G = (VN, VT, P, S) , construct M = (Q, , , q0, F).
(1) Q = VN U {f}, where f is a new final state.
(2)  = VT.
(3) q0 = S.
(4) F = {f} if e  L(G)
= {S, f} otherwise.
(5)  : if A  aB  P then (A,a) ' B.
if A  a  P then (A,a) ' f.

(proof)
If  is accepted by fa then it is accepted in some sequence of
moves through states, ending in f.
But if (A,a) = B and B  f , then A  aB is a productions.
Also if (A,a) = f then A  a is a production.
So we can use the same series of productions to generate  in G
Thus S => .
ex) p.107 [예 29]
*

 rg G=({S, B}, {0, 1}, P, S)
 P : S → 0S
S → 1B
S → 0
S → 1
B → 0S
B → 0
 fa M = (Q, , , q0, F)
 Q : VN∪{f} = {S, B, f}
  : VT = {0, 1}
 q0 : S
 F : {f}
  :
Introduction to Compiler Design Theory Page 48
0 1
S {S, f} {B, f}
B {S, f} f
f f f

 fa  rg
Given M = (Q, , , q0, F), construct G = (VN, VT, P, S).
(1) VN = Q
(2) VT = 
(3) S = q0
(4) P : if (q,a) = r then q  ar.
if p  F then p  e.
ex)
p rstart
0, 11
q
0
1
0
L(P)=(1+01)*
00(0+1)*
p  1p | 0q
q  1p | 0r
r  0r | 1r | ε

2. FA & RE
 fa  rg  re ex) p.126 3.10 (1)
A Dstart
b
C
b
a
b
B
a
a
a
b
A = bA + aB
B = aB + bC
C = aB + bD
D = aB + bA + e
= A + e
 A = (a+b)*abb

 re  fa (※ scanner generator)
For each component, we construct a fa inductively :
1. basis
2. induction - combine the components.
i fε :
ε
i fa   :
a
(1) N1 + N2
N1
i
ε
ε
ε
ε
N2
f

ex) p.112 [예 31]
ε
(2) N1 •N2
N1i N2 f
(3) N*
i f
εε
ε
ε
N

 Definition : The size of a regular expression is the number
of operations and operands in the expression.
ex) size(ab + c*) = 6
 decomposition:
 The number of state is at most twice the size of the expression.
(∵ each operand introduces two states and each operator introduces at
most two states.)
 The number of arcs is at most four times the size of the expression.
*
R6
R3 +
R1 R2
R5
R4
a b c
.

 Simplifications : p.113
※e -arc가 포함되면서 소스상태에서 나가는 다른 arc가 없으면
두 상태는 하나로 취급할 수 있다.
※ e -arc로 연결된 두 상태는 소스 상태에서 나가는 다른 arc가 없
으면 같은 상태로 취급될 수 있다.
A B
ε
a
A
a
A B C
e a
A B
a

 Simplifications : p.113
※두 경로가 같은 곳으로 이동하면 아래와 같이 간단화시킨다.
※ a*를 인식하는 경우는 아래와 같이 간단화시킨다..
A B
FS
C D
e
e
a
b
e
e
S F
a, b
A B C
e e
A
a
e
a

 re  e-NFA (간단화)  DFA ex) p.115 [예 33]
 (ab)*(ba)*
Introduction to Compiler Design Theory Page 56
21
a
b
e
43
b
a
a b
[1,3] [2] [4]
[2] [1,3]
[3] [4]
[4] [3]
21
a
b
b
43
b
a

 The following statements are equivalent :
1. L is generated by some regular grammar.
2. L is recognized by some finite automata.
3. L is described by some regular expression.

p.127 3.14
(3) a(a+b)*b(a+b)*a(a+b)*b(a+b)*
X
b
WS Y
a, b
Z
a a b
a, b a, b a, b
(1) (b + a(aa* b)*b)*
b
Y
a
b
X
a
b
a
Z
(2) (b + aa + ac + aaa + aac)*
Z
b
Y
a
a, c
X
a
a, c

3. Closure Properties of Regular Language
[Theorem] If L1 and L2 are regular languages,
then so are
(i) L1 U L2 ,
(ii) L1L2, and
(iii) L1
*
.

(proof) (ii) Since L1 and L2 are rl, rg G1 = (VN1, VT1, P1, S1) and
rg G2 = (VN2,VT2, P2, S2), such that L(G1) = L1 and L(G2) = L2.
Construct G=(VN1 U VN2,VT1 U VT2, P, S1) in which P is defined as follows :
(1) If A  aB  P1, A  aB  P.
(2) If A  a  P1, A  aS2  P.
(3) All productions in P2 are in P.
We must prove that L(G) = L(G1) . L(G2).
Since G is rg, L(G) is rl. Therefore L(G1) . L(G2) is rl.
ex) P1 : S  aS | bA A  aA | a
P2 : X  0X | 1Y Y  0Y | 1
 P : S  aS | bA A  aA | aX X  0X | 1Y Y  0Y | 1

(iii) L : rl, rg G = (VN, VT, P, S) such that L(G) = L.
Let G' = (VN U {S'}, VT, P', S')
P' : (1) If A  aB  P, then A  aB  P'.
(2) If A  a  P, then A  a, A  aS'  P'.
(3) S'  S ┃ε  P'.
We must prove that L(G') = (L(G))*.
  L(G), S => . S' => S => wS' => w*
S' => w*
.
∴ (L(G))* = L(G').
ex) P : S  aS, S  b
P' : S  aS, S  b, S  bS', S'  S, S'  e .
note P : S = aS + b = a*b
P' : S = aS + b + bS' = a*(b+bS') = a*b + a*bS'
∴ S' = S + e
= a*bS' + a*b + e
= (a*b)*(a*b + e )
= (a*b)*(a*b) + (a*b)* = (a*b)*
* * *

4. The Pumping Lemma for Regular Language
 It is useful in proving certain languages not to be regular.
[Theorem] Let L be a regular language. There exists a constant p such that
if a string w is in L and |ω|  p, then w can be written as xyz,
where 0 < |y| ≤p and xyi
z  L for all i  0.
(proof) Let M = (Q, , , q0, F) be a fa with n states such that L(M) = L.
Let p = n. If   L and |ω|  n, then consider the sequence of
configurations entered by M in accepting w. Since there are at
least n+1 configurations in the sequence, there must be two with
the same state among the first n+1 configurations.
Thus we have a sequence of moves such that
(q0,xyz) = (q1,yz) = δ(q1,z) = qf  F for some q1.
But then, (q0,xyi
z) = (q1,yi
z) = (q1,yi-1
z) = ... = (q1,z) = qf  F.
Since w = xyz L, xyi
z≤ L for all i  0.
z
q1q0
x
y
qf

Consequently, we say that “finite automata can not count”,
meaning they can not accept a language which requires that
they count the number exactly.
ex) L = {0n
1n
| n ³1} is not type 3.
(Proof)
Suppose that L is regular.
Then for a sufficiently large n, 0n
1n
can be written as xyz
such that y   and xyi
z  L for all i  0.
If y  0+
or y  1+
, then xz = xy0
z  L.
If y  0+
1+
, then xyi
z  L.
We have a contradiction, so L can not be regular.
 an
cbn
 not rl
an
cbm
 rl

연습문제 3.5 (4) 풀이교과서 123쪽
A = aB + bA ……………………… (1)
B = aB + bC ……………………… (2)
C = bD + aB ……………………… (3)
D = bA + aB + e ……………………… (4)
식 (4)에서 bA + aB = aB + bA = A 이므로
D = A + e ……………………… (5)
식 (3)에 식 (5)를 대입
C = b(A + e) + aB = bA + aB + b
= A + b ……………………… (6)
식 (2)에 식 (6)을 대입
B = aB + b(A + b) = aB + bA + bb
= A + bb ……………………… (7)
식 (1)에 식 (7)을 대입
A = aB + bA = a(A + bb) + bA = aA + abb + bA = (a + b)A + abb
= (a+b)*abb
 L(G) = (a+b)*abb

Ch03

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (14)

Similar a Ch03

Similar a Ch03 (20)

Más de Hankyo

Más de Hankyo (20)

Último

Último (20)

Ch03