SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
Phases of Syntax Analysis
1. Identify the words: Lexical Analysis.
Converts a stream of characters (input program) into a stream of tokens.
Also called Scanning or Tokenizing.
2. Identify the sentences: Parsing.
Derive the structure of sentences: construct parse trees from a stream of tokens.
Lexical Analysis
Convert a stream of characters into a stream of tokens.
• Simplicity: Conventions about “words” are often different from conventions about “sentences”.
• Efficiency: Word identification problem has a much more efficient solution than sentence identification problem.
• Portability: Character set, special characters, device features.
Terminology
• Token: Name given to a family of words.
e.g., integer constant
• Lexeme: Actual sequence of characters representing a word.
e.g., 32894
• Pattern: Notation used to identify the set of lexemes represented by a token.
e.g., [0 − 9]+
Terminology
A few more examples:
Token Sample Lexemes Pattern
while while while
integer constant 32894, -1093, 0 [0-9]+
identifier buffer size [a-zA-Z]+
Patterns
How do we compactly represent the set of all lexemes corresponding to a token?
For instance:
The token integer constant represents the set of all integers: that is, all sequences of digits (0–9), preceded by an optional
sign (+ or −).
Obviously, we cannot simply enumerate all lexemes.
Use Regular Expressions.
Regular Expressions
Notation to represent (potentially) infinite sets of strings over alphabet Σ.
• a: stands for the set {a} that contains a single string a.
⊲ Analogous to Union.
• ab: stands for the set {ab} that contains a single string ab.
⊲ Analogous to Product.
⊲ (a|b)(a|b): stands for the set {aa, ab, ba, bb}.
• a∗
: stands for the set {ǫ, a, aa, aaa, . . .} that contains all strings of zero or more a’s.
⊲ Analogous to closure of the product operation.
Regular Expressions
Examples of Regular Expressions over {a, b}:
• (a|b)∗
: Set of strings with zero or more a’s and zero or more b’s:
{ǫ, a, b, aa, ab, ba, bb, aaa, aab, . . .}
• (a∗
b∗
): Set of strings with zero or more a’s and zero or more b’s such that all a’s occur before any b:
{ǫ, a, b, aa, ab, bb, aaa, aab, abb, . . .}
• (a∗
b∗
)∗
: Set of strings with zero or more a’s and zero or more b’s:
{ǫ, a, b, aa, ab, ba, bb, aaa, aab, . . .}
Language of Regular Expressions
Let R be the set of all regular expressions over Σ. Then,
• Empty String: ǫ ∈ R
• Unit Strings: α ∈ Σ ⇒ α ∈ R
• Concatenation: r1, r2 ∈ R ⇒ r1r2 ∈ R
• Alternative: r1, r2 ∈ R ⇒ (r1 | r2) ∈ R
• Kleene Closure: r ∈ R ⇒ r∗
∈ R
Regular Expressions
Example: (a | b)∗
L0 = {ǫ}
L1 = L0 · {a, b}
= {ǫ} · {a, b}
= {a, b}
L2 = L1 · {a, b}
= {a, b} · {a, b}
= {aa, ab, ba, bb}
L3 = L2 · {a, b}
...
L =
∞
i=0
Li = {ǫ, a, b, aa, ab, ba, bb, . . .}
Semantics of Regular Expressions
Semantic Function L : Maps regular expressions to sets of strings.
L(ǫ) = {ǫ}
L(α) = {α} (α ∈ Σ)
L(r1 | r2) = L(r1) ∪ L(r2)
L(r1 r2) = L(r1) · L(r2)
L(r∗
) = {ǫ} ∪ (L(r) · L(r∗
))
Computing the Semantics
L(a) = {a}
L(a | b) = L(a) ∪ L(b)
= {a} ∪ {b}
= {a, b}
L(ab) = L(a) · L(b)
= {a} · {b}
= {ab}
L((a | b)(a | b)) = L(a | b) · L(a | b)
= {a, b} · {a, b}
= {aa, ab, ba, bb}
Computing the Semantics of Closure
Example: L((a | b)∗
)
= {ǫ} ∪ (L(a | b) · L((a | b)∗
))
L0 = {ǫ} Base case
L1 = {ǫ} ∪ ({a, b} · L0)
= {ǫ} ∪ ({a, b} · {ǫ})
= {ǫ, a, b}
L2 = {ǫ} ∪ ({a, b} · L1)
= {ǫ} ∪ ({a, b} · {ǫ, a, b})
= {ǫ, a, b, aa, ab, ba, bb}
...
L((a | b)∗
) = L∞ = {ǫ, a, b, aa, ab, ba, bb, . . .}
Another Example
L((a∗
b∗
)∗
) :
L(a∗
) = {ǫ, a, aa, . . .}
L(b∗
) = {ǫ, b, bb, . . .}
L(a∗
b∗
) = {ǫ, a, b, aa, ab, bb,
aaa, aab, abb, bbb, . . .}
L((a∗
b∗
)∗
) = {ǫ}
∪{ǫ, a, b, aa, ab, bb,
aaa, aab, abb, bbb, . . .}
∪{ǫ, a, b, aa, ab, ba, bb,
aaa, aab, aba, abb, baa, bab, bba, bbb, . . .}
.
.
.
Regular Definitions
Assign “names” to regular expressions.
For example,
digit −→ 0 | 1 | · · · | 9
natural −→ digit digit∗
Shorthands:
• a+
: Set of strings with one or more occurrences of a.
• a?
: Set of strings with zero or one occurrences of a.
Example:
integer −→ (+|−)?
digit+
Regular Definitions: Examples
float −→ integer . fraction
integer −→ (+|−)?
no leading zero
no leading zero −→ (nonzero digit digit∗
) | 0
fraction −→ no trailing zero exponent?
no trailing zero −→ (digit∗
nonzero digit) | 0
exponent −→ (E | e) integer
digit −→ 0 | 1 | · · · | 9
nonzero digit −→ 1 | 2 | · · · | 9
Regular Definitions and Lexical Analysis
Regular Expressions and Definitions specify sets of strings over an input alphabet.
• They can hence be used to specify the set of lexemes associated with a token.
⊲ Used as the pattern language
How do we decide whether an input string belongs to the set of strings specified by a regular expression?
Using Regular Definitions for Lexical Analysis
Q: Is ababbaabbb in L(((a∗
b∗
)∗
)?
A: Hm. Well. Let’s see.
L((a∗
b∗
)∗
) = {ǫ}
∪{ǫ, a, b, aa, ab, bb,
aaa, aab, abb, bbb, . . .}
∪{ǫ, a, b, aa, ab, ba, bb,
aaa, aab, aba, abb, baa, bab, bba, bbb, . . .}
...
= ???
Recognizers
Construct automata that recognize strings belonging to a language.
• Finite State Automata ⇒ Regular Languages
• Push Down Automata ⇒ Context-free Languages
⊲ Stack is used to maintain counter, but only one counter can go arbitrarily high.
Recognizing Finite Sets of Strings
Identifying words from a small, finite, fixed vocabulary is straightforward.
For instance, consider a stack machine with push, pop, and add operations with two constants: 0 and 1.
We can use the automaton:
s
h
p
p 0 1
u o
a
d
d
push
pop add
integer_constant
Finite State Automata
Represented by a labeled directed graph.
• A finite set of states (vertices).
• Transitions between states (edges).
• Labels on transitions are drawn from Σ ∪ {ǫ}.
• One distinguished start state.
• One or more distinguished final states.
Finite State Automata: An Example
Consider the Regular Expression (a | b)∗
a(a | b).
L((a | b)∗
a(a | b)) = {aa, ab, aaa, aab, baa, bab,
aaaa, aaab, abaa, abab, baaa, . . .}.
The following automaton determines whether an input string belongs to L((a | b)∗
a(a | b):
a
a
b b
a
1 2 3
Determinism
(a | b)∗
a(a | b):
Nondeterministic:
(NFA)
a
a
b b
a
1 2 3
Deterministic:
(DFA)
a
a
b
b
a
a
b
1 2
3
4
Acceptance Criterion
A finite state automaton (NFA or DFA) accepts an input string x
. . . if beginning from the start state
. . . we can trace some path through the automaton
. . . such that the sequence of edge labels spells x
. . . and end in a final state.
Recognition with an NFA
Is abab ∈ L((a | b)∗
a(a | b))?
a
a
b b
a
1 2 3
Input: a b a b
Path 1: 1 1 1 1 1
Path 2: 1 1 1 2 3 Accept
Path 3: 1 2 3 ⊥ ⊥
Accept
Recognition with an NFA
Is abab ∈ L((a | b)∗
a(a | b))?
a
a
b b
a
1 2 3
Input: a b a b
Path 1: 1 1 1 1 1
Path 2: 1 1 1 2 3 Accept
Path 3: 1 2 3 ⊥ ⊥
Accept
Recognition with a DFA
Is abab ∈ L((a | b)∗
a(a | b))?
a
a
b
b
a
a
b
b
1 2
3
4
NFA vs. DFA
For every NFA, there is a DFA that accepts the same set of strings.
• NFA may have transitions labeled by ǫ.
(Spontaneous transitions)
• All transition labels in a DFA belong to Σ.
• For some string x, there may be many accepting paths in an NFA.
• For all strings x, there is one unique accepting path in a DFA.
• Usually, an input string can be recognized faster with a DFA.
• NFAs are typically smaller than the corresponding DFAs.
Regular Expressions to NFA
Thompson’s Construction: For every regular expression r, derive an NFA N(r) with unique start and final states.
ǫ
ε
α ∈ Σ
α
(r1 | r2)
N(r )
1
ε
ε
ε
ε
N(r )
2
Regular Expressions to NFA (contd.)
r1r2 N(r )2
N(r )1
ε ε
r∗
ε ε
N(r)
ε
ε
Example
(a | b)∗
a(a | b):
ε
ε ε
ε
a
b
ε ε a
ε
ε ε
ε
a
b
ε
Recognition with an NFA
Is abab ∈ L((a | b)∗
a(a | b))?
a
a
b b
a
1 2 3
Input: a b a b
Path 1: 1 1 1 1 1
Path 2: 1 1 1 2 3 Accept
Path 3: 1 2 3 ⊥ ⊥
All Paths {1} {1, 2} {1, 3} {1, 2} {1, 3} Accept
Recognition with an NFA (contd.)
Is aaab ∈ L((a | b)∗
a(a | b))?
a
a
b b
a
1 2 3
Input: a a a b
Path 1: 1 1 1 1 1
Path 2: 1 1 1 1 2
Path 3: 1 1 1 2 3 Accept
Path 4: 1 1 2 3 ⊥
Path 5: 1 2 3 ⊥ ⊥
All Paths {1} {1, 2} {1, 2, 3} {1, 2, 3} {1, 2, 3} Accept
Recognition with an NFA (contd.)
Is aabb ∈ L((a | b)∗
a(a | b))?
a
a
b
b
a
1 2 3
Input: a a a b
Path 1: 1 1 1 1 1
Path 2: 1 1 2 3 ⊥
Path 3: 1 2 3 ⊥ ⊥
All Paths {1} {1, 2} {1, 2, 3} {1, 3} {1} REJECT
Converting NFA to DFA
Subset construction
Given a set S of NFA states,
• compute Sǫ = ǫ-closure(S): Sǫ is the set of all NFA states reachable by zero or more ǫ-transitions from S.
• compute Sα = goto(S, α):
– S′
is the set of all NFA states reachable from S by taking a transition labeled α.
– Sα = ǫ-closure(S′
).
Converting NFA to DFA (contd).
Each state in DFA corresponds to a set of states in NFA.
Start state of DFA = ǫ-closure(start state of NFA).
From a state s in DFA that corresponds to a set of states S in NFA:
add a transition labeled α to state s′
that corresponds to a non-empty S′
in NFA,
such that S′
= goto(S, α).
⇐ s is a final state of DFA
NFA → DFA: An Example
a
a
b b
a
1 2 3
ǫ-closure({1}) = {1}
goto({1}, a) = {1, 2}
goto({1}, b) = {1}
goto({1, 2}, a) = {1, 2, 3}
goto({1, 2}, b) = {1, 3}
goto({1, 2, 3}, a) = {1, 2, 3}
...
NFA → DFA: An Example (contd.)
ǫ-closure({1}) = {1}
goto({1}, a) = {1, 2}
goto({1}, b) = {1}
goto({1, 2}, a) = {1, 2, 3}
goto({1, 2}, b) = {1, 3}
goto({1, 2, 3}, a) = {1, 2, 3}
goto({1, 2, 3}, b) = {1}
goto({1, 3}, a) = {1, 2}
goto({1, 3}, b) = {1}
NFA → DFA: An Example (contd.)
goto({1}, a) = {1, 2}
goto({1}, b) = {1}
goto({1, 2}, a) = {1, 2, 3}
goto({1, 2}, b) = {1, 3}
goto({1, 2, 3}, a) = {1, 2, 3}
...
a
a
b
b
a
a
b
b
{1} {1,2}
{1,3}
{1,2,3}
NFA vs. DFA
R = Size of Regular Expression
N = Length of Input String
NFA DFA
Size of
Automaton
O(R) O(2R
)
Lexical Analysis
• Regular Expressions and Definitions are used to specify the set of strings (lexemes) corresponding to a token.
• An automaton (DFA/NFA) is built from the above specifications.
• Each final state is associated with an action: emit the corresponding token.
Specifying Lexical Analysis
Consider a recognizer for integers (sequence of digits) and floats (sequence of digits separated by a decimal point).
[0-9]+ { emit(INTEGER_CONSTANT); }
[0-9]+"."[0-9]+ { emit(FLOAT_CONSTANT); }
0-9
0-9
0-9
0-9
ε
0-9
0-9
ε "."
INTEGER_CONSTANT
FLOAT_CONSTANT
Lex
Tool for building lexical analyzers.
Input: lexical specifications (.l file)
Output: C function (yylex) that returns a token on each invocation.
%%
[0-9]+ { return(INTEGER_CONSTANT); }
[0-9]+"."[0-9]+ { return(FLOAT_CONSTANT); }
Tokens are simply integers (#define’s).
Lex Specifications
%{
C header statements for inclusion
%}
Regular Definitions e.g.:
digit [0-9]
%%
Token Specifications e.g.:
{digit}+ { return(INTEGER_CONSTANT); }
%%
Support functions in C
Regular Expressions in Lex
• Range: [0-7]: Integers from 0 through 7 (inclusive)
[a-nx-zA-Q]: Letters a thru n, x thru z and A thru Q.
• Exception: [^/]: Any character other than /.
• Definition: {digit}: Use the previously specified regular definition digit.
• Special characters: Connectives of regular expression, convenience features.
e.g.: | * ^
Special Characters in Lex
| * + ? ( ) Same as in regular expressions
[ ] Enclose ranges and exceptions
{ } Enclose “names” of regular definitions
^ Used to negate a specified range (in Exception)
. Match any single character except newline
 Escape the next character
n, t Newline and Tab
For literal matching, enclose special characters in double quotes (") e.g.: "*"
Or use  to escape. e.g.: "
Examples
for Sequence of f, o, r
"||" C-style OR operator (two vert. bars)
.* Sequence of non-newline characters
[^*/]+ Sequence of characters except * and /
"[^"]*" Sequence of non-quote characters
beginning and ending with a quote
({letter}|" ")({letter}|{digit}|" ")*
C-style identifiers
A Complete Example
%{
#include <stdio.h>
#include "tokens.h"
%}
digit [0-9]
hexdigit [0-9a-f]
%%
"+" { return(PLUS); }
"-" { return(MINUS); }
{digit}+ { return(INTEGER_CONSTANT); }
{digit}+"."{digit}+ { return(FLOAT_CONSTANT); }
. { return(SYNTAX_ERROR); }
%%
Actions
Actions are attached to final states.
• Distinguish the different final states.
• Can be used to set attribute values.
• Fragment of C code (blocks enclosed by ‘{’ and ‘}’).
Attributes
Additional information about a token’s lexeme.
• Stored in variable yylval
• Type of attributes (usually a union) specified by YYSTYPE
• Additional variables:
– yytext: Lexeme (Actual text string)
– yyleng: length of string in yytext
⊲ yylineno: Current line number (number of ‘n’ seen thus far)
∗ enabled by %option yylineno
Priority of matching
What if an input string matches more than one pattern?
"if" { return(TOKEN_IF); }
{letter}+ { return(TOKEN_ID); }
"while" { return(TOKEN_WHILE); }
• A pattern that matches the longest string is chosen.
Example: if1 is matched with an identifier, not the keyword if.
• Of patterns that match strings of same length, the first (from the top of file) is chosen.
Example: while is matched as an identifier, not the keyword while.
Constructing Scanners using (f)lex
• Scanner specifications: specifications.l
(f)lex
specifications.l −−−−→ lex.yy.c
• Generated scanner in lex.yy.c
(g)cc
lex.yy.c −−−−→ executable
– yywrap(): hook for signalling end of file.
– Use -lfl (flex) or -ll (lex) flags at link time to include default function yywrap() that always returns 1.
Implementing a Scanner
transition : state × Σ → state
algorithm scanner() {
current state = start state;
while (1) {
c = getc(); /* on end of file, ... */
if defined(transition(current state, c))
current state = transition(current state, c);
else
return s;
}
Implementing a Scanner (contd.)
Implementing the transition function:
• Simplest: 2-D array.
Space inefficient.
• Traditionally compressed using row/colum equivalence. (default on (f)lex)
Good space-time tradeoff.
• Further table compression using various techniques:
– Example: RDM (Row Displacement Method):
Store rows in overlapping manner using 2 1-D arrays.
Smaller tables, but longer access times.
Lexical Analysis: A Summary
Convert a stream of characters into a stream of tokens.
• Make rest of compiler independent of character set
• Strip off comments
• Recognize line numbers
• Ignore white space characters
• Process macros (definitions and uses)
• Interface with symbol (name) table.

Más contenido relacionado

La actualidad más candente

theory of computation lecture 02
theory of computation lecture 02theory of computation lecture 02
theory of computation lecture 028threspecter
 
Theory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping LemmaTheory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping LemmaRushabh2428
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)bolovv
 
Automata theory introduction
Automata theory introductionAutomata theory introduction
Automata theory introductionNAMRATA BORKAR
 
Chapter Three(2)
Chapter Three(2)Chapter Three(2)
Chapter Three(2)bolovv
 
Ch 2 lattice & boolean algebra
Ch 2 lattice & boolean algebraCh 2 lattice & boolean algebra
Ch 2 lattice & boolean algebraRupali Rana
 
Regular expression
Regular expressionRegular expression
Regular expressionRajon
 
AUTOMATA THEORY - SHORT NOTES
AUTOMATA THEORY - SHORT NOTESAUTOMATA THEORY - SHORT NOTES
AUTOMATA THEORY - SHORT NOTESsuthi
 
Finite automata examples
Finite automata examplesFinite automata examples
Finite automata examplesankitamakin
 
Assembly Language Programming By Ytha Yu, Charles Marut Chap 10 ( Arrays and ...
Assembly Language Programming By Ytha Yu, Charles Marut Chap 10 ( Arrays and ...Assembly Language Programming By Ytha Yu, Charles Marut Chap 10 ( Arrays and ...
Assembly Language Programming By Ytha Yu, Charles Marut Chap 10 ( Arrays and ...Bilal Amjad
 
regular expressions (Regex)
regular expressions (Regex)regular expressions (Regex)
regular expressions (Regex)Rebaz Najeeb
 
Chapter Eight(2)
Chapter Eight(2)Chapter Eight(2)
Chapter Eight(2)bolovv
 

La actualidad más candente (19)

theory of computation lecture 02
theory of computation lecture 02theory of computation lecture 02
theory of computation lecture 02
 
Theory of computation Lec2
Theory of computation Lec2Theory of computation Lec2
Theory of computation Lec2
 
Ch3
Ch3Ch3
Ch3
 
Theory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping LemmaTheory of Computation Regular Expressions, Minimisation & Pumping Lemma
Theory of Computation Regular Expressions, Minimisation & Pumping Lemma
 
Automata theory
Automata theoryAutomata theory
Automata theory
 
Chapter Two(1)
Chapter Two(1)Chapter Two(1)
Chapter Two(1)
 
Automata theory introduction
Automata theory introductionAutomata theory introduction
Automata theory introduction
 
Chapter Three(2)
Chapter Three(2)Chapter Three(2)
Chapter Three(2)
 
Ch 2 lattice & boolean algebra
Ch 2 lattice & boolean algebraCh 2 lattice & boolean algebra
Ch 2 lattice & boolean algebra
 
Aho corasick-lecture
Aho corasick-lectureAho corasick-lecture
Aho corasick-lecture
 
Regular expression
Regular expressionRegular expression
Regular expression
 
AUTOMATA THEORY - SHORT NOTES
AUTOMATA THEORY - SHORT NOTESAUTOMATA THEORY - SHORT NOTES
AUTOMATA THEORY - SHORT NOTES
 
Finite automata examples
Finite automata examplesFinite automata examples
Finite automata examples
 
Ch4a
Ch4aCh4a
Ch4a
 
Assembly Language Programming By Ytha Yu, Charles Marut Chap 10 ( Arrays and ...
Assembly Language Programming By Ytha Yu, Charles Marut Chap 10 ( Arrays and ...Assembly Language Programming By Ytha Yu, Charles Marut Chap 10 ( Arrays and ...
Assembly Language Programming By Ytha Yu, Charles Marut Chap 10 ( Arrays and ...
 
regular expressions (Regex)
regular expressions (Regex)regular expressions (Regex)
regular expressions (Regex)
 
FLAT Notes
FLAT NotesFLAT Notes
FLAT Notes
 
string in C
string in Cstring in C
string in C
 
Chapter Eight(2)
Chapter Eight(2)Chapter Eight(2)
Chapter Eight(2)
 

Destacado

Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014
Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014
Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014Snorre Tørriseng
 
Evaluation Question 2
Evaluation Question 2Evaluation Question 2
Evaluation Question 2rturner93
 
Atheism - By Suhit Kulkarni
Atheism - By Suhit KulkarniAtheism - By Suhit Kulkarni
Atheism - By Suhit KulkarniSuhit Kulkarni
 
Will i guess_your_birthdate
Will i guess_your_birthdateWill i guess_your_birthdate
Will i guess_your_birthdatelady_shine
 
Cilc2013 .hr.wk.ewv 20130314
Cilc2013 .hr.wk.ewv 20130314Cilc2013 .hr.wk.ewv 20130314
Cilc2013 .hr.wk.ewv 20130314Heimo Rainer
 
Digital Workplace by Lizard Soft
Digital Workplace by Lizard SoftDigital Workplace by Lizard Soft
Digital Workplace by Lizard SoftIgor Petrushyn
 
Criteria for the design of pressure transducer adapter systems
Criteria for the design of pressure transducer adapter systemsCriteria for the design of pressure transducer adapter systems
Criteria for the design of pressure transducer adapter systemsYavuz özkaptan
 
A team may 23 2013
A team may 23 2013A team may 23 2013
A team may 23 2013bscisteam
 
BI Apps OLAP & Reports- SSAS 2012 Tabular & Multidimensional
BI Apps  OLAP & Reports- SSAS 2012 Tabular & Multidimensional BI Apps  OLAP & Reports- SSAS 2012 Tabular & Multidimensional
BI Apps OLAP & Reports- SSAS 2012 Tabular & Multidimensional Sunny U Okoro
 
Lezione2schetchup 111126133835-phpapp01
Lezione2schetchup 111126133835-phpapp01Lezione2schetchup 111126133835-phpapp01
Lezione2schetchup 111126133835-phpapp01Giuliana Finco
 
Two way fine art pen & brush
Two way fine art pen & brushTwo way fine art pen & brush
Two way fine art pen & brushibec546
 
Mobile Marketing
Mobile MarketingMobile Marketing
Mobile MarketingMark Wilson
 
Hotel web ranking
Hotel web rankingHotel web ranking
Hotel web rankingcmhagc
 
Pat Ward - Church Growth Clinic
Pat Ward - Church Growth ClinicPat Ward - Church Growth Clinic
Pat Ward - Church Growth Clinictheorchardoxford
 
гост пеноблок
гост пеноблокгост пеноблок
гост пеноблокAl Maks
 
Pengenalan Kepada Teknologi Multimedia Part 3
Pengenalan Kepada Teknologi Multimedia Part 3Pengenalan Kepada Teknologi Multimedia Part 3
Pengenalan Kepada Teknologi Multimedia Part 3Noor Hafizah Abd. Rahim
 
Mrs craig final exam 3
Mrs craig   final exam 3Mrs craig   final exam 3
Mrs craig final exam 3cmhagc
 
Value chains which unlock market opportunities
Value chains which unlock market opportunitiesValue chains which unlock market opportunities
Value chains which unlock market opportunitiesagbiz
 

Destacado (20)

Evaluation 3
Evaluation 3Evaluation 3
Evaluation 3
 
Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014
Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014
Sesjon S4B 08/05 "Dokker + FYR = Sant" ved NKUL 2014
 
Evaluation Question 2
Evaluation Question 2Evaluation Question 2
Evaluation Question 2
 
Atheism - By Suhit Kulkarni
Atheism - By Suhit KulkarniAtheism - By Suhit Kulkarni
Atheism - By Suhit Kulkarni
 
Will i guess_your_birthdate
Will i guess_your_birthdateWill i guess_your_birthdate
Will i guess_your_birthdate
 
Cilc2013 .hr.wk.ewv 20130314
Cilc2013 .hr.wk.ewv 20130314Cilc2013 .hr.wk.ewv 20130314
Cilc2013 .hr.wk.ewv 20130314
 
Digital Workplace by Lizard Soft
Digital Workplace by Lizard SoftDigital Workplace by Lizard Soft
Digital Workplace by Lizard Soft
 
Essai
EssaiEssai
Essai
 
Criteria for the design of pressure transducer adapter systems
Criteria for the design of pressure transducer adapter systemsCriteria for the design of pressure transducer adapter systems
Criteria for the design of pressure transducer adapter systems
 
A team may 23 2013
A team may 23 2013A team may 23 2013
A team may 23 2013
 
BI Apps OLAP & Reports- SSAS 2012 Tabular & Multidimensional
BI Apps  OLAP & Reports- SSAS 2012 Tabular & Multidimensional BI Apps  OLAP & Reports- SSAS 2012 Tabular & Multidimensional
BI Apps OLAP & Reports- SSAS 2012 Tabular & Multidimensional
 
Lezione2schetchup 111126133835-phpapp01
Lezione2schetchup 111126133835-phpapp01Lezione2schetchup 111126133835-phpapp01
Lezione2schetchup 111126133835-phpapp01
 
Two way fine art pen & brush
Two way fine art pen & brushTwo way fine art pen & brush
Two way fine art pen & brush
 
Mobile Marketing
Mobile MarketingMobile Marketing
Mobile Marketing
 
Hotel web ranking
Hotel web rankingHotel web ranking
Hotel web ranking
 
Pat Ward - Church Growth Clinic
Pat Ward - Church Growth ClinicPat Ward - Church Growth Clinic
Pat Ward - Church Growth Clinic
 
гост пеноблок
гост пеноблокгост пеноблок
гост пеноблок
 
Pengenalan Kepada Teknologi Multimedia Part 3
Pengenalan Kepada Teknologi Multimedia Part 3Pengenalan Kepada Teknologi Multimedia Part 3
Pengenalan Kepada Teknologi Multimedia Part 3
 
Mrs craig final exam 3
Mrs craig   final exam 3Mrs craig   final exam 3
Mrs craig final exam 3
 
Value chains which unlock market opportunities
Value chains which unlock market opportunitiesValue chains which unlock market opportunities
Value chains which unlock market opportunities
 

Similar a Lex analysis

Theory of Computation Basic Concepts and Grammar
Theory of Computation Basic Concepts and GrammarTheory of Computation Basic Concepts and Grammar
Theory of Computation Basic Concepts and GrammarRushabh2428
 
Mod 2_RegularExpressions.pptx
Mod 2_RegularExpressions.pptxMod 2_RegularExpressions.pptx
Mod 2_RegularExpressions.pptxRaviAr5
 
Chapter 3 REGULAR EXPRESSION.pdf
Chapter 3 REGULAR EXPRESSION.pdfChapter 3 REGULAR EXPRESSION.pdf
Chapter 3 REGULAR EXPRESSION.pdfdawod yimer
 
Lec 3 ---- dfa
Lec 3  ---- dfaLec 3  ---- dfa
Lec 3 ---- dfaAbdul Aziz
 
Automata theory - CFG and normal forms
Automata theory - CFG and normal formsAutomata theory - CFG and normal forms
Automata theory - CFG and normal formsAkila Krishnamoorthy
 
Automata
AutomataAutomata
AutomataGaditek
 
Class1
 Class1 Class1
Class1issbp
 
Syntax Analyzer.pdf
Syntax Analyzer.pdfSyntax Analyzer.pdf
Syntax Analyzer.pdfkenilpatel65
 
Toc chapter 1 srg
Toc chapter 1 srgToc chapter 1 srg
Toc chapter 1 srgShayak Giri
 
4-Regular expression to Deterministic Finite Automata (Direct method)-05-05-2...
4-Regular expression to Deterministic Finite Automata (Direct method)-05-05-2...4-Regular expression to Deterministic Finite Automata (Direct method)-05-05-2...
4-Regular expression to Deterministic Finite Automata (Direct method)-05-05-2...venkatapranaykumarGa
 
RegularLanguageProperties [Autosaved].pptx
RegularLanguageProperties [Autosaved].pptxRegularLanguageProperties [Autosaved].pptx
RegularLanguageProperties [Autosaved].pptxRaviAr5
 

Similar a Lex analysis (20)

Theory of Computation Basic Concepts and Grammar
Theory of Computation Basic Concepts and GrammarTheory of Computation Basic Concepts and Grammar
Theory of Computation Basic Concepts and Grammar
 
Mod 2_RegularExpressions.pptx
Mod 2_RegularExpressions.pptxMod 2_RegularExpressions.pptx
Mod 2_RegularExpressions.pptx
 
Ch03
Ch03Ch03
Ch03
 
Chapter 3 REGULAR EXPRESSION.pdf
Chapter 3 REGULAR EXPRESSION.pdfChapter 3 REGULAR EXPRESSION.pdf
Chapter 3 REGULAR EXPRESSION.pdf
 
UNIT_-_II.docx
UNIT_-_II.docxUNIT_-_II.docx
UNIT_-_II.docx
 
Lec 3 ---- dfa
Lec 3  ---- dfaLec 3  ---- dfa
Lec 3 ---- dfa
 
Automata theory - CFG and normal forms
Automata theory - CFG and normal formsAutomata theory - CFG and normal forms
Automata theory - CFG and normal forms
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
Ch02
Ch02Ch02
Ch02
 
Automata
AutomataAutomata
Automata
 
Class1
 Class1 Class1
Class1
 
Lec1.pptx
Lec1.pptxLec1.pptx
Lec1.pptx
 
Lesson 04
Lesson 04Lesson 04
Lesson 04
 
Syntax Analyzer.pdf
Syntax Analyzer.pdfSyntax Analyzer.pdf
Syntax Analyzer.pdf
 
13000120020_A.pptx
13000120020_A.pptx13000120020_A.pptx
13000120020_A.pptx
 
Toc chapter 1 srg
Toc chapter 1 srgToc chapter 1 srg
Toc chapter 1 srg
 
4-Regular expression to Deterministic Finite Automata (Direct method)-05-05-2...
4-Regular expression to Deterministic Finite Automata (Direct method)-05-05-2...4-Regular expression to Deterministic Finite Automata (Direct method)-05-05-2...
4-Regular expression to Deterministic Finite Automata (Direct method)-05-05-2...
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
Ch3.ppt
Ch3.pptCh3.ppt
Ch3.ppt
 
RegularLanguageProperties [Autosaved].pptx
RegularLanguageProperties [Autosaved].pptxRegularLanguageProperties [Autosaved].pptx
RegularLanguageProperties [Autosaved].pptx
 

Último

Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...Amil baba
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17Celine George
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxUmeshTimilsina1
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 

Último (20)

Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 

Lex analysis

  • 1. Phases of Syntax Analysis 1. Identify the words: Lexical Analysis. Converts a stream of characters (input program) into a stream of tokens. Also called Scanning or Tokenizing. 2. Identify the sentences: Parsing. Derive the structure of sentences: construct parse trees from a stream of tokens. Lexical Analysis Convert a stream of characters into a stream of tokens. • Simplicity: Conventions about “words” are often different from conventions about “sentences”. • Efficiency: Word identification problem has a much more efficient solution than sentence identification problem. • Portability: Character set, special characters, device features. Terminology • Token: Name given to a family of words. e.g., integer constant • Lexeme: Actual sequence of characters representing a word. e.g., 32894 • Pattern: Notation used to identify the set of lexemes represented by a token. e.g., [0 − 9]+ Terminology A few more examples: Token Sample Lexemes Pattern while while while integer constant 32894, -1093, 0 [0-9]+ identifier buffer size [a-zA-Z]+ Patterns How do we compactly represent the set of all lexemes corresponding to a token? For instance: The token integer constant represents the set of all integers: that is, all sequences of digits (0–9), preceded by an optional sign (+ or −). Obviously, we cannot simply enumerate all lexemes. Use Regular Expressions. Regular Expressions Notation to represent (potentially) infinite sets of strings over alphabet Σ. • a: stands for the set {a} that contains a single string a.
  • 2. ⊲ Analogous to Union. • ab: stands for the set {ab} that contains a single string ab. ⊲ Analogous to Product. ⊲ (a|b)(a|b): stands for the set {aa, ab, ba, bb}. • a∗ : stands for the set {ǫ, a, aa, aaa, . . .} that contains all strings of zero or more a’s. ⊲ Analogous to closure of the product operation. Regular Expressions Examples of Regular Expressions over {a, b}: • (a|b)∗ : Set of strings with zero or more a’s and zero or more b’s: {ǫ, a, b, aa, ab, ba, bb, aaa, aab, . . .} • (a∗ b∗ ): Set of strings with zero or more a’s and zero or more b’s such that all a’s occur before any b: {ǫ, a, b, aa, ab, bb, aaa, aab, abb, . . .} • (a∗ b∗ )∗ : Set of strings with zero or more a’s and zero or more b’s: {ǫ, a, b, aa, ab, ba, bb, aaa, aab, . . .} Language of Regular Expressions Let R be the set of all regular expressions over Σ. Then, • Empty String: ǫ ∈ R • Unit Strings: α ∈ Σ ⇒ α ∈ R • Concatenation: r1, r2 ∈ R ⇒ r1r2 ∈ R • Alternative: r1, r2 ∈ R ⇒ (r1 | r2) ∈ R • Kleene Closure: r ∈ R ⇒ r∗ ∈ R Regular Expressions Example: (a | b)∗ L0 = {ǫ} L1 = L0 · {a, b} = {ǫ} · {a, b} = {a, b} L2 = L1 · {a, b} = {a, b} · {a, b} = {aa, ab, ba, bb} L3 = L2 · {a, b} ... L = ∞ i=0 Li = {ǫ, a, b, aa, ab, ba, bb, . . .} Semantics of Regular Expressions
  • 3. Semantic Function L : Maps regular expressions to sets of strings. L(ǫ) = {ǫ} L(α) = {α} (α ∈ Σ) L(r1 | r2) = L(r1) ∪ L(r2) L(r1 r2) = L(r1) · L(r2) L(r∗ ) = {ǫ} ∪ (L(r) · L(r∗ )) Computing the Semantics L(a) = {a} L(a | b) = L(a) ∪ L(b) = {a} ∪ {b} = {a, b} L(ab) = L(a) · L(b) = {a} · {b} = {ab} L((a | b)(a | b)) = L(a | b) · L(a | b) = {a, b} · {a, b} = {aa, ab, ba, bb} Computing the Semantics of Closure Example: L((a | b)∗ ) = {ǫ} ∪ (L(a | b) · L((a | b)∗ )) L0 = {ǫ} Base case L1 = {ǫ} ∪ ({a, b} · L0) = {ǫ} ∪ ({a, b} · {ǫ}) = {ǫ, a, b} L2 = {ǫ} ∪ ({a, b} · L1) = {ǫ} ∪ ({a, b} · {ǫ, a, b}) = {ǫ, a, b, aa, ab, ba, bb} ... L((a | b)∗ ) = L∞ = {ǫ, a, b, aa, ab, ba, bb, . . .} Another Example L((a∗ b∗ )∗ ) : L(a∗ ) = {ǫ, a, aa, . . .} L(b∗ ) = {ǫ, b, bb, . . .} L(a∗ b∗ ) = {ǫ, a, b, aa, ab, bb, aaa, aab, abb, bbb, . . .} L((a∗ b∗ )∗ ) = {ǫ} ∪{ǫ, a, b, aa, ab, bb, aaa, aab, abb, bbb, . . .} ∪{ǫ, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb, . . .} . . .
  • 4. Regular Definitions Assign “names” to regular expressions. For example, digit −→ 0 | 1 | · · · | 9 natural −→ digit digit∗ Shorthands: • a+ : Set of strings with one or more occurrences of a. • a? : Set of strings with zero or one occurrences of a. Example: integer −→ (+|−)? digit+ Regular Definitions: Examples float −→ integer . fraction integer −→ (+|−)? no leading zero no leading zero −→ (nonzero digit digit∗ ) | 0 fraction −→ no trailing zero exponent? no trailing zero −→ (digit∗ nonzero digit) | 0 exponent −→ (E | e) integer digit −→ 0 | 1 | · · · | 9 nonzero digit −→ 1 | 2 | · · · | 9 Regular Definitions and Lexical Analysis Regular Expressions and Definitions specify sets of strings over an input alphabet. • They can hence be used to specify the set of lexemes associated with a token. ⊲ Used as the pattern language How do we decide whether an input string belongs to the set of strings specified by a regular expression? Using Regular Definitions for Lexical Analysis Q: Is ababbaabbb in L(((a∗ b∗ )∗ )? A: Hm. Well. Let’s see. L((a∗ b∗ )∗ ) = {ǫ} ∪{ǫ, a, b, aa, ab, bb, aaa, aab, abb, bbb, . . .} ∪{ǫ, a, b, aa, ab, ba, bb, aaa, aab, aba, abb, baa, bab, bba, bbb, . . .} ... = ??? Recognizers Construct automata that recognize strings belonging to a language. • Finite State Automata ⇒ Regular Languages
  • 5. • Push Down Automata ⇒ Context-free Languages ⊲ Stack is used to maintain counter, but only one counter can go arbitrarily high. Recognizing Finite Sets of Strings Identifying words from a small, finite, fixed vocabulary is straightforward. For instance, consider a stack machine with push, pop, and add operations with two constants: 0 and 1. We can use the automaton: s h p p 0 1 u o a d d push pop add integer_constant Finite State Automata Represented by a labeled directed graph. • A finite set of states (vertices). • Transitions between states (edges). • Labels on transitions are drawn from Σ ∪ {ǫ}. • One distinguished start state. • One or more distinguished final states. Finite State Automata: An Example Consider the Regular Expression (a | b)∗ a(a | b). L((a | b)∗ a(a | b)) = {aa, ab, aaa, aab, baa, bab, aaaa, aaab, abaa, abab, baaa, . . .}. The following automaton determines whether an input string belongs to L((a | b)∗ a(a | b): a a b b a 1 2 3 Determinism (a | b)∗ a(a | b): Nondeterministic: (NFA) a a b b a 1 2 3 Deterministic: (DFA) a a b b a a b 1 2 3 4
  • 6. Acceptance Criterion A finite state automaton (NFA or DFA) accepts an input string x . . . if beginning from the start state . . . we can trace some path through the automaton . . . such that the sequence of edge labels spells x . . . and end in a final state. Recognition with an NFA Is abab ∈ L((a | b)∗ a(a | b))? a a b b a 1 2 3 Input: a b a b Path 1: 1 1 1 1 1 Path 2: 1 1 1 2 3 Accept Path 3: 1 2 3 ⊥ ⊥ Accept Recognition with an NFA Is abab ∈ L((a | b)∗ a(a | b))? a a b b a 1 2 3 Input: a b a b Path 1: 1 1 1 1 1 Path 2: 1 1 1 2 3 Accept Path 3: 1 2 3 ⊥ ⊥ Accept Recognition with a DFA Is abab ∈ L((a | b)∗ a(a | b))? a a b b a a b b 1 2 3 4
  • 7. NFA vs. DFA For every NFA, there is a DFA that accepts the same set of strings. • NFA may have transitions labeled by ǫ. (Spontaneous transitions) • All transition labels in a DFA belong to Σ. • For some string x, there may be many accepting paths in an NFA. • For all strings x, there is one unique accepting path in a DFA. • Usually, an input string can be recognized faster with a DFA. • NFAs are typically smaller than the corresponding DFAs. Regular Expressions to NFA Thompson’s Construction: For every regular expression r, derive an NFA N(r) with unique start and final states. ǫ ε α ∈ Σ α (r1 | r2) N(r ) 1 ε ε ε ε N(r ) 2 Regular Expressions to NFA (contd.) r1r2 N(r )2 N(r )1 ε ε r∗ ε ε N(r) ε ε Example (a | b)∗ a(a | b): ε ε ε ε a b ε ε a ε ε ε ε a b ε
  • 8. Recognition with an NFA Is abab ∈ L((a | b)∗ a(a | b))? a a b b a 1 2 3 Input: a b a b Path 1: 1 1 1 1 1 Path 2: 1 1 1 2 3 Accept Path 3: 1 2 3 ⊥ ⊥ All Paths {1} {1, 2} {1, 3} {1, 2} {1, 3} Accept Recognition with an NFA (contd.) Is aaab ∈ L((a | b)∗ a(a | b))? a a b b a 1 2 3 Input: a a a b Path 1: 1 1 1 1 1 Path 2: 1 1 1 1 2 Path 3: 1 1 1 2 3 Accept Path 4: 1 1 2 3 ⊥ Path 5: 1 2 3 ⊥ ⊥ All Paths {1} {1, 2} {1, 2, 3} {1, 2, 3} {1, 2, 3} Accept Recognition with an NFA (contd.) Is aabb ∈ L((a | b)∗ a(a | b))? a a b b a 1 2 3 Input: a a a b Path 1: 1 1 1 1 1 Path 2: 1 1 2 3 ⊥ Path 3: 1 2 3 ⊥ ⊥ All Paths {1} {1, 2} {1, 2, 3} {1, 3} {1} REJECT Converting NFA to DFA Subset construction Given a set S of NFA states, • compute Sǫ = ǫ-closure(S): Sǫ is the set of all NFA states reachable by zero or more ǫ-transitions from S. • compute Sα = goto(S, α): – S′ is the set of all NFA states reachable from S by taking a transition labeled α. – Sα = ǫ-closure(S′ ). Converting NFA to DFA (contd). Each state in DFA corresponds to a set of states in NFA. Start state of DFA = ǫ-closure(start state of NFA). From a state s in DFA that corresponds to a set of states S in NFA: add a transition labeled α to state s′ that corresponds to a non-empty S′ in NFA, such that S′ = goto(S, α).
  • 9. ⇐ s is a final state of DFA NFA → DFA: An Example a a b b a 1 2 3 ǫ-closure({1}) = {1} goto({1}, a) = {1, 2} goto({1}, b) = {1} goto({1, 2}, a) = {1, 2, 3} goto({1, 2}, b) = {1, 3} goto({1, 2, 3}, a) = {1, 2, 3} ... NFA → DFA: An Example (contd.) ǫ-closure({1}) = {1} goto({1}, a) = {1, 2} goto({1}, b) = {1} goto({1, 2}, a) = {1, 2, 3} goto({1, 2}, b) = {1, 3} goto({1, 2, 3}, a) = {1, 2, 3} goto({1, 2, 3}, b) = {1} goto({1, 3}, a) = {1, 2} goto({1, 3}, b) = {1} NFA → DFA: An Example (contd.) goto({1}, a) = {1, 2} goto({1}, b) = {1} goto({1, 2}, a) = {1, 2, 3} goto({1, 2}, b) = {1, 3} goto({1, 2, 3}, a) = {1, 2, 3} ... a a b b a a b b {1} {1,2} {1,3} {1,2,3} NFA vs. DFA R = Size of Regular Expression N = Length of Input String NFA DFA Size of Automaton O(R) O(2R )
  • 10. Lexical Analysis • Regular Expressions and Definitions are used to specify the set of strings (lexemes) corresponding to a token. • An automaton (DFA/NFA) is built from the above specifications. • Each final state is associated with an action: emit the corresponding token. Specifying Lexical Analysis Consider a recognizer for integers (sequence of digits) and floats (sequence of digits separated by a decimal point). [0-9]+ { emit(INTEGER_CONSTANT); } [0-9]+"."[0-9]+ { emit(FLOAT_CONSTANT); } 0-9 0-9 0-9 0-9 ε 0-9 0-9 ε "." INTEGER_CONSTANT FLOAT_CONSTANT Lex Tool for building lexical analyzers. Input: lexical specifications (.l file) Output: C function (yylex) that returns a token on each invocation. %% [0-9]+ { return(INTEGER_CONSTANT); } [0-9]+"."[0-9]+ { return(FLOAT_CONSTANT); } Tokens are simply integers (#define’s). Lex Specifications %{ C header statements for inclusion %} Regular Definitions e.g.: digit [0-9] %% Token Specifications e.g.: {digit}+ { return(INTEGER_CONSTANT); } %% Support functions in C Regular Expressions in Lex
  • 11. • Range: [0-7]: Integers from 0 through 7 (inclusive) [a-nx-zA-Q]: Letters a thru n, x thru z and A thru Q. • Exception: [^/]: Any character other than /. • Definition: {digit}: Use the previously specified regular definition digit. • Special characters: Connectives of regular expression, convenience features. e.g.: | * ^ Special Characters in Lex | * + ? ( ) Same as in regular expressions [ ] Enclose ranges and exceptions { } Enclose “names” of regular definitions ^ Used to negate a specified range (in Exception) . Match any single character except newline Escape the next character n, t Newline and Tab For literal matching, enclose special characters in double quotes (") e.g.: "*" Or use to escape. e.g.: " Examples for Sequence of f, o, r "||" C-style OR operator (two vert. bars) .* Sequence of non-newline characters [^*/]+ Sequence of characters except * and / "[^"]*" Sequence of non-quote characters beginning and ending with a quote ({letter}|" ")({letter}|{digit}|" ")* C-style identifiers A Complete Example %{ #include <stdio.h> #include "tokens.h" %} digit [0-9] hexdigit [0-9a-f] %% "+" { return(PLUS); } "-" { return(MINUS); } {digit}+ { return(INTEGER_CONSTANT); } {digit}+"."{digit}+ { return(FLOAT_CONSTANT); } . { return(SYNTAX_ERROR); } %% Actions Actions are attached to final states. • Distinguish the different final states.
  • 12. • Can be used to set attribute values. • Fragment of C code (blocks enclosed by ‘{’ and ‘}’). Attributes Additional information about a token’s lexeme. • Stored in variable yylval • Type of attributes (usually a union) specified by YYSTYPE • Additional variables: – yytext: Lexeme (Actual text string) – yyleng: length of string in yytext ⊲ yylineno: Current line number (number of ‘n’ seen thus far) ∗ enabled by %option yylineno Priority of matching What if an input string matches more than one pattern? "if" { return(TOKEN_IF); } {letter}+ { return(TOKEN_ID); } "while" { return(TOKEN_WHILE); } • A pattern that matches the longest string is chosen. Example: if1 is matched with an identifier, not the keyword if. • Of patterns that match strings of same length, the first (from the top of file) is chosen. Example: while is matched as an identifier, not the keyword while. Constructing Scanners using (f)lex • Scanner specifications: specifications.l (f)lex specifications.l −−−−→ lex.yy.c • Generated scanner in lex.yy.c (g)cc lex.yy.c −−−−→ executable – yywrap(): hook for signalling end of file. – Use -lfl (flex) or -ll (lex) flags at link time to include default function yywrap() that always returns 1. Implementing a Scanner transition : state × Σ → state algorithm scanner() { current state = start state; while (1) { c = getc(); /* on end of file, ... */ if defined(transition(current state, c)) current state = transition(current state, c); else return s; }
  • 13. Implementing a Scanner (contd.) Implementing the transition function: • Simplest: 2-D array. Space inefficient. • Traditionally compressed using row/colum equivalence. (default on (f)lex) Good space-time tradeoff. • Further table compression using various techniques: – Example: RDM (Row Displacement Method): Store rows in overlapping manner using 2 1-D arrays. Smaller tables, but longer access times. Lexical Analysis: A Summary Convert a stream of characters into a stream of tokens. • Make rest of compiler independent of character set • Strip off comments • Recognize line numbers • Ignore white space characters • Process macros (definitions and uses) • Interface with symbol (name) table.