2. Topics
• Phases of Compiler
• Lexical Analysis
• What is token?
• What is stream of token?
• What is lexeme?
• What is Pattern?
• What is valid token?
• Regular Expression
• Finite Automata
• Context free grammar
• Syntax Analysis
• Derivation
• Left-most Derivation
• Right-most Derivation
4. Lexical Analysis
• It is also called scanner.
• Lexical analysis is the first phase of a compiler. It takes the modified source code from language
preprocessors that are written in the form of sentences.
• The lexical analyzer breaks these syntaxes into a series of tokens.
• By removing any whitespace or comments in the source code.
5. What is Token?
• What is a token?
A lexical token is a sequence of characters that can be treated as a unit in the grammar of the
programming languages.
• Example of tokens:
• Type token (id, number, real, . . . )
• Punctuation tokens (IF, void, return, . . . )
• Alphabetic tokens (keywords)
6. What is Lexeme?
• A lexeme is a sequence of characters in the source program that is matched by the pattern for a
token.
• Lexeme is a specific instance of a token.
• Example:
• x = a + b * 2
7. What is Pattern?
• Pattern is a rule describing all those lexemes that can represent a particular token in a source
language.
• Example
• Identifier: start with an alphabet or _ followed by any alphanumeric character.
8. What is Valid Token?
• There are some predefined rules for every lexeme to be identified as a valid token. These rules are
defined by grammar rules, by means of a pattern.
9. Regular Expression
Regular expression is a formula that describes a possible set of string. Component of regular
expression..
X the characters
• . any character, usually accept a newline [x y z] any of the characters x, y,
z,…..
R? a R or nothing (=optionally as R)
R* zero or more occurrences…..
R+ one or more occurrences……
R1R2 an R1 followed by anR2
R1|R1 either an R1 or anR2.
10. Finite Automata
• Finite Automata(FA) is the simplest machine to recognize patterns.
• The finite automata or finite state machine is an abstract machine which have five elements or
tuple.
• There are two types
• NFA and DFA
11. Deterministic Finite Automata (DFA)
• In a DFA, for a particular input character, the machine goes to one state only. A transition function
is defined on every state for every input symbol. Also in DFA null (or ε) move is not allowed, i.e.,
DFA cannot change state without any input character.
• For example, below DFA with Σ = {0, 1} accepts all strings ending with 0.
• One important thing to note is, there can be many possible DFAs for a pattern. A DFA with
minimum number of states is generally preferred
12. Non-Deterministic Finite Automata (NFA)
• NFA is similar to DFA except following additional features:
• Null (or ε) move is allowed i.e., it can move forward without reading symbols.
• Ability to transmit to any number of states for a particular input.
• As you can see in transition function is for any input including null (or ε), NFA can go to any state
number of states.
• One important thing to note is, in NFA, if any path for an input string leads to a final state, then
the input string accepted.
13. Syntax Analyzer
• A syntax analyzer or parser takes the input from a lexical analyzer in the form of token streams.
• The parser analyzes the source code (token stream) against the production rules to detect any
errors in the code. The output of this phase is a parse tree.
• Why do you need Syntax Analyzer?
• Check if the code is valid grammatically
• The syntactical analyzer helps you to apply rules to the code
• Helps you to make sure that each opening brace has a corresponding closing balance
• Each declaration has a type and that the type must be exists
15. Derivation
• A derivation is basically a sequence of production rules, in order to get the input string. During
parsing, we take two decisions for some sentential form of input:
• Deciding the non-terminal which is to be replaced.
• Deciding the production rule, by which, the non-terminal will be replaced.
• To decide which non-terminal to be replaced with production rule, we can have two options.
• Left-most Derivation
• Right-most Derivation
16. Left-most Derivation
• In the left most derivation, the input is scanned and replaced with the production rule from left to right. So in
left most derivatives we read the input string from left to right.
Example
Production Rules
S= S+S
S=S-S
S=a|b|c
Input
a-b+c
Left-most Derivation
S= S+S
S=S-S+S
S=a-S+S
S=a-b+S
S=a-b+c
17. Right-most Derivation
• In the right most derivation, the input is scanned and replaced with the production rule from right to left. So in
right most derivatives we read the input string from right to left.
Example
Production Rules
S= S+S
S=S-S
S=a|b|c
Input
a-b+c
Right-most Derivation
S= S+S
S=S-S+S
S=S-S+c
S=S-b+c
S=a-b+c