Symbol Table, Error Handler & Code Generation

The Symbol Table,
Error Handler &
Code Generation
By: Akhil Kaushik

What is Symbol Table
 A compiler needs to collect and uses the
information about names in the source
program
 It is a data structure used by compiler to
keep track of scope, life and binding
information about names in a program.

Names?
These names are used to identify various
program elements like - >
1. Variables
2. Constants
3. Procedures
4. Labels
*Keywords & Operators are usually not stored .

The Symbol Table
 When identifiers are found in the Lexical phase,
they are entered into a symbol table, which will hold
all relevant information about identifiers.
 This information will be used later by the later
phases.
Lexical
Analyzer
Semantic
Analyzer
Code
Generator
Symbol
Table
Syntax
Analyzer

Expectations from ST
Atleast we must be able to ->
 Query to find out if a symbol is already
in the table.
 RETRIEVE a symbol so that its
parameters maybe retrieved and/or
modified,
 INSERT a new symbol into the table

Symbol Table Entries
We will store the following information
about Identifiers ->
 The name (as a string).
 The data type.
 The block level.
 Its scope (global, local or parameter).
 Its offset from the base pointer (for
local variables and parameters only).

Inserting a Symbol
 The install() function will insert a new
symbol into the symbol table.
 Each symbol has a block level ->
 Block level 1 = Keywords.
 Block level 2 = Global variables.
 Block level 3 = Parameters and local variables.
 install() will create an IdEntry object and
store it in the table.

Structure of the Symbol Table
We will implement the symbol table as a linked
list of hash tables, one hash table for each
block level.
Level 3 Level 1
Level 2
Hash table
of
Locals
Hash table
of
Globals
Hash table
of
Keywords
Level 0
null

 Initially, we create a null hash table at
level 0.
Level 0
null

 Then we increase the block level and
install the Keywords at level 1 in the
symbol table.
Level 1
Hash table
of
Keywords
Level 0
null

 Then we increase the block level and
install the Global Variables at level 2.
Level 1
Level 2
Hash table
of
Globals
Hash table
of
Keywords
Level 0
null

 When we enter a function, we create a
level 3 hash table and store Parameters
and Local Variables there.
Level 3 Level 1
Level 2
Hash table
of
Locals
Hash table
of
Globals
Hash table
of
Keywords
Level 0
null

 When we leave the function, the hash
table of local variables is deleted from the
list.
Level 1
Level 2
Hash table
of
Globals
Hash table
of
Keywords
Level 0
null

Locating a Symbol
 If we enter another function, a new level 3
hash table is created.
Level 3 Level 1
Level 2
Hash table
of
Locals
Hash table
of
Globals
Hash table
of
Keywords
Level 0
null

Locating a Symbol
 When we look up an identifier, we begin
the search at the head of the list.
Level 3 Level 1
Level 2
Hash table
of
Locals
Hash table
of
Globals
Hash table
of
Keywords
Level 0
null

Locating a Symbol
 If it is not found there, then the search
continues at the lower levels.
Level 3 Level 1
Level 2
Hash table
of
Locals
Hash table
of
Globals
Hash table
of
Keywords
Level 0
null

Locating a Symbol
 Keywords are found in the level 1 hash
table.
Level 3 Level 1
Level 2
Hash table
of
Locals
Hash table
of
Globals
Hash table
of
Keywords
Level 0
null

Hash Tables
 A hash table is a list in which each member
is accessed through a key.
 The key is used to determine where to store
the value in the table.
 The function that produces a location from
the key is called the hash function.
 For example, if it were a hash table of strings,
the hash function might compute the sum of
the ASCII values of the first 5 characters of
the string, modulo the size of the table.

Hash Tables
 The numerical value of the hashed key gives
the location of the member.
 Thus, there is no need to search for the
member; the hashed key tells where it is
located.
 For example, if the string were "return",
then the key would be (114 + 101 + 116 +
117 + 114) % 100 = 62.
 Thus, "return" would be located in position
62 of the hash table.

Language Constructs Stored in
ST
 Constants
 Variables
 User Defined Data Types
 Subprograms
 Classes
 Inheritance
 Modules

Symbol Table Organization
 Arrays
 Linked List
 Hash Tables
 Binary Trees

Error Detection
 It is an important feature of any
Compiler
 A good Compiler not only detects
errors but also recover from errors
 A compiler should be able to modify
input when it finds an error in lexical
analysis phase.

Properties of Error Handler
 It should Localize the problem.
 Report errors in source program
rather than intermediate code.
 Error message should be easy.
 Error message should not be
duplicated.

Categories Of Error Handler
■ Lexical Errors: Misspellings of keywords,
identifiers and operators.
■ Syntactic Errors: Misplaced semicolons, extra or
missing braces.
■ Semantic Errors: Type mismatches between
operators and operands. Ex: a return statement with
return type void.
■ Logical Errors: Incorrect reasoning or plain
carelessness might result in errors like
interchangeably using = and == operators.

Error Recovery Strategies
1. Phrase-level Recovery
● Local Correction by parser on remaining input, by
some string which allows parser to continue.
● Replacing comma by semicolon, inserting extra
semicolon etc.
● Advantage: It can correct any input string.
● Drawbacks: Improper replacement might lead to
infinite loops.

2. Error Productions
● Add rules to the grammar that generate
erroneous constructs.
● Such a parser will detect anticipated errors
when an error production is used.
● Advantage: Error diagnostics will be readily
available for such anticipated errors.
● Impossible to know all errors.

3. Panic Mode
 When a parser encounters an error anywhere
in the statement, it ignores the rest of the
statement by not processing input from
erroneous input to known delimiter, such as
semi-colon.
 This is the easiest way of error-recovery and
also, it prevents the parser from developing
infinite loops.

4. Global Correction
 The parser considers the program in hand as a
whole and tries to figure out what the program
is intended to do and tries to find out a closest
match for it, which is error-free.
 When an erroneous input (X) is fed, it
creates a parse tree for some closest error-
free statement (Y).
 This may allow the parser to make minimal
changes in the source code, but due to the
complexity (time and space) of this strategy, it
has not been implemented in practice yet.

CODE GENERATION
 Code generation is the final phase of
compilation.
 The code generated by the code generator is
an object code of some lower-level
programming language, for example,
machine language or assembly language.
 Software for this phase is called code
generator.

Code Generation
The Object Code should have the following
minimum properties:
 It should carry the exact meaning of the
source code.
 It should be efficient in terms of CPU usage
and memory management.

Input and Output of code
Generator
Input:
1.A sequence of
Quadruples.
2.A sequence of
Triples.
3.DAGs
4.Postfix String
Output:
1. Absolute Machine
language Program
2. Relocatable M/C
language program
3. Assembly
language program
4. Program in other
programming
language

Main Tasks Of Code Generator
 Memory Management : The code generator
decides which memory locations and CPU
registers are to be used.
 IR Type : Intermediate Representation(IR)
has various forms.
 It can be in Abstract Syntax Tree (AST) structure,
Reverse Polish Notation, or 3-address code.
 Also what data structure is to be used (stack
or array).

 Selection of Instruction: The code
generator takes Intermediate Representation
as input and converts it into target machine’s
instruction set.
 One representation can have many ways
(instructions) to convert it, so it becomes the
responsibility of the code generator to choose the
appropriate instructions wisely.

 Register allocation: The target machine’s
architecture may not allow all of the values to be
kept in the CPU memory or registers.
 Code generator decides what values to keep in the
registers.
 Also, it decides the registers to be used to keep these
values.
 Ordering of instructions : At last, the code
generator decides the order in which the
instruction will be executed.
 It creates schedules for instructions to execute them.

Code Generation From DAGs
 One major point here is in what order
computation should be done.
 It is possible that generated code will be
optimal if we rearrange the computation
order.
 Here computation order means sequencing
of instructions.
 DAGs provides advantage of rearranging
computation to generate optimal object code.

Method of Generating Code
from DAG
 Algorithm:-
begin
select an unlisted node n, all of whose parents have
been listed ;
list n;
while the leftmost child m of n has no unlisted
parents
and is not a leaf ,do begin
list m;
n = m
end
end

EXAMPLE
The 3-Address code statements:
t1 = a + b
t2 = c + d
t3 = e – t2
t4 = t1 – t3

Peephole Optimization
 This optimization technique works locally on
the source code to transform it into an
optimized code.
 By locally, we mean a small portion of the
code block at hand.
 These methods can be applied on
intermediate codes as well as on target
codes. A bunch of statements is analyzed
and are checked for the following possible
optimization.

1. Redundant Instruction
Elimination
 At compilation level, the compiler searches
for instructions redundant in nature.
For example:
MOV x, R0
MOV R0, R1
 We can delete the first instruction and re-
write the sentence as:
MOV x, R1

2.Elimination of Unreachable
Code
 Unreachable code is a part of the program
code that is never accessed because of
programming constructs.
Example:
void addition(int x)
{
return x + 10;
print(“value of x is %d”, x);
}

3.Flow of Control Optimization
 There are instances in a code where the
program control jumps back and forth without
performing any significant task. These jumps
can be removed.
Example
GOTO L1
...
L1 : GOTO L2
L2 : INC R1
GOTO L2
...
L2 : INC R1

4. Algebraic Expression
Simplification
 There are occasions where algebraic
expressions can be made simple.
For Example:
a = a + 0  a = a
a = a * 1  a = a

5. Strength Reduction
 There are operations that consume more
time and space. Their ‘strength’ can be
reduced by replacing them with other
operations that consume less time and
space, but produce the same result.
For Example:
x = 2 * y  x = y + y

6. Accessing Machine
Instructions
 Target machine may have Hardware
instructions to implement certain specific
operations effectively.
 For Example:
 The expression a = a + 1 can simply be
replaced by:
 INC a or
 AOS a. (AOS – Add on storage)

Symbol Table, Error Handler & Code Generation

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Symbol Table, Error Handler & Code Generation

Similar a Symbol Table, Error Handler & Code Generation (20)

Más de Akhil Kaushik

Más de Akhil Kaushik (19)

Último

Último (20)

Symbol Table, Error Handler & Code Generation