2. What is Symbol Table
A compiler needs to collect and uses the
information about names in the source
program
It is a data structure used by compiler to
keep track of scope, life and binding
information about names in a program.
3. Names?
These names are used to identify various
program elements like - >
1. Variables
2. Constants
3. Procedures
4. Labels
*Keywords & Operators are usually not stored .
4. The Symbol Table
When identifiers are found in the Lexical phase,
they are entered into a symbol table, which will hold
all relevant information about identifiers.
This information will be used later by the later
phases.
Lexical
Analyzer
Semantic
Analyzer
Code
Generator
Symbol
Table
Syntax
Analyzer
5. Expectations from ST
Atleast we must be able to ->
Query to find out if a symbol is already
in the table.
RETRIEVE a symbol so that its
parameters maybe retrieved and/or
modified,
INSERT a new symbol into the table
6. Symbol Table Entries
We will store the following information
about Identifiers ->
The name (as a string).
The data type.
The block level.
Its scope (global, local or parameter).
Its offset from the base pointer (for
local variables and parameters only).
7. Inserting a Symbol
The install() function will insert a new
symbol into the symbol table.
Each symbol has a block level ->
Block level 1 = Keywords.
Block level 2 = Global variables.
Block level 3 = Parameters and local variables.
install() will create an IdEntry object and
store it in the table.
8. Structure of the Symbol Table
We will implement the symbol table as a linked
list of hash tables, one hash table for each
block level.
Level 3 Level 1
Level 2
Hash table
of
Locals
Hash table
of
Globals
Hash table
of
Keywords
Level 0
null
9. Structure of the Symbol Table
Initially, we create a null hash table at
level 0.
Level 0
null
10. Structure of the Symbol Table
Then we increase the block level and
install the Keywords at level 1 in the
symbol table.
Level 1
Hash table
of
Keywords
Level 0
null
11. Structure of the Symbol Table
Then we increase the block level and
install the Global Variables at level 2.
Level 1
Level 2
Hash table
of
Globals
Hash table
of
Keywords
Level 0
null
12. Structure of the Symbol Table
When we enter a function, we create a
level 3 hash table and store Parameters
and Local Variables there.
Level 3 Level 1
Level 2
Hash table
of
Locals
Hash table
of
Globals
Hash table
of
Keywords
Level 0
null
13. Structure of the Symbol Table
When we leave the function, the hash
table of local variables is deleted from the
list.
Level 1
Level 2
Hash table
of
Globals
Hash table
of
Keywords
Level 0
null
14. Locating a Symbol
If we enter another function, a new level 3
hash table is created.
Level 3 Level 1
Level 2
Hash table
of
Locals
Hash table
of
Globals
Hash table
of
Keywords
Level 0
null
15. Locating a Symbol
When we look up an identifier, we begin
the search at the head of the list.
Level 3 Level 1
Level 2
Hash table
of
Locals
Hash table
of
Globals
Hash table
of
Keywords
Level 0
null
16. Locating a Symbol
If it is not found there, then the search
continues at the lower levels.
Level 3 Level 1
Level 2
Hash table
of
Locals
Hash table
of
Globals
Hash table
of
Keywords
Level 0
null
17. Locating a Symbol
Keywords are found in the level 1 hash
table.
Level 3 Level 1
Level 2
Hash table
of
Locals
Hash table
of
Globals
Hash table
of
Keywords
Level 0
null
18. Hash Tables
A hash table is a list in which each member
is accessed through a key.
The key is used to determine where to store
the value in the table.
The function that produces a location from
the key is called the hash function.
For example, if it were a hash table of strings,
the hash function might compute the sum of
the ASCII values of the first 5 characters of
the string, modulo the size of the table.
19. Hash Tables
The numerical value of the hashed key gives
the location of the member.
Thus, there is no need to search for the
member; the hashed key tells where it is
located.
For example, if the string were "return",
then the key would be (114 + 101 + 116 +
117 + 114) % 100 = 62.
Thus, "return" would be located in position
62 of the hash table.
20. Language Constructs Stored in
ST
Constants
Variables
User Defined Data Types
Subprograms
Classes
Inheritance
Modules
23. Error Detection
It is an important feature of any
Compiler
A good Compiler not only detects
errors but also recover from errors
A compiler should be able to modify
input when it finds an error in lexical
analysis phase.
24. Properties of Error Handler
It should Localize the problem.
Report errors in source program
rather than intermediate code.
Error message should be easy.
Error message should not be
duplicated.
25. Categories Of Error Handler
■ Lexical Errors: Misspellings of keywords,
identifiers and operators.
■ Syntactic Errors: Misplaced semicolons, extra or
missing braces.
■ Semantic Errors: Type mismatches between
operators and operands. Ex: a return statement with
return type void.
■ Logical Errors: Incorrect reasoning or plain
carelessness might result in errors like
interchangeably using = and == operators.
26. Error Recovery Strategies
1. Phrase-level Recovery
● Local Correction by parser on remaining input, by
some string which allows parser to continue.
● Replacing comma by semicolon, inserting extra
semicolon etc.
● Advantage: It can correct any input string.
● Drawbacks: Improper replacement might lead to
infinite loops.
27. 2. Error Productions
● Add rules to the grammar that generate
erroneous constructs.
● Such a parser will detect anticipated errors
when an error production is used.
● Advantage: Error diagnostics will be readily
available for such anticipated errors.
● Impossible to know all errors.
28. 3. Panic Mode
When a parser encounters an error anywhere
in the statement, it ignores the rest of the
statement by not processing input from
erroneous input to known delimiter, such as
semi-colon.
This is the easiest way of error-recovery and
also, it prevents the parser from developing
infinite loops.
29. 4. Global Correction
The parser considers the program in hand as a
whole and tries to figure out what the program
is intended to do and tries to find out a closest
match for it, which is error-free.
When an erroneous input (X) is fed, it
creates a parse tree for some closest error-
free statement (Y).
This may allow the parser to make minimal
changes in the source code, but due to the
complexity (time and space) of this strategy, it
has not been implemented in practice yet.
31. CODE GENERATION
Code generation is the final phase of
compilation.
The code generated by the code generator is
an object code of some lower-level
programming language, for example,
machine language or assembly language.
Software for this phase is called code
generator.
32. Code Generation
The Object Code should have the following
minimum properties:
It should carry the exact meaning of the
source code.
It should be efficient in terms of CPU usage
and memory management.
33. Input and Output of code
Generator
Input:
1.A sequence of
Quadruples.
2.A sequence of
Triples.
3.DAGs
4.Postfix String
Output:
1. Absolute Machine
language Program
2. Relocatable M/C
language program
3. Assembly
language program
4. Program in other
programming
language
34. Main Tasks Of Code Generator
Memory Management : The code generator
decides which memory locations and CPU
registers are to be used.
IR Type : Intermediate Representation(IR)
has various forms.
It can be in Abstract Syntax Tree (AST) structure,
Reverse Polish Notation, or 3-address code.
Also what data structure is to be used (stack
or array).
35. Main Tasks Of Code Generator
Selection of Instruction: The code
generator takes Intermediate Representation
as input and converts it into target machine’s
instruction set.
One representation can have many ways
(instructions) to convert it, so it becomes the
responsibility of the code generator to choose the
appropriate instructions wisely.
36. Main Tasks Of Code Generator
Register allocation: The target machine’s
architecture may not allow all of the values to be
kept in the CPU memory or registers.
Code generator decides what values to keep in the
registers.
Also, it decides the registers to be used to keep these
values.
Ordering of instructions : At last, the code
generator decides the order in which the
instruction will be executed.
It creates schedules for instructions to execute them.
37. Code Generation From DAGs
One major point here is in what order
computation should be done.
It is possible that generated code will be
optimal if we rearrange the computation
order.
Here computation order means sequencing
of instructions.
DAGs provides advantage of rearranging
computation to generate optimal object code.
38. Method of Generating Code
from DAG
Algorithm:-
begin
select an unlisted node n, all of whose parents have
been listed ;
list n;
while the leftmost child m of n has no unlisted
parents
and is not a leaf ,do begin
list m;
n = m
end
end
42. Peephole Optimization
This optimization technique works locally on
the source code to transform it into an
optimized code.
By locally, we mean a small portion of the
code block at hand.
These methods can be applied on
intermediate codes as well as on target
codes. A bunch of statements is analyzed
and are checked for the following possible
optimization.
43. 1. Redundant Instruction
Elimination
At compilation level, the compiler searches
for instructions redundant in nature.
For example:
MOV x, R0
MOV R0, R1
We can delete the first instruction and re-
write the sentence as:
MOV x, R1
44. 2.Elimination of Unreachable
Code
Unreachable code is a part of the program
code that is never accessed because of
programming constructs.
Example:
void addition(int x)
{
return x + 10;
print(“value of x is %d”, x);
}
45. 3.Flow of Control Optimization
There are instances in a code where the
program control jumps back and forth without
performing any significant task. These jumps
can be removed.
Example
GOTO L1
...
L1 : GOTO L2
L2 : INC R1
GOTO L2
...
L2 : INC R1
47. 5. Strength Reduction
There are operations that consume more
time and space. Their ‘strength’ can be
reduced by replacing them with other
operations that consume less time and
space, but produce the same result.
For Example:
x = 2 * y x = y + y
48. 6. Accessing Machine
Instructions
Target machine may have Hardware
instructions to implement certain specific
operations effectively.
For Example:
The expression a = a + 1 can simply be
replaced by:
INC a or
AOS a. (AOS – Add on storage)