The document discusses intermediate code generation in compilers. It begins by explaining that intermediate code generation is the final phase of the compiler front-end and its goal is to translate the program into a format expected by the back-end. Common intermediate representations include three address code and static single assignment form. The document then discusses why intermediate representations are used, how to choose an appropriate representation, and common types of representations like graphical IRs and linear IRs.
2. Intermediate Code Generation
• The final phase of the compiler front-end
• Goal: translate the program into a format
expected by the compiler back-end
• In typical compilers: followed by intermediate
code optimization and machine code
generation
3. Why use an intermediate representation?
• It makes optimization easier: write optimization methods only
for the intermediate representation
• The intermediate representation can be directly interpreted
• SPARC (Scalable Processor Architecture)
• MIPS (Microprocessor without Interlocked Pipelined Stages)
5. How to choose the intermediate representation?
• It should be easy to translate the source language
to the intermediate representation
• It should be easy to translate the intermediate
representation to the machine code
• The intermediate representation should be
suitable for optimization
• It should be neither too high level nor too low
level
• Single compiler can have more than one
intermediate representation
6. Common Intermediate representations
• General forms of intermediate representations (IR):
– Graphical IR (i.e. parse tree, abstract syntax trees, DAG..)
– Linear IR (i.e. non-graphical)
– Three Address Code (TAC): instructions of the form “result
= op1 operator op2”
– Static single assignment (SSA) form: each variable is
assigned once.
Y = 1
Y = 2
X = Y
Y1 = 1
Y2 = 2
X1 = Y2
7. Example IR in programming languages
• Java bytecode (executed on the Java Virtual Machine)
• C is used in several compilers as an intermediate
representation (e.g. Lisp, Haskell, Cython. . . )
• Microsoft’s Common Intermediate Language (CIL)
• GNU Compiler Collection (GCC) uses abstract syntax trees
8. Position of Intermediate code generator
• Intermediate code is the interface between front end
and back end in a compiler
Parser
Static
Checker
Intermediate Code
Generator
Code
Generator
Front end Back end
10. Variants of syntax trees - DAG
• Syntax tree is used to crate a DAG instead of tree for Expressions.
• A directed acyclic graph (DAG) is an AST with a unique node for each value.
• It can easily show the common sub-expressions and then use that knowledge during code
generation.
• Common sub-expressions has more than one parent. Ex. a and b-c
• Example: a+a*(b-c)+(b-c)*d
• Node a and (b-c) are unique nodes that values are using in two different context.
+
+ *
*
-
b c
a
d
11. DAG’s – using Array
• Algorithm
– Search the array for a node m with label op, left child l and right child r
– If there is such a node, return the value number m
– If not, create in the array a new node n with label op, left child l, and right child
r and return its value n.
– The search for m can be made more efficient by using k lists and using a hash
function to determine which lists to check.
=
+
1
i
id entry for i
num 10
+ 0 1
= 0 1
i := i + 1
Array of Records
0
1
2
3
4
10
Value-number method for constructing a node in a DAG
Input: label operator, left child, right child
Output: op, l, r
14. SDD for creating DAG’s
1) E -> E1+T
2) E -> E1-T
3) E -> T
4) T -> (E)
5) T -> id
6) T -> num
Grammar Productions Semantic Rules
{E.node= new mknode(‘+’, E1.node,T.node)}
{E.node= new mknode(‘-’, E1.node,T.node)}
{E.node = T.node}
{T.node = E.node}
{T.node = new mkleaf(id, id.entry)}
{T.node = new mkleaf(num, num.val)}
Example:
1) p1=mkleaf(id, entry-a)
2) P2=mkleaf(id, entry-a)=p1
3) p3=mkleaf(id, entry-b)
4) p4=mkleaf(id, entry-c)
5) p5=mknode(‘-’,p3,p4)
6) p6=mknode(‘*’,p1,p5)
7) p7=mknode(‘+’,p1,p6)
8) p8=mkleaf(id,entry-b)=p3
9) p9=mkleaf(id,entry-c)=p4
10) p10=mknode(‘-’,p3,p4)=p5
11) p11=mkleaf(id,entry-d)
12) p12=mknode(‘*’,p5,p11)
13) p13=mknode(‘+’,p7,p12)
15. Example
• To work out a+(b-c)+(b-c)
– Construct Syntax tree, DAG
P4
400
16. Exercise
• To construct syntax tree, DAG, array of records for the following
expression a := b*(-c)+b*(-c)
• Consider the SDD to produce syntax trees for assignment statements
17. Three address code: Addresses
• Three-address code is built from two concepts:
– addresses and instructions.
• Instruction is the statement or operation
– At most one operator on the right side of an instruction.
– 3-address code form:
x = y op z
• An address can be
– Identifier: source variable program name or pointer to the Symbol Table name.
– constant: Constants in the program.
18. Forms or types of three address instructions
• Assignment Statements ----- x := y op z
• Assignment instructions ----- x := op y
• Copy statements ----- x := y
• Unconditional jump ----- goto L
• Conditional jump ----- if x relop y goto L [relop are <, =, >= , etc.]
• Procedure calls: 3 address code generated for call of the procedure y=p(x1,x2,…,xn)
param x1
param x2,
…
param xn
y = call p, n
• Indexed assignments ------ x := y[i] and x[i] := y
• Address and pointer assignments ------ x := &y and x := *y and *x := y
19. Ex: Write Three Address Code for the block of statements
int a;
int b;
a = 5 + 2 * b;
Solution:
t0 = 5;
t1 = 2 * b;
a = t0 + t1;
20. Ex: Write Three Address Code for the if -else
if (A < B)
{
t = 1
}
else
{
t = 0
}
Solution-
(1) if (A < B) goto (4)
(2) t=0
(3) goto (5)
(4) t = 1
(5)
21. Ex: Write Three Address Code for the if-else
if (A < B) && (C < D)
{
t = 1
}
else
{
t = 0
}
Solution-
(1) if (A < B) goto (3)
(2) goto (4)
(3) if (C < D) goto (6)
(4) t = 0
(5) goto (7)
(6) t = 1
(7)
22. Ex: Write Three Address Code for the while statements
a=3; b=4; i=0;
while(i<n){
a= b+1;
a=a*a;
i++;
}
c=a;
Solution:
a=3;
b=4;
i=0;
L1:
T1=i<n;
if T1 goto L2;
goto L3;
L2:
T2=b+1;
a=T2;
T3=a*a;
a=T3
i++;
goto L1;
L3:
c=a;
23. Ex: Write Three Address Code for the switch case
switch (ch)
{
case 1 : c = a + b;
break;
case 2 : c = a – b;
break;
}
Solution-
if ch = 1 goto L1
if ch = 2 goto L2
L1:
T1 = a + b
c = T1
goto Last
L2:
T1 = a – b
c = T2
goto Last
Last:
26. Choice of allowable operators
• It is an important issue in the design of an intermediate form
• A small operator set is easier to implement
• Restricted instruction set may force front end to generate long
sequences of statements for some source language operations
• The optimizer and code generator may then have to work harder if
good code is to be generated
• Close enough to machine instructions to simplify code
generation
27. Example
do
i = i+1;
while (a[i*8] < v);
L: t1 = i + 1
i = t1
t2 = i * 8
t3 = a[t2]
if t3 < v goto L
Symbolic labels
100: t1 = i + 1
101: i = t1
102: t2 = i * 8
103: t3 = a[t2]
104: if t3 < v goto 100
Position numbers
28. Syntax tree vs. DAG vs. Three address code
• AST is the procedure’s parse tree with the nodes for most non-terminal symbols
removed.
• Directed Acyclic Graph is an AST with a unique node for each value.
• Three address code is a sequence of statements of the general form x := y op z
• In a TAC there is at most one operator at the right side of an instruction.
• Example:
t1 = b – c
t2 = a * t1
t3 = a + t2
t4 = t1 * d
t5 = t3 + t4
a+a*(b-c)+(b-c)*d
+
+ *
*
-
b c
a
d
AST DAG TAC
29. Data structures for three address
codes
• Implementations of Three-Address statements
– Quadruples
• Has four fields: op, arg1, arg2 and result
– Triples
• Temporaries are not used and instead references to
instructions are made
– Indirect triples
• In addition to triples we use a list of pointers to triples
30. Example
• a = b * uminus c + b * uminus c
(or)
a = b * (-c) + b * (-c)
t1 = uminus c
t2 = b * t1
t3 = uminus c
t4 = b * t3
t5 = t2 + t4
a = t5
Three address code
uminus
*
uminus c t3
*
+
=
c t1
b t2
t1
b t4
t3
t2 t5
t4
t5 a
arg1 result
arg2
op
Quadruples
uminus
*
uminus c
*
+
=
c
b (0)
b (2)
(1) (3)
a
arg1 arg2
op
Triples
(4)
0
1
2
3
4
5
uminus
*
uminus c
*
+
=
c
b (0)
b (2)
(1) (3)
a
arg1 arg2
op
Indirect Triples
(4)
0
1
2
3
4
5
(0)
(1)
(2)
(3)
(4)
(5)
op
35
36
37
38
39
40
31. Compare Quadruples, Triples &
Indirect Triples
• When instructions are moving around during optimizations:
quadruples are better than triples.
– Quadruple uses temporary variables
• Indirect triples solve this problem
33. More triple representation
• x[i]:=y
• x:=y[i] //Exercise
op arg1 arg2
(0) []= x i
(1) assign (0) y
op arg1 arg2
(0) []= y i
(1) Assign x (0)
34. Exercise
• To give 3 Address code representations for
a+a*(b-c)+(b-c)*d
– Quadruples?
– Triples?
– Indirect Triples?
35. Types and Declarations
• Type checking: Ensures the types of operands matches that is expected
by its context (operator).
– E.g. mod operation needs integer operands
• Determine the storage needed
• Calculate the address of an array reference
• Insert explicit type conversion
• When declarations are together, a single offset on the stack pointer
suffices.
• int x, y, z; fun1(); fun2();
• Otherwise, the translator needs to keep track of the current offset.
• int x; fun1(); int y, z; fun2();
36. Storage layout
• From the type, we can determine amount of
storage at run time.
• At compile time, we will use this amount to
assign its name a relative address.
• Type and relative address are saved in the
symbol table entry of the name.
• Data with length determined only at run time
saves a pointer in the symbol table.
37. Type Systems Design
• Design is based on syntactic constructs in the
language, notion of types and the rules for
assigning the types to language constructs.
• E.g.
– In arithmetic operation such as addition,
subtraction, multiplication and division, if both
operands are integers then result is also integer.
38. Type Expressions
• The type of a language construct is denoted by type expression.
• It is either a basic type or formed by applying an operator (type
constructor) to other type expressions.
39. Equivalence of Type Expression
• Two types are structurally equivalent iff one of the following
conditions is true.
• They are the same basic type.
• They are formed by applying the same constructor to structurally
equivalent types.
• One is a type name that denotes the other.
• int a[2][3] is not equivalent to int b[3][2];
• int a is not equivalent to char b[4];
• struct {int, char} is not equivalent to struct {char, int};
• int * is not equivalent to void *.
40. Type checking rules for expressions
• Basic type expressions: Boolean, char, integer, float
• Eliteral {E.type = char }
• E num {E.type = integer}
• E id {E.type = lookup(id.entry) }
41. Type checking Expressions (cont..d)
• Special Basic type expressions: type_error (raise the error
during type checking) and void.
E E1 mod E2 { E.type = { if E1.type = = integer and
E2.type = = integer then
integer
else
type_error }
}
• Type name is a type expression
Ea {E.name = a}
42. Type checking Expressions (cont..d)
• A type expression can be formed by applying the array type
constructor to a number and a type expression
Constructors include:
• Arrays: If T is a type expression then array(I,T) is a type expressions
denotes array with type T and index set I.
– int a[2][3] is array of 2 arrays of 3 integers.
– In functional style: array(2, array(3, int))
EE1[E2] {E.type = {if E2.type = = integer and
E1.type = array(I, T) then
T
else
type_error
43. Type checking Expressions (cont..d)
• Product: If s and t are type expressions, then their Cartesian product s*t
is a type expression
• Records: A record is a data structure with named field. It applied to a
tuple formed from field names and field types.
• E.g.
type row = record
{
address: integer;
lexeme : array[1..15] of caht
}
var table: array[1…101] of row;
It declares the type name row denotes the type expression record ( (address
x integer) x (lexeme x array(1..15, char)) )
The variable table is array of records of this type.
44. Type checking Expressions (cont..d)
• Assignment Statements: E may be Arithmetic, Logical, Relational
expression
Sid = E { S.type = {if id.type = E.type then
void
else
type_error} }
• If Statements:
Sif E then S1 {S.type = { if E.type = Boolean then
S1.type
else
type_error } }
45. Type checking Expressions (cont..d)
• While Statements:
Swhile E do S1 {S.type = { if E.type = Boolean then
S1.type
else
type_error
}
}
• Pointers: If T is a type expression, then pointer(T) is a type expression
(i.e. pointer to an object of type T).
E*E1 {E.type = {if E.type = ptr(T) then
T
else
type_error } }
46. Type checking Expressions (cont..d)
• Functions: It maps a domain type D to a range type R. The type of such
function is denoted by type expression DR.
Mapping: T DR {T.type = D.type R.type }
Function call: E E1 (E2) {E.type = {if E2.type = T1 and
E1.type = T1 T2 then
T2
else
type_error }}
47. Type checking rules for coercions
• Implicit type conversions (by Compiler) and Explicit type
conversions (by programmer)
EE1 op E2 {E.type = {if E1.type = integer and
E2.type = integer then integer
else if E1.type = integer and
E2.type = float then float
else if E1.type = float and
E2.type = integer then float
else if E1.type = float and
E2.type = float then float
else type_error
51. Type Checking
• Type expressions are checked for
– Correct code
– Security aspects
– Efficient code generation
52. Reference
• A.V. Aho, M.S. Lam, R. Sethi, J. D. Ullman,
Compilers Principles, Techniques and Tools,
Pearson Edition, 2013.
P. Kuppusamy - Lexical Analyzer