1. SYSTEM
SOFTWARE
PCODE
R.
V.
COLLEGE
OF
ENGINEERING
DEPARTMENT
OF
COMPUTER
SCIENCE
AND
ENGINEERING
(Autonomous
Institution
Affiliated
to
VTU)
Bangalore
–
560
059
Assignment
Report
on
P-‐Code
Compiler
Bachelor
of
Engineering
in
Computer Science & Engineering
By
Sandeep
R.V
1RV10CS089
(Academic
Year
:
2012-‐13
)
2. SYSTEM
SOFTWARE
PCODE
1. INTRODUCTION
Interpretive compilers are translators for high-level languages,
Such translators produce (as
output) intermediate code (P-code for example) which is intrinsically simple enough to satisfy the
constraints imposed by a practical interpreter, even though it may still be quite a long way from
the machine code of the system on which it is desired to execute the original program. Rather than
continue translation to the level of machine code, an alternative approach that may perform
acceptably well is to use the intermediate code as part of the input to a specially written
interpreter. This in turn "executes" the original algorithm, by simulating a virtual machine for
which the intermediate code effectively is the machine code.
Machine
Independent
Machine
Dependent
Fig.
1.1
2.
P-‐CODE
P-Code (Portable Code) is an assembly language for a hypothetical stack machine.
One Such kind is Bytecode, bytecodes are compact numeric codes, constants, and references
(normally numeric addresses), which encode the result of parsing and semantic analysis of things
like type, scope, and nesting depths of, program objects. They therefore allow much better
performance than direct interpretation of source code.
The name bytecode stems from instruction sets, which have one byte opcodes, followed by
optional parameters.
Example: Java bytecode
3.
P-‐CODE
COMPILER
n P-code compilers (also called bytecode compilers) are very similar in concept to
interpreter
n The source program is analyzed and converted into an intermediate form, which is
then executed interpretively
n With a P-code compiler, this intermediate forms is the machine language for a
hypothetical machine, often called pseudo-machine
n P-code object programs can be executed on any machine that has a P-code
interpreter
3. SYSTEM
SOFTWARE
PCODE
n The P-code object program is often much smaller than a corresponding machine
code (native code) program would be
Fig 3.1 Translation and Execution Using a P-code Compiler
4. P-CODE MACHINE
Portable code machine is a virtual machine designed to execute p-code.
This term is applied both generically to all such machines (such as the Java Virtual Machine
and MATLAB precompiled code), and to specific implementations, the most famous being the p-
Machine of the Pascal-P system, particularly the UCSD Pascal implementation.
4.1. Machine architecture
The P-code machine is similar to a conventional computer in that it consists of a processor
and a memory. A major difference is that many operations performed by the processor involve the
stack, which is part of memory. For example a procedure call entails manipulating various factors
such as parameters and return addresses and these are held on the stack. The machine instructions,
called P-code, are stored in the memory and accessed in the normal manner.
The processor has a defined instruction set as well as five registers, which have distinct
functions for controlling the instructions and the stack areas within the memory.
4. SYSTEM
SOFTWARE
PCODE
The registers are:
. PC the program counter;
. SP the stack pointer
. MP the mark stack pointer;
. NP the new pointer;
. EP the extreme stack pointer.
The program counter, PC, is a pointer to the current instruction being executed. Other
registers are used to control the remaining area of memory - the data store.
The data store is divided into three areas as shown in Fig. 4.1.1
Fig 4.1.1
The constants area is generated by the assembler and accessed by the code.
The heap grows towards the low-numbered locations of store and the NP pointer points at
free heap space. Data is put onto the heap during a call to the new procedure and removed by
using mark and release. The stack area has a further internal structure and is used to hold stack
frames. A stack frame is generated and placed on the stack each time a procedure or function is
5. SYSTEM
SOFTWARE
PCODE
called (in the following section, except where explicitly stated, references to procedures also
include functions.) The first stack frame is the exception in that this belongs to the program block.
The stack frames on the stack are shown in Fig. 4.1.2.
Fig 4.1.2.
4.2. The Structure of Stack Frames
The stack frame format with its associated registers is shown in Fig 4.2.1
Fig 4.2.1
The mark stack carries the administrative details for executing and returning from
a procedure and for maintaining the necessary links to access variables in outer levels:
. The function value is simply an area used for returning the function value to the
6. SYSTEM
SOFTWARE
PCODE
calling block and so is only used by functions; in practice this area must be
capable of holding the largest possible value that may be returned by a
function;
. The static link is used as a pointer to access variables in outer blocks; it points to
the base of the stack frame of the procedure that textually encloses it;
. The dynamic link is the previous MP value so that the current stack frame may
be removed when the execution of the called procedure is complete;
. The previous EP value is similarly used for resetting the EP register when this
procedure returns;
. The return address enables program control to be returned to the calling
procedure.
The following example program contains a nested recursive procedure and
the schematic diagram shows the static and dynamic links on the stack at the time
the program halts. The static link of a procedure will always point to the stack
frame of the textually enclosing procedure (q will always point to p, p will always
point to the program); the dynamic link will always point to the stack frame of the
calling procedure (in this example, q will point sometimes to p and sometimes to
q):
program main(input,output);
var j: integer;
procedure p;
var i: integer;
procedure q;
begin
if i 0 then begin
i:=i-1; j := j+1;
q (*procedure q is called recursively here*)
end
else write(input, 0) (*execution error to halt program*)
end;
begin
i:=2;
q
end;
begin
j := 0;
p
end.
7. SYSTEM
SOFTWARE
PCODE
Fig 4.2.2
If p were called recursively before calling q, the static links of q would then
point to the most recent stack frame of p.
The extreme stack pointer, EP, points to the top of the current stack frame. It protects the
stack frame from NP, so that SP does not have to be checked each time it is incremented, and it
also marks the point where a new stack frame can be placed should a procedure be called from the
current one. The maximum possible size of each stack frame is known at compile time and this is
reflected by the value of EP.
The local stack is used for holding temporary values during the execution of a procedure.
For example the SBI instruction takes the two values from the top of this stack, subtracts them and
returns this value back to the stack. The stack pointer will be adjusted to reflect the fact that there
is now one value less on the stack.
8. SYSTEM
SOFTWARE
PCODE
4.3. Instruction set
A typical P-Code assembly instruction looks like OPC [T] [P] [Q]
. opc is a three letter opcode mnemonic.
. T is a letter from {A, B, C, I, R, S, etc} for type address, boolean, character, integer, real,
set, etc.
. P is usually an integer specifying a static level for addressing.
. Q is an integer or label.
P-Code instructions contain four fields, although not all are always used. The OP field must
be at least 7 bits long to contain the opcode. The T field must be at least 4 bits to contain the
operand type. The P field is at least 4 bits; usually it contains a static nesting level for describing
operand location. The Q field contains enough bits to address all of STORE or CODE. It usually
contains an absolute address, a data segment displacement, a jump destination address, or a short
constant.
4.3.1. Instructions for calling procedures and functions
There are three P-code instructions for building stack frames, MST and CUP are used
before entering the procedure and ENT is used for the first two instructions within the procedure.
The detailed operation of these instructions is as follows.
A typical calling sequence is
MST 0
[code to load parameters]
CUP 0 L 3
The operand of the MST (mark stack) instruction is an indication of the depth of nesting of
the given procedure and is defined as one plus the level of the calling procedure minus the level
of the called procedure. This is used for calculating the static link.
MST 0 means: the static link is the stack frame that called you; MST 1 means the static link
is the static link of the stack frame that called you; MST 2 would use the static link of that frame,
and so on.
The execution of MST creates values for the static and dynamic links and saves the EP. As
just explained, it assigns the value of the static link to point to the stack frame of the procedure
that textually encloses this one. The dynamic link is assigned the current value of the mark stack
pointer and the current value of the extreme stack pointer is stored. The stack pointer is
incremented so as to point at the parameter area. This is in readiness for subsequent instructions
which may load parameters.
9. SYSTEM
SOFTWARE
PCODE
The CUP (call user procedure) instruction sets the new value of MP and the link for the
return address and finally causes the jump to the procedure concerned.
At the start of the procedure there are two ENT instructions which define the overall size of
the stack frame and adjust SP and EP accordingly.
ENT 1 L 7
ENT 2 L 8
The labels L 7 and L 8 point to the values to be used by the ENT instructions.
The procedure execution may now proceed. At conclusion of the procedure, the RET
(return) instruction provides the mechanism for returning to the calling procedure and removing
stack frames.
The 'main' program block is handled in the same way as procedures as regards the stack,
except that there are four locations reserved above the mark stack for files. These locations have
fixed addresses as shown in Fig. 4.3.1.1, and are manipulated in the same way as global variables.
The files may be regarded as parameters to the program.
Fig 4.3.1.1
11. SYSTEM
SOFTWARE
PCODE
Fig 4.3.1.2
Key to effect on stack:
a address
b boolean
c character
i integer
r real
s set
x any of the above types
The c in instruction names is a single character denoting one of the primitive types A, B, C,
I, R, S, and matches the x in the effect column.
12. SYSTEM
SOFTWARE
PCODE
5. UCSD p-Machine
5.1. Architecture
Like many other p-code machines, the UCSD p-Machine is a stack machine, which means
that most instructions take their operands from the stack, and place results back on the stack.
Thus, the add instruction replaces the two topmost elements of the stack with their sum. A few
instructions take an immediate argument. Like Pascal, the p-code is strongly typed, supporting
boolean (b), character (c), integer (i), real (r), set (s), and pointer (a) types natively.
Some simple instructions:
Fig 5.1.1
5.2. Environment
Unlike other stack-based environments (such as the Java Virtual Machine) the p-System has
only one stack shared by procedure stack frames (providing return address, etc.) and the
arguments to local instructions.
Three of the machine's registers point into the stack (which grows upwards):
. SP points to the top of the stack (the stack pointer).
. MP marks the beginning of the active stack frame (the frame pointer).
. EP points to the highest stack location used in the current procedure .
Also present is a constant area, and, below that, the heap growing down towards the stack.
The NP register points to the top (lowest used address) of the heap. When EP gets greater than
NP, the machine's memory is exhausted.
The fifth register, PC, points at the current instruction in the code area.
13. SYSTEM
SOFTWARE
PCODE
5.3. Calling conventions
Stack frames look like this:
Fig 5.3.1
The procedure calling sequence works as follows:
The call is introduced with:
mst n
where n specifies the difference in nesting levels. This instruction will mark the stack, i.e. reserve
the first five cells of the above stack frame, and initialise previous EP, dynamic, and static link.
The caller then computes and pushes any parameters for the procedure, and then issues
cup n, p
to call a user procedure (n being the number of parameters, p the procedure's address). This will
save the PC in the return address cell, and set the procedure's address as the new PC.
User procedures begin with the two instructions:
ent 1, i
ent 2, j
The first sets SP to MP + i, the second sets EP to SP + j. So i essentially specifies the space
reserved for locals (plus the number of parameters plus 5), and j gives the number of entries
needed locally for the stack. Memory exhaustion is checked at this point.
Returning to the caller is accomplished via
retC
with C giving the return type (i, r, c, b, a as above, and p for no return value). The return value has
to be stored in the appropriate cell previously. On all types except p, returning will leave this
value on the stack.
Instead of calling a user procedure (cup), standard procedure q can be called with
csp q
These standard procedures are Pascal procedures like readln() (csp rln), sin() (csp sin)
etc.
14. SYSTEM
SOFTWARE
PCODE
Advantages of P-code compiler:
• Portability of software
It is not necessary for the compiler to generate different code for different computers.
• The source version of the compiler is compiled into p-code, this p-code can then be
interpreted on another computer.
• Easy to write a new compiler for each different machine.
• The p-code object program is much smaller than a corresponding machine code program.
Disadvantages:
The interpretive execution of a P-code program may be much slower than the
execution of the equivalent machine code.
REFERENCES:
• http://www.decompiler-vb.net/blog/post/The-truth-about-P-Code.aspx
• Pascal
Implementation
by
Steven
Pemberton
and
Martin
Daniels,
CHAPTER
10
http://homepages.cwi.nl/~steven/pascal/book/10pcode.html
• http://pascal-central.com/pcode.html#What%20Is%20P-Code
• http://www.itechtalk.com/thread135.html
• A Pascal P-Code Interpreter for the Stanford Emmy, by Donald Alpert,
Technical Note No .164.