Computer organization-and-architecture-questions-and-answers

QUESTIONS AND ANSWERS
FOR
COMPUTER ORGANIZATION
AND ARCHITECTURE
G.Appasami, M.Sc., M.C.A., M.Phil., M.Tech.,
Assistant Professor
Department of Computer Science and Engineering
Dr. Pauls Engineering Collage
Pauls Nagar, Villupuram
Tamilnadu, India.
SARUMATHI PUBLICATIONS
No. 109, Pillayar Kovil Street, Periya Kalapet
Pondicherry – 605014, India
Phone: 0413 – 2656368
Mobile: 9786554175, 8940872274

First Edition: July 2010
Second Edition: July 2011
Published By
© All rights reserved. No part of this publication can be reproduced or stored in any form or
by means of photocopy, recording or otherwise without the prior written permission of the
author.
Price Rs. 160/-
Copies can be had from
No. 109, Pillayar Kovil Street, Periya Kalapet
Phone: 0413 – 2656368
Mobile: 9786554175, 8940872274
Printed at
Meenam Offset

ANNA UNIEVRSITY SYLLABUS
CS 2253 COMPUTER ORGANIZATION AND ARCHITECTURE
(Common to CSE & IT)
1. BASIC STRUCTURE OF COMPUTERS 9
Functional units – Basic operational concepts – Bus structures – Performance and metrics –
Instructions and instruction sequencing – Hardware – Software Interface – Instruction set
architecture – Addressing modes – RISC – CISC. ALU design – Fixed point and floating point
operations.
2. BASIC PROCESSING UNIT 9
Fundamental concepts – Execution of a complete instruction – Multiple bus organization –
Hardwired control – Micro programmed control – Nano programming.
3. PIPELINING 9
Basic concepts – Data hazards – Instruction hazards – Influence on instruction sets – Data path and
control considerations – Performance considerations – Exception handling.
4. MEMORY SYSTEM 9
Basic concepts – Semiconductor RAM – ROM – Speed – Size and cost – Cache memories –
Improving cache performance – Virtual memory – Memory management requirements –
Associative memories – Secondary storage devices.
5. I/O ORGANIZATION 9
Accessing I/O devices – Programmed Input/Output -Interrupts – Direct Memory Access – Buses –
Interface circuits – Standard I/O Interfaces (PCI, SCSI, USB), I/O devices and processors.
TOTAL = 45

TABLE OF CONTENTS
UNIT 1: BASIC STRUCTURE OF COMPUTERS
1.1 Explain functional unit of a computer 1
1.2 Discuss basic operational concepts 4
1.3 Explain bus structure of a computer 6
1.4 Explain performance and metrics 9
1.5 Explain instruction and instruction sequencing 13
1.6 Discuss Hardware – Software Interface 19
1.7 Explain various addressing modes 22
1.8 Discuss RISC and CISC 25
1.9 Design ALU 29
UNIT 2: BASIC PROCESSING UNIT
2.1 Explain the process Fundamental concepts 31
2.2 Explain the process complete instruction execution 36
2.3 Discuss multiple bus organization 38
2.4 Discuss Hardwired control 40
2.5 Micro programmed control 43
2.6 Explain Nano programming 47
UNIT 3: PIPELINING
3.1 Explain Basic concepts of pipeline 49
3.2 Discus Data hazards 54
3.3 Discus Instruction hazards 58
3.4 Discus Influence on Instruction Sets 66
3.5 Explain Datapath and control considerations 69
3.6 Explain Performance consideration 71
3.7 Discuss Exception Handling 73

UNIT 4: MEMORY SYSTEM
4.1 Explain some basic concepts of memory system 76
4.2 Discuss Semiconductor RAM Memories 77
4.3 Discuss Read Only Memories (ROM) 87
4.4 Discuss speed, size and cost of Memories 89
4.5 Discuss about Cache memories 91
4.6 explain improving cache performance 96
4.7 Explain virtual memory 100
4.8 Explain Memory management requirements 104
4.9 Write about Associative memory 105
4.10 Discuss Secondary memory 108
UNIT 5: I/O ORGANIZATION
5.1 Discuss Accessing I/O Devices 118
5.2 Explain Program-controlled I/O 120
5.3 Explain Interrupts 121
5.4 Explain direct memory access (DMA) 129
5.5 Explain about Buses 134
5.6 Discuss interface circuits 139
5.7 Describe about Standard I/O Interface 147
5.8. Discuss I/O device 160
5.9 Discuss Processors 161
Appendix: Short answers for questions

G. APPASAMI / COA / CSE / DR. PAULS ENGINEERING COLLEGE 1
1.1 Explain functional unit of a computer.
A computer consists of five functionally independent main parts: input, memory, arithmetic
and logic, output, and control units, as shown in Figure 1.1. The input unit accepts coded
information from human operators, from electromechanical devices such as key boards, or from
other computers over digital communication lines. The information received is either stored in the
computer's memory for later reference or immediately used by the arithmetic and logic circuitry to
perform the desired operations. The processing steps are determined by a program stored in the
memory. Finally, the results are sent back to the outside world through the output unit. All of these
actions are coordinated by the control unit. Figure 1.1 does not show the connections among the
functional units. The arithmetic and logic circuits, in conjunction with the main control circuits, as
the processor, input and output equipment is often collectively referred to as the input-output (I/O)
unit.
Figure 1.1 Basic functional unit of a computer.
Input Unit
Computers accept coded information through input units, which read data. The most well-
known input device is the keyboard. Whenever a key is pressed, the corresponding letter or digit is
automatically translated into its corresponding binary code and transmitted over a cable to either
the memory or the processor.
Many other kinds of input devices are available, including joysticks, trackballs, and
mouses, which can be used as pointing device. Touch screens are often used as graphic input
devices in conjunction with displays. Microphones can be used to capture audio input which is
then sampled and converted into digital codes for storage and. processing. Cameras and scanners
are used as to get digital images.
Memory Unit
The function of the memory unit is to store programs and data. There are two classes of
storage, called primary and secondary.
Primary storage is a fast memory that operates at electronic speeds. Programs must be
stored in the memory while they are being executed. The memory contains a large number of
semiconductor storage cells, each capable of storing one bit of information. These cells are rarely
read or written as individual cells but instead are processed in groups of fixed size called words.
The memory is organized so that the contents of one word, containing n bits, can be stored or
retrieved in one basic operation.
Processor
Arithmetic
and
logic unit
Control unit
Input
unit
Output
unit
Memory unit

To provide easy access to any word in the memory, a distinct address is associated with
each word location. Addresses are numbers that identify successive locations. A given word is
accessed by specifying its address and issuing a control command that starts the storage or retrieval
process. The number of bits in each word is often referred to as the word length of the computer.
Typical word lengths range from 16 to 64 bits. The capacity of the memory is one factor that
characterizes the size of a computer.
Programs must reside in the memory during execution. Instructions and data can be written
into the memory or read out under the controller of the processor. It is essential to be able to access
any word location in the memory as quickly as possible. Memory in which any location can be
reached in a short and fixed amount of time after specifying its address is called random-access
Memory (RAM). The time required to access one word is called the Memory access time. This
time is fixed, independent of the location of the word being accessed. It typically ranges from a
few nanoseconds (ns) to about 100 ns for modem RAM units. The memory of a computer is
normally implemented as a Memory hierarchy of three or four levels of semiconductor RAM units
with different speeds and sizes. The small, fast, RAM units are called caches. They are tightly
coupled with the processor and are often contained on the same integrated circuit chip to achieve
high performance. The largest and slowest unit is referred to as the main Memory.
Although primary storage is essential, it tends to be expensive. Thus additional, cheaper,
secondary storage is used when large amounts of data and many programs have to be stored,
particularly for information that is access infrequently. A wide selection of secondary storage
devices is available, including magnetic disks and tapes and optical disks
Arithmetic and Logic Unit.
Most computer operations are executed in the arithmetic and logic unit (ALU) of the
processor. Consider a typical example: Suppose two numbers located in the memory are to be
added. They are brought into the processor, and the actual addition is carried out by the ALU. The
sum may then be stored in the memory or retained in the processor for immediate use.
Any other arithmetic or logic operation, for example, multiplication, division, or
comparison of numbers, is initiated by bringing the required operands into the processor, where the
operation is performed by the ALU. When operands are brought into the processor, they are stored
in high-speed storage elements called registers. Each register can store one word of data. Access
times to registers are somewhat faster than access times to the fastest cache unit in the memory h-
ierarchy.
The control and the arithmetic and logic units are many times faster than other devices
connected to a computer system. This enables a single processor to control a number of external
devices such as keyboards, displays, magnetic and optical disks, sensors, and mechanical
controllers.

Output Unit
The output unit is the counterpart of the input unit. Its function is to send processed results
to the outside world. The most familiar example of such a device is a printer. Printers employ
mechanical impact heads, inkjet streams, or photocopying techniques, as in laser printers, to
perform the printing. It is possible to produce printers capable of printing as many as 10,000 lines
per minute. This is a tremendous speed for a mechanical device but is still very slow compared to
the electronic speed of a processor unit.
Monitors, Speakers, Headphones and projectors are also some of the output devices. Some
units, such as graphic displays, provide both an output function and an input function. The dual
role of input and output of such units are referred with single name as I/O unit in many cases.
Speakers, Headphones and projectors are some of the output devices. Storage devices such as hard
disk, floppy disk, flash drives are also used for input as well as output.
Control Unit
The memory, arithmetic and logic, and input and output 'units store and process
information and perform input and output operations. The operation of these units must be
coordinated in some way. This is the task of the control unit. The control unit is effectively the
nerve center that sends control signals to other units and senses their states.
I/O transfers, consisting of input and output operations, controlled by the instructions of
I/O programs that identify the devices involved and the information to be transferred. However,
the actual timing signals that govern the transfers are generated by the control circuits. Timing
signals are signals that determine when a given action is to take place. Data transfers between the
processor and the memory are also controlled by the control unit through timing signals. It is
reasonable to think of a control unit as a well-defined, physically separate unit that interacts with
other parts of the machine. In practice, however, this is seldom the case. Much of the control
circuitry is physically distributed throughout the machine. A large set of control lines (wires)
carries the signals used for timing and synchronization of events in all units.
The operation of a computer can be summarized as follows:
1. The computer accepts information in the form of programs and data through an input unit
and stores it in the memory.
2. Information stored in the memory is fetched, under program control, into an arithmetic and
logic unit, where it is processed.
3. Processed information leaves the computer through an output unit.
4. All activities inside the machine are directed by the control unit.

1.2 Explain basic operations of a computer.
1.2 Discuss basic operational concepts.
The processor contains arithmetic logic unit (ALU), control circuitry unit (CU) and a
number of registers used for several different purposes. The instruction register (IR) holds the
instruction that is currently being executed. Its output is available to the control circuits, which
generate the timing signals that control the various processing elements involved in executing the
instruction.
The program counter (PC) is another specialized register. It keeps track of the execution of
a program. "It contains the memory address of the next instruction to be fetched and executed.
During the execution of an instruction, the contents of the PC are updated to correspond to the
address of the next instruction to be executed. It is customary to say that the PC points to the next
instruction that is to be fetched from the memory. Besides the IR and PC, Figure 1.2 shows n
general-purpose registers, R0 through Rn-1.
Figure 1.2 Connections between processor and memory
Finally, two registers facilitate communication with the memory. These are the memory
address register (MAR) and the memory data register (MDR). The MAR holds the address of the
location to be accessed. The MDR contains the data to be written into or read out of the addressed
location.
Let us now consider some typical operating steps. Programs reside in the memory and
usuaJ1y get there through the input unit Execution of the program starts when the PC is set to point
to the first instruction of the program. The contents of the PC are transferred to the MAR and a
Read control signal is sent to the memory. After the time required to access the memory elapses,
the addressed word (in this case, the first instruction of the program) is read out of the memory and
loaded into the MDR. Next, the contents of the MDR are transferred to the JR. At this point, the
instruction is ready to be decoded and executed.
Processor
MAR
Control
unit
Memory
ALU
n general purpose
registers
.
.
.
R0
MDR
Rn-1
R1
PC
IR

If the instruction involves an operation to be performed by the ALU, it is necessary to
obtain the required operands. If an operand resides in the memory (it could also be in a general-
purpose register in the processor), it has to be fetched by sending its address to the MAR and
initiating a Read cycle. When the operand has been read from the memory into the MDR, it is
transferred from the MDR to the ALU. After one or more operands are fetched in this way, the
ALU can perform the desired operation. If the result of this operation is to be stored in the
memory, then the result is sent to the MDR. The address of the location where the result is to be
stored is sent to the MAR, and a Write cycle is initiated. At some point during the execution of the
current instruction, the contents of the PC are incremented so that the PC points to the next
instruction to be executed. Thus, as soon as the execution of the current instruction is completed, a
new instruction fetch may be started.
To perform a given task, an appropriate program consisting of a list of instructions is stored
in the memory. Individual instructions are brought from the memory into the processor, which
executes the specified operations. Data to be used as operands are also stored in the memory. A
typical instruction may be
Add LOCA, R0
This instruction adds the operand at memory location LOCA to the operand in a register in
the processor, R0, and places the sum into register R0. The original contents of location LOCA are
preserved, whereas those of R0 are overwritten. This instruction requires the performance of
several steps. First, the instruction is fetched from the memory into the processor. Next, the
operand at LOCA is fetched and added to the contents of R0. Finally, the resulting sum is stored in
register R0.
Primary storage (or main memory or internal memory), often referred to simply as
memory, is the only one directly accessible to the Processor. The Processor continuously reads
instructions stored there and executes them as required. Any data actively operated on is also
stored there in uniform manner. The processor transfers data using MAR and MDR register using
control unit.

1.3 Explain bus structure of a computer.
Single bus structure
In computer architecture, a bus is a subsystem that transfers data between components
inside a computer, or between computers. Early computer buses were literally parallel electrical
wires with multiple connections, but Modern computer buses can use both parallel and bit serial
connections.
Figure 1.3.1 Single bus structure
To achieve a reasonable speed of operation, a computer must be organized so that all its
units can handle one full word of data at a given time. When a word of data is transferred between
units, all its bits are transferred in parallel, that is, the bits are transferred simultaneously over
many wires, or lines, one bit per line. A group of lines that serves as a connecting path for several
devices is called a bus. In addition to the lines that carry the data, the bus must have lines for
address and control purposes. The simplest way to interconnect functional units is to use a single
bus, as shown in Figure 1.3.1. All units are connected to this bus. Because the bus can be used for
only one transfer at a time, only two units can actively use the bus at any given time. Bus control
lines are used to arbitrate multiple requests for use of the bus. The main virtue of the single-bus
structure is its low cost and is flexibility for attaching peripheral" devices. Systems that contain
multiple buses achieve more concurrency in operations by allowing two or more transfers to be
carried out at the same time. This leads to better performance but at an increased cost.
Parts of a System bus
Processor, memory, Input and output devices are connected by system bus, which consists
of separate busses as shown in figure 1.3.2. They are:
(i)Address bus: Address bus is used to carry the address. It is unidirectional bus. The address is
sent to from CPU to memory and I/O port and hence unidirectional. It consists of 16, 20, 24 or
more parallel signal lines.
(ii)Data bus: Data bus is used to carry or transfer data to and from memory and I/O ports. They
are bidirectional. The processor can read on data lines from memory and I/O port and as well as it
can write data to memory. It consists of 8, 16, 32 or more parallel signal lines.
(iii)Control bus: Control bus is used to carry control signals in order to regulate the control
activities. They are bidirectional. The CPU sends control signals on the control bus to enable the
outputs of addressed memory devices or port devices. Some of the control signals are: MEMR
(memory read), MEMW (memory write), IOR (I/O read), IOW (I/O write), BR (bus request), BG
(bus grant), INTR (interrupt request), INTA (interrupt acknowledge), RST (reset), RDY (ready),
HLD (hold), HLDA (hold acknowledge),
MemoryProcessor Input Output
…

Figure 1.3.2 Bus interconnection scheme
The devices connected to a bus vary widely in their speed of operation. Some
electromechanical devices, such as keyboards and printers are relatively slow. Other devises like
magnetic or optical disks, are considerably faster. Memory and processor units operate at
electronic speeds, making them the fastest parts of a computer. Because all these devices must
communicate with each other over a bus, an efficient transfer mechanism that is not constrained by
the slow devices and that can be used to smooth out the differences in timing among processors,
memories, and external devices is necessary.
A common approach is to include buffer registers with the devices to hold the information
during transfers. To illustrate this technique, consider the transfer of an encoded character from a
processor to a character printer. The processor sends the character over the bus to the printer
buffer. Since the buffer is an electronic register, this transfer requires relatively little time. Once
the buffer is loaded, the printer can start printing without further intervention by the processor. The
bus and the processor are no longer needed and can be released for other activity. The printer
continues printing the character in its buffer and is not available for further transfers until this
process is completed. Thus, buffer registers smooth out timing differences among processors,
memories, and I/O devices. They prevent a high-speed processor from being locked to a slow I/O
device during a sequence of data transfers. This allows the processor to switch rapidly from one
device to another, interweaving its processing activity with data transfers involving several I/O
devices.
The Figure 1.3.3 shows traditional bus configurations and the Figure 1.3.4 shows high
speed bus configurations. The traditional bus connection uses three buses: local bus, system
bus and expanded bus. The high speed bus configuration uses high-speed bus along with the
three buses used in the traditional bus connection. Here, cache controller is connected to high-
speed bus. This bus supports connection to high-speed LANs, such as Fiber Distributed Data
Interface (FDDI), video and graphics workstation controllers, as well as interface controllers to
local peripheral including SCSI.
MemoryProcessor Input Output
…
Control
bus
Data
bus
Address
buss
System
bus

Figure 1.3.3 Traditional bus configuration
Figure 1.3.4 High speed bus configuration
Main
Memory
Processor Cache
Local I/O
controller
Local bus
System bus
Expansion bus
SCSI Network
Expansion
bus interface
Modem Serial
Main
Memory
Processor Cache
Local I/O
controller
Local bus
System bus
High Speed bus
Expansion bus
SCSI Video Graphics LAN
Expansion
bus interfaceFAX Modem Serial
Traditional
bus
High
Speed
bus

1.4 Explain performance and metrics
The most important measure of the performance of a computer is how quickly it can execute
programs. The speed with which a computer executes programs is affected by the design of its
hardware and its machine language instructions. Because programs are usually written in a high-level
language, performance is also affected by the compiler that translates programs into machine language.
For best performance, it is necessary to design the compiler, the machine instruction set, and the
hardware in a coordinated way. The operating system overlaps processing, disk transfers, and printing
for several programs to make the best possible use of the resources available. The total time required to
execute the program is called elapsed time in operating system. This elapsed time is a measure of the
performance of the entire computer system. It is affected by the speed of the processor, the disk, and
the printer.
CACHE MEMORY
Just as the elapsed time for the execution of a program depends on all units in a computer
system, the processor time depends on the hardware involved in the execution of individual machine
instructions. This hardware comprises the processor and the memory, which are usually connected by a
bus, as shown in Figure 1.3.1. The pertinent parts of this figure are repeated in Figure 1.4, including the
cache memory as part of the processor unit. Let us examine the flow of program instructions and data
between the memory and the processor. At the start of execution, all program instructions and the
required data are stored in the main memory. As execution proceeds, instructions are fetched one by
one over the bus into the processor, and a copy is placed in the cache. When the execution of an
instruction calls for data located in the main memory, the data are fetched and a copy is placed in the
cache. Later, if the same instruction or data item is needed a second time, it is read directly from the
cache.
Figure 1.4 Processor Cache
The processor and a relatively small cache memory can be fabricated on a single integrated
circuit chip. The internal speed of performing the basic steps of instruction, processing on such chips is
very high and is considerably faster than the speed at which instructions and data can be fetched from
the main memory. A program will be executed faster if the movement of instructions and data between
the main memory and the processor is minimized, which is achieved by using the cache. For example,
suppose a number of instructions are executed repeatedly over a short period of time, as happens in a
program loop. If these instructions are available in the cache, they can be fetched quickly during the
period of repeated use.
PROCESSOR CLOCK
Processor circuits are controlled by a timing signal called a clock. The clock defines regular
time intervals, called clock cycles. To execute a machine instruction, the processor divides the action to
be performed into a sequence of basic steps, such that each step can be completed in one clock cycle.
The length P of one clock cycle is an important parameter that affects processor performance. Its
inverse is the clock rate, R = 1/ P, which is measured in cycles per second. Processors used in today's
personal computers and workstations have clock rates that range from a few hundred million to over a
billion cycles per second. In standard electrical engineering terminology, the term "cycles per second"
Main
Memory Processor
Cache
Memory
System bus

is called hertz (Hz). The term "million" is denoted by the prefix Mega (M), and "billion" is denoted by
the prefix Giga (G). Hence, 500 million cycles per second is usually abbreviated to 500 Megahertz
(MHz), and 1250 million cycles per second is abbreviated to 1.25 Gigahertz (GHz). The corresponding
clock periods are 2 and 0.8 nanoseconds (ns), respectively.
BASIC PERFORMANCE EQUATION
Let T be the processor time required to execute a program that has been prepared in some high-
level language. The compiler generates a machine language object program that corresponds to the
source program. Assume that complete execution of the program requires the execution of N machine
language instructions. The number N is the actual number of instruction executions, and is not
necessarily equal to the number of machine instructions in the object program. Some instructions may
be executed more than once, which is the case for instructions inside a program loop. Others may not
be executed at all, depending on the input data used. Suppose that the average number of basic steps
needed to execute one machine instruction is S, where each basic step is completed in one clock cycle.
If the clock rate is R cycles per second, the program execution time is
(1.1)
This is often referred to as the basic performance equation. The performance parameter T for
an application program is much more important to the user than the individual values of the parameters
N, S, or R. To achieve high performance, the computer designer must seek ways to reduce the value of
T, which means reducing N and S, and increasing R. The value of N is reduced if the source program is
compiled into fewer machine instructions. The value of S is reduced if instructions have a smaller
number of basic steps to perform or if the execution of instructions is overlapped. Using a higher-
frequency clock increases the value or R, which means that the time required to complete a basic
execution step is reduced.
The N, S, and R are not independent parameters; changing one may affect another. Introducing
a new feature in the design of a processor will lead to improved performance only if the overall result
is to reduce the value of T. A processor advertised as having a 900-MHz clock does not necessarily
provide better performance than a 700-MHz processor because it may have a different value of S.
PIPELINING AND SUPERSCALAR OPERATION
Usually in sequential execution, the instructions are executed one after another. Hence, the
value of S is the total number of basic steps, or clock cycles, required to execute an instruction. A
substantial improvement in performance can be achieved by overlapping the execution of successive
instructions, using a technique called pipelining. Consider the instruction
Add Rl,R2,R3
Which adds the contents of registers R 1 and R2, and places the sum into R3. The contents of
Rl and R2 are first transferred to the inputs of the ALU. After the add operation is performed, the sum
is transferred to R3. The processor can read the next instruction from the memory while the addition o-
ration is being performed. Then, if that instruction also uses the ALU, its operands can be transferred to
the ALU inputs at the same time that the result of the Add instruction is being transferred to R3. In the
ideal case, if all instructions are overlapped to the maximum degree possible, execution proceeds at the
rate of one instruction completed in each clock cycle. Individual instructions still require several clock
cycles to complete. But, for the purpose of computing T, the effective value of S is 1.
The ideal va1ue S = 1 cannot be attained in practice for a variety of reasons. However,
pipelining increases the rate of executing instructions significantly and causes the effective value of S
to approach 1.

A higher degree of concurrency can be achieved if multiple instruction pipelines are
implemented in the processor. This means that multiple functional units are used, creating parallel
paths through which different instructions can be executed in parallel. With such an arrangement, it
becomes possible to start the execution of several instructions in every clock cycle. This mode of
operation is called superscalar execution. If it can be sustained for a long time during program
execution, the effective value of S can be reduced to less than one. Of course, parallel execution must
preserve the logical correctness of programs, that is, the results produced must be the same as those
produced by serial execution of program instructions. Many of today's high-performance processors are
designed to operate in this manner.
CLOCK RATE
There are two possibilities for increasing the clock rate, R. First, improving the integrated-
circuit (IC) technology makes logic circuits faster, which reduces the time needed to complete a basic
step. This allows the clock period, P, to be reduced and the clock rate, R, to be increased. Second,
reducing the amount of processing done in one basic step also makes it possible to reduce the clock
period, P. However, if the actions that have to be performed by an instruction remain the same, the
number of basic steps needed may increase.
Increases in the value of R that are entirely caused by improvements in IC technology affect all
aspects of the processor's operation equally with the exception of the time it takes to access the main
memory. In the presence of a cache, the percentage of accesses to the main memory is small. Hence,
much of the performance gain expected from the use of faster technology can be realized. The value of
T will be reduced by the same factor as R is increased because S and N are not affected.
INSTRUCTION SET: CISC AND RISC
Simple instructions require a small number of basic steps to execute. Complex Instructions
involve a large number of steps. For a processor that has only simple instructions, a large number of
instructions may be needed to perform a given programming task. This could lead to a large value for
N and a small value for S. On the other hand, if individual instructions perform more complex
operations, fewer instructions will be needed, leading to a lower value of N and a larger value of S. It is
not obvious if one choice is better than the other.
A key consideration in comparing the two choices is the use of pipelining. We pointed out
earlier that the effective value of S in a pipelined processor is I close to 1 even though the number of
basic steps per instruction may be considerably larger. This seems to imply that complex instructions
combined with pipelining would achieve the best performance. However, it is much easier to
implement efficient pipelining in processors with simple instruction sets. The suitability of the
instruction set for pipelined execution is an important and often deciding consideration.
The terms RISC and CISC refer to design principles and techniques. Reduced instruction set
computing (RISC), is a CPU design strategy based on the insight that simplified (as opposed to
complex) instructions can provide higher performance if this simplicity enables much faster execution
of each instruction. A complex instruction set computer (CISC) is a computer where single instructions
can execute several low-level operations (such as a load from memory, an arithmetic operation, and a
memory store) and/or are capable of multi-step operations or addressing modes within single
instructions.
COMPILER
A compiler translates a high-level language program into a sequence of machine instructions.
To reduce N, we need to have a suitable machine instruction set and a compiler that makes good use of
it. An optimizing compiler takes advantage of various features of the target processor to reduce the
product N x S, which is the total number of clock cycles needed to execute a program. The number of
cycles is dependent not only on the choice of instructions, but also on the order in which they appear in
the program. The compiler may rearrange program instructions to achieve better performance. Of
course, such changes must not affect the result of the computation.

Superficially, a compiler appears as a separate entity from the processor with which it is used
and may even be available from a different vendor. However, a high quality compiler must be closely
linked to the processor architecture. The compiler and the processor are often designed at the same
time, with much interaction between the designers to achieve best results. The ultimate objective is to
reduce the total number of clock cycles needed to perform a required programming task.
PERFORMANCE EQUATION
It is important to be able to assess the performance of a computer. Computer designers use
performance estimates to evaluate the effectiveness of new features. Manufacturers use performance
indicators in the marketing process. Buyers use such data to choose among many available computer
models.
The previous discussion suggests that the only parameter that properly describes the -
performance of a computer is the execution time, T, for the programs of interest. Despite the
conceptual simplicity of Equation 1.1, computing the value of T is not simple. Moreover, parameters
such as the clock spee4 and various architectural features are not reliable indicators of the expected
performance.
For these reasons, the computer community adopted the idea of measuring computer
performance using benchmark programs. To make comparisons possible, standardized programs must
be used. The performance measure is the time it takes a computer to execute a given benchmark.
Initially, some attempts were made to create artificial programs that could be used as standard
benchmarks. But, synthetic programs do not properly predict performance obtained when real
application programs are nun.
A nonprofit organization called System Performance Evaluation Corporation (SPEC) selects
and publishes representative application programs for different application domains, together with test
results for many commercially available computers. For general-purpose computers, a suite of
benchmark programs was selected in 1989. It was modified somewhat and published in 1995 and again
in 2000. For SPEC2000, the reference computer is an Ultra SPARCI0 workstation with a 300-MHz
UltraSPARC-III processor.
The SPEC rating is computed as follows
Thus a SPEC rating of 50 means that the computer under test is 50 times as fast as the
UltraSPARC I0 for this particular benchmark. The test is repeated for all the programs in the SPEC
suite, and the geometric mean of the results is computed. Let SPECi be the rating for program i in the
suite. The overall SPEC rating for the computer is given by
where n is the number of programs in the suite. Because the actual execution time is measured,
the SPEC rating is a measure of the combined effect of all factors affecting performance, including the
compiler, the operating system, the processor, and the memory of the computer being tested.

1.5 Explain instruction and instruction sequencing
A computer program consist of a sequence of small steps, such as adding two numbers,
testing for a particular condition, reading a character from the keyboard, or sending a character to
be displayed on a display screen.
A computer must have instructions capable of performing four types of operations:
· Data transfers between the memory and the processor registers
· Arithmetic and logic operations on data
· Program sequencing and control
· I/O transfers
REGISTER TRANSFER NOTATION
In general, the information is transferred from one location to another in a computer.
Possible locations for such transfers are memory locations, processor registers, or registers in the
I/O subsystem.
For example, Names for the addresses of memory locations may be LOC, PLACE, A,
VAR2; processor register names may be RO, R5; and I/O register names may be DATAIN,
OUTSTATUS, and so on. The contents of a location are denoted by placing square brackets
around the name of the location.
Thus, the expression
means that the contents of memory location LOC are transferred into processor register R1.
As another example, consider the operation that adds the contents of registers R1 and R2,
and then places their sum into register R3. This action is indicated as
This type of notation is known as Register Transfer Notation (RTN). .Note that the right
hand side of an RTN expression always denotes a value, and the left-hand side is the name of a
location where the value is to be placed, overwriting the old contents of that location.
ASSEMBLY TRANSFER NOTATION
The another type of notation to represent machine instructions and programs is assembly transfer
notation. For example, an instruction that causes the transfer described above, from memory location LOC
to processor register RI, is specified by the statement.
Move LOC,Rl
The contents of LOC are unchanged by the execution of this instruction, but the old contents of register R1
are overwritten.
The second example of adding two numbers contained in processor registers Rl and R2 and placing
their sum in R3 can be specified by the assembly language statement
Add RI,R2,R3
R1 [R1] + [R2]
R1 [LOC]

BASIC INSTRUCTION TYPES
Instruction types are classified based on the number of operands used in instructions. They are:
(i). Three - Address Instruction,
(ii). Two - Address Instruction,
(iii). One - Address Instruction, and
(iv). Zero - Address Instruction.
Consider a high level language program command, which adds two variables A and B, and assign
the sum in third variable C.
C = A + B ;
To carry out this action, the contents of memory locations A and B are fetched from the memory
and transferred into the processor where their Sum is computed. This result is then sent back to the memory
and stored in location C.
C ← [A] + [B]
Example: Add two variables A and B, and assign the sum in third variable C
(i). Three - Address Instruction
The general instruction format is: Operation Source1,Source2,Destination
Symbolic add instruction: ADD A,B,C
Operands A and B are called the source operands, C is called Destination operand, and Add is the
operation to be performed on the operands.
If k bits are needed to specify the memory address of each operand, the encoded form of the above
instruction must contain 3k bits for addressing purposes in addition to the bits needed to denote the Add
operation. For a modem processor with a 32-bit address space, a 3-address instruction is too large to fit in
one word for a reasonable word length. Thus, a format that allows multiple words to be used for a single
instruction would be needed to represent an instruction of this type. An alternative approach is to use a
sequence of simpler instructions to perform the same task, with each instruction having only one or two
operands. Suppose that two-address instructions of the form
(ii). Two - Address Instruction
The general instruction format is: Operation Source,Destination
Symbolic add instruction: MOVE B,C
ADD A,C
An Add instruction of this type is ADD A,B which performs the operation B ←
[A] + [B] When the sum is calculated, the result is sent to the memory and stored in location B,
replacing the original contents of this location. This means that operand B is both a source and a
destination.
A single two-address instruction cannot be used to solve our original problem, which is to
add the contents of locations A and B, without destroying either of them, and to place the sum in
location C. The problem can be solved by using another two- address instruction that copies the
contents of one memory location into another. Such an instruction is Move B,C which performs
the operation C ← [B], leaving the contents of location B unchanged. The word "Move" is a
misnomer here; it should be "Copy." However, this instruction name is deeply entrenched in
computer nomenclature. The operation C ← [A] + [B] can now be performed by the two-
instruction sequence
Move B,C
Add A,C
Even two-address instructions will not normally fit into one word for usual word lengths
and address sizes. Another possibility is to have machine instructions that specify only one

memory operand. when a second operand is needed, as in the case of an Add instruction, it is
understood implicitly to be in a unique location. A processor register, usually called the
accumulator, may be used for this purpose. Thus, the one-address instruction
(iii). One - Address Instruction
The general instruction format is: Operation operand
Symbolic add instruction: LOAD A
ADD B
STORE C
ADD A means the following: Add the contents of memory location A to the contents of the
accumulator register and place the sum back into the accumulator. Let us also introduce the one-
address instructions Load A and Store A
The Load instruction copies the contents of memory location A into the Accumulator, and
the Store instruction copies the contents of the accumulator into memory location A. Using only
one-address instructions, the operation C ← [A] + [B] can be performed by executing the sequence
of instructions Load A, Add B, Store C
Note that the operand specified in the instruction may be a source or a destination,
depending on the instruction. In the Load instruction, address A specifies the source operand, and
the destination location, the accumulator, is implied. On the other hand, C denotes the destination
location in the Store instruction, whereas the source, the accumulator, is implied.
INSTRUCTION EXECUTION AND STRAIGHT LINE SEQUENCING
To perform a particular task on the computer, it is programmer’s job to select and write
appropriate instructions one after other, i.e. programmer has to write instructions in a proper
sequence. This job of the programmer is known as instruction sequencing. The instructions written
in a proper sequence to execute a particular task is called program.
Figure 1.5.2 Basic instruction cycle
The complete instruction cycle involves three operations: Instruction fetching, opcode
decoding and instruction execution as shown in figure 1.5.1.
Processor executes a program with the help of program counter (PC). PC holds the address
of the instruction to be executed next. To begin execution of a program, the address of its first
instruction is placed into the PC. Then, the processor control circuits use the information (address
Fetch the next instruction
START
Decode instruction
Execute instruction
START
Fetch cycle
Decode cycle
Execute cycle

of memory) in the PC to fetch and execute instructions, one at a time, in the order of increasing
addresses. This is called straight-line sequencing. During the execution of instruction, the PC is
incremented by the length of the current instruction in execution. For example, if currently
executing instruction length is 4 bytes, then PC is incremented by 4 so that it points to instruction
to be executed next.
Consider the task C ← [A] + [B] for illustration. Figure 1.5.2 shows a possible program
segment for this task as it appears in the memory of a computer. Assume the word length is 32 bits
and the memory is byte addressable. The three instructions of the program are in successive word
locations, starting at location i. Since each instruction is 4 bytes long, the second and third
instructions start at addresses i + 4 and i + 8.
Figure 1.5.2 A program for C ← [A] + [B]
The processor contains a register called the program counter (PC), which holds the address
of the instruction to be executed next. To begin executing a program, the address of its first
instruction (i in our example) must be placed into the PC. Then, the processor control circuits use
the information in the PC to fetch and execute instructions, one at a time, in the order of increasing
addresses. This is called straight-line sequencing. During the execution of each instruction, the PC
is incremented by 4 to point to the next instruction. Thus, after the Move instruction at location i +
8 is executed, the PC contains the value i + 12, which is the address of the first instruction of the
next program segment.
Executing a given instruction is a two-phase procedure. In the first phase, called instruction
fetch, the instruction is fetched from the memory location whose address is in the PC. This
instruction is placed in the instruction register (IR) in the processor. At the start of the second
phase, called instruction execute, the instruction in IR is examined to determine which operation is
to be performed. The specified operation is then performed by the processor. This often involves
fetching operands from the memory or from processor registers, performing an arithmetic or logic
operation, and storing the result in the destination location. At some point during this two-phase
procedure, the contents of the PC are advanced to point to the next instruction. When the execute
phase of an instruction is completed, the PC contains the address of the next instruction, and a new
instruction fetch phase can begin. In most processors, the execute phase itself is divided into a

small number of distinct phases corresponding to fetching operands, performing the operation, and
storing the result.
BRANCHING
Consider the task of adding a list of n numbers. The program outlined in Figure 1.5.3 is a
generalization of the program in Figure 1.5.2. The addresses of the memory locations containing
the n numbers are symbolically given as NUMl, NUM2, ..., NUMn, and a separate Add instruction
is used to add each number to the contents of register R0. After all the numbers have been added,
the result is placed in memory location SUM. Instead of using a long list of Add instructions, it is
possible to place a single Add instruction in a program loop, as shown in Figure 1.5.4. The loop is
a straight-line sequence of instructions executed as many times as needed. It starts at location
LOOP and ends at the instruction Branch>O. During each pass through this loop, the address of
the next list entry is determined, and that entry is fetched and added to R0.
Figure 1.5.3 A straight line program Figure 1.5.4 A program using a loop
for adding n numbers for adding n numbers
Assume that the number of entries in the list, n, is stored in memory location N, as shown.
Register Rl is used as a counter to determine the number of times the loop is executed. Hence, the
contents of location N are loaded into register Rl at the beginning of the program. Then, within the
body of the loop, the instruction Decrement Rl Reduces the contents of Rl by 1 each time through
the loop. (A similar type of operation is performed by an Increment instruction, which adds 1 to its
operand.) Execution of the loop is repeated as long as the result of the decrement operation is
greater than zero.
This type of instruction loads a new value into the program counter. As a result, the
processor fetches and executes the instruction at this new address, called the branch target, instead
of the instruction at the location that follows the branch instruction in sequential address order. A
conditional branch instruction causes a branch only if a specified condition is satisfied. If the
condition is not satisfied, the PC is incremented in the normal way, and the next instruction in
sequential address order is fetched and executed.

In the program in Figure 1.5.4, the instruction Branch>0 LOOP (branch if greater than 0)
is a conditional branch instruction that causes a branch to location LOOP if the result of the
immediately preceding instruction, which is the decremented value in register Rl, is greater than
zero. This means that the loop is repeated as long as there are entries in the list that are yet to be
added to R0. At the end of the nth pass through the loop, the Decrement instruction produces a
value of zero, and, hence, branching does not occur. Instead, the Move instruction is fetched and
executed. It moves the final result from R0 into memory location SUM.
CONDITION CODES
The processor keeps track of information about the results of various operations for use by
subsequent conditional branch instructions. This is accomplished by recording the required
information in individual bits, often called condition code flags. These flags are usually grouped
together in a special processor register called the condition code register or status register.
Individual condition code flags are set to 1 or cleared to 0, depending on the outcome of the
operation performed.
Four commonly used flags are
N (negative) Set to 1 if the result is negative; otherwise, cleared to 0
Z (zero) Set to 1 if the result is 0; otherwise, cleared to 0
V (overflow) Set to 1 if arithmetic overflow occurs; otherwise, cleared to 0
C (carry) Set to 1 if a carry-out results from the operation; otherwise, cleared to 0
The N and Z flags indicate whether the result of an arithmetic or logic operation is negative
or zero. The N and Z flags may also be affected by instructions that transfer data, such as Move,
Load, or Store. This makes it possible for a later conditional branch instruction to cause a branch
based on the sign and value of the operand that was moved. Some computers also provide a special
Test instruction that examines a value in a register or in the memory and sets or clears the N and Z
flags accordingly.
GENERATION OF MEMORY ADDRESSES
Different addressing modes give rise to the need for flexible ways to specify the address of
an operand. The instruction set of a computer typically provides a number of such methods, called
addressing modes. While the details differ from one computer to another, the underlying concepts
are the same.

1.6 Discuss Hardware – Software Interface
Hardware
Computer hardware is the collection of physical elements that comprise a computer system.
Example Processor, memory, hard disk, floppy disk, keyboard, mouse, monitors, printers and so
on.
Software
Computer software, or just software, is a collection of computer programs and related data
that provides the instructions for telling a computer what to do and how to do it. In other words,
software is a conceptual entity which is a set of computer programs, procedures, and associated
documentation concerned with the operation of a data processing system. We can also say software
refers to one or more computer programs and data held in the storage of the computer for some
purposes. In other words software is a set of programs, procedures, algorithms and its
documentation. Program software performs the function of the program it implements, either by
directly providing instructions to the computer hardware or by serving as input to another piece of
software. The term was coined to contrast to the old term hardware (meaning physical devices). In
contrast to hardware, software "cannot be touched". Software is also sometimes used in a more
narrow sense, meaning application software only.
Types of software
System software
System software provides the basic functions for computer usage and helps run the computer
hardware and system. It includes a combination of the following:
1. Device drivers:
A device driver or software driver is a computer program allowing higher-level computer
programs to interact with a hardware device.
2. Operating systems:
An operating system (OS) is a set of programs that manage computer hardware resources
and provide common services for application software. The operating system is the most important
type of system software in a computer system. A user cannot run an application program on the
computer without an operating system, unless the application program is self booting.
The main functions of an OS are Device management, Storage management, User
interface, Memory management and Processor management
3. Servers:
A server is a computer program running to serve the requests of other programs, the
"clients". Thus, the "server" performs some computational task on behalf of "clients". The clients
either run on the same computer or connect through the network.
4. Utilities:
Utility software is system software designed to help analyze, configure, optimize or
maintain a computer. A single piece of utility software is usually called a utility or tool. Utility
software usually focuses on how the computer infrastructure (including the computer hardware,
operating system, application software and data storage) operates.

5. Window systems:
A windowing system (or window system) is a component of a graphical user interface
(GUI), and more specifically of a desktop environment, which supports the implementation of
window managers, and provides basic support for graphics hardware, pointing devices such as
mice, and keyboards. The mouse cursor is also generally drawn by the windowing system.
System software is responsible for managing a variety of independent hardware
components, so that they can work together harmoniously. Its purpose is to unburden the
application software programmer from the often complex details of the particular computer being
used, including such accessories as communications devices, printers, device readers, displays and
keyboards, and also to partition the computer's resources such as memory and processor time in a
safe and stable manner.
Programming software
Programming software usually provides tools to assist a programmer in writing computer
programs, and software using different programming languages in a more convenient way. The
tools include:
1. Compilers:
A compiler is a computer program (or set of programs) that transforms source code written
in a programming language (the source language) into another computer language (the target
language, often having a binary form known as object code). The most common reason for
wanting to transform source code is to create an executable program.
The name "compiler" is primarily used for programs that translate source code from a high-
level programming language to a lower level language (e.g., assembly language or machine code).
If the compiled program can run on a computer whose CPU or operating system is different from
the one on which the compiler runs, the compiler is known as a cross-compiler. A program that
translates from a low level language to a higher level one is a decompiler.
A compiler is likely to perform many or all of the following operations: lexical analysis,
preprocessing, parsing, semantic analysis (Syntax-directed translation), code generation, and code
optimization.
2. Debuggers:
A debugger or debugging tool is a computer program that is used to test and debug other
programs (the "target" program). The code to be examined might alternatively be running on an
instruction set simulator (ISS), a technique that allows great power in its ability to halt when
specific conditions are encountered but which will typically be somewhat slower than executing
the code directly on the appropriate (or the same) processor. Some debuggers offer two modes of
operation - full or partial simulation, to limit this impact.
3. Interpreters:
An interpreter normally means a computer program that executes Program by converting
source code to object code line by line.
4. Linkers:
A linker or link editor is a program that takes one or more objects generated by a compiler
and combines them into a single executable program.

5. Text editors:
A text editor is a type of program used for editing plain text files.Text editors are often
provided with operating systems or software development packages, and can be used to change
configuration files and programming language source code.
An Integrated development environment (IDE) is a single application that attempts to manage all
these functions.
Application software
Application software is developed to perform in any task that benefits from computation. It is a
broad category, and encompasses software of many kinds, including the internet browser being
used to display this page. This category includes:
Business software
Computer-aided design
Databases
Decision making software
Educational software
Image editing
Industrial automation
Mathematical software
Medical software
Simulation software
Spreadsheets
Word processing
Hardware – Software Interface
It is an interface tool and it to a point of interaction between components, and is applicable
at the level of both hardware and software. This allows a component, whether a piece of hardware
such as a graphics card or a piece of software such as an Internet browser, to function
independently while using interfaces to communicate with other components via an input/output
system and an associated protocol.
All Device drivers, Operating systems, Servers, Utilities, Window systems, Compilers,
Debuggers, Interpreters, Linkers and Text editors are considered as Harware – Software Interfaces.

1.7 Explain various addressing modes
A program operates on data that reside in the computer's memory. These data can be
organized in a variety of ways. For example to record marks in various courses, we may organize
this information in the form of a table. Programmers use organizations called data structures to
represent the data used in computations. These include lists, linked lists, arrays, queues, and so on.
Programs are normally written in a high-level language, which enables the programmer to
use constants, local and global variables, pointers, and arrays. When translating a high-level
language program into assembly language, the compiler must be able to implement these
constructs using the facilities provided in the instruction set of the computer in which the program
will be run. The different ways in which the location of an operand is specified in an instruction
are referred to as addressing modes.
IMPLEMENTATION OF VARIABLES AND CONSTANTS
Variables and constants are the simplest data types and are found in almost every computer
program. A variable is represented by allocating a register or a memory location to hold its value.
Thus, the value can be changed as needed using appropriate instructions.
1. Register addressing mode - The operand is the contents of a processor register; the name
(address) of the register is given in the instruction.
Example: MOVE R1,R2
This instruction copies the contents of register R2 to R1.
2. Absolute addressing mode - The operand is in a memory location; the address of this location
is given explicitly in the instruction. (In some assembly languages, this mode is called Direct.)
Example: MOVE LOC,R2
This instruction copies the contents of memory location of LOC to register R2.
3. Immediate addressing mode - The operand is given explicitly in the instruction.
Example: MOVE #200 , R0
The above statement places the value 200 in the register R0. A common convention is to
use the sharp sign (#) in front of the value to indicate that this value is to be used as an immediate
operand.
INDIRECTION AND POINTERS
In the addressing modes that follow, the instruction does not give the operand or its address
explicitly. Instead, it provides information from which the memory address of the operand can be
determined. We refer to this address as the effective address (EA) of the operand.
4. Indirect addressing mode - The effective address of the operand is the contents of a register or
memory location whose address appears in the instruction.
Example Add (R2),R0
Register R2 is used as a pointer to the numbers in the list, and the operands are accessed indirectly
through R2. The initialization section of the program loads the counter value n from memory
location N into Rl and uses the Immediate addressing mode to place the address value NUM 1,
which is the address of the first number in the list, into R2.
INDEXING AND ARRAY
It is useful in dealing with lists and arrays.
5. Index mode - The effective address of the operand is generated by adding a constant value to
the contents of a register.

The register used may be either a special register provided for this purpose, or, more
commonly; it may be anyone of a set of general-purpose registers in the processor. In either case, it
is referred to as an index register. We indicate the Index mode symbolically as X(Ri).
Where X denotes the constant value contained in the instruction and Ri is the name of the
register involved. The effective address of the operand is given by EA = X + [Ri]. The contents of
the index register are not changed in the process of generating the effective address.
RELATIVE ADDRESSING
We have defined the Index mode using general-purpose processor registers. A useful
version of this mode is obtained if the program counter, PC, is used instead of a general purpose
register. Then, X(PC) can be used to address a memory location that is X bytes away from the
location presently pointed to by the program counter. Since the addressed location is identified
''relative'' to the program counter, which always identifies the current execution point in a program,
the name Relative mode is associated with this type of addressing.
6.Relative mode - The effective address is determined by the Index mode using the program
counter in place of the general-purpose register Ri.
This mode can be used to access data operands. But, its most common use is to specify the target
address in branch instructions. An instruction such as Branch>O LOOP causes program
execution to go to the branch target location identified by the name LOOP if the branch condition
is satisfied. This location can be computed by specifying it as an offset from the current value of
the program counter. Since the branch target may be either before or after the branch instruction,
the offset is given as a signed number.
ADDITIONAL MODES
The two additional modes described are useful for accessing data items in successive
locations in the memory.
7. Autoincrement mode - The effective address of the operand is the contents of a register
specified in the instruction. After accessing the operand, the contents of this register are
automatically incremented to point to the next item in a list.
We denote the Autoincrement mode by putting the specified register in parentheses, to
show that the contents of the register are used as the effective address, followed by a plus sign to
indicate that these contents are to be incremented after the operand is accessed. Thus, the
Autoincrement mode is written as (Ri) +
As a companion for the Autoincrement mode, another useful mode accesses the items of a list in
the reverse order:
8. Autodecrement mode - The contents of a register specified in the instruction is first
automatically decremented and is then used as the effective address of the operand.
We denote the Autodecrement mode by putting the specified register in parentheses, preceded by a
minus sign to indicate that the contents of the register are to be decremented before being used as
the effective address. Thus, we write - (Ri)
Table 1.1 Generic addressing modes

Name Assembler syntax Addressing function
Immediate #Value Operand = Value
Register Ri EA = Ri
Absolute (Direct) LOC EA = LOC
Indirect (Ri) EA = [Ri]
(LOC) EA = [LOC]
Index X(Ri) EA = [Ri] + X
Base with index (Ri,Rj) EA = [Ril + [Rj]
Base with index X(Ri,Rj) EA = [Ri] + [Rj] + X
and offset
Relative X(PC) EA = [PC] +X
Autoincrement (Ri)+ EA = [Ri]; Increment Ri
Autodecrement -(Ri) Decrement Ri; EA = [Ri]
EA = effective address Value = a signed number

1.8 Discuss RISC and CISC
In recent years, the boundary between RISC and CISC architectures has been blurred. Future
processors may be designed with features from both RISC and CISC types.
Architectural description of RISC
Figure 1.8.1 RISC Architecture
As shown in figure 1.8.1, RISC architecture uses separate instruction and data caches. Their access
paths are also different. The hardwired control unit is found in most of the RISC processors.
Architectural description of CISC
Figure 1.8.2 CISC Architecture
As shown in figure 1.8.2, In CISC processor, there is a unified cache buffer for holding both
instruction and data. Therefore they have to share the common path. The micro programmed
control unit uses the CISC processors. but modern CISC processors may also use hardwired
control unit.
Control unit Instruction and
Data path
Microprogrammed
Control memory Cache
Main memory
Hardwired control unit Data path
Instruction cache Data cache
(Instruction) (Data)
Main memory

Table 2 Difference between RISC and CISC
RISC CISC
Used by Apple Used by Intel and AMD processors
Requires less registers therefore it is
easier to design
Slower than RISC chips when performing
instructions
Faster than CISC More expensive to make compared to RISC
Reduced Instruction Set Computer Complex Instruction Set Architecture
Pipelining can be implemented easily Pipelining implementation is not easy
Direct addition is not possible Direct addition between data in two memory
locations. Ex.8085
Fewer, simpler and faster instructions Large amount of different and complex
instructions
RISC architecture is not widely used Atleast 75% of the processor use CISC
architecture
RISC chips require fewer transistors and
cheaper to produce. Finally, it's easier to
write powerful optimized compilers.
In common CISC chips are relatively slow
(compared to RISC chips) per instruction, but
use little (less than RISC) instructions.
RISC puts a greater burden on the
software. Software developers need to
write more lines for the same tasks.
In CISC, software developers no need to write
more lines for the same tasks
Mainly used for real time applications Mainly used in normal PC’s, Workstations and
servers
Large number of registers, most of
which can be used as general purpose
registers
CISC processors cannot have a large number of
registers.
RISC processor has a number of
hardwired instructions.
CISC processor executes microcode
instructions.
Instructions are executed by hardware Instructions are executed by micro program
Fixed format instructions variable format instructions
Few instructions Many instructions
Multiple instruction set Single instruction set
Highly pipelined Less pipelined
Complexity is in the compiler Complexity is in the microprogram
1.9 Design ALU

The most of the computer operations are performed in arithmetic and logical unit (ALU)
only. The data & operands are brought to memory to the processor register and actual addition is
carried out in ALU. Each register can store one word of data. The ALU and control unit are many
times faster than other devices connected to the system.
The following operations can be performed by the ALU. They are:
(i) Arithmetic operations (Addition, subtraction, multiplication and division)
(ii)Logical operations (AND, OR, NOT and XOR)
Adders
For two input adders we can have four possible operations. They are
0 + 0 = 0 ⇒ 1 digit sum and no carry
1 + 1 = 10 ⇒ 2 digit sum (Higher bit =carry and lower bit =sum)
Addition with out carry ⇒ half adder
Addition with carry ⇒ full adder
Half adder
The half adder is an example of a simple, functional digital circuit built from two logic gates. A
half adder adds two one-bit binary numbers A and B. It has two outputs, S and C (the value
theoretically carried on to the next addition); the final sum is 2C + S. The simplest half-adder
design, pictured on the right, incorporates an XOR gate for S and an AND gate for C. Half adders
cannot be used compositely, given their incapacity for a carry-in bit.
Inputs Outputs
A B Carry Sum
0 0 0 0
1 0 0 1
0 1 0 1
1 1 1 0
Figure 1.9.1 Half adder's one-bit truth table
Figure 1.9.2 Half adder block diagram
1 bit
Half Adder
(HA)
Input Output
Sum
CarryA
B

Figure 1.9.3 Half adder Karnaugh map for carry and sum
Figure 1.9.4 Half adder logic diagram
Disadvantages: Previous carry cannot be added in half adder.
Full adder
Schematic symbol for a 1-bit full adder with Cin and Cout drawn on sides of block to
emphasize their use in a multi-bit adder. A full adder adds binary numbers and accounts for
values carried in as well as out. A one-bit full adder adds three one-bit numbers, often written as A,
B, and Cin; A and B are the operands, and Cin is a bit carried in (in theory from a past addition). The
full-adder is usually a component in a cascade of adders, which add 8, 16, 32, etc. binary numbers.
The circuit produces a two-bit output sum typically represented by the signals Cout and S, where
.
The one-bit full adder's truth table is:
Inputs Outputs
A B Cin Cout S
0 0 0 0 0
1 0 0 0 1
0 1 0 0 1
1 1 0 1 0
0 0 1 0 1
1 0 1 1 0
0 1 1 1 0
1 1 1 1 1
Figure 1.9.5 Full adder's one-bit truth table
1
0 0
0
B 0 1
A
0
1
0
0
B 0 1
A
0
1
Carry = AB
1
1
Sum = AB + AB
= A ⊕ B

Figure 1.9.6 Full adder block diagram
Figure 1.9.7 Full adder Karnaugh map for carry and sum
Figure 1.9.8 Full adder logic diagram
Example full adder logic diagram; the AND gates and the OR gate can be replaced with
NAND gates for the same results. A full adder can be implemented in many different ways such as
with a custom transistor-level circuit or composed of other gates. One example implementation is
with
and .
In this implementation, the final OR gate before the carry-out output may be replaced by an XOR
gate without altering the resulting logic. Using only two types of gates is convenient if the circuit
is being implemented using simple IC chips which contain only one gate type per chip. In this
light, Cout can be implemented as .
A full adder can be constructed from two half adders by connecting A and B to the input of one
half adder, connecting the sum from that to an input to the second adder, connecting Ci to the other
input and OR the two carry outputs. Equivalently, S could be made the three-bit XOR of A, B, and
Ci, and Cout could be made the three-bit majority function of A, B, and Ci.
More complex adders
Carry = AB+BC+AC
Sum = ABC + ABC+ ABC+ ABC
= (A ⊕ B) ⊕ C
1
0 0
0
A
0
1
0
0
A
0
1
1
1
1
1 0
11
1 0
01
1
BC 00 01 11 10 BC 00 01 11 10

Ripple carry adder
Figure 1.9.9 4-bit adder with logic gates shown
It is possible to create a logical circuit using multiple full adders to add N-bit numbers.
Each full adder inputs a Cin, which is the Cout of the previous adder. This kind of adder is a ripple
carry adder, since each carry bit "ripples" to the next full adder. Note that the first (and only the
first) full adder may be replaced by a half adder.
The layout of a ripple carry adder is simple, which allows for fast design time; however,
the ripple carry adder is relatively slow, since each full adder must wait for the carry bit to be
calculated from the previous full adder. The gate delay can easily be calculated by inspection of
the full adder circuit. Each full adder requires three levels of logic. In a 32-bit [ripple carry] adder,
there are 32 full adders, so the critical path (worst case) delay is 3 (for carry propagation in first
adder) + 31 * 2 (for carry propagation in later adders) = 65 gate delays.
Carry-lookahead adders
Figure 1.9.10 4-bit adder with carry lookahead
To reduce the computation time, engineers devised faster ways to add two binary numbers
by using carry-lookahead adders. They work by creating two signals (P and G) for each bit
position, based on if a carry is propagated through from a less significant bit position (at least one
input is a '1'), a carry is generated in that bit position (both inputs are '1'), or if a carry is killed in
that bit position (both inputs are '0'). In most cases, P is simply the sum output of a half-adder and
G is the carry output of the same adder. After P and G are generated the carries for every bit
position are created.

2.1 Explain the process Fundamental concepts
To execute a program, the processor fetches one instruction at a time and performs the
operations specified. Instructions are fetched from successive memory locations until a branch or a
jump instruction is encountered. The processor keeps track of the address of the memory location
containing the next instruction to be fetched using the program counter, PC. After fetching an
instruction, the contents of the PC are updated to point to the next instruction in the sequence. A
branch instruction may load a different value into the PC.
Another key register in the processor is the instruction register, IR. Suppose that each
instruction comprises 4 bytes, and that it is stored in one memory word. To execute an instruction,
the processor has to perform the following three steps:
1. Fetch the contents of the memory location pointed to by the PC. The contents of this
location are interpreted as an instruction to be executed. Hence, they are loaded into the IR.
Symbolically, this can be written as IR ← [[PC]]
2. Assuming that the memory is byte addressable, increment the contents of the PC by 4,
that is, PC ← [PC] + 4
3. Carry out the actions specified by the instruction in the IR.
In cases where an instruction occupies more than one word, steps 1 and 2 must be repeated as
many times as necessary to fetch the complete instruction. These two steps are usually referred to
as the fetch phase; step 3 constitutes the execution phase. Figure 2.1 shows an organization in
which the arithmetic and logic unit (ALU) and all the registers are interconnected via a single
common bus. This bus is internal to the processor and should not be confused with the external bus
that connects the processor to the memory and I/O devices.
The data and address lines of the external memory bus are shown in Figure 2.1 connected
to the internal processor bus via the memory data register, MDR, and the memory address register,
MAR, respectively. Register MDR has two inputs and two outputs. Data may be loaded into MDR
either from the memory bus or from the internal processor bus. The data stored in MDR may be
placed on either bus. The input of MAR is connected to the internal bus, and its output is
connected to the external bus. The control lines of the memory bus are connected to the instruction
decoder and control logic block. This unit is responsible for issuing the signals that control the
operation of all the units inside the processor and for interacting with the memory bus.
The number and use of the processor registers R0 through R(n - 1) vary considerably from
one processor to another. Registers may be provided for general-purpose use by the programmer.
Some may be dedicated as special-purpose registers, such as index registers or stack pointers.
Three registers, Y, Z, and TEMP in Figure 2.1, have not been mentioned before. These registers
are transparent to the programmer, that is, the programmer need not be concerned with them
because they are never referenced explicitly by any instruction. They are used by the processor for
temporary storage during execution of some instructions. These registers are never used for storing
data generated by one instruction for later use by another instruction.
The multiplexer MUX selects either the output of register Y or a constant value 4 to be
provided as input A of the ALU. The constant 4 is used to increment the contents of the program
counter. We will refer to the two possible values of the MUX control input Select as Select4 and
SelectY for selecting the constant 4 or register Y, respectively.
As instruction execution progresses, data are transferred from one register to another, often
passing through the AL U to perform some arithmetic or logic operation. The instruction decoder
and control logic unit is responsible for implementing the actions specified by the instruction
loaded in the IR register. The decoder generates the control signals needed to select the registers

involved and direct the transfer of data. The registers, the ALU, and the interconnecting bus are
collectively referred to as the datapath.
Figure 2.1 Single bus organization of the data path inside a processor.
With few exceptions, an instruction can be executed by performing one or more of the
following operations in some specified sequence:
· Transfer a word of data from one processor register to another or to the ALU
· Perform arithmetic or a logic operation and store the result in a processor register
· Fetch the contents of a given memory location and load them into a processor register
· Store a word of data from a processor register into a given memory location
REGISTER TRANSFERS
Instruction execution involves a sequence of steps in which data are transferred from one
register to another. For each register, two control signals are used to place the contents of that
register on the bus or to load the data on the bus into the register. This is represented symbolically
in Figure 2.2. The input and output of register Ri are connected to the bus via switches controlled
by the signals Riin and Ri out respectively. When Riin is set to 1, the data on the bus are loaded into
Ri. Similarly, when Riout is set to 1, the contents of register Ri are placed on the bus. While Riout is
equal to 0, the bus can be used for transferring data from other registers.
Suppose that we wish to transfer the contents of register Rl to register R4. This can be
accomplished as follows:
. . .
Sub
…
Internal Processor Bus
Add
Select
Memory
Bus
Data line
Address line
PC
MAR
MDR
Y
MUX
Constant
ALU
A B
XOR
ALU
control
lines
Carry in
Z
IR
Instruction
Decoder and
control logic
R0
R (n-1)
TEMP
...
Control signals

· Enable the output of register Rl by setting R1out to 1. This places the contents of R 1 on the
processor bus.
· Enable the input of register R4 by setting R4in to 1. This loads data from the processor bus into
register R4.
Figure 2.2 Input and output gating for the registers in figure 2.1.
All operations and data transfers within the processor take place within time periods
defined by the processor clock. The control signals that govern a particular transfer are asserted at
the start of the clock cycle. In our example, R1out and R4in are set to 1. The registers consist of
edge-triggered flip-flops. Hence, at the next active edge of the clock, the flip-flops that constitute
R4 will load the data present at their inputs. At the same time, the control signals R1out and R4in
will return to 0. We will use this simple model of the timing of data transfers for the rest of this
chapter. However, we should point out that other schemes are possible. For example, data transfers
may use both the rising and falling edges of the clock. Also, when edge-triggered flip-flops are not
used, two or more clock signals may be needed to guarantee proper transfer of data. This is known
as multiphase clocking.
PERFORMING ARITHMETIC AND LOGICAL OPERATION
The ALU is a combinational circuit that has no internal storage. It performs arithmetic and
logic operations on the two operands applied to its A and B inputs. In Figures 2.1 and 2.2, one of
Zout
Ri out
Ri in
Select
Ri
Y
MUX
Constant C
Z
X
Yin
X
X
ALU
A B
X
Zin
X

the operands is the output of the multiplexer MUX and the other operand is obtained directly from
the bus. The result produced by the ALU is stored temporarily in register Z. Therefore, a sequence
of operations to add the contents of register Rl to those of register R2 and store the result in
register R3 is
1. R1out, Yin
2. R2out, Select Y, Add, Zin
3. Zout, R3in
FETCHING A WORD FROM MEMORY
The connections for register MDR are illustrated in Figure 2.4. It has four control signals:
MDR in and MDRout control the connection to the internal bus, and MDR inE and MDRout E control
the connection to the external bus. The circuit in Figure 2.3 is easily modified to provide the
additional connections. A three-input multiplexer can be used, with the memory bus data line
connected to the third input. This input is selected when MDRinE = 1. A second tri-state gate,
controlled by MDRoutE can be used to connect the output of the flip-flop to the memory bus.
Figure 2.3 Input and output gating for one register bit.
Figure 2.4 Connections and control signals for register MDR.
As an example of a read operation, consider the instruction Move (R1),R2. The actions needed to
execute this instruction are:
1. MAR ← [R1]
2. Start a Read operation on the memory bus
MDR in
MDR out
MDR
X
X
MDR out E
X
MDR in E
X
Memory bus data line
Bus
D Q
_
> Q
M
U
X
Ri in
Ri out
Clock

3. Wait for the MFC response from the memory
4. Load MDR from the memory bus
5. R2 ← [MDR]
These actions may be carried out as separate steps, but some can be combined into a single step.
Each action can be completed in one clock cycle, except action 3 which requires one or more clock
cycles, depending on the speed of the addressed device.
The memory read operation requires three steps, which can be described by the signals
being activated as follows:
1. R1out, MARin, Read
2. MDR inE , WMFC
3. MDRout, R2 in
where WMFC is the control signal that causes the processor's control circuitry to wait for the
arrival of the MFC signal.
STORING A WORD IN MEMORY
Writing a word into a memory location follows a similar procedure. The desired address is
loaded into MAR. Then, the data to be written are loaded into MDR, and a Write command is
issued. Hence, executing the instruction Move R2,(R 1) requires the following sequence:
1. R1out, MARin
2. R2out, MDRin, Write
3. MDRoutE, WMFC
As in the case of the read operation, the Write control signal causes the memory bus interface
hardware to issue a Write command on the memory bus. The processor remains in step 3 until the
memory operation is completed and an MFC response is received.

2.2 Explain the process complete instruction execution
Consider the instruction Add (R3),Rl , which adds the contents of a memory location
pointed to by R3 to register R1. Executing this instruction requires the following actions:
1. Fetch the instruction.
2. Fetch the first operand (the contents of the memory location pointed to by R3).
3. Perform the addition.
4. Load the result into R1.
Figure 2.5 gives the sequence of control steps required to perform these operations for the single-
bus architecture of Figure 2.1. Instruction execution proceeds as follows. In step 1, the instruction
fetch operation is initiated by loading the contents of the PC into the MAR and sending a Read
request to the memory. The Select signal is set to Select4, which causes the multiplexer MUX to
select the constant 4. This value is added to the operand at input B, which is the contents of the PC,
and the result is stored in register Z. The updated value is moved from register Z back into the PC
during step 2, while waiting for the memory to respond. In step 3, the word fetched from the
memory is loaded into the IR.
Steps 1 through 3 constitute the instruction fetch phase, which is the same for all instructions. The
instruction decoding circuit interprets the contents of the IR at the beginning of step 4. This
enables the control circuitry to activate the control signals for steps 4 through 7, which constitute
the execution phase. The contents of register R3 are transferred to the MAR in step 4, and a
memory read operation is initiated. Then
Figure 2.5 Control sequence for execution of instruction Add (R3), Rl.
the contents of R 1 are transferred to register Y in step 5, to prepare for the addition operation.
When the Read operation is completed, the memory operand is available in register MDR, and the
addition operation is performed in step 6. The contents of MDR are gated to the bus, and thus also
to the B input of the ALU, and register Y is selected as the second input to the ALU by choosing
Select Y The sum is stored in register Z, then transferred to R 1 in step 7. The End signal causes a
new instruction fetch cycle to begin by returning to step 1.
This discussion accounts for all control signals in Figure 2.5 except Y in in step 2. There is no need
to copy the updated contents of PC into register Y when executing the Add instruction. But, in
Branch instructions the updated value of the PC is needed to compute the Branch target address.
To speed up the execution of Branch instructions, this value is copied into register Y in step 2.

Since step 2 is part of the fetch phase, the same action will be performed for all instructions. This
does not cause any harm because register Y is not used for any other purpose at that time.
BRANCH INSTRUCTION
A branch instruction replaces the contents of the PC with the branch target address. This
address is usually obtained by adding an offset X, which is given in the branch instruction, to the
updated value of the PC. Figure 2.6 gives a control sequence that implements an unconditional
branch instruction. Processing starts, as usual, with the fetch phase. This phase ends when the
instruction is loaded into the IR in step 3. The offset value is extracted from the IR by the
instruction decoding circuit, which will also perform sign extension if required. Since the value of
the updated PC is already available in register Y, the offset X is gated onto the bus in step 4, and
an addition operation is performed. The result, which is the branch target address, is loaded into
the PC in step 5.
The offset X used in a branch instruction is usually the difference between the branch target
address and the address immediately following the branch instruction. For example, if the branch
instruction is at location 2000 and if the branch target address is 2050, the value of X must be 46.
The reason for this can be readily appreciated from the control sequence in Figure 2.6. The PC is
incremented during the fetch phase, before knowing the type of instruction being executed. Thus,
when the branch address is computed in step 4, the PC value used is the updated value, which
points to the instruction following the branch instruction in the memory.
Figure 2.6 Control sequence for an unconditional instruction.
Consider now a conditional branch. In this case, we need to check the status of the condition codes
before loading a new value into the PC. For example, for a Branch-on-negative (Branch <0)
instruction, step 4 in Figure 2.6 is replaced with
Thus, if N = 0 the processor returns to step 1 immediately after step 4. If N = 1, step 5 is performed
to load a new value into the PC, thus performing the branch operation.
2.3 Discuss multiple bus organization

In single bus organization, only one data item can be transferred over the bus in a clock
cycle. To reduce the number of steps needed, most commercial processors provide multiple
internal paths that enable several transfers to take place in parallel.
Figure 2.7 Three bus organization of data path.
Figure 2.7 illustrates a three-bus structure used to connect the registers and the ALU of a
processor. All general-purpose registers are combined into a single block called the register file.
The register file in Figure 2.7 is said to have three ports. There are two outputs, allowing the
contents of two different registers to be accessed simultaneously and have their contents placed on
R
A
L
U
Constant
B
A
Bus C
PC
Register
file
Instruction
Decoder
Bus BBus A
Incrementer
M
U
X
IR
MDR
MAR
Memory bus
data lines
Address
lines

buses A and B. The third port allows the data on bus C to be loaded into a third register during the
same clock cycle.
Buses A and B are used to transfer the source operands to the A and B inputs of the ALU,
where an arithmetic or logic operation may be performed. The result is transferred to the
destination over bus C. If needed, the ALU may simply pass one of its two input operands
unmodified to bus C. We will call the ALU control signals for such an operation R=A or R=B.
A second feature in Figure 2.7 is the introduction of the Incrementer unit, which is used to
increment the PC by 4. Using the Incrementer eliminates the need to add 4 to the PC using the
main ALD, as was done in single bus organization. The source for the constant 4 at the ALU input
multiplexer is still useful. It can be used to increment other addresses, such as the memory
addresses in LoadMultiple and StoreMultiple instructions.
Figure 2.8 Control sequence for execution of instruction Add R4,R5,R6 for 3 bus organization
Consider the three-operand instruction Add R4,R5,R6
The control sequence for executing this instruction is given in Figure 2.8. In step 1, the
contents of the PC are passed through the ALU, using the R=B control signal, and loaded into the
MAR to start a memory read operation. At the same time the PC is incremented by 4. Note that the
value loaded into MAR is the original contents of the PC. The incremented value is loaded into the
PC at the end of the clock cycle and will not affect the contents of MAR. In step 2, the processor
waits for MFC and loads the data received into MDR, then transfers them to IR in step 3. Finally,
the execution phase of the instruction requires only one control step to complete, step 4. By
providing more paths for data transfer a significant reduction in the number of clock cycles needed
to execute an instruction is achieved.

2.4 Discuss Hardwired control
To execute instructions, the processor must have some means of generating the control
signals needed in the proper sequence. Computer designers use a wide variety of techniques to
solve this problem. The approaches used fall into one of two categories: hardwired control and
microprogrammed control.
The required control signals are determined by the following information:
· Contents of the control step counter
· Contents of the instruction register
· Contents of the condition code flags
· External input signals, such as MFC and interrupt requests
Figure 2.10 Control unit organization.
To gain insight into the structure of the control unit, we start with a simplified view of the
hardware involved. The decoder/encoder block in Figure 2.10 is a combinational circuit that
generates the required control outputs, depending on the state of all its inputs. By separating the
decoding and encoding functions, we obtain the more detailed block diagram in Figure 2.11. The
step decoder provides a separate signal line for each step, or time slot, in the control sequence.
Similarly, the output of the instruction decoder consists of a separate line for each machine
instruction. For any instruction loaded in the IR, one of the output lines INS 1 through INS m is set
to 1, and all other lines are set to O. (For design details of decoders, refer to Appendix A.) The
input signals to the encoder block in Figure 2.11 are combined to generate the individual control
signals Y in , PC OUh Add, End, and so on. An example of how the encoder generates the Zin
control signal for the processor organization is given in Figure 2.12. This circuit implements the
logic function
This signal is asserted during time slot Tl for all instructions, during T6 for an Add
instruction, during T 4 for an unconditional branch instruction, and so on. Figure 2.13 gives a
circuit that generates the End control signal from the logic function
CLK
Decoder/
Encoder
IR
Control step
counter
External
inputs
Condition
codes
Control signals
.
.
.
…
.
.
.
.
.
.
Clock
…

The End signal starts a new instruction fetch cycle by resetting the control step counter to its
starting value. Figure 2.11 contains another control signal called RUN. When set to 1, RUN causes
the counter to be incremented by one at the end of every clock cycle. When RUN is equal to 0, the
counter stops counting. This is needed whenever the WMFC signal is issued, to cause the
processor to wait for the reply from the memory.
Figure 2.11 Separation of decoding and encoding functions.
The control hardware shown in Figure 2.10 or 2.11 can be viewed as a state machine that
changes from one state to another in every clock cycle, depending on the contents of the
instruction register, the condition codes, and the external inputs. The outputs of the state machine
are the control signals. The sequence of operations carried out by this machine is determined by
the wiring of the logic elements, hence the name "hardwired." A controller that uses this approach
can operate at high speed. However, it has little flexibility, and the complexity of the instruction
set it can implement is limited.
INSn
INS2
Run End
CLK
Encoder
Control step
counter
External
inputs
Condition
codes
Control signals
Instruction
Decoder
.
.
.
…
.
.
.
.
.
.
Clock
. . .
IR
.
.
.
Step decoder
T1 T2 . . . Tn
INS1

Figure 2.12 Generation of the Zin Control Figure 2.13 Generation of the
signal for the processor in figure 7.1. End Control signal.
A COMPLETE PROCESSOR
A complete processor can be designed using the structure shown in Figure 2.14.
This structure has an instruction unit that fetches instructions from an instruction cache or from the
main memory when the desired instructions are not already in the cache. It has separate processing
units to deal with integer data and floating-point data. A data cache is inserted between these units
and the main memory. Using separate caches for instructions and data is common practice in many
processors today. Other processors use a single cache that stores both instructions and data. The
processor is connected to the system bus and, hence, to the rest of the computer, by means of a bus
interface.
Figure 2.14 Block diagram of a complete processor.
…
T6T5
…
T7
Branch <0Add
End
T4
N N
Branch
Main
Memory
Processor
Instruction
Unit
Integer
Unit
Floating-point Unit
Instruction
Cache
Data
Cache
Bus Interface
Input/
Output
System bus
… …
T1
T4 T6
Branch Add
Zin

2.5 Micro programmed control
An alternative scheme for hardwired control is called micro programmed control in which
control signals are generated by a program similar to machine language programs.
Figure 2.15 An example of micro instructions for figure 2.6
A control word (CW) is a word whose individual bits represent the various control signals
Each of the control steps in the control sequence of an instruction defines a unique combination of
1s and 0s in the CW. The CW s corresponding to the 7 steps of Figure 2.6 are shown in Figure
2.15. SelectY is represented by Select = 0 and Select4 by Select = 1. A sequence of CW s
corresponding to the control sequence of a machine instruction constitutes the microroutine for that
instruction, and the individual control words in this microroutine are referred to as
microinstructions.
Figure 2.16 Basic organization of a microprogrammed control unit.
The microroutines for all instructions in the instruction set of a computer are stored in a
special memory called the control store. The control unit can generate the control signals for any
instruction by sequentially reading the CW s of the corresponding microroutine from the control
store. This suggests organizing the control unit as shown in Figure 2.16. To read the control words
sequentially from the control store, a microprogram counter (µPC) is used. Every time a new
Micro program
counter (µPC)
Control Store /
memory (µCM)
CW
Clock
IR
Starting
address
generator

instruction is loaded into the IR, the output of the block labeled "starting address generator" is
loaded into the µPC. The µPC is then automatically incremented by the clock, causing successive
microinstructions to be read from the control store. Hence, the control signals are delivered to
various parts of the processor in the correct sequence.
In microprogrammed control, an alternative approach is to use conditional branch
microinstructions. In addition to the branch address, these microinstructions specify which of the
external inputs, condition codes, or, possibly, bits of the instruction register should be checked as a
condition for branching to take place.
Figure 2.17 Micro routine for the instruction branch <0
Figure 2.18 Organization of a control unit to allow conditional branching in microprogram
The instruction Branch <0 may now be implemented by a microroutine such as that shown
in Figure 7.17. After loading this instruction into IR, a branch microinstruction transfers control to
the corresponding microroutine, which is assumed to start at location 25 in the control store. This
address is the output of the starting address generator block in Figure 7.16. The microinstruction at
location 25 tests N bit of the condition codes. If this bit is equal to 0, a branch takes place to
location 0 to fetch a new machine instruction. Otherwise, the microinstruction at location 26 is
Micro program
counter (µPC)
Control Store /
memory (µCM)
CW
Clock
IR
Starting and
branch address
generator
External inputs
Condition codes

Computer organization-and-architecture-questions-and-answers

Computer organization-and-architecture-questions-and-answers

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Computer organization-and-architecture-questions-and-answers

Similar a Computer organization-and-architecture-questions-and-answers (20)

Más de appasami

Más de appasami (20)

Último

Último (20)

Computer organization-and-architecture-questions-and-answers