1. Introduction toIntroduction to
Multiprocessor SystemsMultiprocessor Systems
V.V. SubrahmanyamV.V. Subrahmanyam
SOCIS, IGNOUSOCIS, IGNOU
Date: 23Date: 23rdrd
April, 2011April, 2011
Time: 13-00 to 13-45Time: 13-00 to 13-45
2. ObjectivesObjectives
IntroductionIntroduction
Processor CouplingProcessor Coupling
• Loosely Coupled SystemsLoosely Coupled Systems
• Tightly Coupled SystemsTightly Coupled Systems
Multiprocessor InterconnectionsMultiprocessor Interconnections
Types of Multiprocessor O/STypes of Multiprocessor O/S
Functions of Multiprocessor O/SFunctions of Multiprocessor O/S
Multiprocessor SynchronizationMultiprocessor Synchronization
3. Multiprocessor SystemsMultiprocessor Systems
A multiprocessor system is aA multiprocessor system is a
collection of a number of standardcollection of a number of standard
processors put together in anprocessors put together in an
innovative way to improve theinnovative way to improve the
performance / speed of computerperformance / speed of computer
hardware.hardware.
The main feature of this architectureThe main feature of this architecture
is to provide high speed at low costis to provide high speed at low cost
in comparison to uniprocessor.in comparison to uniprocessor.
4. Contd…Contd…
Multiprocessor operating systems aim toMultiprocessor operating systems aim to
support high performance through multiplesupport high performance through multiple
CPUs.CPUs.
An important goal is to make the numberAn important goal is to make the number
of CPUs transparent to the application.of CPUs transparent to the application.
Achieving such transparency is relativelyAchieving such transparency is relatively
easy because the communication betweeneasy because the communication between
different (parts of) applications uses thedifferent (parts of) applications uses the
same primitives asthose in multitaskingsame primitives asthose in multitasking
uni-processor operating systems.uni-processor operating systems.
5. Contd…Contd…
Multiprogramming is more appropriate toMultiprogramming is more appropriate to
describe this concept, which isdescribe this concept, which is
implemented mostly in software, whereasimplemented mostly in software, whereas
multiprocessing is more appropriate tomultiprocessing is more appropriate to
describe the use of multiple hardwaredescribe the use of multiple hardware
CPUs.CPUs.
The multiprocessor system is generallyThe multiprocessor system is generally
characterised by -characterised by - increased systemincreased system
throughputthroughput andand application speedupapplication speedup --
parallel processing.parallel processing.
6. Throughput and ApplicationThroughput and Application
SpeedupSpeedup
ThroughputThroughput can be improved, in acan be improved, in a time-time-
sharing environmentsharing environment, by executing a, by executing a
number of unrelated user processor onnumber of unrelated user processor on
different processors in parallel. As a resultdifferent processors in parallel. As a result
a large number of different tasks can bea large number of different tasks can be
completed in a unit of time without explicitcompleted in a unit of time without explicit
user direction.user direction.
AApplication speeduppplication speedup is possible byis possible by
creating a multiple processor scheduled tocreating a multiple processor scheduled to
work on different processors.work on different processors.
7. Processor CouplingProcessor Coupling
Multiprocessor systems have moreMultiprocessor systems have more
than one processing unit sharingthan one processing unit sharing
memory/peripheral devices. Theymemory/peripheral devices. They
have greater computing power, andhave greater computing power, and
higher reliability. Multiprocessorhigher reliability. Multiprocessor
systems are classified into two:systems are classified into two:
• Tightly-coupledTightly-coupled
• Loosely-coupledLoosely-coupled
8. Tightly- Coupled SystemsTightly- Coupled Systems
Each processor is assigned a specific dutyEach processor is assigned a specific duty
but processors work in close association,but processors work in close association,
possibly sharing one memory module.possibly sharing one memory module.
Tightly-coupled multiprocessor systemsTightly-coupled multiprocessor systems
contain multiple CPUs that are connectedcontain multiple CPUs that are connected
at the bus level.at the bus level.
These CPUs may have access to a centralThese CPUs may have access to a central
shared memory (SMP), or may participateshared memory (SMP), or may participate
in a memory hierarchy with both local andin a memory hierarchy with both local and
shared memory (NUMA).shared memory (NUMA). The IBM p690The IBM p690
RegattaRegatta is an example of a high end SMPis an example of a high end SMP
system.system.
9. Contd…Contd…
Mainframe systems with multipleMainframe systems with multiple
processors are often tightly-coupled.processors are often tightly-coupled.
Tightly-coupled systems perform betterTightly-coupled systems perform better
and are physically smaller than loosely-and are physically smaller than loosely-
coupled systems, but have historicallycoupled systems, but have historically
required greater initial investments andrequired greater initial investments and
may depreciate rapidly.may depreciate rapidly.
10. Contd…Contd…
Tightly-coupled systems tend toTightly-coupled systems tend to
be much more energy efficientbe much more energy efficient
than clusters. This is due to factthan clusters. This is due to fact
that considerable economies canthat considerable economies can
be realised by designingbe realised by designing
components to work togethercomponents to work together
from the beginning in tightly-from the beginning in tightly-
coupled systems.coupled systems.
11. Loosely-Coupled SystemsLoosely-Coupled Systems
Loosely-coupled multiprocessorLoosely-coupled multiprocessor
systems often referred to as clusterssystems often referred to as clusters
are based on multiple standaloneare based on multiple standalone
single or dual processor commoditysingle or dual processor commodity
computers interconnected via a highcomputers interconnected via a high
speed communication system.speed communication system.
A Linux BeowulfA Linux Beowulf is an example of ais an example of a
loosely-coupled system.loosely-coupled system.
12. Contd…Contd…
Nodes in a loosely-coupled systemNodes in a loosely-coupled system
are usually inexpensive commodityare usually inexpensive commodity
computers and can be recycled ascomputers and can be recycled as
independent machines uponindependent machines upon
retirement from the cluster.retirement from the cluster.
Loosely coupled systems are notLoosely coupled systems are not
much energy efficient whenmuch energy efficient when
compared with tightly coupled.compared with tightly coupled.
15. Contd…Contd…
Processors and memory areProcessors and memory are
connected by a common bus.connected by a common bus.
Communication between processorsCommunication between processors
(P1, P2, P3 and P4) and with globally(P1, P2, P3 and P4) and with globally
shared memory is possible over ashared memory is possible over a
shared bus.shared bus.
16. DisadvantageDisadvantage
The above architecture gives rise toThe above architecture gives rise to
a problem ofa problem of ContentionContention at twoat two
points, one is shared bus and thepoints, one is shared bus and the
other is shared memory.other is shared memory.
17. Two Ways to OvercomeTwo Ways to Overcome
Cache along with the Shared MemoryCache along with the Shared Memory
Cache associated with eachCache associated with each
individual processorindividual processor
18. Cache with Shared MemoryCache with Shared Memory
P 2 P 3 P 4
Cache
Shared Memory
P1 P2 P3 P4
19. Cache with each individualCache with each individual
processorprocessor
P 2 P 3 P 4
Shared Memory
P1 P2 P3 P4
Cache Cache Cache Cache
20. TechniquesTechniques
Techniques which can be employedTechniques which can be employed
to decrease the impact ofto decrease the impact of busbus andand
memory saturationmemory saturation in bus-orientedin bus-oriented
system.system.
• Wider Bus TechniqueWider Bus Technique
• Split Request / Reply ProtocolsSplit Request / Reply Protocols
21. Wider Bus TechniqueWider Bus Technique
A bus is made wider so that moreA bus is made wider so that more
bytes can be transferred in a singlebytes can be transferred in a single
bus cycle. In other words, a widerbus cycle. In other words, a wider
parallel bus increases the bandwidthparallel bus increases the bandwidth
by transferring more bytes in aby transferring more bytes in a
single bus cycle.single bus cycle.
22. Split Request / Reply ProtocolsSplit Request / Reply Protocols
The memory request and reply areThe memory request and reply are
split into two individual works andsplit into two individual works and
are treated as separate busare treated as separate bus
transactions. As soon as atransactions. As soon as a
processor requests a block, the busprocessor requests a block, the bus
released to other user, meanwhile itreleased to other user, meanwhile it
takes for memory to fetch andtakes for memory to fetch and
assemble the related group ofassemble the related group of
items.items.
23. Other SchemesOther Schemes
Individual processors may or may notIndividual processors may or may not
have private/cache memory.have private/cache memory.
Individual processors may or may notIndividual processors may or may not
attach with input/output devices.attach with input/output devices.
Input/output devices may be attached toInput/output devices may be attached to
shared bus.shared bus.
Shared memory implemented in the formShared memory implemented in the form
of multiple physical banks connected toof multiple physical banks connected to
the shared bus.the shared bus.
25. ContdContd
It is a grid structure of processor andIt is a grid structure of processor and
memory modules.memory modules.
The every cross point of grid structure isThe every cross point of grid structure is
attached with switch. By looking at theattached with switch. By looking at the
Figure,Figure, it shows simultaneous accessit shows simultaneous access
between processor and memory modulesbetween processor and memory modules
asas NN number of processors are providednumber of processors are provided
withwith NN number of memory modules. Thusnumber of memory modules. Thus
each processor accesses a differenteach processor accesses a different
memory module.memory module.
26. Contd…Contd…
Crossbar needsCrossbar needs NN22
switches for fullyswitches for fully
connected network betweenconnected network between
processors and memory.processors and memory.
Processors may or may not haveProcessors may or may not have
their private memories.their private memories.
28. Contd…Contd…
This architecture has some advantagesThis architecture has some advantages
over other architectures of multiprocessingover other architectures of multiprocessing
system.system.
In anIn an n-degreen-degree hypercube architecture, wehypercube architecture, we
have:have:
• 22nn
nodes (Total number of processors)nodes (Total number of processors)
• Nodes are arranged in n-dimensionalNodes are arranged in n-dimensional
cube, i.e. each node is connected tocube, i.e. each node is connected to nn
number of nodes.number of nodes.
• Each node is assigned with a uniqueEach node is assigned with a unique
address which lies between 0 to 2address which lies between 0 to 2nn
–1–1
• The adjacent nodes (n-1) are differing inThe adjacent nodes (n-1) are differing in
1 bit and the1 bit and the nthnth node is havingnode is having
maximum ‘maximum ‘nn’ inter-node distance.’ inter-node distance.
29. ExampleExample
3-degree hypercube will have 23-degree hypercube will have 2nn
nodesnodes
i.e., 2i.e., 233
= 8 nodes= 8 nodes
Nodes are arranged in 3-dimensionalNodes are arranged in 3-dimensional
cube, that is, each node is connected tocube, that is, each node is connected to
3 no. of nodes.3 no. of nodes.
Each node is assigned with a uniqueEach node is assigned with a unique
address, which lies between 0 to 7 (2address, which lies between 0 to 7 (2nn
–1), i.e., 000, 001, 010, 011, 100, 101,–1), i.e., 000, 001, 010, 011, 100, 101,
110, 111110, 111
Two adjacent nodes differing in 1 bitTwo adjacent nodes differing in 1 bit
(001, 010) and the 3(001, 010) and the 3rdrd
(n(nthth
) node is) node is
having maximum ‘3’ inter-node distancehaving maximum ‘3’ inter-node distance
(100).(100).
30. Contd…Contd…
Hypercube provide a good basis for scalableHypercube provide a good basis for scalable
system because its communication length growssystem because its communication length grows
logarithmically with the number of nodes.logarithmically with the number of nodes.
It provides a bi-directional communicationIt provides a bi-directional communication
between two processors.between two processors.
It is usually used in loosely coupledIt is usually used in loosely coupled
multiprocessor system because the transfer ofmultiprocessor system because the transfer of
data between two processors goes throughdata between two processors goes through
several intermediate processors.several intermediate processors.
The longest internodes delay isThe longest internodes delay is n-degreen-degree..
To increase the input/output bandwidth theTo increase the input/output bandwidth the
input/output devices can be attached with everyinput/output devices can be attached with every
node (processor).node (processor).
31. Multistage Switch-based SystemMultistage Switch-based System
Multistage Switch Based SystemMultistage Switch Based System
permits simultaneous connectionpermits simultaneous connection
between several input-output pairs.between several input-output pairs.
It consists of several stages ofIt consists of several stages of
switches which provide multistageswitches which provide multistage
interconnection network.interconnection network.
32. Contd…Contd…
AA NN input-output connections containsinput-output connections contains
K= log2N stages ofK= log2N stages of N/2N/2 switches at eachswitches at each
stage. In simple words,stage. In simple words, N*NN*N processor-processor-
memory interconnection network requiresmemory interconnection network requires
(N/2) x = log(N/2) x = log22N switches.N switches.
In a 8X8 process-memory interconnectionIn a 8X8 process-memory interconnection
network requires (8/2* lognetwork requires (8/2* log228) = 4*3= 128) = 4*3= 12
switches. Each switch acts as 2X2switches. Each switch acts as 2X2
crossbar.crossbar.
33. Types of Multiprocessor O/STypes of Multiprocessor O/S
The multiprocessor o/s are complexThe multiprocessor o/s are complex
in comparison to multiprograms onin comparison to multiprograms on
an uniprocessor operating systeman uniprocessor operating system
because multiprocessor executesbecause multiprocessor executes
tasks concurrently.tasks concurrently.
Therefore, it must be able to supportTherefore, it must be able to support
the concurrent execution of multiplethe concurrent execution of multiple
tasks to increase processorstasks to increase processors
performance.performance.
34. Contd…Contd…
Depending upon the control structureDepending upon the control structure
and its organisation the three basicand its organisation the three basic
types of multiprocessor operatingtypes of multiprocessor operating
system are:system are:
• Separate SupervisorSeparate Supervisor
• Master-SlaveMaster-Slave
• Symmetric SupervisionSymmetric Supervision
35. Separate SupervisorSeparate Supervisor
In separate supervisor system eachIn separate supervisor system each
process behaves independently.process behaves independently.
Each system has its own operating systemEach system has its own operating system
which manages local input/output devices,which manages local input/output devices,
file system and memory well as keeps itsfile system and memory well as keeps its
own copy of kernel, supervisor and dataown copy of kernel, supervisor and data
structures, whereas some common datastructures, whereas some common data
structures also exist for communicationstructures also exist for communication
between processors.between processors.
The access protection is maintained,The access protection is maintained,
between processor, by using somebetween processor, by using some
synchronization mechanism likesynchronization mechanism like
semaphores.semaphores.
36. LimitationsLimitations
Such architecture will face theSuch architecture will face the
following problems:following problems:
Little coupling among processors.Little coupling among processors.
Parallel execution of single task.Parallel execution of single task.
During process failure it degrades.During process failure it degrades.
Inefficient configuration as theInefficient configuration as the
problem of replication arises betweenproblem of replication arises between
supervisor/kernel/data structuresupervisor/kernel/data structure
code and each processor.code and each processor.
37. Master-SlaveMaster-Slave
In master-slave, out of many processors oneIn master-slave, out of many processors one
processor behaves as a master whereas othersprocessor behaves as a master whereas others
behave as slaves.behave as slaves.
The master processor is dedicated to executingThe master processor is dedicated to executing
the operating system.the operating system.
It works as scheduler and controller over slaveIt works as scheduler and controller over slave
processors. It schedules the work and alsoprocessors. It schedules the work and also
controls the activity of the slaves. Therefore,controls the activity of the slaves. Therefore,
usually data structures are stored in its privateusually data structures are stored in its private
memory.memory.
Slave processors are often identified and workSlave processors are often identified and work
only as a schedulable pool of resources, in otheronly as a schedulable pool of resources, in other
words, the slave processors execute applicationwords, the slave processors execute application
programmes.programmes.
38. Contd…Contd…
This arrangement allows the parallelThis arrangement allows the parallel
execution of a single task by allocatingexecution of a single task by allocating
several subtasks to multiple processorsseveral subtasks to multiple processors
concurrently. Since the operating systemconcurrently. Since the operating system
is executed by only master processors thisis executed by only master processors this
system is relatively simple to develop andsystem is relatively simple to develop and
efficient to use. Limited scalability is theefficient to use. Limited scalability is the
main limitation of this system, becausemain limitation of this system, because
the master processor become a bottleneckthe master processor become a bottleneck
and will consequently fail to fully utilizeand will consequently fail to fully utilize
slave processors.slave processors.
39. SymmetricSymmetric
In symmetric organization all processorsIn symmetric organization all processors
configuration are identical.configuration are identical.
All processors are autonomous and are treatedAll processors are autonomous and are treated
equally. To make all the processors functionallyequally. To make all the processors functionally
identical, all the resources are pooled and areidentical, all the resources are pooled and are
available to them.available to them.
This operating system is also symmetric as anyThis operating system is also symmetric as any
processor may execute it. In other words there isprocessor may execute it. In other words there is
one copy of kernel that can be executed by allone copy of kernel that can be executed by all
processors concurrently. To that end, the wholeprocessors concurrently. To that end, the whole
process is needed to be controlled for properprocess is needed to be controlled for proper
interlocks for accessing scarce data structure andinterlocks for accessing scarce data structure and
pooled resources.pooled resources.
40. Contd…Contd…
The simplest way to achieve this is to treat theThe simplest way to achieve this is to treat the
entire operating system as a critical section andentire operating system as a critical section and
allow only one processor to execute the operatingallow only one processor to execute the operating
system at one time.system at one time.
This method is called ‘floating master’ methodThis method is called ‘floating master’ method
because in spite of the presence of manybecause in spite of the presence of many
processors only one operating system exists.processors only one operating system exists.
The processor that executes the operating systemThe processor that executes the operating system
has a special role and acts as a master.has a special role and acts as a master.
As the operating system is not bound to anyAs the operating system is not bound to any
specific processor, therefore, it floats from onespecific processor, therefore, it floats from one
processor to another.processor to another.
41. Contd…Contd…
Parallel execution of different applicationsParallel execution of different applications
is achieved by maintaining a queue ofis achieved by maintaining a queue of
ready processors in shared memory.ready processors in shared memory.
Processor allocation is then reduced toProcessor allocation is then reduced to
assigning the first ready process to firstassigning the first ready process to first
available processor until either allavailable processor until either all
processors are busy or the queue isprocessors are busy or the queue is
emptied. Therefore, each idled processoremptied. Therefore, each idled processor
fetches the next work item from thefetches the next work item from the
queue.queue.
42. Multiprocessor O/S Functions andMultiprocessor O/S Functions and
RequirementsRequirements
A multiprocessor operating systemA multiprocessor operating system
manages all the available resources andmanages all the available resources and
schedule functionality to form anschedule functionality to form an
abstraction.abstraction.
It will facilitates programme execution andIt will facilitates programme execution and
interaction with users.interaction with users.
A processor is one of the important andA processor is one of the important and
basic types of resources that need to bebasic types of resources that need to be
managed. For effective use ofmanaged. For effective use of
multiprocessors the processor schedulingmultiprocessors the processor scheduling
is necessary.is necessary.
43. Processor SchedulingProcessor Scheduling
Processors scheduling undertakes theProcessors scheduling undertakes the
following tasks:following tasks:
• Allocation of processors among applications inAllocation of processors among applications in
such a manner that will be consistent withsuch a manner that will be consistent with
system design objectives. It affects the systemsystem design objectives. It affects the system
throughput. Throughput can be improved bythroughput. Throughput can be improved by
co-scheduling several applications together,co-scheduling several applications together,
thus availing fewer processors to each.thus availing fewer processors to each.
• Ensure efficient use of processors allocation toEnsure efficient use of processors allocation to
an application. This primarily affects thean application. This primarily affects the
speedup of the system.speedup of the system.
44. Contd…Contd…
The two primary facets of OS supportThe two primary facets of OS support
for multiprocessing are:for multiprocessing are:
• Flexible and efficient interprocess andFlexible and efficient interprocess and
interprocessor synchronizationinterprocessor synchronization
mechanism, andmechanism, and
• Efficient creation and management of aEfficient creation and management of a
large number of threads of activity, suchlarge number of threads of activity, such
as processes or threads.as processes or threads.
45. Memory ManagementMemory Management
In multiprocessors system memoryIn multiprocessors system memory
management is highly dependent on themanagement is highly dependent on the
architecture and inter-connection scheme.architecture and inter-connection scheme.
• In loosely coupled systems memory is usuallyIn loosely coupled systems memory is usually
handled independently on a pre-processorhandled independently on a pre-processor
basis whereas in multiprocessor system sharedbasis whereas in multiprocessor system shared
memory may be simulated by means of amemory may be simulated by means of a
message passing mechanism.message passing mechanism.
• In shared memory systems the operatingIn shared memory systems the operating
system should provide a flexible memorysystem should provide a flexible memory
model that facilitates safe and efficient accessmodel that facilitates safe and efficient access
to share data structures and synchronizationto share data structures and synchronization
variables.variables.
46. Device ManagementDevice Management
The third basic resource isThe third basic resource is DeviceDevice
ManagementManagement but it has received littlebut it has received little
attention in multiprocessor systems toattention in multiprocessor systems to
date, because earlier the main focus pointdate, because earlier the main focus point
is speedup of compute intensiveis speedup of compute intensive
application, which generally do notapplication, which generally do not
generate much input/output after thegenerate much input/output after the
initial loading.initial loading.
Now, multiprocessors are applied for moreNow, multiprocessors are applied for more
balanced general-purpose applications,balanced general-purpose applications,
therefore, the input/output requirementtherefore, the input/output requirement
increases in proportion with the realisedincreases in proportion with the realised
throughput and speed.throughput and speed.
47. Multiprocessor SynchronizationMultiprocessor Synchronization
Multiprocessor system facilitates parallelMultiprocessor system facilitates parallel
program execution and read/write sharingprogram execution and read/write sharing
of data and thus may cause the processorsof data and thus may cause the processors
to concurrently access location in theto concurrently access location in the
shared memory. Therefore, a correct andshared memory. Therefore, a correct and
reliable mechanism is needed to serializereliable mechanism is needed to serialize
this access. This is called synchronizationthis access. This is called synchronization
mechanism.mechanism.
The mechanism should make access to aThe mechanism should make access to a
shared data structure appear atomic withshared data structure appear atomic with
respect to each other.respect to each other.