SlideShare una empresa de Scribd logo
1 de 10
Descargar para leer sin conexión
Parallel programs to multi-processor
computers!
Author: Igor Oreschenkov

Date: 04.02.2009


Abstract
For instance, when a man cannot manage everything he creates a tally for himself, brainless,
irresponsible, able only to solder pins, or carry heavy loads, or write from dictation but doing this very
well.

Arkadiy and Boris Strugatskie

The article is an introduction into parallel programs for beginners. It was published in "PC World"
magazine (http://www.viva64.com/go.php?url=429).


Parallel program, a particular case of labor division
Hardly would we have seen the world around us as we see it if in its time mankind didn't invented the
principle of labor division. Due to this there are professionals specializing in a particular sphere and able
to perform their work perfectly. Implementation of this approach in the sphere of industry allowed us to
increase product release tenfold. That's why it's no wonder that when computers appeared there
arouse the question of applying the principle of labor division to computer programs' operation.
Research in this sphere allowed us to detect also a pleasing fact that complicated mathematical
calculations related, for example, to matrix operations, can be split into independent subtasks. Each
subtask can be executed separately and then you can get the result of solution of the source task by
combining the results of subtasks' solutions. As subtasks are independent they can be solved
simultaneously by several executors. If a computer takes the role of the executor this approach allows
us to find solutions for computationally very complicated tasks of realistic modeling of real-environment
objects, cryptanalysis, real-time control of technological processes. It is this principle according to which
all the supercomputers - the pride of scientific laboratories and computational centers - operate.

Having reached the limit of processors' performance increase relating to fundamental physical limits,
computer manufacturers began to increase the number of computational nodes in their products. And
now inside the chassis of a common PC you may find what has been earlier called a supercomputer - a
multi-core computing device. Operation systems have been ready for this technology for a long time.
Unix and Linux, OS/2, Mac OS and modern versions of Microsoft Windows show performance increase
at simultaneous execution of several programs. For example, while you watch a film, antivirus scanning
of your hard disk can be performed without your noticing it. But there are situations when solution of
only one task can demand a lot of computational resources. Leaving aside the problem of nuclear fusion
synthesis or cracking a password to an archive, let's think about such common things as music and video
coding and computer games. Programs implementing these tasks enable fully the resources of CPU. But
increase of the number of processors in the system won't give you marked performance gain in these
examples if no special methods were used while developing the programs. Indeed, how does an
operation system know the peculiarities of coding methods' implementation to be able to distribute
calculations between processors? Can it know how to perform parallel calculation of a game-situation
scene?


Processes and threads - twins
So, we have a multi-processor computer. We have analyzed the task and singled out the subtasks which
can be solved simultaneously. How can we implement solution of these subtasks in a program? Modern
operation systems offer us two variants of code execution - in the form of processes and in the form of
threads.

A process is operation of a program loaded into main memory and ready for execution. So, when we
simultaneously perform two actions - serf the news in the Internet and burn files on a CD, two processes
are executed simultaneously - an internet-browser and a CD-writing program are operating. A process
consists of program code, operated data and various resources, for example, files or system queues
belonging to the program. Each process is executed in its address space, i.e. has access only to its own
area of main memory. On the one hand it is good because one process cannot interfere with the other
(until they both address the non-divisible resource, for example, the disk drive, but it is a different thing)
and errors in operation of one program in no way will influence operation of the other. But on the other
hand, if processes are meant for solving one common task, we face the question: how will they
exchange information and interact at all?

Launch of processes - Unix and Windows
What is interesting, developers of Mircosoft Windows and Unix (Linux) operation systems applied
different approaches to the question of launching a child process. From the viewpoint of a programmer
using WIN32 API everything is quite natural. In the point of launch of a support process CreateProcess()
function is called whose one argument indicates the path to EXE-file with the program to be executed.

In Unix-like operation system this is implemented in a different way. In the code point where we need to
create a new process, fork() system call is used. As the result the operating program "forks", that is the
state of the program at the moment of fork() execution is copied (by the state we understand the values
of the CPU registers, stack, data area and list of open files) and on the basis of this copy a new process is
launched possessing the same information as the parent process. Both parent and child processes
continue executing the same program beginning with the command following fork() system call. As
execution of one and the same work by two processors is senseless, we face the question: how to
implement solution of an independent subtask in each process we've got? The point is that from the
viewpoint of the parent process the result of fork() function is the identifier of the child process, and
fork() simply returns zero value into the child process. Each process can identify itself by this code. And
all the rest is a technical matter. The child process has only to prepare an environment for executing a
new program, load its code with the help of exec() system call and transfer control to it.

A thread is a means of forking a program inside a process. The process can contain several threads each
of which is executed independently from the others keeping its own values of registers and its own
stack. Threads have access to all the global resources of the executed process, for example, open files or
memory segments. Each process has at least one executed thread, the main thread. When solution of
the task comes to the section allowing paralleling, the processor spawns the necessary number of
additional threads executed simultaneously, solving their subtasks and returning the results to the main
thread.
To solve complex tasks it is not enough just to perform parallel execution of different code sections. You
need also to arrange exchange of the results of their operation, that is organize interaction between
these sections. One would think, in the case of threads it could be easily implemented by providing
access to the defined common area of main memory. But this simplicity is deceptive because operations
of writing into main memory by different threads can intersect very unpredictably so that the integrity
of the information kept in this memory area will be broken.

Example of threads' interaction
Imagine a billing system of a cellular operator executed on a powerful multi-processor server. It's no
wonder that charge-off from a client's account for the provided services is performed by one program
thread while charge of payments on the client's account is performed by another thread. The first
operation can be written in the form: S := S - A where S - balance of the account at the moment of
operation, A - cost of phone calls made. The processor will execute this operation in three steps: at first
the source value of S variable will be received, then A will be subtracted from it, after that the result will
be written into S cell. Similarly the second operation is performed: S := S + B where B - sum of money
paid by the client. Suppose the client make a call some time later after he has paid through a bank
office. As charge of payments on the account occurs with some delay due to objective reasons, it is
possible that both operations - charge-off and charge-on - will be executed in the billing system nearly at
the same time. Suppose that charge-off operation be executed first and the charge-on operation follows
it with one-beat lag (figure 1). After the three beats of the charge-off operation the current account
balance will equal S-A but the fourth beat will make this value S+B.
Figure 1 - An example of a billing system's operation

Thus, the final balance of the account will equal S+B instead of the right S-A+B value. In a different
situation such a collision could be not so good for the client. That's why to avoid such problems it is
necessary to take special measures to protect separate program sections from simultaneous execution.

In the next article sections about synchronization and interaction of simultaneously executed program
sections the terms "process" and "thread" will be used together as the described mechanisms are in
most cases applicable to both notions.


Synchronization...
Modern operation systems offer a wide range of means for synchronizing parallel processes. The
simplest way for synchronizing simultaneously executed code sections is waiting of one thread for
termination of execution of the other. The same mechanism provides support of operation systems
executing asynchronous operations, such as file input-output or data exchange through network.
Besides, functions of waiting can be used at mutual announcing about the events taking place (figure 2).
Figure 2 - Event is an object of operation systems which can switch its state from "common" to "signal"

Thus, for example, while developing a system of bank trading day an event can be registered for
announcing about receipt of funds to a client's account. This event can be waited for by the thread
performing execution of client pay directions.

If there is a possibility that different program sections can simultaneously modify some variable it is
necessary to implement protection of this variable. For example, if one thread deals with calculations of
a game scene and the other deals with its repainting, the latter should be performed only after all the
necessary calculations are done. For this purpose a special object mutex is introduced into operation
systems (English "mutual execution").




   Figure 3 - Mutex is a system object which can be in one of the two internal states - "free" or "busy".
Before a thread begins to change the variable it should perform capture of the mutex corresponding to
this variable. If it succeeds the mutex changes its internal state and the thread continues execution. If
the other thread tries to capture the corresponding mutex while the first thread works with the variable,
it will be rejected and it will have to wait until the mutex is released by the first thread (figure 3). Thus,
at a concrete moment of time it is guaranteed that only one section works with the variable.

While programming there can be situations when you need to modify several global variables in one
code section. If there is a possibility that this code will be executed by several threads simultaneously,
you should implement its protection. Specially for such cases operation systems offer to arrange the
corresponding code into a critical section.




   Figure 4 - Critical section is a program section which can be executed by only one thread at a time.

The critical section can be executed by only one thread at a time. The other thread can inquire if the
critical section is available for execution and in case of negative answer can either wait until it is free or
execute another operation (figure 4). Critical sections differ from mutexes also in that one and the same
thread can enter the critical section it has captured many times while when trying to capture a mutex
the thread's execution will stop and a so called "deadlock" can occur.

Suppose we have a set of single-type resources, for example, several windows for displaying information
or a pool of network printers. To keep record of distribution of such resources between processors we
use semaphores.
Figure 5 - Semaphore allows you to keep record of distribution of single-type resources between threads

Semaphore is a system object which is a counter. Before a thread starts working with the resource from
the set it should address the semaphore associated with it. If the value of the semaphore counter is
greater than zero the thread is allowed to work with the resource while the counter's value decrements
by one (figure 5). But if all the resources of the set are used the thread will be enqueued. When the
resource is free the thread announces the semaphore about it, the semaphore's value rises and the
resource can be used again.

Sometimes it is necessary to urgently stop execution of a process to perform some urgent actions. Some
operation systems use signals for this purpose. A process which has to execute a special operation sends
a signal to the other process. When writing this other process a procedure of processing signals should
be implemented: when the process gets a signal it can suspend execution or terminate its work, execute
a special subprogram created for such a case or just ignore the signal.


... and interaction
As it was said above, to solve a common task together processors may need not only to coordinate their
work but also to exchange information - for example, intermediate results. For this purpose you can use
main-memory sharing, information exchange through a file, transfer of data through unidirectional
pipes or imitation of network operation (figure 6).
Figure 6 - Ways of exchanging data between processors

The quickest and most obvious way of information exchange between simultaneously executed code
sections is to use the shared area of main memory. One process writes data into memory, the other
reads them and vice versa. But in this case you need to implement synchronization of the processes by
one of the above mentioned ways. If information exchange between the threads through the shared
memory can be implemented directly the processes should at first request the necessary memory
segment from operation systems and coordinate the procedure of its usage.

The next method of information exchange is using pipes. Pipe is a system object allowing you to transfer
information in one direction, as a rule. The most known examples of pipes are standard input/output
streams (stdin, stdout). Stdout of one process can be directed into the stdin of the other. After this the
information being written into stdout of the first process can be read by the second process. Operation
system allow us to create additional pipes. To work with pipes we use functions whose syntax is similar
to functions of file operations.

You can perform data exchange with the help of files as well. Modern operation systems have
embedded mechanisms of buffering information participating in file operations that's why this method
is rather effective. But from the security viewpoint this approach yields to those described above as an
unauthorized application can get access to the file used for inter-processor exchange. Before using this
method you should study the guidance on implementation of file operations in a concrete operation
system and take measures to protect data.

Duplex data exchange between processes can be implemented by operation system network means.
With the help of sockets two processes can establish a channel between them to transfer data in the
same way as a browser and a HTTP-server. The format of the data being transferred in this case can be
absolutely of any kind - the point is that the processes use the same agreements of exchange procedure,
the protocol. Moreover, nothing prevents these processes from being executed on different computers.
And with this last notice we pass on to the next article section.


Supercomputers - everywhere!
Speaking about parallel execution of programs, we suggested that we use a computer with several
processors. But such a computer is not the only platform for executing parallel programs. Let's consider
a simplest computational network consisting of ten common one-processor computers united by a
FastEthernet twisted pair. Looks familiar, doesn't it? What is each computer in fact? Generally speaking,
it is a processor with main and disk memory available to it. Each computer is a perfect area for executing
one process. And what if on each of the ten computers we launch simultaneously programs which solve
subtasks of cracking the password to an archive? Obviously, solution of the task will demand much less
time than in case of using one computer. And what if the network consists not of 10 but of 100
computers? And this network is linked to the Internet?

From this example we see that even small local networks have a great potential computational power.
And if you hardly can use this power for computer games, for CAD systems it will be good enough. A
group of computers united by the bus of transferring data into a single computational system is called a
cluster system or cluster. You need only to write a special program in which you should implement the
possibility of information exchange between its simultaneously executed copies (processes) through the
local network. Generally speaking, this program may have traditional client-server architecture, i.e. be
able both to send messages to its neighbors and receive messages from them. But there already exist
ready solutions for simplifying development of programs for cluster systems, for example, MPI and
PVM, which offer a programmer means for performing both point-to-point and collective interaction
between processes executed on the nodes of the cluster system, and also the methods of their
synchronization.

MPI specification (Message Passing Interface) offers a programming model in which a program spawns
several processes interacting with the help of addressing subprograms of message passing and
receiving. Its implementations are libraries of subprograms which can be used in C/C++ and Fortran
languages. When launching an MPI-program a fixed number of processes is created. MPI specification
has been created as an industry standard that's why all its means are focused on getting high
performance when used on symmetrical multi-processor systems and homogeneous cluster systems
(supercomputers). There is a free implementation of MPI - MPICH (MPI CHamelion), whose Windows
and Linux versions are available for download here: http://www.viva64.com/go.php?url=405. Being
installed on a PC MPICH system can perform development and debugging of programs for further usage
without any modifications on clusters or supercomputers.

Unlike MPI the system of development and execution of parallel programs PVM (Parallel Virtual
Machine) has been created within the framework of a research project meant for heterogeneous
computational complexes. It allows you to quickly unite a heterogeneous set of computers connected
through network into a computational resource which is called "Parallel Computing Machine". The
computers may have different architectures and operate on different OS. PVM system is a set of
libraries and utilities meant for development and debugging of parallel programs and also for controlling
configuration of the Virtual Computing Machine. C/C++ and Fortran languages are supported.
Configuration of the computational complex can change dynamically by means of excluding some nodes
and adding new ones. Such universality is possible also because of some decrease of performance in
comparison with MPI system.

Thus, both MPI and PVM allow you without much effort to turn the local network of your organization
into a powerful computing machine able to solve complex tasks.

We should admit that parallel computational systems are very common nowadays. Operation system
provide developers with the necessary low-level service, while for solving applied tasks there exist ready
proved means in the form of specialized utilities and libraries for popular high-level programming
languages. And a programmer should master methods of parallel program development to keep up with
the modern tendencies.

Más contenido relacionado

Similar a Parallel programs for multi-core PCs

Concurrency and Parallelism, Asynchronous Programming, Network Programming
Concurrency and Parallelism, Asynchronous Programming, Network ProgrammingConcurrency and Parallelism, Asynchronous Programming, Network Programming
Concurrency and Parallelism, Asynchronous Programming, Network ProgrammingPrabu U
 
Introduction into the problems of developing parallel programs
Introduction into the problems of developing parallel programsIntroduction into the problems of developing parallel programs
Introduction into the problems of developing parallel programsPVS-Studio
 
Multithreading 101
Multithreading 101Multithreading 101
Multithreading 101Tim Penhey
 
But is it Art(ificial Intelligence)?
But is it Art(ificial Intelligence)? But is it Art(ificial Intelligence)?
But is it Art(ificial Intelligence)? Alan Sardella
 
Real-world Concurrency : Notes
Real-world Concurrency : NotesReal-world Concurrency : Notes
Real-world Concurrency : NotesSubhajit Sahu
 
Report on High Performance Computing
Report on High Performance ComputingReport on High Performance Computing
Report on High Performance ComputingPrateek Sarangi
 
unixlinux - kernelexplain yield in user spaceexplain yield in k.pdf
unixlinux - kernelexplain yield in user spaceexplain yield in k.pdfunixlinux - kernelexplain yield in user spaceexplain yield in k.pdf
unixlinux - kernelexplain yield in user spaceexplain yield in k.pdfPRATIKSINHA7304
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfKishaKiddo
 
computer notes - Introduction to operating system
computer notes - Introduction to operating systemcomputer notes - Introduction to operating system
computer notes - Introduction to operating systemecomputernotes
 
Software_Documentation
Software_DocumentationSoftware_Documentation
Software_DocumentationSven Roesner
 
Computer Programming Grade 9
Computer Programming Grade 9Computer Programming Grade 9
Computer Programming Grade 9Jay Mungcal
 
Computer Programming Grade 9 for Students
Computer Programming Grade 9 for StudentsComputer Programming Grade 9 for Students
Computer Programming Grade 9 for StudentsJayMungcal
 

Similar a Parallel programs for multi-core PCs (20)

Concurrency and Parallelism, Asynchronous Programming, Network Programming
Concurrency and Parallelism, Asynchronous Programming, Network ProgrammingConcurrency and Parallelism, Asynchronous Programming, Network Programming
Concurrency and Parallelism, Asynchronous Programming, Network Programming
 
Introduction into the problems of developing parallel programs
Introduction into the problems of developing parallel programsIntroduction into the problems of developing parallel programs
Introduction into the problems of developing parallel programs
 
Concurrency and parallel in .net
Concurrency and parallel in .netConcurrency and parallel in .net
Concurrency and parallel in .net
 
Multithreading 101
Multithreading 101Multithreading 101
Multithreading 101
 
But is it Art(ificial Intelligence)?
But is it Art(ificial Intelligence)? But is it Art(ificial Intelligence)?
But is it Art(ificial Intelligence)?
 
Parallel computing persentation
Parallel computing persentationParallel computing persentation
Parallel computing persentation
 
Real-world Concurrency : Notes
Real-world Concurrency : NotesReal-world Concurrency : Notes
Real-world Concurrency : Notes
 
GCF
GCFGCF
GCF
 
Cloud C
Cloud CCloud C
Cloud C
 
Computer Science Assignment Help
Computer Science Assignment HelpComputer Science Assignment Help
Computer Science Assignment Help
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
 
Report on High Performance Computing
Report on High Performance ComputingReport on High Performance Computing
Report on High Performance Computing
 
Seminar
SeminarSeminar
Seminar
 
unixlinux - kernelexplain yield in user spaceexplain yield in k.pdf
unixlinux - kernelexplain yield in user spaceexplain yield in k.pdfunixlinux - kernelexplain yield in user spaceexplain yield in k.pdf
unixlinux - kernelexplain yield in user spaceexplain yield in k.pdf
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdf
 
Isometric Making Essay
Isometric Making EssayIsometric Making Essay
Isometric Making Essay
 
computer notes - Introduction to operating system
computer notes - Introduction to operating systemcomputer notes - Introduction to operating system
computer notes - Introduction to operating system
 
Software_Documentation
Software_DocumentationSoftware_Documentation
Software_Documentation
 
Computer Programming Grade 9
Computer Programming Grade 9Computer Programming Grade 9
Computer Programming Grade 9
 
Computer Programming Grade 9 for Students
Computer Programming Grade 9 for StudentsComputer Programming Grade 9 for Students
Computer Programming Grade 9 for Students
 

Último

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Último (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Parallel programs for multi-core PCs

  • 1. Parallel programs to multi-processor computers! Author: Igor Oreschenkov Date: 04.02.2009 Abstract For instance, when a man cannot manage everything he creates a tally for himself, brainless, irresponsible, able only to solder pins, or carry heavy loads, or write from dictation but doing this very well. Arkadiy and Boris Strugatskie The article is an introduction into parallel programs for beginners. It was published in "PC World" magazine (http://www.viva64.com/go.php?url=429). Parallel program, a particular case of labor division Hardly would we have seen the world around us as we see it if in its time mankind didn't invented the principle of labor division. Due to this there are professionals specializing in a particular sphere and able to perform their work perfectly. Implementation of this approach in the sphere of industry allowed us to increase product release tenfold. That's why it's no wonder that when computers appeared there arouse the question of applying the principle of labor division to computer programs' operation. Research in this sphere allowed us to detect also a pleasing fact that complicated mathematical calculations related, for example, to matrix operations, can be split into independent subtasks. Each subtask can be executed separately and then you can get the result of solution of the source task by combining the results of subtasks' solutions. As subtasks are independent they can be solved simultaneously by several executors. If a computer takes the role of the executor this approach allows us to find solutions for computationally very complicated tasks of realistic modeling of real-environment objects, cryptanalysis, real-time control of technological processes. It is this principle according to which all the supercomputers - the pride of scientific laboratories and computational centers - operate. Having reached the limit of processors' performance increase relating to fundamental physical limits, computer manufacturers began to increase the number of computational nodes in their products. And now inside the chassis of a common PC you may find what has been earlier called a supercomputer - a multi-core computing device. Operation systems have been ready for this technology for a long time. Unix and Linux, OS/2, Mac OS and modern versions of Microsoft Windows show performance increase at simultaneous execution of several programs. For example, while you watch a film, antivirus scanning of your hard disk can be performed without your noticing it. But there are situations when solution of only one task can demand a lot of computational resources. Leaving aside the problem of nuclear fusion synthesis or cracking a password to an archive, let's think about such common things as music and video coding and computer games. Programs implementing these tasks enable fully the resources of CPU. But increase of the number of processors in the system won't give you marked performance gain in these examples if no special methods were used while developing the programs. Indeed, how does an
  • 2. operation system know the peculiarities of coding methods' implementation to be able to distribute calculations between processors? Can it know how to perform parallel calculation of a game-situation scene? Processes and threads - twins So, we have a multi-processor computer. We have analyzed the task and singled out the subtasks which can be solved simultaneously. How can we implement solution of these subtasks in a program? Modern operation systems offer us two variants of code execution - in the form of processes and in the form of threads. A process is operation of a program loaded into main memory and ready for execution. So, when we simultaneously perform two actions - serf the news in the Internet and burn files on a CD, two processes are executed simultaneously - an internet-browser and a CD-writing program are operating. A process consists of program code, operated data and various resources, for example, files or system queues belonging to the program. Each process is executed in its address space, i.e. has access only to its own area of main memory. On the one hand it is good because one process cannot interfere with the other (until they both address the non-divisible resource, for example, the disk drive, but it is a different thing) and errors in operation of one program in no way will influence operation of the other. But on the other hand, if processes are meant for solving one common task, we face the question: how will they exchange information and interact at all? Launch of processes - Unix and Windows What is interesting, developers of Mircosoft Windows and Unix (Linux) operation systems applied different approaches to the question of launching a child process. From the viewpoint of a programmer using WIN32 API everything is quite natural. In the point of launch of a support process CreateProcess() function is called whose one argument indicates the path to EXE-file with the program to be executed. In Unix-like operation system this is implemented in a different way. In the code point where we need to create a new process, fork() system call is used. As the result the operating program "forks", that is the state of the program at the moment of fork() execution is copied (by the state we understand the values of the CPU registers, stack, data area and list of open files) and on the basis of this copy a new process is launched possessing the same information as the parent process. Both parent and child processes continue executing the same program beginning with the command following fork() system call. As execution of one and the same work by two processors is senseless, we face the question: how to implement solution of an independent subtask in each process we've got? The point is that from the viewpoint of the parent process the result of fork() function is the identifier of the child process, and fork() simply returns zero value into the child process. Each process can identify itself by this code. And all the rest is a technical matter. The child process has only to prepare an environment for executing a new program, load its code with the help of exec() system call and transfer control to it. A thread is a means of forking a program inside a process. The process can contain several threads each of which is executed independently from the others keeping its own values of registers and its own stack. Threads have access to all the global resources of the executed process, for example, open files or memory segments. Each process has at least one executed thread, the main thread. When solution of the task comes to the section allowing paralleling, the processor spawns the necessary number of additional threads executed simultaneously, solving their subtasks and returning the results to the main thread.
  • 3. To solve complex tasks it is not enough just to perform parallel execution of different code sections. You need also to arrange exchange of the results of their operation, that is organize interaction between these sections. One would think, in the case of threads it could be easily implemented by providing access to the defined common area of main memory. But this simplicity is deceptive because operations of writing into main memory by different threads can intersect very unpredictably so that the integrity of the information kept in this memory area will be broken. Example of threads' interaction Imagine a billing system of a cellular operator executed on a powerful multi-processor server. It's no wonder that charge-off from a client's account for the provided services is performed by one program thread while charge of payments on the client's account is performed by another thread. The first operation can be written in the form: S := S - A where S - balance of the account at the moment of operation, A - cost of phone calls made. The processor will execute this operation in three steps: at first the source value of S variable will be received, then A will be subtracted from it, after that the result will be written into S cell. Similarly the second operation is performed: S := S + B where B - sum of money paid by the client. Suppose the client make a call some time later after he has paid through a bank office. As charge of payments on the account occurs with some delay due to objective reasons, it is possible that both operations - charge-off and charge-on - will be executed in the billing system nearly at the same time. Suppose that charge-off operation be executed first and the charge-on operation follows it with one-beat lag (figure 1). After the three beats of the charge-off operation the current account balance will equal S-A but the fourth beat will make this value S+B.
  • 4. Figure 1 - An example of a billing system's operation Thus, the final balance of the account will equal S+B instead of the right S-A+B value. In a different situation such a collision could be not so good for the client. That's why to avoid such problems it is necessary to take special measures to protect separate program sections from simultaneous execution. In the next article sections about synchronization and interaction of simultaneously executed program sections the terms "process" and "thread" will be used together as the described mechanisms are in most cases applicable to both notions. Synchronization... Modern operation systems offer a wide range of means for synchronizing parallel processes. The simplest way for synchronizing simultaneously executed code sections is waiting of one thread for termination of execution of the other. The same mechanism provides support of operation systems executing asynchronous operations, such as file input-output or data exchange through network. Besides, functions of waiting can be used at mutual announcing about the events taking place (figure 2).
  • 5. Figure 2 - Event is an object of operation systems which can switch its state from "common" to "signal" Thus, for example, while developing a system of bank trading day an event can be registered for announcing about receipt of funds to a client's account. This event can be waited for by the thread performing execution of client pay directions. If there is a possibility that different program sections can simultaneously modify some variable it is necessary to implement protection of this variable. For example, if one thread deals with calculations of a game scene and the other deals with its repainting, the latter should be performed only after all the necessary calculations are done. For this purpose a special object mutex is introduced into operation systems (English "mutual execution"). Figure 3 - Mutex is a system object which can be in one of the two internal states - "free" or "busy".
  • 6. Before a thread begins to change the variable it should perform capture of the mutex corresponding to this variable. If it succeeds the mutex changes its internal state and the thread continues execution. If the other thread tries to capture the corresponding mutex while the first thread works with the variable, it will be rejected and it will have to wait until the mutex is released by the first thread (figure 3). Thus, at a concrete moment of time it is guaranteed that only one section works with the variable. While programming there can be situations when you need to modify several global variables in one code section. If there is a possibility that this code will be executed by several threads simultaneously, you should implement its protection. Specially for such cases operation systems offer to arrange the corresponding code into a critical section. Figure 4 - Critical section is a program section which can be executed by only one thread at a time. The critical section can be executed by only one thread at a time. The other thread can inquire if the critical section is available for execution and in case of negative answer can either wait until it is free or execute another operation (figure 4). Critical sections differ from mutexes also in that one and the same thread can enter the critical section it has captured many times while when trying to capture a mutex the thread's execution will stop and a so called "deadlock" can occur. Suppose we have a set of single-type resources, for example, several windows for displaying information or a pool of network printers. To keep record of distribution of such resources between processors we use semaphores.
  • 7. Figure 5 - Semaphore allows you to keep record of distribution of single-type resources between threads Semaphore is a system object which is a counter. Before a thread starts working with the resource from the set it should address the semaphore associated with it. If the value of the semaphore counter is greater than zero the thread is allowed to work with the resource while the counter's value decrements by one (figure 5). But if all the resources of the set are used the thread will be enqueued. When the resource is free the thread announces the semaphore about it, the semaphore's value rises and the resource can be used again. Sometimes it is necessary to urgently stop execution of a process to perform some urgent actions. Some operation systems use signals for this purpose. A process which has to execute a special operation sends a signal to the other process. When writing this other process a procedure of processing signals should be implemented: when the process gets a signal it can suspend execution or terminate its work, execute a special subprogram created for such a case or just ignore the signal. ... and interaction As it was said above, to solve a common task together processors may need not only to coordinate their work but also to exchange information - for example, intermediate results. For this purpose you can use main-memory sharing, information exchange through a file, transfer of data through unidirectional pipes or imitation of network operation (figure 6).
  • 8. Figure 6 - Ways of exchanging data between processors The quickest and most obvious way of information exchange between simultaneously executed code sections is to use the shared area of main memory. One process writes data into memory, the other reads them and vice versa. But in this case you need to implement synchronization of the processes by one of the above mentioned ways. If information exchange between the threads through the shared memory can be implemented directly the processes should at first request the necessary memory segment from operation systems and coordinate the procedure of its usage. The next method of information exchange is using pipes. Pipe is a system object allowing you to transfer information in one direction, as a rule. The most known examples of pipes are standard input/output streams (stdin, stdout). Stdout of one process can be directed into the stdin of the other. After this the information being written into stdout of the first process can be read by the second process. Operation
  • 9. system allow us to create additional pipes. To work with pipes we use functions whose syntax is similar to functions of file operations. You can perform data exchange with the help of files as well. Modern operation systems have embedded mechanisms of buffering information participating in file operations that's why this method is rather effective. But from the security viewpoint this approach yields to those described above as an unauthorized application can get access to the file used for inter-processor exchange. Before using this method you should study the guidance on implementation of file operations in a concrete operation system and take measures to protect data. Duplex data exchange between processes can be implemented by operation system network means. With the help of sockets two processes can establish a channel between them to transfer data in the same way as a browser and a HTTP-server. The format of the data being transferred in this case can be absolutely of any kind - the point is that the processes use the same agreements of exchange procedure, the protocol. Moreover, nothing prevents these processes from being executed on different computers. And with this last notice we pass on to the next article section. Supercomputers - everywhere! Speaking about parallel execution of programs, we suggested that we use a computer with several processors. But such a computer is not the only platform for executing parallel programs. Let's consider a simplest computational network consisting of ten common one-processor computers united by a FastEthernet twisted pair. Looks familiar, doesn't it? What is each computer in fact? Generally speaking, it is a processor with main and disk memory available to it. Each computer is a perfect area for executing one process. And what if on each of the ten computers we launch simultaneously programs which solve subtasks of cracking the password to an archive? Obviously, solution of the task will demand much less time than in case of using one computer. And what if the network consists not of 10 but of 100 computers? And this network is linked to the Internet? From this example we see that even small local networks have a great potential computational power. And if you hardly can use this power for computer games, for CAD systems it will be good enough. A group of computers united by the bus of transferring data into a single computational system is called a cluster system or cluster. You need only to write a special program in which you should implement the possibility of information exchange between its simultaneously executed copies (processes) through the local network. Generally speaking, this program may have traditional client-server architecture, i.e. be able both to send messages to its neighbors and receive messages from them. But there already exist ready solutions for simplifying development of programs for cluster systems, for example, MPI and PVM, which offer a programmer means for performing both point-to-point and collective interaction between processes executed on the nodes of the cluster system, and also the methods of their synchronization. MPI specification (Message Passing Interface) offers a programming model in which a program spawns several processes interacting with the help of addressing subprograms of message passing and receiving. Its implementations are libraries of subprograms which can be used in C/C++ and Fortran languages. When launching an MPI-program a fixed number of processes is created. MPI specification has been created as an industry standard that's why all its means are focused on getting high performance when used on symmetrical multi-processor systems and homogeneous cluster systems (supercomputers). There is a free implementation of MPI - MPICH (MPI CHamelion), whose Windows
  • 10. and Linux versions are available for download here: http://www.viva64.com/go.php?url=405. Being installed on a PC MPICH system can perform development and debugging of programs for further usage without any modifications on clusters or supercomputers. Unlike MPI the system of development and execution of parallel programs PVM (Parallel Virtual Machine) has been created within the framework of a research project meant for heterogeneous computational complexes. It allows you to quickly unite a heterogeneous set of computers connected through network into a computational resource which is called "Parallel Computing Machine". The computers may have different architectures and operate on different OS. PVM system is a set of libraries and utilities meant for development and debugging of parallel programs and also for controlling configuration of the Virtual Computing Machine. C/C++ and Fortran languages are supported. Configuration of the computational complex can change dynamically by means of excluding some nodes and adding new ones. Such universality is possible also because of some decrease of performance in comparison with MPI system. Thus, both MPI and PVM allow you without much effort to turn the local network of your organization into a powerful computing machine able to solve complex tasks. We should admit that parallel computational systems are very common nowadays. Operation system provide developers with the necessary low-level service, while for solving applied tasks there exist ready proved means in the form of specialized utilities and libraries for popular high-level programming languages. And a programmer should master methods of parallel program development to keep up with the modern tendencies.