2. Objectives
To define what is a distributed system
To know the consequences of the
definition
To identify the challenges in designing and
building a distributed system
4. ♦ Data are Distributed
» If data must exist in multiple computers for admin and ownership
reasons
♦ Computation is Distributed
» Applications taking advantage of parallelism, multiple processors,
» particular feature
» Scalability and heterogeneity of Distributed System
♦ Users are Distributed
» If Users communicate and interact via application (shared objects)
5. Definition of A Distributed System
“A distributed system is a collection of independent
computers that appear to the users of the system as a
single computer.” [Tanenbaum]
“A distributed system is a collection of autonomous
computers linked by a network with software designed
to produce an integrated computing facility.”
[Coulouris, Dollimore, Kindberg]
“A system of multiple autonomous processing
elements, cooperating in a common purpose or to
achieve a common goal.” [Burns & Wellings 1997]
6. ”A system that consists of a collection of two or more
independent computers, that are connected by a
network, which coordinate their processing through the
exchange of synchronous or asynchronous message
passing. They may be on separate continents, in the
same building or the same room”.
Definition from our textbook
8. Introduction
Motivation for A Distributed System
Load balancing / distribution
breaking a problem into smaller pieces enables you to solve larger
problems without resorting to larger computer
MAINFRAME – 10 X faster but 1000 X expensive
Increased Processing Power
independent processors working on the same task
Distributed systems consisting of collections of microcomputers
may have processing powers that no single computer will ever
achieve
10000 CPUs, each running at 50 MIPS, yields 500000 MIPS
instruction to be executed in 0.002 nsec equivalent to light
distance of 0.6 mm any processor chip of that size would melt
immediately
9. Introduction
Motivation for A Distributed System
Fault tolerance
If any of the machine gets down, others can still run
Availability
Anytime, anywhere access
Resource sharing
All clients can be server or vice versa to provide resources
(data, files, services)
The main motivator for DS
15. Challenge: Heterogeneity
Heterogeneity = variety and difference
Heterogeneity of
underlying network infrastructure (ethernet, ISDN, token ring etc),
computer hardware and software (e.g., operating systems, compare UNIX
sockets and Winsock calls),
programming languages (java, C, python : in particular, data
representations),
implementations by different developers
Heterogeneity needs to be masked
Distributed Computing
16. determines whether the system can be
extended and re-implemented in various
ways
Determined primarily by the degree to
which new resource-sharing services can be
added and be made available for use by
variety of client programs
Detailed interfaces of components need to
be standardized and published.
Challenge: Openness
Distributed Computing
17. Security for information resources has three components:
Confidentiality
protection against disclosure to unauthorized individuals
Integrity
protection against alteration or corruption
Availability
protection against interference with the means of accessing the
resources
The challenge: sending sensitive information in a network
message in a secure manner efficiently
Not just to conceal the info but to ensure that the sender and
recipients are the rightful owners of the messages
Challenge: Security
Distributed Computing
18. Challenge: Scalability
A distributed system is scalable if it remains effective
as the number of users and/or resources increase
Challenges:
Controlling resource costs
Controlling performance loss
Preventing resources from running out
Avoiding performance bottlenecks
Distributed Computing
19. Challenge: Failure Handling
Failures more common than in centralized systems
and usually partial
Failure handling includes
Detection (may be impossible)
Masking/hiding
Tolerance
Recovery
Redundancy
Distributed Computing
20. Challenge: Failure Handling
Detection
Some possible (e.g., using transmission errors via
checksums)
Some impossible (crashed remote server vs. slow
remote server)
Challenge: manage failures that cannot be detected, but
suspected
Distributed Computing
21. Challenge: Failure Handling
Masking/hiding
Some failures can be hidden or made less severe
Replication in space/time
Space: e.g., writing to multiple disks
Time: e.g., transmission of multiple messages
May not work in worst cases, e.g., all disks may have
been corrupted
Distributed Computing
22. Challenge: Failure Handling
Tolerance
Sometimes not feasible to hide all failures
E.g., user has to tolerate if web service has failed rather than wait
until service is up again
Only feasible for certain classes of applications/systems, e.g., DNS
vs. Internet addresses
Recovery
Restoring a correct system state
Roll back using log files
Distributed Computing
23. Challenge: Failure Handling
Redundancy
Tolerate failures by using redundant components
Provided through replication
E.g., redundant routes in network, replication of name tables in
multiple domain name servers
Goal of failure handling: high availability
availability of a system is a measure of the proportion of time that
the system is available for use
Distributed Computing
24. Challenge: Concurrency
Concurrency control
Handling several simultaneous requests for a resource
Consistent scheduling of concurrent threads (so that
dependencies are preserved, e.g., in concurrent transactions)
Synchronized operations (semaphores)
Safest, but limits throughput
Shared objects/resources must guarantee correctness in a
concurrent environment
Avoidance of deadlocks
Distributed Computing
25. Challenge: Transparency
Concealing the heterogeneous and distributed nature
of the system so that it appears to the user like one
system
Eight types (ANSA/ISO)
access, location, concurrency, replication, failure,
mobility, performance and scaling transparencies
Distributed Computing
26. Challenge: Transparency
•Access transparency:
•enables local and remote resources to be accessed using identical
operations.
For instance, from a user's point of view, access to a remote service such as
a printer should be identical with access to a local printer.
From a programmers point of view, the access method to a remote object
may be identical to access a local object of the same class.
•E.g., Same user interface and operations offered in order to access either
local or remote resources
Distributed Computing
27. Challenge: Transparency
•Location transparency:
•enables resources to be accessed without knowledge of their location.
•The details of the topology of the system should be of no concern to the
user.
•The location of an object in the system may not be visible to the user or
programmer.
•This differs from access transparency in that both the naming and access
methods may be the same. Names may give no hint as to location.
•E.g., URL or e-mail addresses.
• www.google.com (IP address is the physical location)
Distributed Computing
28. Challenge: Transparency
•Concurrency transparency:
•enables several processes to operate concurrently using shared resources
without interference between them.
•E.g., no conflict occur when 2 or more users accessing the same system
Distributed Computing
29. Challenge: Transparency
•Replication transparency:
•enables multiple instances of resources to be used to increase reliability
and performance without knowledge of the replicas by users.
•This kind of transparency should be mainly incorporated for the distributed file
systems, which replicate the data at two or more sites for more reliability. The client
generally should not be aware that a replicated copy of the data exists. The clients
should also expect operations to return only one set of values.
•The examples are Distributed DBMS and Mirroring of Web pages.
•Failure transparency:
•enables the concealment of faults, allowing users and application
programs to complete their tasks despite the failure of hardware or
software components.
•E.g., retransmission of e-mail messages
•Mobility transparency:
•allows the movement of resources and clients within a system without
affecting the operation of users or programs.Distributed Computing
30. Challenge: Transparency
•Mobility transparency:
•allows the movement of resources and clients within a system without
affecting the operation of users or programs.
•E.g., caller and callee undergoing different places while on the phone
Distributed Computing
31. Challenge: Transparency
•Performance transparency:
•allows the system to be reconfigured to improve performance as loads
vary.
•Eg:Video On Demand (VOD) System
•Scaling transparency:
•allows the system and applications to expand in scale without change to
the system structure or the application algorithms.
•Eg: P2P apps
Distributed Computing
32. Transparency Description
Access
Hide differences in data representation and how a
resource is accessed
Location Hide where a resource is located
Migration Hide that a resource may move to another location
Relocation
Hide that a resource may be moved to another
location while in use
Replication Hide that a resource may be replicated
Concurrency
Hide that a resource may be shared by several
competitive users
Failure Hide the failure and recovery of a resource
Persistence
Hide whether a (software) resource is in memory or on
disk
33. 1. Name a program that is using distributed computing and freely
available to the masses
2. Name 1 research field that is relying heavily on distributed computing
34. Summary
A distributed system is a collection of independent
and autonomous computers that appear to the
users of the system as a single computer.”
Based on the above definition, there are three
significant consequences :
Concurrency
No global clock
Independent failures
Several challenges need to be addressed in
designing and building DS
Heterogeneity, openness, security, scalability, failure
handling, concurrency and transparency.
Distributed Computing