Distributed operating systems present users with an integrated computing platform that hides individual computers. They control all nodes in a network and allocate resources without user involvement. Distributed OS examples include cluster computer systems, V system, and Sprite. Middleware implements network-wide programming abstractions like RPC, event distribution, and resource discovery. The core OS functionality distributed OSs should provide for middleware includes encapsulation, protection, concurrent processing, and invocation mechanisms.
2. Introduction
Distributed OS
I. Presents users (and applications) with an integrated
computing platform that hides the individual computers.
II. Has control over all of the nodes (computers) in the network
and allocates their resources to tasks without user
involvement.
III. the user doesn't know (or care) where his programs are
running.
Ex:Cluster computer systems,V system, Sprite
Deepak John,CE-Poonjar
4. Middleware implements abstractions that support network-wide
programming. Examples:
RPC and RMI (Sun RPC, Corba, Java RMI)
event distribution and filtering (Corba Event Notification, Elvin)
resource discovery for mobile and ubiquitous computing
support for multimedia streaming
Deepak John,CE-Poonjar
5. Functions that OS should provide for middleware
1. Encapsulation
provide a set of operations that meet their clients’ needs
2. Protection
protect resource from illegitimate access
3. Concurrent processing
support clients access resource concurrently
4. Invocation mechanism: a means of accessing an encapsulated
resource
Communication : Pass operation parameters and results between
resource managers
Scheduling: Schedule the processing of the invoked operation
Deepak John,CE-Poonjar
7. Process manager
Handles the creation of and operations upon processes.
Thread manager
Thread creation, synchronization and scheduling
Communication manager
Communication between threads attached to different
processes on the same computer
Memory manager
Management of physical and virtual memory
Supervisor
Dispatching of interrupts, system call traps and other exceptions
control of memory management unit and hardware caches
processor and floating point unit register manipulations
Deepak John,CE-Poonjar
8. Kernel and Protection
Kernel
always runs
complete access privileges for the physical resources
Different execution mode
supervisor mode (kernel process) / user mode (user process)
system call trap: invocation mechanism for resources managed by
kernel
An address space: a collection of ranges of virtual memory locations,
in each of which a specified combination of memory access rights
applies, e.g.: read only or read-write
The cost for protection
switching between different processes take many processor cycles
a system call trap is a more expensive operation than a simple method
call
Deepak John,CE-Poonjar
9. Processes and Threads
Process
A program in execution
Problem: sharing between related activities are awkward and
expensive
a process consists of an execution environment together with
one or more threads.
An execution environment is a collection of local kernel managed
resources to which its threads have access. An execution
environment primarily consists of:
an address space;
thread synchronization and communication resources such as
semaphores and communication interfaces (for example, sockets);
higher-level resources such as open files and windows.
Deepak John,CE-Poonjar
10. Process address space
Stack
Text
Heap
Auxiliary
regions
0
2
N
Address space
a unit of management of a process’s
virtual memory
Up to 232 bytes and sometimes up to
264 bytes.
consists of one or more regions.
Region
an area of continuous virtual memory
that is accessible by the threads of the
owning process.
can be shared
1. kernel code
2. libraries
3. shared data & communication
4. Copy on write
Deepak John,CE-Poonjar
11. Creation of new process in distributed system
Creating process by the operation system
Fork, in UNIX
Process creation in distributed system
The choice of a target host
Choice of process host
running new processes at their originator’s computer
sharing processing load between a set of computers
Load sharing system
Centralized, Decentralized, Hierarchical
sender-initiated,receiver-initiated
Deepak John,CE-Poonjar
12. Load sharing policy
Transfer policy: situate a new process locally or remotely
Location policy: which node should host the new process
1. Static policy without regard to the current state of the system
2.Adaptive policy applies heuristics to make their allocation
decision
Creation of a new execution environment
1. Initializing the address space
Statically defined format
With respect to an existing execution environment,
e.g. fork()
2. Copy-on-write scheme
A technique that is used to reduce amount of data copy.
Deepak John,CE-Poonjar
13. Copy-on-write – a convenient optimization
a) Before write b) After write
Shared
frame
A's page
table
B's page
table
Process A’s address space Process B’s address space
Kernel
RA RB
RB copied
from RA
Deepak John,CE-Poonjar
14. Threads concept and implementation
Thread
threads were introduced as a lightweight – operating system
provided .
a process consists of its address-space, and a set of threads attached
to that process.
The operating system can perform less expensive context switches
between threads attached to the same process and threads attached to
the same process can access the same memory etc, such that
communication/ synchronisation can be much cheaper and less
awkward
Deepak John,CE-Poonjar
15. A server application generally
consists of:
A single thread, the receiver-
thread which receives all the
requests, places them in a queue
and dispatches those requests to
be dealt with by the worker-
threads
The worker-thread which deals
with the request may be a thread
in the same process or it may be
a thread in another
process
Deepak John,CE-Poonjar
16. Client and server with threads Worker pool
Server creates a fixed pool of
“worker” threads to process the
requests when it starts up.
The module marked ‘receipt
and queuing is typically
implemented by an ‘I/O’ thread,
Pro: simple
Cons
1. Inflexibility: worker threads
number unequal to current
request number
2. High level of switching
between the I/O and worker
thread
For single thread, server can handle 100 client
requests per second
Deepak John,CE-Poonjar
17. Alternative server threading architectures
Thread-per-request
Server spawn a new worker thread for each new request, destroy it
when the request processing finish.
Pro: throughput is potentially maximized.
Con: overhead of the thread creation and destruction
Deepak John,CE-Poonjar
18. Thread-per-connection
Server creates a new worker thread when client creates a connection,
destroys the thread when the client closes the connection.
Pro: lower thread management overheads compared with the thread-
per-request.
Con: client may be delayed while a worker thread has several
outstanding requests but another thread has no work to perform
Thread-per-object
Associate a thread with each remote object
Pro&Con are similar to thread-per-connection
Deepak John,CE-Poonjar
19. Threads versus multiple processes
Creating a thread is (much) cheaper than a process (~10-20times)
Switching to a different thread in same process is (much) cheaper
(5-50 times)
Threads within same process can share data and other resources
more conveniently and efficiently (without copying or messages)
Threads within a process are not protected from each other
Threads implementation
can be implemented:
in the OS kernel (Win NT, Solaris, Mach)
at user level e.g. by a thread library, or in the language (Ada, Java).
Deepak John,CE-Poonjar
20. Java thread constructor and management methods
• Thread(ThreadGroup group, Runnable target, String name)
Creates a new thread in the SUSPENDED state, which will
belong to group and be identified as name; the thread will
execute the run() method of target.
• setPriority(int newPriority), getPriority()
Set and return the thread’s priority.
• run()
A thread executes the run() method of its target object, if it has
one, and otherwise its own run() method (Thread implements
Runnable).
Methods of objects that inherit from class Thread
Deepak John,CE-Poonjar
21. • start()
Change the state of the thread from SUSPENDED to
RUNNABLE.
• sleep(int millisecs)
Cause the thread to enter the SUSPENDED state for the
specified time.
• yield()
Enter the READY state and invoke the scheduler.
• destroy()
Destroy the thread.
Deepak John,CE-Poonjar
22. Java thread synchronization calls
• thread.join(int millisecs)
Blocks the calling thread for up to the specified time until thread
has terminated.
• thread.interrupt()
Interrupts thread: causes it to return from a blocking method call
such as sleep().
• object.wait(long millisecs, int nanosecs)
Blocks the calling thread until a call made to notify() or
notifyAll() on object wakes the thread, or the thread is interrupted,
or the specified time has elapsed.
• object.notify(), object.notifyAll()
Wakes, respectively, one or all of any threads that have called
wait() on object. Deepak John,CE-Poonjar
23. Operating system architecture
Monolithic Kernel Microkernel
Server: Dynamically loaded server program:Kernel code and data:
.......
.......
Key:
S4
S1 .......
S1 S2 S3
S2 S3 S4
Monolithic kernel and microkernel
Deepak John,CE-Poonjar
24. Monolithic vs Microkernel
A monolithic kernel provides all of the services via a single
image, that is a single program initialized when the computer boots
A microkernel instead implements only the absolute minimum:
Basic virtual memory, Basic scheduling and Inter-process
communication
All other services such as device drivers, the file system,
networking etc are implemented as user-level server processes that
communicate with each other and the kernel via IPC
Deepak John,CE-Poonjar
25. The Microkernel Approach
The major advantages of the microkernel approach include:
Extensibility - major functionality can be added without modifying
the core kernel of the operating system
Modularity - the different functions of the operating system can be
forced into modularity behind memory protection barriers. A
monolithic kernel must use programming language features or code
conventions to attempt to ensure this
Robustness -relatively small kernel might be likely to contain fewer
bugs than a larger program, however, this point is rather contentious
Portability - since only a small portion of the operating system, its
smaller kernel, relies on the particulars of a given machine it is easier
to port to a new machine architecture
Deepak John,CE-Poonjar
26. The role of the microkernel
Middleware
Language
support
subsystem
Language
support
subsystem
OS emulation
subsystem
....
Microkernel
Hardware
The microkernel supports middleware via subsystems
Deepak John,CE-Poonjar
27. The Monolithic Approach
The major advantage of the monolithic approach is the relative
efficiency with which operations may be invoked. Since services
share an address space with the core of the kernel they need not make
system calls to access core-kernel functionality
Most operating systems in use today are a kind of hybrid solution
Linux is a monolithic kernel, but modules may be dynamically
loaded and unloaded at run time.
Deepak John,CE-Poonjar
29. Purposes of a Distributed File System
Sharing of storage and information across a network
Convenience (and efficiency) of a conventional file system
Persistent storage that most other services (e.g., Web servers) need
Files
Files are an abstraction of permanent storage.
A file is typically defined as a sequence of similar-sized data items
along with a set of attributes.
A directory is a file that provides a mapping from text names to
internal file identifiers.
Deepak John,CE-Poonjar
30. File attributes
updated
by system:
File length
Creation timestamp
Read timestamp
Write timestamp
Attribute timestamp
Reference count
Owner
File type
Access control list
E.g. for UNIX: rw-rw-r--
updated
by owner:
Deepak John,CE-Poonjar
31. File Systems
Responsible for the (a) organization, (b)storage, (c) retrieval, (d)
naming, (e)sharing, and (f) protection of files.
Provide a set of programming operations that characterize the file
abstraction,particularly operations to read and write subsequences of
data items beginning at any point of a file.
Directory module: relates file names to file IDs
File module: relates file IDs to particular files
Access control module: checks permission for operation requested
File access module: reads or writes file data or attributes
Block module: accesses and allocates disk blocks
Device module: disk I/O and buffering
File
system
modules
Deepak John,CE-Poonjar
34. File Service Architecture
An architecture that offers a clear separation of the main concerns
in providing access to files is obtained by structuring the file
service as three components:
A flat file service
A directory service
A client module.
The Client module implements exported interfaces by flat file and
directory services on server side.
Deepak John,CE-Poonjar
35. Client computer Server computer
Application
program
Application
program
Client module
Flat file service
Directory service
Lookup
AddName
UnName
GetNames
Read
Write
Create
Delete
GetAttributes
SetAttributes
Deepak John,CE-Poonjar
36. Flat file service:
Concerned with the implementation of operations on the contents of file.
Unique File Identifiers (UFIDs) are used to refer to files in all requests
for flat file service operations.
Directory Service:
Provides mapping between text names for the files and their UFIDs.
Clients may obtain the UFID of a file by quoting its text name to
directory service. Directory service supports functions needed generate
directories, to add new files to directories.
Client Module:
It runs on each computer and provides integrated service (flat file and
directory) as a single API to application programs.
It holds information about the network locations of flat-file and directory
server processes; and achieve better performance through
implementation of a cache of recently used file blocks at the client.
Deepak John,CE-Poonjar
37. Flat file service operations
Read(FileId, i, n) -> Data
— throws BadPosition
If 1 ≤ i ≤ Length(File): Reads a sequence of up to n items
from a file starting at item i and returns it in Data.
Write(FileId, i, Data)
— throws BadPosition
If 1 ≤ i ≤ Length(File)+1: Writes a sequence of Data to a
file, starting at item i, extending the file if necessary.
Create() -> FileId Creates a new file of length 0 and delivers a UFID for it.
Delete(FileId) Removes the file from the file store.
GetAttributes(FileId) -> Attr Returns the file attributes for the file.
SetAttributes(FileId, Attr) Sets the file attributes
Deepak John,CE-Poonjar
38. Access control
UNIX checks access rights when a file is opened ,subsequent
checks during read/write are not necessary
distributed environment
a. server has to check.
b. stateless approaches
1. access check once when UFID is issued
• client gets an encoded "capability" (who can access and how)
• capability is submitted with each subsequent request
2. access check for each request.
Deepak John,CE-Poonjar
39. File Group
A collection of files that can be located
on any server or moved between
servers while maintaining the same
names.
Similar to a UNIX filesystem
Helps with distributing the load of
file serving between several servers.
File groups have identifiers which
are unique throughout the system
(and hence for an open system, they
must be globally unique).
Used to refer to file groups and
files
To construct a globally
unique ID we use some
unique attribute of the
machine on which it is
created, e.g. IP number,
even though the file group
may move subsequently.
IP address date
32 bits 16 bits
File Group ID:
Deepak John,CE-Poonjar
40. Case Study: Sun NFS
An industry standard for file sharing on local networks since the
1980s
An open standard with clear and simple interfaces.
OS independent
unix implementation
rpc
udp or tcp
Supports many of the design requirements already mentioned:
Transparency,heterogeneity,efficiency,fault tolerance
Limited achievement of:
Concurrency,replication,consistency,security
Deepak John,CE-Poonjar
41. NFS architecture
Client computer Server computer
UNIX
file
system
NFS
client
NFS
server
UNIX
file
system
Application
program
Application
program
Virtual file systemVirtual file system
Other
filesystem
UNIX kernel
system calls
NFS
protocol
(remote operations)
UNIX
Operations
on local files
Operations
on
remote files
Application
program
NFS
Client
Kernel
Application
program
NFS
Client
Client computer
Deepak John,CE-Poonjar
42. Virtual file system
access transparency
part of unix kernel
NFS file handle, 3 components:
filesystem identifier
different groups of files
i-node (index node)
structure for finding the file
i-node generation number
i-nodes are reused
incremented when reused
VFS
struct for each file system
v-node for each open file
file handle for remote file
i-node number for local file
Deepak John,CE-Poonjar
43. NFS access control and authentication
Stateless server, so the user's identity and access rights must be
checked by the server on each request.
In the local file system they are checked only on open()
Every client request is accompanied by the userID and groupID
Server is exposed to imposter attacks unless the userID and groupID
are protected by encryption
Kerberos has been integrated with NFS to provide a stronger and
more comprehensive security solution
Deepak John,CE-Poonjar
45. Mount service
the process of including a new file system is called mounting.
communicates with the mount process on the server in a mount
protocol.
Server maintains a table of clients who have mounted filesystems at
that server
Each client maintains a table of mounted file systems holding.
hard-mounted
user process is suspended until request is successful
when server is not responding
request is retried until it's satisfied
soft-mounted
if server fails, client returns failure after a small number of retries
user process handles the failure Deepak John,CE-Poonjar
46. THE MOUNT PROTOCOL:
The following operations occur:
1. The client's request is sent via RPC to the mount server ( on
server machine.)
2. Mount server checks export list containing
a) file systems that can be exported,
b) legal requesting clients.
c) It's legitimate to mount any directory within the legal
filesystem.
3. Server returns "file handle" to client.
4. Server maintains list of clients and mounted directories -- this is
state information! But this data is only a "hint" and isn't treated as
essential.
5. Mounting often occurs automatically when client or server boots.
Deepak John,CE-Poonjar
47. Local and remote file systems
jim jane joeann
usersstudents
usrvmunix
Client Server 2
. . . nfs
Remote
mount
staff
big bobjon
people
Server 1
export
(root)
Remote
mount
. . .
x
(root) (root)
Note: The file system mounted at /usr/students in the client is actually the sub-tree
located at /export/people in Server 1; the file system mounted at /usr/staff in the client is
actually the sub-tree located at /nfs/users in Server 2.
Deepak John,CE-Poonjar
49. Automounter
NFS client catches attempts to access 'empty' mount points and routes
them to the Automounter
Automounter has a table of mount points and multiple candidate
serves for each
it sends a probe message to each candidate server and then uses the
mount service to mount the filesystem at the first server to respond
Keeps the mount table small
Deepak John,CE-Poonjar
50. Server caching
caching file pages, directory/file attributes
read-ahead: prefetch pages following the most-recently read file
pages
delayed-write: write to disk when the page in memory is needed
for other purposes
"sync" flushes "dirty" pages to disk every 30 seconds
two write option
1. write-through: write to disk before replying to the client
2. cache and commit:
stored in memory cache
write to disk before replying to a "commit" request from the
client Deepak John,CE-Poonjar
51. Client caching
caches results of read, write, getattr, lookup, readdir
clients responsibility to poll the server for consistency
timestamp-based methods for consistency validation
Tc: time when the cache entry was last validated
Tm: time when the block was last modified at the server
cache entry is valid if:
1. T - Tc < t, where t is the freshness interval
t is adaptively adjusted:
files: 3 to 30 seconds depending on freq of updates
directories: 30 to 60 seconds
2. Tmclient = Tmserver
need validation for all cache accesses
Reading
Deepak John,CE-Poonjar
52. writing
dirty: modified page in cache
flush to disk: file is closed or sync from client
bio-daemon (block input-output)
read-ahead: after each read request, request the next file block from
the server as well
delayed write: after a block is filled, it's sent to the server
reduce the time to wait for read/write
Deepak John,CE-Poonjar
53. Summary for NFS
access transparency: same system calls for local or remote files
location transparency: could have a single name space for all files
(depending on all the clients to agree the same name space)
mobility transparency: mount table need to be updated on each client
(not transparent)
scalability: can usually support large loads, add processors, disks,
servers...
file replication: read-only replication, no support for replication of
files with updates
hardware and OS: many ports
fault tolerance: stateless and idempotent
consistency: not quite one-copy for efficiency
security: added encryption--Kerberos
efficiency: pretty efficient, wide-spread use
Deepak John,CE-Poonjar
54. Naming Services
Definition
Key benefits
Resource localization
Uniform naming
Device independent address (e.g., you can move domain
name/web site from one server to another server seamlessly).
In a Distributed System, a Naming Service is a specific service
whose aim is to provide a consistent and uniform naming of
resources, this allowing other programs or services to localize
them and obtain the required metadata for interacting with
them.
Deepak John,CE-Poonjar
55. Requirements for name spaces
Allow simple but meaningful names to be used
Potentially infinite number of names
Structured
to allow similar subnames without clashes
to group related names
Management of trust
Deepak John,CE-Poonjar
56. Iterative navigation
Client
1
2
3
A client iteratively contacts name servers NS1–NS3 in order to resolve a name
NS2
NS1
NS3
Name
servers
Iterative Navigation is the act of chaining multiple Naming Services in
order to resolve a single name to the corresponding resource.
Iterative Navigation is the act of chaining multiple Naming Services in
order to resolve a single name to the corresponding resource.
Deepak John,CE-Poonjar
57. Recursive Navigation
The Iterative Navigation can be…
Recursive:
it is performed by the naming server
the server becomes like a client for the next server
Non recursive:
it is performed by the client or the first server
if it is performed by the server it is “server controlled”
the server bounces back the next hop to its client
Deepak John,CE-Poonjar
58. Non-recursive and recursive server-controlled navigation
A name server NS1 communicates with other name servers on behalf of a client
Recursive
server-controlled
1
2
3
5
4
client
NS2
NS1
NS3
1
2
34
client
NS2
NS1
NS3
Non-recursive
server-controlled
DNS offers recursive navigation as an option, but iterative is the
standard technique. Recursive navigation must be used in domains
that limit client access to their DNS information for security reasons.
Deepak John,CE-Poonjar
59. The SNS - a Simple Name Service model
Stores attributes of named objects such as users, computers and
services and group names.
Users
Computers
Services
Email server, login info, encoded passwords,
home directory
Network addresses, architecture, OS, owner
Named object Value
Service address, version no.
Group Mailing lists, group1, group2,...
Deepak John,CE-Poonjar
60. SNS basic design requirements
Specify the Types of named objects:
users, services, computers and group names and directories.
Other types of objects may be integrated;
The names are used only within the organization;
Efficient name lookup;
Access control:
everyone can read but Authorized write;
Deepak John,CE-Poonjar
61. DNS - The Internet Domain Name System
A distributed naming database (specified in RFC 1034/1305)
Name structure reflects administrative structure of the Internet
Rapidly resolves domain names to IP addresses
exploits caching heavily
typical query time ~100 milliseconds
Scales to millions of computers
partitioned database
caching
replication
Deepak John,CE-Poonjar
62. Basic DNS algorithm for name resolution (domain name -> IP number)
• Look for the name in the local cache
• Try a superior DNS server, which responds with:
– another recommended DNS server
– the IP address (which may not be entirely up to date)
Deepak John,CE-Poonjar
63. DNS server functions and configuration
Main function is to resolve domain names for computers, i.e. to get
their IP addresses.
Other functions:
reverse resolution - get domain name from IP address
Host information - type of hardware and OS
Well-known services - a list of well-known services offered by a
host
Other attributes can be included (optional)
Deepak John,CE-Poonjar
64. DNS issues
Name tables change infrequently, but when they do, caching can
result in the delivery of stale data.
Clients are responsible for detecting this and recovering
Its design makes changes to the structure of the name space
difficult. For example:
merging previously separate domain trees under a new root
moving subtrees to a different part of the structure (e.g. if
Scotland became a separate country, its domains should all be
moved to a new country-level domain.)
.
Deepak John,CE-Poonjar