Scheduling -Issues in Load Distributing, Components for Load Distributing Algorithms,
Different Types Distributed of Load Distributing Algorithms, Fault-tolerant services Highly
available services, Introduction to Distributed Database and Multimedia system
2. Distributed System
Mr. Sagar Pandya
Information Technology Department
sagar.pandya@medicaps.ac.in
Course
Code
Course Name Hours Per
Week
Total Hrs. Total
Credits
L T P
IT3EL04 Distributed System 3 0 0 3 3
3. Reference Books
Text Book:
1. G. Coulouris, J. Dollimore and T. Kindberg, Distributed Systems: Concepts
and design, Pearson.
2. P K Sinha, Distributed Operating Systems: Concepts and design, PHI
Learning.
3. Sukumar Ghosh, Distributed Systems - An Algorithmic approach, Chapman
and Hall/CRC
Reference Books:
1. Tanenbaum and Steen, Distributed systems: Principles and Paradigms,
Pearson.
2. Sunita Mahajan & Shah, Distributed Computing, Oxford Press.
3. Distributed Algorithms by Nancy Lynch, Morgan Kaufmann.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
4. Unit-5
Scheduling
Issues in Load Distributing,
Components for Load Distributing Algorithms,
Different Types Distributed of Load Distributing Algorithms,
Fault-tolerant services Highly available services,
Introduction to Distributed Database and
Multimedia system
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
5. INTRODUCTION
The goal of distributed scheduling is to distribute a system’s load
across available resources in a way that optimizes overall system
performance while maximizing resource utilization.
The primary concept is to shift workloads from strongly laden
machines to idle or lightly loaded machines.
To fully utilize the computing capacity of the Distributed Systems,
good resource allocation schemes are required.
A distributed scheduler is a resource management component of a
DOS that focuses on dispersing the system’s load among the
computers in a reasonable and transparent manner.
The goal is to maximize the system’s overall performance.
A locally distributed system is made up of a group of independent
computers connected by a local area network.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
6. INTRODUCTION
A distributed scheduler is a resource management component of a
distributed operating system that focuses on judiciously and
transparently redistributing the load of the system among the
individual units to enhance overall performance.
Users submit tasks at their host computers for processing.
The need for load distribution arises in such environments because,
due to the random arrival of tasks and their random CPU service time
requirements, there is a good possibility that several computers are
idle or lightly loaded and some others are heavily loaded,
which would degrade the performance.
In real life applications there is always a possibility that one server or
system is idle while a task is being waited upon at another server.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
7. INTRODUCTION
Users submit tasks for processing at their host computers.
Because of the unpredictable arrival of tasks and their random CPU
service time, load distribution is essential in such an environment.
The length of resource queues, particularly the length of CPU
queues, are useful indicators of demand since they correlate closely
with task response time.
It is also fairly simple to determine the length of a queue.
However, there is a risk in oversimplifying scheduling decisions.
A number of remote servers, for example, could notice at the same
time that a particular site had a short CPU queue length and start a lot
of process transfers.
As a result, that location may become overburdened with processes,
and its initial reaction may be to try to relocate them.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
8. INTRODUCTION
We don’t want to waste resources (CPU time and bandwidth) by
making poor decisions that result in higher migration activity
because migration is an expensive procedure. Therefore, we need
proper load distributing algorithms.
Load on a system/node can correspond to the queue length of tasks/
processes that need to be processed.
Queue length of waiting tasks: proportional to task response time,
hence a good indicator of system load.
Distributing load: transfer tasks/processes among nodes.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
9. What is Load?
CPU queue length can act as a good indicator of load and it is also
easy to determine.
If the task transfer involves large delay then using CPU queue length
will make the load to accept more task while the already accepted
task are in transit.
When all the accepted task arrives at the node then the load becomes
overloaded and requires further task transfer.
This problem can be solved by artificially incrementing CPU queue
length whenever a task is accepted.
If the task transfer does not occur in a specified amount of time
period then time out occurs and if time out occurs then CPU queue
length is automatically decremented.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
10. INTRODUCTION
Static, dynamic, and adaptive load distribution algorithms are
available.
Static indicates that decisions on process assignment to processors
are hardwired into the algorithm, based on a priori knowledge, such
as that gleaned via an analysis of the application’s graph model.
Dynamic algorithms use system state information to make
scheduling decisions, allowing them to take use of under utilized
system resources at runtime while incurring the cost of gathering
system data.
To adjust to system loading conditions, an adaptive algorithm alters
the parameters of its algorithm.
When system demand or communication is high, it may lower the
amount of information required for scheduling decisions.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
11. Components of Load Distributing Algorithm
Components of Load Distributing Algorithm :A load distributing
algorithm has, typically, four components:- transfer, selection,
location and information policies.
Transfer Policy –
Determine whether or not a node is in a suitable state for a task
transfer.
Process Selection Policy –
Determines the task to be transferred.
Site Location Policy –
Determines the node to which a task should be transferred to when it
is selected for transfer.
Information Policy –
It is in-charge of initiating the gathering of system state data.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
12. Components of Load Distributing Algorithm
1. Transfer Policy
When a process is a about to be created, it could run on the local
machine or be started elsewhere.
Bearing in mind that migration is expensive, a good initial choice of
location for a process could eliminate the need for future system
activity.
Many policies operate by using a threshold.
If the machine's load is below the threshold then it acts as a potential
receiver for remote tasks.
If the load is above the threshold, then it acts as a sender for new
tasks.
Local algorithms using thresholds are simple but may be far from
optimal.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
13. Components of Load Distributing Algorithm
Transfer policy indicates when a node (system) is in a suitable state
to participate in a task transfer.
The most popular proposed concept for transfer policy is based on a
optimum threshold.
Thresholds are nothing but units of load.
When a load or task originates in a particular node and the number of
load goes beyond the threshold T, the node becomes a sender (i.e.
the node is overloaded and has additional task(s) that should be
transferred to another node).
Similarly, when the loads at a particular node falls bellow T it
becomes a receiver.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
14. Components of Load Distributing Algorithm
2. Process Selection Policy
A selection policy chooses a task for transfer.
This decision will be based on the requirement that the overhead
involved in the transfer will be compensated by an improvement in
response time for the task and/or the system.
Some means of knowing that the task is long-lived will be necessary to
avoid needless migration. This could be based perhaps on past history.
A number of other factors could influence the decision.
The size of the task's memory space is the main cost of migration.
Small tasks are more suited.
Also, for efficiency purposes, the number of location dependent calls
made by the chosen task should be minimal because these must be
mapped home transparently.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
15. Components of Load Distributing Algorithm
A selection policy determines which task in the node (selected by the
transfer policy), should be transferred.
If the selection policy fails to find a suitable task in the node, then the
transfer procedure is stopped until the transfer policy indicates that the
site is again a sender.
Here there are two approaches viz.: preemptive and non-pre-emptive.
Non-pre-emptive the approach is simple, we select the newly
originated task that has caused the node to be a sender, for migration.
But often this is not the best approach as the overhead incurred in the
transfer of task should be compensated for by the reduction in the
response time realised by the task.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
16. Components of Load Distributing Algorithm
Also there are some other factors, firstly the overhead incurred by the
transfer should be minimal (a task of small size carries less overhead)
and secondly, the number of location dependent system calls made
by the selected task should be minimal.
This phenomenon of location dependency is called location affinity
and must be executed at the node where the task originated because
they use resources such as windows, or mouse that only exist at the
node.
Other criteria to consider in a task selection approach are: first, the
overhead imposed by the transfer should be as low as possible, and
second, the number of location-dependent calls made by the selected
task should be as low as possible.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
17. Components of Load Distributing Algorithm
3. Site Location Policy
Once the transfer policy has decided to send a particular task, the
location policy must decide where the task is to be sent.
This will be based on information gathered by the information policy.
Polling is a widely used sender-initiated technique.
A site polls other sites serially or in parallel to determine if they are
suitable sites for a transfer and/or if they are willing to accept a
transfer.
Nodes could be selected at random for polling, or chosen more
selectively based on information gathered during previous polls. The
number of sites polled may vary.
A receiver-initiated scheme depends on idle machines to announce
their availability for work.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
18. Components of Load Distributing Algorithm
The goal of the idle site is to find some work to do. An interesting
idea is for it to offer to do work at a price, leaving the sender to make
a cost/performance decision in relation to the task to be migrated.
Polling is a widely used approach for locating a suitable node. In
polling, a node polls another node to see if it is a suitable load-
sharing node.
Nodes can be polled sequentially or concurrently.
A site polls other sites in a sequential or parallel manner to see
whether they are acceptable for a transfer and/or if they are prepared
to accept one.
For polling, nodes could be chosen at random or more selectively
depending on information obtained during prior polls.
It’s possible that the number of sites polled will change.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
19. Components of Load Distributing Algorithm
4. Information Policy – The information policy is in charge of
determining when information regarding the states of the other nodes
in the system should be collected. Most information policies fall into
one of three categories.
Demand driven – Using sender initiated or receiver initiated polling
techniques, a node obtains the state of other nodes only when it
desires to get involved in either sending or receiving tasks.
Because their actions are dependent on the status of the system,
demand-driven policies are inherently adaptive and dynamic.
The policy here can be sender initiative : sender looks for receivers
to transfer the load, receiver initiated – receivers solicit load from the
senders and symmetrically initiated – a combination of both sender
& receiver initiated.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
20. Components of Load Distributing Algorithm
Periodic – At regular intervals, nodes exchange data. To inform
localization algorithms, each site will have a significant history of
global resource utilization over time. At large system loads, the
benefits of load distribution are negligible, and the periodic exchange
of information may thus be an unnecessary overhead.
State change driven –When a node’s state changes by a specific
amount, it sends out state information. This data could be forwarded
to a centralized load scheduling point or shared with peers.
It does not collect information about other nodes like demand-driven
policy. This policy does not alter its operations in response to
changes in system state.
For example, if the system is already overloaded, exchanging system
state information on a regular basis will exacerbate the problem.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
21. Design Issues for Processor Allocation Algorithms
The major design decisions can be summed up as follows::
Deterministic versus Heuristic
Deterministic algorithms are appropriate when everything about
process behavior is known in advance.
If the computing, memory and communication requirements of all
processes can be established, then a graph may be constructed
depicting the system state.
The problem is to partition the graph into a number of subgraphs
according to a stated policy so that each sub graph of processes is
mapped onto one machine.
This is achieved subject to the resource constraints imposed by each
machine.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
22. Design Issues for Processor Allocation Algorithms
Heuristics refer to principles used in making decisions when all
possibilities cannot be fully explored.
For example, consider a site location algorithm where a machine
sends out a probe to a randomly chosen machine, asking if its
workload is below some threshold.
If not, it probes another and if no suitable host is found within N
probes, the algorithm terminates and the process runs on the
originating machine.
Optimal versus Suboptimal Algorithms: Optimal solutions require
extensive system information and devote significant time to analysis.
A study of the complexity of this analysis versus the success of
solutions might reveal that a simple suboptimal algorithm can yield
acceptable results and will be easier to implement.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
23. Design Issues for Processor Allocation Algorithms
Deterministic algorithms impose excessive costs on all modules
within the policy hierarchy and are not scaleable, but can achieve
optimal results.
Heuristic techniques are invariably less expensive and often
demonstrate acceptable but suboptimal results.
Local versus Global Algorithms
When a process is being considered for migration and a new
destination is being selected, there is a choice between allowing this
decision to be made in isolation by the current host or, to require
some consideration of the status of the intended destination.
It may be better to collect information about load elsewhere before
deciding whether the current host is under greater pressure or not.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
24. Design Issues for Processor Allocation Algorithms
One globally continuous technique, is to associate an 'income' with
each process so that the larger this value is relative to other processes
at this site, the greater the percentage of processor cycles received.
This income is adjusted based on the operational characteristics
revealed by the process to the operating system.
For heavily loaded sites the percentage of processor cycles received
makes poor economic sense and processes migrate to where they can
obtain more cycles per current income.
Sender Initiated versus Receiver Initiated Algorithms
Maintaining a complete database of idle or lightly loaded machines
for migration can be a challenging problem in a distributed system.
Various alternatives to this have been proposed.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
25. Design Issues for Processor Allocation Algorithms
With sender initiated schemes the overloaded machine takes the
initiative by random polling or broadcasting and waiting for replies.
With receiver initiated schemes an idle machine offers itself for work
to a group of machines and accepts tasks from them.
In some situations a machine can become momentarily idle even
though on average it is reasonably busy.
As discussed earlier, care must be taken to coordinate migration
activity so that senders do not capitalize on this transient period and
send a flood of processes to this node.
One approach is to assign each site a local load value and an export
load value. The local load value reflects the true state of the machine
while the export load value decreases more slowly to dampen short-
lived fluctuations.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
26. Types of Load Distribution Algorithms:
Load Distribution Algorithms are classified as static, dynamic or
adaptive. In static algorithms all load distribution decisions are
hardwired in an algorithm on the basis of prior knowledge of the
system.
Dynamic algorithms make their load distribution decision on the
basis of current system state, so they have more scope of
improvement as compared to static algorithm.
However the overhead incurred in correcting the system state
information may overweigh the load benefit of load distribution.
Adaptive Algorithms are a special class of dynamic algorithms and
they can adapt their activities by changing the parameter of the
algorithm to suite the system state.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
27. Types of Load Distribution Algorithms:
Example, at high system load, dynamic algorithms continue the
correction of system state information thereby further increasing the
system load. On the other hand, adaptive algorithms will discontinue
this procedure.
Load Balancing v/s Load Sharing :
Load Distribution Algorithms are further classified as load balancing
and load sharing.
The main aim of both the algorithms is to reduce the unshared state
(a system in which some sites are idle while the other sites are
heavily loaded) by transferring the task.
Load Balancing algorithms goes one step ahead by equally
distributing the load on all computers.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
28. Types of Load Distribution Algorithms:
Such algorithms have more task transfer so the overhead incurred in
task transfer may overweigh the potential performance improvement.
Preemptive v/s Non- Preemptive Transfer:
Preemptive tasks are those tasks which are partially executed.
Their transfer is an expansive up gradation as the task state
consisting of virtual memory image , process control block, list of
higher buffers and timers is also to be transferred.
Non-Preemptive tasks are those tasks which have not begin their
execution such task transfer are also called task placement. In both
type of task transfer, the environment (such as the privileges
inherited by task ) in which the terms are to be executed is also
transferred.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
29. Static and Dynamic Load Balancing Algorithms
Static and Dynamic Load Balancing Algorithms:
Static Load Balancing In static algorithm the processes are assigned
to the processors at the compile time according to the performance of
the nodes. Once the processes are assigned, no change or
reassignment is possible at the run time.
Number of jobs in each node is fixed in static load balancing
algorithm.
Static algorithms do not collect any information about the nodes.
The assignment of jobs is done to the processing nodes on the basis
of the following factors: incoming time, extent of resource needed,
mean execution time and inter-process communications.
Since these factors should be measured before the assignment, this is
why static load balance is also called probabilistic algorithm.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
30. Static and Dynamic Load Balancing Algorithms
As there is no migration of job at the runtime no overhead occurs or
a little over head may occur.
Since load is balanced prior to the execution, several fundamental
flaws with static load balancing even if a mathematical solution
exist: Very difficult to estimate accurately the execution times of
various parts of a program without actually executing the parts.
Communication delays that vary under different circumstances Some
problems have an indeterminate number of steps to reach their
solution.
In static load balancing it is observed that as the number of tasks is
more than the processors, better will be the load balancing.
Fig shows a schematic diagram of static load balancing where local
tasks arrive at the assignment queue.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
31. Static and Dynamic Load Balancing Algorithms
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
32. Static and Dynamic Load Balancing Algorithms
A job either be transferred to a remote node or can be assigned to
threshold queue from the assignment queue.
A job from remote node similarly be assigned to threshold queue.
Once a job is assigned to a threshold queue, it can not be migrated to
any node.
A job arriving at any node either processed by that node or
transferred to another node for remote processing through the
communication network.
The static load balancing algorithms can be divided into two sub
classes: optimal static load balancing and sub optimal static load
balancing.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
33. Static and Dynamic Load Balancing Algorithms
Dynamic Load Balancing During the static load balancing too much
information about the system and jobs must be known before the
execution. These information may not be available in advance.
A full study on the system state and the jobs quite tedious approach
in advance.
So, dynamic load balancing algorithm came into existence. The
assignment of jobs is done at the runtime.
In DLB jobs are reassigned at the runtime depending upon the
situation that is the load will be transferred from heavily loaded
nodes to the lightly loaded nodes.
In this case communication over heads occur and becomes more
when number of processors increase. In dynamic load balancing no
decision is taken until the process gets execution.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
34. Static and Dynamic Load Balancing Algorithms
This strategy collects the information about the system state and
about the job information.
As more information is collected by an algorithm in a short time,
potentially the algorithm can make better decision.
Dynamic load balancing is mostly considered in heterogeneous
system because it consists of nodes with different speeds, different
communication link speeds, different memory sizes, and variable
external loads due to the multiple.
The numbers of load balancing strategies have been developed and
classified so far for getting the high performance of a system.
Fig shows a simple dynamic load balancing for transferring jobs
from heavily loaded to the lightly loaded nodes.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
35. Static and Dynamic Load Balancing Algorithms
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
36. COMPARISON BETWEEN SLB and DLB
ALGORITHM
COMPARISON BETWEEN SLB and DLB ALGORITHM:
Some qualitative parameters for comparative study have been listed
below.
1. Nature: Whether the applied algorithm is static or dynamic is
determined by this factor.
2. Overhead: Involved In static load balancing algorithm
redistribution of tasks are not possible and there is no overhead
involved at runtime.
But a little overhead may occur due to the inter process
communications. In case of dynamic load balancing algorithm
redistribution of tasks are done at the run time so considerable over
heads may involve. Hence it clear that SLB involves a less amount of
overheads as compared to DLB.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
37. COMPARISON BETWEEN SLB and DLB
ALGORITHM
3. Utilization of Resource:
Though the response time is minimum in case of SLB, it has poor
resource utilization capability because it is impractical to get all the
submitted jobs to the corresponding processors will completed at the
same time that means there is a great chance that some would be idle
after completing their assigned jobs and some will remain busy due
to the absence of reassignment policy.
In case of dynamic algorithm since there is reassignment policy exist
at run time, it is possible to complete all the jobs approximately at
the same time. So, better resource utilization occurs in DLB.
4. Thrashing or Process Dumping: A processor is called in
thrashing if it is spending more time in migration of jobs than
executing any useful work.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
38. COMPARISON BETWEEN SLB and DLB
ALGORITHM
As the degree of migration is less, processor thrashing will be less.
So SLB is out of thrashing but DLB incurs considerable thrashing
due to the process migration during run time.
5. State Woggling
It corresponds to the frequent change of the status by the processors
between low and high. It is a performance degrading factor.
6. Predictability:
Predictability corresponds to the fact that whether it is possible to
predict about the behavior of an algorithm.
The behavior of the SLB algorithm is predictable as everything is
known before compilation.
DLB algorithm’s behavior is unpredictable, as everything is done at
run time.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
39. COMPARISON BETWEEN SLB and DLB
ALGORITHM
7. Adaptability:
Adaptability determines whether an algorithm will adjust by itself
with the change of the system state.
SLB has no ability to adapt with changing environment. But DLB
has that ability.
8. Reliability: Reliability of a system is concerned with if a node
fails still the system will work without any error.
SLB is not so reliable as there is no ability to adapt with the changing
of a system’s state. But DLB has adaptation power, so DLB is more
reliable.
9. Response Time: Response time measures how much time is taken
by a system applying a particular load balancing algorithm to
respond for a job.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
40. COMPARISON BETWEEN SLB and DLB
ALGORITHM
SLB algorithm has shorter response time because processors fully
involved in processing due to the absence of job transferring.
But DLB algorithm has larger response time because processors can
not fully involved in processing due to the presence of job
transferring policy.
10. Stability: SLB is more stable as every thing is known before
compilation and work load transfer is done. But DLB is not so stable
as SLB because it involves both the compile time assignment of jobs
and distribution of work load as needed.
11. Complexity: Involved SLB algorithms are easy to construct
while DLB algorithms are not so easy to develop because nothing is
known in advance. Although the dynamic load balancing is complex
phenomenon, the benefits from it is much more than its complexity .
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
41. Benefits of Load balancing
Benefits of Load balancing
a) Load balancing improves the performance of each node and hence
the overall system performance.
b) Load balancing reduces the job idle time
c) Small jobs do not suffer from long starvation
d) Maximum utilization of resources
e) Response time becomes shorter
f) Higher throughput
g) Higher reliability
h) Low cost but high gain
i) Extensibility and incremental growth
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
42. Introduction to Distributed Database System
A distributed database is basically a database that is not limited to
one system, it is spread over different sites, i.e, on multiple
computers or over a network of computers.
A distributed database system is located on various sites that don’t
share physical components.
This may be required when a particular database needs to be
accessed by various users globally.
It needs to be managed such that for the users it looks like one single
database.
Types:
1. Homogeneous Database:
In a homogeneous database, all different sites store database
identically.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
43. Introduction to Distributed Database System
The operating system, database management system and the data
structures used – all are same at all sites. Hence, they’re easy to
manage.
2. Heterogeneous Database:
In a heterogeneous distributed database, different sites can use
different schema and software that can lead to problems in query
processing and transactions.
Also, a particular site might be completely unaware of the other sites.
Different computers may use a different operating system, different
database application.
They may even use different data models for the database.
Hence, translations are required for different sites to communicate.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
44. Introduction to Distributed Database System
A distributed database is a collection of multiple interconnected
databases, which are spread physically across various locations that
communicate via a computer network.
Features:
Databases in the collection are logically interrelated with each other.
Often they represent a single logical database.
Data is physically stored across multiple sites. Data in each site can
be managed by a DBMS independent of the other sites.
The processors in the sites are connected via a network. They do not
have any multiprocessor configuration.
A distributed database is not a loosely connected file system.
A distributed database incorporates transaction processing, but it is
not synonymous with a transaction processing system.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
45. Introduction to Distributed Database System
Distributed Data Storage :
There are 2 ways in which data can be stored on different sites. These
are:
1. Replication –
In this approach, the entire relation is stored redundantly at 2 or more
sites. If the entire database is available at all sites, it is a fully
redundant database. Hence, in replication, systems maintain copies of
data.
This is advantageous as it increases the availability of data at
different sites.
Also, now query requests can be processed in parallel.
However, it has certain disadvantages as well. Data needs to be
constantly updated.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
46. Introduction to Distributed Database System
Any change made at one site needs to be recorded at every site that
relation is stored or else it may lead to inconsistency.
This is a lot of overhead. Also, concurrency control becomes way
more complex as concurrent access now needs to be checked over a
number of sites.
2. Fragmentation –
In this approach, the relations are fragmented (i.e., they’re divided
into smaller parts) and each of the fragments is stored in different sites
where they’re required.
It must be made sure that the fragments are such that they can be used
to reconstruct the original relation (i.e, there isn’t any loss of data).
Fragmentation is advantageous as it doesn’t create copies of data,
consistency is not a problem.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
47. Introduction to Distributed Database System
Fragmentation of relations can be done in two ways:
Horizontal fragmentation – Splitting by rows –
The relation is fragmented into groups of tuples so that each tuple is
assigned to at least one fragment.
Vertical fragmentation – Splitting by columns –
The schema of the relation is divided into smaller schemas. Each
fragment must contain a common candidate key so as to ensure
lossless join.
Advantages of Distributed Databases:
Modular Development − If the system needs to be expanded to new
locations or new units, in centralized database systems, the action
requires substantial efforts and disruption in the existing functioning.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
48. Introduction to Distributed Database System
However, in distributed databases, the work simply requires adding new
computers and local data to the new site and finally connecting them to the
distributed system, with no interruption in current functions.
More Reliable − In case of database failures, the total system of centralized
databases comes to a halt. However, in distributed systems, when a
component fails, the functioning of the system continues may be at a reduced
performance. Hence DDBMS is more reliable.
Better Response − If data is distributed in an efficient manner, then user
requests can be met from local data itself, thus providing faster response. On
the other hand, in centralized systems, all queries have to pass through the
central computer for processing, which increases the response time.
Lower Communication Cost − In distributed database systems, if data is
located locally where it is mostly used, then the communication costs for data
manipulation can be minimized. This is not feasible in centralized systems.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
49. Introduction to Multimedia Database
Multimedia database is the collection of interrelated multimedia data
that includes text, graphics (sketches, drawings), images, animations,
video, audio etc and have vast amounts of multisource multimedia data.
The framework that manages different types of multimedia data which
can be stored, delivered and utilized in different ways is known as
multimedia database management system.
There are three classes of the multimedia database which includes static
media, dynamic media and dimensional media.
Content of Multimedia Database management system :
1. Media data – The actual data representing an object.
2. Media format data – Information such as sampling rate, resolution,
encoding scheme etc. about the format of the media data after it goes
through the acquisition, processing and encoding phase.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
50. Introduction to Multimedia Database
3. Media keyword data – Keywords description relating to the
generation of data. It is also known as content descriptive data.
Example: date, time and place of recording.
4. Media feature data – Content dependent data such as the
distribution of colors, kinds of texture and different shapes present in
data.
Types of multimedia applications based on data management
characteristic are :
1. Repository applications – A Large amount of multimedia data as
well as meta-data(Media format date, Media keyword data, Media
feature data) that is stored for retrieval purpose, e.g., Repository of
satellite images, engineering drawings, radiology scanned pictures.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
51. Introduction to Multimedia Database
2. Presentation applications – They involve delivery of multimedia
data subject to temporal constraint. Optimal viewing or listening
requires DBMS to deliver data at certain rate offering the quality of
service above a certain threshold. Here data is processed as it is
delivered. Example: Annotating of video and audio data, real-time
editing analysis.
3. Collaborative work using multimedia information – It involves
executing a complex task by merging drawings, changing
notifications. Example: Intelligent healthcare network.
There are still many challenges to multimedia databases:
1. Modelling – Working in this area can improve database versus
information retrieval techniques thus, documents constitute a
specialized area and deserve special consideration.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
52. Introduction to Multimedia Database
2. Design – The conceptual, logical and physical design of
multimedia databases has not yet been addressed fully as
performance and tuning issues at each level are far more complex as
they consist of a variety of formats like JPEG, GIF, PNG, MPEG
which is not easy to convert from one form to another.
3. Storage – Storage of multimedia database on any standard disk
presents the problem of representation, compression, mapping to
device hierarchies, archiving and buffering during input-output
operation. In DBMS, a ”BLOB”(Binary Large Object) facility allows
untyped bitmaps to be stored and retrieved.
4. Performance – For an application involving video playback or
audio-video synchronization, physical limitations dominate.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
53. Introduction to Multimedia Database
The use of parallel processing may alleviate some problems but such
techniques are not yet fully developed. Apart from this multimedia
database consume a lot of processing time as well as bandwidth.
5. Queries and retrieval – For multimedia data like images, video,
audio accessing data through query opens up many issues like
efficient query formulation, query execution and optimization which
need to be worked upon.
Areas where multimedia database is applied are :
Documents and record management : Industries and businesses
that keep detailed records and variety of documents. Example:
Insurance claim record.
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
54. Introduction to Multimedia Database
Knowledge dissemination : Multimedia database is a very effective
tool for knowledge dissemination in terms of providing several
resources. Example: Electronic books.
Education and training : Computer-aided learning materials can be
designed using multimedia sources which are nowadays very popular
sources of learning. Example: Digital libraries.
Marketing, advertising, retailing, entertainment and travel. Example:
a virtual tour of cities.
Real-time control and monitoring : Coupled with active database
technology, multimedia presentation of information can be very
effective means for monitoring and controlling complex tasks
Example: Manufacturing operation contro
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
58. Thank You
Great God, Medi-Caps, All the attendees
Mr. Sagar Pandya
sagar.pandya@medicaps.ac.in
www.sagarpandya.tk
LinkedIn: /in/seapandya
Twitter: @seapandya
Facebook: /seapandya