LinkedIn emplea cookies para mejorar la funcionalidad y el rendimiento de nuestro sitio web, así como para ofrecer publicidad relevante. Si continúas navegando por ese sitio web, aceptas el uso de cookies. Consulta nuestras Condiciones de uso y nuestra Política de privacidad para más información.
LinkedIn emplea cookies para mejorar la funcionalidad y el rendimiento de nuestro sitio web, así como para ofrecer publicidad relevante. Si continúas navegando por ese sitio web, aceptas el uso de cookies. Consulta nuestra Política de privacidad y nuestras Condiciones de uso para más información.
Scribd comenzará a operar SlideShare en el 24 de septiembre de 2020.A partir de esta fecha, Scribd empezará a gerenciar su cuenta, así como cualquier contenido que tenga en SlideShare. Se aplicarán las Condiciones generales de uso y la Política de privacidad de Scribd. Si no está de acuerdo, pedimos que cierre su cuenta de SlideShare. Para más detalles haga clic aquí.
Presentation was delivered in a fault tolerance class which talk about the achieving fault tolerance in databases by making use of the replication.Different commercial databases were studied and looked into the approaches they took for replication.Then based on the study an architecture was suggested for military database design using an asynchronous approach and making use of the cluster patterns.
What is replication
We all must be thinking how we can achieve fault
tolerance by the help of the replication
Replication in databases is nothing but storing the
same information in synchronization at multiple
location so that in cases of the primary databases
failure a replicated can takeover.
Availability and reliability
• A system goes down 1ms/hr has an availability of more
than 99.99%, but is unreliable.
• A system that never crashes but is shut down for a week
once every year is 100% reliable but only 98% available
There are two basic parameters to select when designing a
replication strategy: where and when.
Depending on when the updates are propagated:
• Synchronous (eager)
• Asynchronous (lazy)
Depending on Where the updates can take place:
• Primary Copy (master)
• Update Everywhere (group)
Location transparency is difficult to achieve in a distributed
environment. Local accesses are fast, remote accesses are slow. If
everything is local, then all accesses should be fast.
Failure resilience is also difficult to achieve. If a site fails, the data
it contains becomes unavailable. By keeping several copies of the
data at different sites, single site failures should not affect the
When replication is
implemented in industry
When evaluating a commercial replication strategy, keep in
• The customer base (who is going to use it?).
• The underlying database (what can the system do?).
• What competitors are doing (market pressure).
• There is no such thing as a “better approach”.
• The complexity of the problem
• Loose consistency (= asynchronous).
• Primary copy.
• PUSH model: replication takes place by “subscription”.
A site subscribes to copies of data. Changes are
propagated from the primary as soon as they occur.
When the changes are made they are pushed to the
• The goal is to minimize the time the copies are not
consistent but still within an asynchronous environment
there is some delay .
• Persistent queues are used to store changes in case of
• The Log Transfer Manager monitors the log of Sybase
SQL Server and notifies any changes to the replication
server. It acts as a light weight process that examines the
log to detect committed transactions (a wrapper).
Usually runs in the same system as the source database.
• When a transaction is detected, its log records are sent to
• The Replication Server usually runs on a different
system than the database to minimize the load.
• It takes updates, looks who is subscribed to them and
send them to the corresponding replication servers at the
• It was designed with a focus of scalability, high
performance and fault tolerance.
• The replicated databases is in hot standby mode.
• It makes use of the 2 phase commit Protocol as well.
• It can tolerate both types of fault -media failures as
well as disk failures.
• It makes use of heartbeat messages being sent by
each node in order to detect faults and a missing
hear beat signifies that there is a fault.
Evaluation of ClustRa
• Clustra has an availability of 99.999% and lies in the
class 5 and a downtime of no more than 5 minutes
in a year.
• When evaluating the effectiveness of ClustRa
against failures it was done in the experiment by
injecting faults in the data buffers which is the
primary component of the database.
• Goals: Flexibility. It tries to provide a platform that
can be tailored to as many applications as possible.
It provides several approaches to replication and
the user must select the most appropriate to the
• There is no such thing as a “bad approach”, so all
of them must be supported (or as many as possible)
Design of oracle
• One of the earliest implementations: Snapshot. This was
a copy of the database. Refreshing was done by getting a
• Symmetric replication: changes are forwarded at time
intervals (push) or on demand (pull).
• Asynchronous replication is the default but synchronous
is also possible.
• Implements fast recovery strategy
• Makes use of the incremental check pointing
• Lazy roll back
• Makes use of the multi master replication.
In Microsoft SQL servers
• Transactional replication: Only the committed changes
made at the primary database are sent to the subscribing
• Snapshot replication :Entire state of the primary
database is captured and applied to the replica. Can be
scheduled periodically or manually.
• Merge replication: Different sites can modify the
2 PHC PROBLEMS
• The most common problem is of blocking
• The second is that it is a costly one.
• Decreases the availability of the involved database
• Imposes high performance overhead
• Most commercials systems make use of this protocol in
order to support synchronization and consistency.
Proposed architecture for
military database systems
• Each node in the cluster is assigned numbers.
• In case of any node failure node with the higher value
becomes the coordinator in the group.
• We implement fault injection in the design phase itself
and also apply it to different areas of the databases.
• Data is replicated among each node and therefore in case
of any node failure or particular unit others cluster node
can be used to recover from the failure.
• No Central point of failure
• No Blocking state because making use 1 phase commit
• We make use of the pull/push both approaches.
• Making use of merge replication
What we have used in the
• To make a choice between 2 phase protocol and 1
• To evaluate against failures we are going to inject faults
in the system at various points.
• To choose between synchronous and asynchronous
• To choose between centralization and decentralization
• R.J. Ramsden, "Database synchronisation in military command and control
systems," IEEE International Conference Information-Decision-Action Systems in
Complex Organisations, pp. 115 - 117, 6-8 Apr 1992.
• D. Harel , H. Lachover , A. Naamad , A. Pnueli , M. Politi , R. Sherman and A. Shtul-
Trauring "STATEMENT: A working environment for the development of complex
reactive systems", IEEE Transactions on Software Engineering, vol. 16, no.
4, pp.403 -414 1990
• R. Chillarege and N. Bowen "Understanding Large System Failures - A Fault
Injection Experiment", Proc. 19th. Ann. Intâ€ ™l Symp. Fault Tolerant
Computing, pp.356 -363 1989 P. Bohannon , J. Parker , R. Rastogi , S. Seshadri , A.
Silberschatz and S. Sudarshan "Distributed Multi-Level Recovery in Main-Memory
Databases", Distributed and Parallel Database Systems Journal, vol. 6, no.