SlideShare a Scribd company logo
1 of 35
Download to read offline
TAMC 2007                                                                     25 th May, 2007




            A Distributed Algorithm of Fault
             Recovery For Stateful Failover

                                  Indranil Saha
                HTS (Honeywell Technology Solutions) Research
                              Bangalore, India
                     Email: indranil.saha@honeywell.com
                                         and
                        Debapriyay Mukhopadhyay
                               Ixia Technologies
                                Kolkata, India
                      Email: dmukhopadhyay@ixiacom.com



            A Distributed Algorithm of Fault Recovery For Stateful Failover                1
TAMC 2007                                                                     25 th May, 2007




                        Presentation Outline
    I will talk about
      • Introduction
      • System Models
      • Distributed Algorithm for Automated Fault Recovery
      • Formal verification of the Distributed Algorithm
      • Conclusion




            A Distributed Algorithm of Fault Recovery For Stateful Failover                2
TAMC 2007                                                                     25 th May, 2007




                       Presentation Outline
      • Introduction
      • System Models
      • Distributed Algorithm for Automated Fault Recovery
      • Formal verification of the Distributed Algorithm
      • Conclusion




            A Distributed Algorithm of Fault Recovery For Stateful Failover                3
TAMC 2007                                                                     25 th May, 2007




                               Introduction
      • Critical business processes and mission critical systems should
        provide a high degree of availability and reliability to the end
        users.
      • Redundancy techniques are mostly used to achieve
        fault-tolerance.
      • Redundancy can be achieved by using extra copies of its
        components which include hardware, software and network
        components.




            A Distributed Algorithm of Fault Recovery For Stateful Failover                4
TAMC 2007                                                                     25 th May, 2007




             Stateful and Stateless Failover
      • Stateless Failover:
        - Occasional loss of application state information or data is
        tolerable.
        - The system can restart without any state or data restoration
        after a failure.
        - Any live node in the network is a promising candidate to take
        over the processes of any failed node
      • Stateful Failover
        - Restoration of the state or data pertaining to the application
        is required for highly accurate recovery.
        - How to distribute the state information of a node across the
        network is an important issue.


            A Distributed Algorithm of Fault Recovery For Stateful Failover                5
TAMC 2007                                                                      25 th May, 2007



                              Related Works
      • Graph theoretic models have been extensively used to represent
        processor-to-processor interconnection structure of fault
        tolerant designs for specific multi-processor architectures
        (Kuhl80, Yang88, Sridhar91, Mukhopadhyay92, Sung00,
        Hung01).
      • Minimum k-Hamilton graphs are widely used to meet
        reliability considerations for loop type communication networks
        (Mukhopadhyay92, Sung00, Hung01).
      • Fault tolerant networks based on de Bruijn graph are proposed,
        which can tolerate up to k − 2 node faults, where the graph is
        regular of degree k and have k n number of vertices for some n
        (Sridhar91).
            None of these works talk about stateful failover.

             A Distributed Algorithm of Fault Recovery For Stateful Failover                6
TAMC 2007                                                                     25 th May, 2007




                       Presentation Outline
      • Introduction
      • System Models
      • Distributed Algorithm for Automated Fault Recovery
      • Formal verification of the Distributed Algorithm
      • Conclusion




            A Distributed Algorithm of Fault Recovery For Stateful Failover                7
TAMC 2007                                                                     25 th May, 2007



                             System Model
      • The network consists of the set of nodes N with |N | = n
      • Each node is labeled with a unique id from 0 to n − 1.
      • Each node handles one process initially, and is capable of
        executing at most m processes simultaneously.
      • Pi is the process node i starts executing initially when the
        network becomes functional.
      • Failures are of failstop kind, i.e., the nodes in the network can
        stop operating at any point of time due to a crash.
      • With a processor failed, all the links incident on that node also
        becomes non-functional.
      • k node faults are allowed in the network.

            A Distributed Algorithm of Fault Recovery For Stateful Failover                8
TAMC 2007                                                                     25 th May, 2007



                         Network Topology
    Each node i ∈ N, (0 ≤ i ≤ n − 1), in the network is connected to
    the set of nodes Pi ⊆ N, such that |Pi | = l = k + x, where
    k + x(≤ n − 1) is even, and

      Pi = {j ∈ N : j = (i + p)(mod n), where − l/2 ≤ p ≤ l/2, p = 0}



    Underlying undirected graph modeling the network can be written
    as (N, E) where
                              n−1
                        E = ∪i=0 {(i, j) : j ∈ Pi }.

    The state information of processor i, i ∈ N , is periodically
    forwarded to all the nodes in the set Fi ⊆ N such that |Fi | = k and

    Fi = {j ∈ S : j = (i + p)(mod n), where − k/2 ≤ p ≤ k/2 , p = 0}


            A Distributed Algorithm of Fault Recovery For Stateful Failover                9
TAMC 2007                                                                     25 th May, 2007




                               Connectivity
    - The graph (N, E) represents a regular network, for, the degree of
    each node is l.

    - For any n and k, the graph (N, E) corresponds to the Harary
    Graph Hl,n , where
                               
                                k + 2,    for k even,
                   l =k+x≥
                                k + 1,    for k odd,

    The network is l-connected with χ(G) ≥ l(> k),

    χ(G) denotes the connectivity of G.


            A Distributed Algorithm of Fault Recovery For Stateful Failover               10
TAMC 2007                                                                     25 th May, 2007




                        Theoretical Results
    Theorem 1. A. Forwarding state information of each process to k
    other nodes in the network ensures k-fault tolerance.
    B. A sufficient condition to ensure k-fault tolerance is to forward
    the state information by each node to at least k other nodes in the
    network.
    Theorem 2. As long as k ≤ m−1 .n , no live node has to execute
                                      m
    more than m processes including one of its own and an algorithm to
    attain the same under the proposed framework can also be found.
    Theorem 3. Minimum number of nodes with which any node in a
    network with n > 2k (or n = 2k) is required to be connected
    directly is 2k (or 2k − 1) to ensure that all the eligible nodes
    corresponding to a process can be updated about its state
    information all the time in one hop.


            A Distributed Algorithm of Fault Recovery For Stateful Failover               11
TAMC 2007                                                                     25 th May, 2007



                          Network Example
    Illustration of a network with n = 10, m = 2 and k = 4




            A Distributed Algorithm of Fault Recovery For Stateful Failover               12
TAMC 2007                                                                     25 th May, 2007




                       Presentation Outline
      • Introduction
      • System Models
      • Distributed Algorithm for Automated Fault Recovery
      • Formal verification of the Distributed Algorithm
      • Conclusion




            A Distributed Algorithm of Fault Recovery For Stateful Failover               13
TAMC 2007                                                                       25 th May, 2007



                              Message Types
     1. INFO
            • In the first round, each node i sends an IN F O message to
              all the nodes in the set Fi .
            • Message consists of the tuple (j, Fj )
     2. STATUS
            • Starting from the second round in each successive rounds,
              every live node i sends ST AT U S message for every process
              pj that is running on it to all the live nodes in the set Fj .
            • Message consists of the tuple (pj , Spj )
            • Ommision of the Status message for a process for a round
              indicates the failure of the process.
     3. RESOLVED Message is sent to all the nodes in Fj by the node
        who has resolved the failure of process j.

              A Distributed Algorithm of Fault Recovery For Stateful Failover               14
TAMC 2007                                                                     25 th May, 2007



             Preference for the Neighbours
          i
    prefj . denotes the preference of node i to take process j in case of
    its failure among the nodes in Fj .




            A Distributed Algorithm of Fault Recovery For Stateful Failover               15
TAMC 2007                                                                     25 th May, 2007



            Distributed Algorithm: Example
    Illustration of the distributed algorithm for a network with n = 10,
    m = 2 and k = 4




    Every node is running its own process.

            A Distributed Algorithm of Fault Recovery For Stateful Failover               16
TAMC 2007                                                                     25 th May, 2007



            Distributed Algorithm: Example




    Node 9 is faulty

            A Distributed Algorithm of Fault Recovery For Stateful Failover               17
TAMC 2007                                                                     25 th May, 2007



            Distributed Algorithm: Example




    Node 1 takes the process of node 9 after one round as it is the highest
    preference node for process 9.

            A Distributed Algorithm of Fault Recovery For Stateful Failover               18
TAMC 2007                                                                     25 th May, 2007




            Distributed Algorithm: Example




    Node 2 is faulty


            A Distributed Algorithm of Fault Recovery For Stateful Failover               19
TAMC 2007                                                                     25 th May, 2007



            Distributed Algorithm: Example




    Node 4 takes the process of node 2 after one round as it is the highest
    preference node for process 2.

            A Distributed Algorithm of Fault Recovery For Stateful Failover               20
TAMC 2007                                                                     25 th May, 2007




            Distributed Algorithm: Example




    Node 8 is faulty


            A Distributed Algorithm of Fault Recovery For Stateful Failover               21
TAMC 2007                                                                     25 th May, 2007



            Distributed Algorithm: Example




    Node 0 takes the process of node 8 after one round as it is the highest
    preference node for process 8.

            A Distributed Algorithm of Fault Recovery For Stateful Failover               22
TAMC 2007                                                                     25 th May, 2007



            Distributed Algorithm: Example




    Node 0 is faulty. Real problem begins...

            A Distributed Algorithm of Fault Recovery For Stateful Failover               23
TAMC 2007                                                                     25 th May, 2007




            Distributed Algorithm: Example




    Node 7 takes the process of node 8 after 3 rounds as it is the third
    preference node for process 8.


            A Distributed Algorithm of Fault Recovery For Stateful Failover               24
TAMC 2007                                                                     25 th May, 2007



            Distributed Algorithm: Example




    Node 1 stops running process 9 and starts running process 0 after 6
    rounds of node 0’s failure.
    According to Theorem 1 there is at least one node available to take
    process 9.

            A Distributed Algorithm of Fault Recovery For Stateful Failover               25
TAMC 2007                                                                     25 th May, 2007



            Distributed Algorithm: Example




    Node 7 stops running process 8 and starts running process 9 after 8
    rounds when node 1 stops running process 9.

            A Distributed Algorithm of Fault Recovery For Stateful Failover               26
TAMC 2007                                                                     25 th May, 2007



            Distributed Algorithm: Example




    Node 6 starts running process 8 after 4 rounds when node 7 stops
    running process 8.
    No more failure is possible.

            A Distributed Algorithm of Fault Recovery For Stateful Failover               27
TAMC 2007                                                                     25 th May, 2007




                  Analysis of the Algorithm
      • At most 2k rounds are required to resolve a single fault.
      • To resolve a single fault, the maximum number of
        RESOLV ED messages that is required to be sent across the
        network is (k − 2)m + 1, where m is the maximum number of
        processes that a node is capable of executing.




            A Distributed Algorithm of Fault Recovery For Stateful Failover               28
TAMC 2007                                                                     25 th May, 2007




                       Presentation Outline
      • Introduction
      • System Models
      • Distributed Algorithm for Automated Fault Recovery
      • Formal verification of the Distributed Algorithm
      • Conclusion




            A Distributed Algorithm of Fault Recovery For Stateful Failover               29
TAMC 2007                                                                     25 th May, 2007




              Correctness of the Algorithm
      • We show the correctness of the distributed algorithm through
        formal verification.
      • We use Spin Model checker for modeling and verification of the
        algorithm.
      • We have been able to verify our model for N=8, K=3 and M=2
        and all lower instances.
      • Due to the state-space explosion problem inherent in model
        checker SPIN, we could not verity our algorithm for more than
        8 processors.




            A Distributed Algorithm of Fault Recovery For Stateful Failover               30
TAMC 2007                                                                     25 th May, 2007




                       Spin Model Checker
      • Tool for automatically model checking distributed algorithms
      • Promela is a language for modeling systems of concurrent
        processes that can interact via shared variables and message
        channels
      • Given a concurrent system modeled by a Promela program,
        SPIN can check for deadlock, dead code, violations of user
        specified assertions, and temporal properties expressed by LTL
        formulas
      • When a violation of a property is detected, SPIN reports a
        scenario, i.e., a sequence of transitions, violating the property.



            A Distributed Algorithm of Fault Recovery For Stateful Failover               31
TAMC 2007                                                                     25 th May, 2007




                                 Properties
    Safety 1 Whenever a node becomes faulty, at least one of its
       neighboring nodes is non-faulty.


    Safety 2 No node has to take more than M processes at any point
       of time.


    Liveness Whenever a node becomes faulty, its process is
       eventually taken up by some other live nodes.


    Timeliness Every fault is recovered in no more than 2K rounds.




            A Distributed Algorithm of Fault Recovery For Stateful Failover               32
TAMC 2007                                                                     25 th May, 2007




                       Presentation Outline
      • Introduction
      • System Models
      • Distributed Algorithm for Automated Fault Recovery
      • Formal verification of the Distributed Algorithm
      • Conclusion




            A Distributed Algorithm of Fault Recovery For Stateful Failover               33
TAMC 2007                                                                     25 th May, 2007




                                 Conclusion
      • We have presented a distributed algorithm of automated fault
        recovery for stateful failover in a network.
      • In whatever way the fault may arise the algorithm can handle
        that fault
      • In at most 2k rounds the processes of the faulty processor are
        taken up by a(some) eligible live node(nodes) in the network.
      • The message complexity of our algorithm is linear with the
        number of nodes.
      • The correctness of the algorithm has been proved by modeling
        the algorithm in SPIN and verifying its desired properties.



            A Distributed Algorithm of Fault Recovery For Stateful Failover               34
TAMC 2007                                                                     25 th May, 2007




                      Thank You!!



            A Distributed Algorithm of Fault Recovery For Stateful Failover               35

More Related Content

What's hot

11.iterative idma receivers with random and tree based interleavers
11.iterative idma receivers with random and tree based interleavers11.iterative idma receivers with random and tree based interleavers
11.iterative idma receivers with random and tree based interleavers
Alexander Decker
 
Digital watermarking with a new algorithm
Digital watermarking with a new algorithmDigital watermarking with a new algorithm
Digital watermarking with a new algorithm
eSAT Journals
 
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Hemant Jha
 

What's hot (18)

SVD Based Robust Digital Watermarking For Still Images Using Wavelet Transform
SVD Based Robust Digital Watermarking For Still Images Using Wavelet Transform SVD Based Robust Digital Watermarking For Still Images Using Wavelet Transform
SVD Based Robust Digital Watermarking For Still Images Using Wavelet Transform
 
An fpga implementation of the lms adaptive filter
An fpga implementation of the lms adaptive filter An fpga implementation of the lms adaptive filter
An fpga implementation of the lms adaptive filter
 
Reversible color video watermarking scheme based on hybrid of integer-to-inte...
Reversible color video watermarking scheme based on hybrid of integer-to-inte...Reversible color video watermarking scheme based on hybrid of integer-to-inte...
Reversible color video watermarking scheme based on hybrid of integer-to-inte...
 
11.iterative idma receivers with random and tree based interleavers
11.iterative idma receivers with random and tree based interleavers11.iterative idma receivers with random and tree based interleavers
11.iterative idma receivers with random and tree based interleavers
 
An fpga implementation of the lms adaptive filter
An fpga implementation of the lms adaptive filterAn fpga implementation of the lms adaptive filter
An fpga implementation of the lms adaptive filter
 
Iterative idma receivers with random and tree based interleavers
Iterative idma receivers with random and tree based interleaversIterative idma receivers with random and tree based interleavers
Iterative idma receivers with random and tree based interleavers
 
N03430990106
N03430990106N03430990106
N03430990106
 
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDADynamic Texture Coding using Modified Haar Wavelet with CUDA
Dynamic Texture Coding using Modified Haar Wavelet with CUDA
 
34 8951 suseela g suseela paper8 (edit)new
34 8951 suseela g   suseela paper8 (edit)new34 8951 suseela g   suseela paper8 (edit)new
34 8951 suseela g suseela paper8 (edit)new
 
33 8951 suseela g suseela paper8 (edit)new2
33 8951 suseela g   suseela paper8 (edit)new233 8951 suseela g   suseela paper8 (edit)new2
33 8951 suseela g suseela paper8 (edit)new2
 
Digital watermarking with a new algorithm
Digital watermarking with a new algorithmDigital watermarking with a new algorithm
Digital watermarking with a new algorithm
 
Digital watermarking with a new algorithm
Digital watermarking with a new algorithmDigital watermarking with a new algorithm
Digital watermarking with a new algorithm
 
Study of Compensation of Variable Delay in Communication Link Using Communica...
Study of Compensation of Variable Delay in Communication Link Using Communica...Study of Compensation of Variable Delay in Communication Link Using Communica...
Study of Compensation of Variable Delay in Communication Link Using Communica...
 
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
Vlsiphysicaldesignautomationonpartitioning 120219012744-phpapp01
 
Simple and Fast Implementation of Segmented Matrix Algorithm for Haar DWT on ...
Simple and Fast Implementation of Segmented Matrix Algorithm for Haar DWT on ...Simple and Fast Implementation of Segmented Matrix Algorithm for Haar DWT on ...
Simple and Fast Implementation of Segmented Matrix Algorithm for Haar DWT on ...
 
Study and Performance Analysis of MOS Technology and Nanocomputing QCA
Study and Performance Analysis of MOS Technology and Nanocomputing QCAStudy and Performance Analysis of MOS Technology and Nanocomputing QCA
Study and Performance Analysis of MOS Technology and Nanocomputing QCA
 
SECURED COLOR IMAGE WATERMARKING TECHNIQUE IN DWT-DCT DOMAIN
SECURED COLOR IMAGE WATERMARKING TECHNIQUE IN DWT-DCT DOMAIN SECURED COLOR IMAGE WATERMARKING TECHNIQUE IN DWT-DCT DOMAIN
SECURED COLOR IMAGE WATERMARKING TECHNIQUE IN DWT-DCT DOMAIN
 
B046050711
B046050711B046050711
B046050711
 

Similar to Slides Tamc07

Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...
IOSR Journals
 
Key Updating for Leakage Resiliency with Application to AES Modes of Operation
Key Updating for Leakage Resiliency with Application to AES Modes of OperationKey Updating for Leakage Resiliency with Application to AES Modes of Operation
Key Updating for Leakage Resiliency with Application to AES Modes of Operation
1crore projects
 
14_H00141747_Doula Isham_B30UC_Power System Transient Stability_Poster_2015-16
14_H00141747_Doula Isham_B30UC_Power System Transient Stability_Poster_2015-1614_H00141747_Doula Isham_B30UC_Power System Transient Stability_Poster_2015-16
14_H00141747_Doula Isham_B30UC_Power System Transient Stability_Poster_2015-16
Isham Rashik
 

Similar to Slides Tamc07 (20)

Black Box Model based Self Healing Solution for Stuck at Faults in Digital Ci...
Black Box Model based Self Healing Solution for Stuck at Faults in Digital Ci...Black Box Model based Self Healing Solution for Stuck at Faults in Digital Ci...
Black Box Model based Self Healing Solution for Stuck at Faults in Digital Ci...
 
Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...Digital Implementation of Artificial Neural Network for Function Approximatio...
Digital Implementation of Artificial Neural Network for Function Approximatio...
 
Differential Protection of Generator by Using Neural Network, Fuzzy Neural an...
Differential Protection of Generator by Using Neural Network, Fuzzy Neural an...Differential Protection of Generator by Using Neural Network, Fuzzy Neural an...
Differential Protection of Generator by Using Neural Network, Fuzzy Neural an...
 
Reliability Improvement in Logic Circuit Stochastic Computation
Reliability Improvement in Logic Circuit Stochastic ComputationReliability Improvement in Logic Circuit Stochastic Computation
Reliability Improvement in Logic Circuit Stochastic Computation
 
Unit ii supervised ii
Unit ii supervised iiUnit ii supervised ii
Unit ii supervised ii
 
Implementation of NN controlled DVR for Enhancing The Power Quality By Mitiga...
Implementation of NN controlled DVR for Enhancing The Power Quality By Mitiga...Implementation of NN controlled DVR for Enhancing The Power Quality By Mitiga...
Implementation of NN controlled DVR for Enhancing The Power Quality By Mitiga...
 
IRJET - Fault Detection and Classification in Transmission Line by using KNN ...
IRJET - Fault Detection and Classification in Transmission Line by using KNN ...IRJET - Fault Detection and Classification in Transmission Line by using KNN ...
IRJET - Fault Detection and Classification in Transmission Line by using KNN ...
 
WCNC
WCNCWCNC
WCNC
 
Power system transient stability margin estimation using artificial neural ne...
Power system transient stability margin estimation using artificial neural ne...Power system transient stability margin estimation using artificial neural ne...
Power system transient stability margin estimation using artificial neural ne...
 
Neural Networks on Steroids (Poster)
Neural Networks on Steroids (Poster)Neural Networks on Steroids (Poster)
Neural Networks on Steroids (Poster)
 
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
 
6119ijcsitce01
6119ijcsitce016119ijcsitce01
6119ijcsitce01
 
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
CONTRAST OF RESNET AND DENSENET BASED ON THE RECOGNITION OF SIMPLE FRUIT DATA...
 
Fundamental, An Introduction to Neural Networks
Fundamental, An Introduction to Neural NetworksFundamental, An Introduction to Neural Networks
Fundamental, An Introduction to Neural Networks
 
3 article azojete vol 7 24 33
3 article azojete vol 7 24 333 article azojete vol 7 24 33
3 article azojete vol 7 24 33
 
Paper on experimental setup for verifying - "Slow Learners are Fast"
Paper  on experimental setup for verifying  - "Slow Learners are Fast"Paper  on experimental setup for verifying  - "Slow Learners are Fast"
Paper on experimental setup for verifying - "Slow Learners are Fast"
 
Key Updating for Leakage Resiliency with Application to AES Modes of Operation
Key Updating for Leakage Resiliency with Application to AES Modes of OperationKey Updating for Leakage Resiliency with Application to AES Modes of Operation
Key Updating for Leakage Resiliency with Application to AES Modes of Operation
 
International Journal of Computational Engineering Research(IJCER)
 International Journal of Computational Engineering Research(IJCER)  International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
14_H00141747_Doula Isham_B30UC_Power System Transient Stability_Poster_2015-16
14_H00141747_Doula Isham_B30UC_Power System Transient Stability_Poster_2015-1614_H00141747_Doula Isham_B30UC_Power System Transient Stability_Poster_2015-16
14_H00141747_Doula Isham_B30UC_Power System Transient Stability_Poster_2015-16
 
Conformer review
Conformer reviewConformer review
Conformer review
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Slides Tamc07

  • 1. TAMC 2007 25 th May, 2007 A Distributed Algorithm of Fault Recovery For Stateful Failover Indranil Saha HTS (Honeywell Technology Solutions) Research Bangalore, India Email: indranil.saha@honeywell.com and Debapriyay Mukhopadhyay Ixia Technologies Kolkata, India Email: dmukhopadhyay@ixiacom.com A Distributed Algorithm of Fault Recovery For Stateful Failover 1
  • 2. TAMC 2007 25 th May, 2007 Presentation Outline I will talk about • Introduction • System Models • Distributed Algorithm for Automated Fault Recovery • Formal verification of the Distributed Algorithm • Conclusion A Distributed Algorithm of Fault Recovery For Stateful Failover 2
  • 3. TAMC 2007 25 th May, 2007 Presentation Outline • Introduction • System Models • Distributed Algorithm for Automated Fault Recovery • Formal verification of the Distributed Algorithm • Conclusion A Distributed Algorithm of Fault Recovery For Stateful Failover 3
  • 4. TAMC 2007 25 th May, 2007 Introduction • Critical business processes and mission critical systems should provide a high degree of availability and reliability to the end users. • Redundancy techniques are mostly used to achieve fault-tolerance. • Redundancy can be achieved by using extra copies of its components which include hardware, software and network components. A Distributed Algorithm of Fault Recovery For Stateful Failover 4
  • 5. TAMC 2007 25 th May, 2007 Stateful and Stateless Failover • Stateless Failover: - Occasional loss of application state information or data is tolerable. - The system can restart without any state or data restoration after a failure. - Any live node in the network is a promising candidate to take over the processes of any failed node • Stateful Failover - Restoration of the state or data pertaining to the application is required for highly accurate recovery. - How to distribute the state information of a node across the network is an important issue. A Distributed Algorithm of Fault Recovery For Stateful Failover 5
  • 6. TAMC 2007 25 th May, 2007 Related Works • Graph theoretic models have been extensively used to represent processor-to-processor interconnection structure of fault tolerant designs for specific multi-processor architectures (Kuhl80, Yang88, Sridhar91, Mukhopadhyay92, Sung00, Hung01). • Minimum k-Hamilton graphs are widely used to meet reliability considerations for loop type communication networks (Mukhopadhyay92, Sung00, Hung01). • Fault tolerant networks based on de Bruijn graph are proposed, which can tolerate up to k − 2 node faults, where the graph is regular of degree k and have k n number of vertices for some n (Sridhar91). None of these works talk about stateful failover. A Distributed Algorithm of Fault Recovery For Stateful Failover 6
  • 7. TAMC 2007 25 th May, 2007 Presentation Outline • Introduction • System Models • Distributed Algorithm for Automated Fault Recovery • Formal verification of the Distributed Algorithm • Conclusion A Distributed Algorithm of Fault Recovery For Stateful Failover 7
  • 8. TAMC 2007 25 th May, 2007 System Model • The network consists of the set of nodes N with |N | = n • Each node is labeled with a unique id from 0 to n − 1. • Each node handles one process initially, and is capable of executing at most m processes simultaneously. • Pi is the process node i starts executing initially when the network becomes functional. • Failures are of failstop kind, i.e., the nodes in the network can stop operating at any point of time due to a crash. • With a processor failed, all the links incident on that node also becomes non-functional. • k node faults are allowed in the network. A Distributed Algorithm of Fault Recovery For Stateful Failover 8
  • 9. TAMC 2007 25 th May, 2007 Network Topology Each node i ∈ N, (0 ≤ i ≤ n − 1), in the network is connected to the set of nodes Pi ⊆ N, such that |Pi | = l = k + x, where k + x(≤ n − 1) is even, and Pi = {j ∈ N : j = (i + p)(mod n), where − l/2 ≤ p ≤ l/2, p = 0} Underlying undirected graph modeling the network can be written as (N, E) where n−1 E = ∪i=0 {(i, j) : j ∈ Pi }. The state information of processor i, i ∈ N , is periodically forwarded to all the nodes in the set Fi ⊆ N such that |Fi | = k and Fi = {j ∈ S : j = (i + p)(mod n), where − k/2 ≤ p ≤ k/2 , p = 0} A Distributed Algorithm of Fault Recovery For Stateful Failover 9
  • 10. TAMC 2007 25 th May, 2007 Connectivity - The graph (N, E) represents a regular network, for, the degree of each node is l. - For any n and k, the graph (N, E) corresponds to the Harary Graph Hl,n , where   k + 2, for k even, l =k+x≥  k + 1, for k odd, The network is l-connected with χ(G) ≥ l(> k), χ(G) denotes the connectivity of G. A Distributed Algorithm of Fault Recovery For Stateful Failover 10
  • 11. TAMC 2007 25 th May, 2007 Theoretical Results Theorem 1. A. Forwarding state information of each process to k other nodes in the network ensures k-fault tolerance. B. A sufficient condition to ensure k-fault tolerance is to forward the state information by each node to at least k other nodes in the network. Theorem 2. As long as k ≤ m−1 .n , no live node has to execute m more than m processes including one of its own and an algorithm to attain the same under the proposed framework can also be found. Theorem 3. Minimum number of nodes with which any node in a network with n > 2k (or n = 2k) is required to be connected directly is 2k (or 2k − 1) to ensure that all the eligible nodes corresponding to a process can be updated about its state information all the time in one hop. A Distributed Algorithm of Fault Recovery For Stateful Failover 11
  • 12. TAMC 2007 25 th May, 2007 Network Example Illustration of a network with n = 10, m = 2 and k = 4 A Distributed Algorithm of Fault Recovery For Stateful Failover 12
  • 13. TAMC 2007 25 th May, 2007 Presentation Outline • Introduction • System Models • Distributed Algorithm for Automated Fault Recovery • Formal verification of the Distributed Algorithm • Conclusion A Distributed Algorithm of Fault Recovery For Stateful Failover 13
  • 14. TAMC 2007 25 th May, 2007 Message Types 1. INFO • In the first round, each node i sends an IN F O message to all the nodes in the set Fi . • Message consists of the tuple (j, Fj ) 2. STATUS • Starting from the second round in each successive rounds, every live node i sends ST AT U S message for every process pj that is running on it to all the live nodes in the set Fj . • Message consists of the tuple (pj , Spj ) • Ommision of the Status message for a process for a round indicates the failure of the process. 3. RESOLVED Message is sent to all the nodes in Fj by the node who has resolved the failure of process j. A Distributed Algorithm of Fault Recovery For Stateful Failover 14
  • 15. TAMC 2007 25 th May, 2007 Preference for the Neighbours i prefj . denotes the preference of node i to take process j in case of its failure among the nodes in Fj . A Distributed Algorithm of Fault Recovery For Stateful Failover 15
  • 16. TAMC 2007 25 th May, 2007 Distributed Algorithm: Example Illustration of the distributed algorithm for a network with n = 10, m = 2 and k = 4 Every node is running its own process. A Distributed Algorithm of Fault Recovery For Stateful Failover 16
  • 17. TAMC 2007 25 th May, 2007 Distributed Algorithm: Example Node 9 is faulty A Distributed Algorithm of Fault Recovery For Stateful Failover 17
  • 18. TAMC 2007 25 th May, 2007 Distributed Algorithm: Example Node 1 takes the process of node 9 after one round as it is the highest preference node for process 9. A Distributed Algorithm of Fault Recovery For Stateful Failover 18
  • 19. TAMC 2007 25 th May, 2007 Distributed Algorithm: Example Node 2 is faulty A Distributed Algorithm of Fault Recovery For Stateful Failover 19
  • 20. TAMC 2007 25 th May, 2007 Distributed Algorithm: Example Node 4 takes the process of node 2 after one round as it is the highest preference node for process 2. A Distributed Algorithm of Fault Recovery For Stateful Failover 20
  • 21. TAMC 2007 25 th May, 2007 Distributed Algorithm: Example Node 8 is faulty A Distributed Algorithm of Fault Recovery For Stateful Failover 21
  • 22. TAMC 2007 25 th May, 2007 Distributed Algorithm: Example Node 0 takes the process of node 8 after one round as it is the highest preference node for process 8. A Distributed Algorithm of Fault Recovery For Stateful Failover 22
  • 23. TAMC 2007 25 th May, 2007 Distributed Algorithm: Example Node 0 is faulty. Real problem begins... A Distributed Algorithm of Fault Recovery For Stateful Failover 23
  • 24. TAMC 2007 25 th May, 2007 Distributed Algorithm: Example Node 7 takes the process of node 8 after 3 rounds as it is the third preference node for process 8. A Distributed Algorithm of Fault Recovery For Stateful Failover 24
  • 25. TAMC 2007 25 th May, 2007 Distributed Algorithm: Example Node 1 stops running process 9 and starts running process 0 after 6 rounds of node 0’s failure. According to Theorem 1 there is at least one node available to take process 9. A Distributed Algorithm of Fault Recovery For Stateful Failover 25
  • 26. TAMC 2007 25 th May, 2007 Distributed Algorithm: Example Node 7 stops running process 8 and starts running process 9 after 8 rounds when node 1 stops running process 9. A Distributed Algorithm of Fault Recovery For Stateful Failover 26
  • 27. TAMC 2007 25 th May, 2007 Distributed Algorithm: Example Node 6 starts running process 8 after 4 rounds when node 7 stops running process 8. No more failure is possible. A Distributed Algorithm of Fault Recovery For Stateful Failover 27
  • 28. TAMC 2007 25 th May, 2007 Analysis of the Algorithm • At most 2k rounds are required to resolve a single fault. • To resolve a single fault, the maximum number of RESOLV ED messages that is required to be sent across the network is (k − 2)m + 1, where m is the maximum number of processes that a node is capable of executing. A Distributed Algorithm of Fault Recovery For Stateful Failover 28
  • 29. TAMC 2007 25 th May, 2007 Presentation Outline • Introduction • System Models • Distributed Algorithm for Automated Fault Recovery • Formal verification of the Distributed Algorithm • Conclusion A Distributed Algorithm of Fault Recovery For Stateful Failover 29
  • 30. TAMC 2007 25 th May, 2007 Correctness of the Algorithm • We show the correctness of the distributed algorithm through formal verification. • We use Spin Model checker for modeling and verification of the algorithm. • We have been able to verify our model for N=8, K=3 and M=2 and all lower instances. • Due to the state-space explosion problem inherent in model checker SPIN, we could not verity our algorithm for more than 8 processors. A Distributed Algorithm of Fault Recovery For Stateful Failover 30
  • 31. TAMC 2007 25 th May, 2007 Spin Model Checker • Tool for automatically model checking distributed algorithms • Promela is a language for modeling systems of concurrent processes that can interact via shared variables and message channels • Given a concurrent system modeled by a Promela program, SPIN can check for deadlock, dead code, violations of user specified assertions, and temporal properties expressed by LTL formulas • When a violation of a property is detected, SPIN reports a scenario, i.e., a sequence of transitions, violating the property. A Distributed Algorithm of Fault Recovery For Stateful Failover 31
  • 32. TAMC 2007 25 th May, 2007 Properties Safety 1 Whenever a node becomes faulty, at least one of its neighboring nodes is non-faulty. Safety 2 No node has to take more than M processes at any point of time. Liveness Whenever a node becomes faulty, its process is eventually taken up by some other live nodes. Timeliness Every fault is recovered in no more than 2K rounds. A Distributed Algorithm of Fault Recovery For Stateful Failover 32
  • 33. TAMC 2007 25 th May, 2007 Presentation Outline • Introduction • System Models • Distributed Algorithm for Automated Fault Recovery • Formal verification of the Distributed Algorithm • Conclusion A Distributed Algorithm of Fault Recovery For Stateful Failover 33
  • 34. TAMC 2007 25 th May, 2007 Conclusion • We have presented a distributed algorithm of automated fault recovery for stateful failover in a network. • In whatever way the fault may arise the algorithm can handle that fault • In at most 2k rounds the processes of the faulty processor are taken up by a(some) eligible live node(nodes) in the network. • The message complexity of our algorithm is linear with the number of nodes. • The correctness of the algorithm has been proved by modeling the algorithm in SPIN and verifying its desired properties. A Distributed Algorithm of Fault Recovery For Stateful Failover 34
  • 35. TAMC 2007 25 th May, 2007 Thank You!! A Distributed Algorithm of Fault Recovery For Stateful Failover 35