SlideShare una empresa de Scribd logo
1 de 38
Descargar para leer sin conexión
[ ИМИДЖЕВОЕ
ИЗОБРАЖЕНИЕ ]
Live Testing of
Distributed System
Fault Tolerance With
Fault Injection
Techniques
Alexey Vasyukov,
Inventa
Vadim Zherder,
Moscow Exchange
Plan
• Introduction
Distributed Trading System concepts
• Distributed Consensus protocol
• MOEX Fault Injection testing framework
2
Trading System concepts
3
Trading System
4
Transactions Processing
• One incoming stream
• Strictly ordered FIFO incoming stream
• SeqNum
• TimeStamp
• Strictly ordered FIFO outcoming stream
• Each transaction should get reply
• No loss, no duplicates
• Multistage processing
• No 1 – finished, No 2,3 – in process, No 4 - incoming
• Transaction processing result
• ErrorCode, Message, Statuses, …
• TransLog = Transactions Log
• Transaction buffer
• Large but finite. When exhausted, TS stops processing transactions and become
“unavailable”
5
Trading System: what’s behind
6
Messaging
7
UP – transactions
DOWN – responses and messages to
nodes of the cluster
Use CRC (Cyclic redundancy check)
to control transaction flow
Full history: “Late join” is possible
Role: “Main”
“Main” = the main TS instance
• Get new transactions from incoming stream
• Broadcast transactions within cluster
• Process transactions
• Check transaction result (compare to results obtained from other nodes)
• Broadcast results within cluster
• Publish replies to clients
8
Role: “Backup”
“Backup” = a special state of TS instance
• Can be switched to Main quickly
• Get all transactions from Main
• Process transactions independently
• Check transaction result (compare to results from other nodes)
• Write its own TransLog
• Do not send replies to clients
9
Role: “Backup”
2 modes: SYNC (“Hot Backup”) and ASYNC (“Warm Backup”)
10
If Main failed, SYNC can switch to Main
automatically
SYNC publish transaction results
If SYNC failed, ASYNC can switch to SYNC
automatically
ASYNC does not publish transaction result
ASYNC can be switched to Main manually
by Operator
Number of SYNCs is a static parameter determined by Operator
Role: Governor
11
Governor can
• Force node to change its role
• Force node to stop
• Start Elections to assign new Main
Only one Governor in the cluster
Governor can be assigned only by Operator
Governor role cannot be changed
If some node asks Governor but it is unavailable then this node stalls
until it recovers connectivity to the Governor
Governor can be recovered or restarted only by Operator
Roles Summary
12
Governor Main SYNC Backup ASYNC Backup
Send Table of states V V V V
Get Client Transaction V
Broadcast Transaction V
Process Transaction V V V
Broadcast Transaction
result
V V
Compare transaction
results
V V V
Send replies to clients V
Can Switch to Main SYNC
If something goes wrong…
13
IF we detect
• Mismatch in transaction result
• A node does not respond
• No new transactions incoming
• Wrong CRC
• Governor does not response
• Mismatch in tables of states
• …
THEN
ASK Governor
Elections
14
Elections Starts to assign new Main
Stop transaction processing
2-fold Generation counter (G:S)
Initial values (0:0)
Every successful election increases G and drop S to 0 (G:0).
Every round of elections increases S
Example: (1:0) -> (1:1) -> (1:2) -> (2:0)
Generation counter in every message to/from Governor
2-Phase commit approach
Governor sends new table of states and waits for confirmation
from all nodes
Distributed
consensus protocol
15
MOEX Consensus Protocol
(by Sergey Kostanbaev, MOEX)
16
We must provide Tables of state to be consistent at all nodes during
normal work period
We must provide Tables of state to become consistent after some nodes
failed
Node 1 Node 2 Node 3 Node 4
uuid1 S_MAIN S_GOVERNOR S_BACKUP_SYNC S_BACKUP_ASYNC
uuid2 S_MAIN S_GOVERNOR S_BACKUP_SYNC S_BACKUP_ASYNC
uuid3 S_MAIN S_GOVERNOR S_BACKUP_SYNC S_BACKUP_ASYNC
uuid4 S_MAIN S_GOVERNOR S_BACKUP_SYNC S_BACKUP_ASYNC
Table of States at each node
MOEX Consensus Protocol
17
Thus, it is an example of a Distributed consensus protocol
Other examples:
• Paxos, 1998, 2001, …
LAMPORT, L. Paxos made simple. ACM SIGACT News 32, 4 (Dec. 2001), 18–25.
• RAFT, 2014 https://raft.github.io/raft.pdf
ONGARO, D., AND OUSTERHOUT, J. In search of an understandable consensus
algorithm. In Proc ATC’14,USENIX Annual Technical Conference (2014), USENIX
• DNCP, 2016 https://tools.ietf.org/html/rfc7787
Open questions:
Is MOEX CP equivalent to any of known protocols?
Hypothesis on MOEX CP features
H1. Byzantine fault tolerance
H2. Safety
H3. No liveness
Cluster Normal State Requirements
18
• There is exactly 1 Governor in the cluster
• There is exactly 1 Main in the cluster
• Tables of states at all nodes are consistent
• All active nodes in the cluster have the same value of
Generation Counter
• The cluster is available (for client connection) and process
transactions
• All nodes process the same sequence of transactions
• Either number of SYNCs equals to the predefined value, or it is
less than predefined value and there is no ASYNCs
…
Main “Theorem”
19
• Assume that the cluster was in Normal state, and one of Main
or Backup node fails. Then the cluster goes back to Normal
state during finite time.
MOEX CP Testing
20
Investigate
• Fault detection
• Implementation correctness
• Timing
• Dependence on load profile
• Dependence on environment configuration
• Statistics
Integration with CI/CD processes
Typical Test Scenario
21
1. Start all
2. Wait for normal state
3. Start transactions generator
4. Keep transactions flow for some time
5. Fault injection – emulate fault (single or multiple)
6. Wait for normal state (check timeout)
7. Check state at each node
8. Get artifacts
References
22
WIDDER J., Introduction into Fault-tolerant Distributed Algorithms and their Modeling, TMPA
(2014)
LAMPORT, L. Paxos made simple. ACM SIGACT News 32, 4 (Dec. 2001), 18–25.
https://raft.github.io/raft.pdf
ONGARO, D., AND OUSTERHOUT, J. In search of an understandable consensus algorithm. In
Proc ATC’14,USENIX Annual Technical Conference (2014), USENIX
ONGARO D. Consensus: Bridging theory and practice : Doctoral dissertation – Stanford
University, 2014.
MOEX Fault Injection
Testing Framework
23
Fault Injection: Testing Implementation
24
MOEX Fault Injection Framework
Concepts
• End-to-end testing of cluster implementation
• Starts complete real system on real infrastructure
• Provides modules to inject predictable faults on selected servers
• Provides domain specific libraries to write tests
• System, network, app issues are injected directly
• Misconfiguration problems are tested indirectly
(real infrastructure, config push before test start)
25
Architecture
26
Inject Techniques
OS Processes
• Kill (SIGKILL)
• Hang (SIGSTOP for N seconds + SIGCONT)
Network
• Interface “blink” (DROP 100% packets for N seconds)
• Interface “noise” (DROP X% packets for N seconds)
• Content filtering – allows “smart” inject into protocol, dropping selected
messages from the flow
Application
• Data corrupt (with gdb script) – emulates application level issues from
incorrect calculation
27
Basic Cluster State Validations
28
# Code Description
00 ALIVE_ON_START Cluster nodes should start correctly
01 SINGLE_MAIN Only one node should consider itself MAIN
02 GW_OK All gateways should be connected to correct MAIN
03 GEN_OK All active cluster nodes should have the same generation
04 TE_VIEW_OK Current MAIN should be connected to all alive nodes
05 CLU_VIEW_CONSISTENT All alive nodes should have the same cluster view
06 ELECTIONS_OK Elections count during the test should match inject scenario
07 DEAD_NODES_OK The number of lost nodes should match inject scenario
08 CLIENTS_ALIVE Clients should not notice any issue, fault handling logic is
completely hidden from them
Test Targets
• Basic system faults
• Multiple system faults on different nodes
• Application level faults
• Random network instabilities
• Recovery after faults
• Governor stability (failures, restarts, failures during elections)
29
Test Summary
30
Logs from all nodes
for root cause analysisCluster state
validations summary
Cluster nodes states
(Sync Backup is dead,
Async Backup switched to Sync)
Basic Fault: Overall System Behavior
Event log timeline
BS died,
elections started
Elections,
no transactions
Resumed
operation
Restore After Fault: Overall System Behavior
BS hanged,
elections started
Elections,
no transactions
Resumed
operation
BS is alive again
BS rejoins the cluster,
receiving missed transactions
Performance Metrics
• Key performance data from all cluster nodes
• How faults influence service quality for consumers?
• Compare configurations (indirectly, together with config push)
33
Domain Specific Language
• Useful for ad-hoc tests and quick analysis
• Complements set of 'default' tests (written in Python)
34
Statistics
• Multiple runs to identify problems without stable reproducers
• Heatmap to analyze quickly both which tests and which validations fail
35
References
36
Similar tools:
1. Netflix Simian Army; http://techblog.netflix.com/2011/07/netflix-simian-army.html
2. Jepsen; https://jepsen.io/
Reading:
1. Caitie McCaffrey. 2015. The Verification of a Distributed System. Queue 13, 9, pages 60
(December 2015), 11 pages. DOI=http://dx.doi.org/10.1145/2857274.2889274
2. Alvaro, P., Rosen, J. and Hellerstein, J.M. 2015. Lineage-driven fault injection.
http://www.cs.berkeley.edu/~palvaro/molly.pdf
3. Yuan, D., Luo, Y., Zhuang, X., Rodrigues, G. R., Zhao, X., Zhang, Y., Jain, P. U., Stumm, M.
2014. Simple testing can prevent most critical failures: an analysis of production failures in
distributed data-intensive systems; https://www.usenix.org/conference/osdi14/technical-
sessions/presentation/yuan
4. Ghosh S. et al. 1997. Software Fault Injection Testing on a Distributed System – A Case Study
5. Lai, M.-Y., Wang S.Y. 1995. Software Fault Insertion Testing for Fault Tolerance. Software Fault
Tolerance, Edited by Lyu, Chapter 13.
Questions
37
Thank you!
38

Más contenido relacionado

La actualidad más candente

Operating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyOperating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyPeter Tröger
 
TMPA-2017: A Survey on Model-Based Testing Tools for Test Case Generation
TMPA-2017: A Survey on Model-Based Testing Tools for Test Case GenerationTMPA-2017: A Survey on Model-Based Testing Tools for Test Case Generation
TMPA-2017: A Survey on Model-Based Testing Tools for Test Case GenerationIosif Itkin
 
Process synchronization
Process synchronizationProcess synchronization
Process synchronizationAli Ahmad
 
Implementation of Election Algorithm of Distributed Systems in Client-Server ...
Implementation of Election Algorithm of Distributed Systems in Client-Server ...Implementation of Election Algorithm of Distributed Systems in Client-Server ...
Implementation of Election Algorithm of Distributed Systems in Client-Server ...Mushfekur Rahman
 
Process Synchronization And Deadlocks
Process Synchronization And DeadlocksProcess Synchronization And Deadlocks
Process Synchronization And Deadlockstech2click
 
Database applicationtesting
Database applicationtestingDatabase applicationtesting
Database applicationtestingRenuka Ballal
 
Mutual Exclusion using Peterson's Algorithm
Mutual Exclusion using Peterson's AlgorithmMutual Exclusion using Peterson's Algorithm
Mutual Exclusion using Peterson's AlgorithmSouvik Roy
 
Concurrency: Mutual Exclusion and Synchronization
Concurrency: Mutual Exclusion and SynchronizationConcurrency: Mutual Exclusion and Synchronization
Concurrency: Mutual Exclusion and SynchronizationAnas Ebrahim
 
Model-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specificationsModel-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specificationsLionel Briand
 
Enabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical SystemsEnabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical SystemsLionel Briand
 
How to improve your Tizen native program
How to improve your Tizen native programHow to improve your Tizen native program
How to improve your Tizen native programRyo Jin
 
Automated and Scalable Solutions for Software Testing: The Essential Role of ...
Automated and Scalable Solutions for Software Testing: The Essential Role of ...Automated and Scalable Solutions for Software Testing: The Essential Role of ...
Automated and Scalable Solutions for Software Testing: The Essential Role of ...Lionel Briand
 

La actualidad más candente (20)

H S
H SH S
H S
 
Design for Testability
Design for TestabilityDesign for Testability
Design for Testability
 
Værktøjer udviklet på AAU til analyse af SCJ programmer
Værktøjer udviklet på AAU til analyse af SCJ programmerVærktøjer udviklet på AAU til analyse af SCJ programmer
Værktøjer udviklet på AAU til analyse af SCJ programmer
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Operating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - ConcurrencyOperating Systems 1 (8/12) - Concurrency
Operating Systems 1 (8/12) - Concurrency
 
Process synchronization
Process synchronizationProcess synchronization
Process synchronization
 
TMPA-2017: A Survey on Model-Based Testing Tools for Test Case Generation
TMPA-2017: A Survey on Model-Based Testing Tools for Test Case GenerationTMPA-2017: A Survey on Model-Based Testing Tools for Test Case Generation
TMPA-2017: A Survey on Model-Based Testing Tools for Test Case Generation
 
Process modelling at BaneDanmark
Process modelling at BaneDanmarkProcess modelling at BaneDanmark
Process modelling at BaneDanmark
 
Process synchronization
Process synchronizationProcess synchronization
Process synchronization
 
Scan insertion
Scan insertionScan insertion
Scan insertion
 
Implementation of Election Algorithm of Distributed Systems in Client-Server ...
Implementation of Election Algorithm of Distributed Systems in Client-Server ...Implementation of Election Algorithm of Distributed Systems in Client-Server ...
Implementation of Election Algorithm of Distributed Systems in Client-Server ...
 
Critical section operating system
Critical section  operating systemCritical section  operating system
Critical section operating system
 
Process Synchronization And Deadlocks
Process Synchronization And DeadlocksProcess Synchronization And Deadlocks
Process Synchronization And Deadlocks
 
Database applicationtesting
Database applicationtestingDatabase applicationtesting
Database applicationtesting
 
Mutual Exclusion using Peterson's Algorithm
Mutual Exclusion using Peterson's AlgorithmMutual Exclusion using Peterson's Algorithm
Mutual Exclusion using Peterson's Algorithm
 
Concurrency: Mutual Exclusion and Synchronization
Concurrency: Mutual Exclusion and SynchronizationConcurrency: Mutual Exclusion and Synchronization
Concurrency: Mutual Exclusion and Synchronization
 
Model-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specificationsModel-driven trace diagnostics for pattern-based temporal specifications
Model-driven trace diagnostics for pattern-based temporal specifications
 
Enabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical SystemsEnabling Model Testing of Cyber Physical Systems
Enabling Model Testing of Cyber Physical Systems
 
How to improve your Tizen native program
How to improve your Tizen native programHow to improve your Tizen native program
How to improve your Tizen native program
 
Automated and Scalable Solutions for Software Testing: The Essential Role of ...
Automated and Scalable Solutions for Software Testing: The Essential Role of ...Automated and Scalable Solutions for Software Testing: The Essential Role of ...
Automated and Scalable Solutions for Software Testing: The Essential Role of ...
 

Destacado

TMPA-2017: Generating Cost Aware Covering Arrays For Free
TMPA-2017: Generating Cost Aware Covering Arrays For Free TMPA-2017: Generating Cost Aware Covering Arrays For Free
TMPA-2017: Generating Cost Aware Covering Arrays For Free Iosif Itkin
 
TMPA-2017: Defect Report Classification in Accordance with Areas of Testing
TMPA-2017: Defect Report Classification in Accordance with Areas of TestingTMPA-2017: Defect Report Classification in Accordance with Areas of Testing
TMPA-2017: Defect Report Classification in Accordance with Areas of TestingIosif Itkin
 
TMPA-2017: Static Checking of Array Objects in JavaScript
TMPA-2017: Static Checking of Array Objects in JavaScriptTMPA-2017: Static Checking of Array Objects in JavaScript
TMPA-2017: Static Checking of Array Objects in JavaScriptIosif Itkin
 
TMPA-2017: Conference Opening
TMPA-2017: Conference OpeningTMPA-2017: Conference Opening
TMPA-2017: Conference OpeningIosif Itkin
 
TMPA-2017: Compositional Process Model Synthesis based on Interface Patterns
TMPA-2017: Compositional Process Model Synthesis based on Interface PatternsTMPA-2017: Compositional Process Model Synthesis based on Interface Patterns
TMPA-2017: Compositional Process Model Synthesis based on Interface PatternsIosif Itkin
 
TMPA-2017: Vellvm - Verifying the LLVM
TMPA-2017: Vellvm - Verifying the LLVMTMPA-2017: Vellvm - Verifying the LLVM
TMPA-2017: Vellvm - Verifying the LLVMIosif Itkin
 
TMPA-2017: Using Functional Directives to Analyze Code Complexity and Communi...
TMPA-2017: Using Functional Directives to Analyze Code Complexity and Communi...TMPA-2017: Using Functional Directives to Analyze Code Complexity and Communi...
TMPA-2017: Using Functional Directives to Analyze Code Complexity and Communi...Iosif Itkin
 
TMPA-2017: Modeling of PLC-programs by High-level Coloured Petri Nets
TMPA-2017: Modeling of PLC-programs by High-level Coloured Petri NetsTMPA-2017: Modeling of PLC-programs by High-level Coloured Petri Nets
TMPA-2017: Modeling of PLC-programs by High-level Coloured Petri NetsIosif Itkin
 
TMPA-2017: The Quest for Average Response Time
TMPA-2017: The Quest for Average Response TimeTMPA-2017: The Quest for Average Response Time
TMPA-2017: The Quest for Average Response TimeIosif Itkin
 
TMPA-2017: Evolutionary Algorithms in Test Generation for digital systems
TMPA-2017: Evolutionary Algorithms in Test Generation for digital systemsTMPA-2017: Evolutionary Algorithms in Test Generation for digital systems
TMPA-2017: Evolutionary Algorithms in Test Generation for digital systemsIosif Itkin
 
TMPA-2017: Layered Layouts for Software Systems Visualization
TMPA-2017: Layered Layouts for Software Systems VisualizationTMPA-2017: Layered Layouts for Software Systems Visualization
TMPA-2017: Layered Layouts for Software Systems VisualizationIosif Itkin
 
TMPA-2017: Functional Parser of Markdown Language Based on Monad Combining an...
TMPA-2017: Functional Parser of Markdown Language Based on Monad Combining an...TMPA-2017: Functional Parser of Markdown Language Based on Monad Combining an...
TMPA-2017: Functional Parser of Markdown Language Based on Monad Combining an...Iosif Itkin
 
TMPA-2017: Predicate Abstraction Based Configurable Method for Data Race Dete...
TMPA-2017: Predicate Abstraction Based Configurable Method for Data Race Dete...TMPA-2017: Predicate Abstraction Based Configurable Method for Data Race Dete...
TMPA-2017: Predicate Abstraction Based Configurable Method for Data Race Dete...Iosif Itkin
 
TMPA-2017: Dl-Check: Dynamic Potential Deadlock Detection Tool for Java Programs
TMPA-2017: Dl-Check: Dynamic Potential Deadlock Detection Tool for Java ProgramsTMPA-2017: Dl-Check: Dynamic Potential Deadlock Detection Tool for Java Programs
TMPA-2017: Dl-Check: Dynamic Potential Deadlock Detection Tool for Java ProgramsIosif Itkin
 
TMPA-2017: Extended Context-Free Grammars Parsing with Generalized LL
TMPA-2017: Extended Context-Free Grammars Parsing with Generalized LLTMPA-2017: Extended Context-Free Grammars Parsing with Generalized LL
TMPA-2017: Extended Context-Free Grammars Parsing with Generalized LLIosif Itkin
 
TMPA-2017: Simple Type Based Alias Analysis for a VLIW Processor
TMPA-2017: Simple Type Based Alias Analysis for a VLIW ProcessorTMPA-2017: Simple Type Based Alias Analysis for a VLIW Processor
TMPA-2017: Simple Type Based Alias Analysis for a VLIW ProcessorIosif Itkin
 
TMPA-2015: The Verification of Functional Programs by Applying Statechart Dia...
TMPA-2015: The Verification of Functional Programs by Applying Statechart Dia...TMPA-2015: The Verification of Functional Programs by Applying Statechart Dia...
TMPA-2015: The Verification of Functional Programs by Applying Statechart Dia...Iosif Itkin
 
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemTMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemIosif Itkin
 
TMPA-2015: Formal Methods in Robotics
TMPA-2015: Formal Methods in RoboticsTMPA-2015: Formal Methods in Robotics
TMPA-2015: Formal Methods in RoboticsIosif Itkin
 
TMPA-2015: ClearTH: a Tool for Automated Testing of Post Trade Systems
TMPA-2015: ClearTH: a Tool for Automated Testing of Post Trade SystemsTMPA-2015: ClearTH: a Tool for Automated Testing of Post Trade Systems
TMPA-2015: ClearTH: a Tool for Automated Testing of Post Trade SystemsIosif Itkin
 

Destacado (20)

TMPA-2017: Generating Cost Aware Covering Arrays For Free
TMPA-2017: Generating Cost Aware Covering Arrays For Free TMPA-2017: Generating Cost Aware Covering Arrays For Free
TMPA-2017: Generating Cost Aware Covering Arrays For Free
 
TMPA-2017: Defect Report Classification in Accordance with Areas of Testing
TMPA-2017: Defect Report Classification in Accordance with Areas of TestingTMPA-2017: Defect Report Classification in Accordance with Areas of Testing
TMPA-2017: Defect Report Classification in Accordance with Areas of Testing
 
TMPA-2017: Static Checking of Array Objects in JavaScript
TMPA-2017: Static Checking of Array Objects in JavaScriptTMPA-2017: Static Checking of Array Objects in JavaScript
TMPA-2017: Static Checking of Array Objects in JavaScript
 
TMPA-2017: Conference Opening
TMPA-2017: Conference OpeningTMPA-2017: Conference Opening
TMPA-2017: Conference Opening
 
TMPA-2017: Compositional Process Model Synthesis based on Interface Patterns
TMPA-2017: Compositional Process Model Synthesis based on Interface PatternsTMPA-2017: Compositional Process Model Synthesis based on Interface Patterns
TMPA-2017: Compositional Process Model Synthesis based on Interface Patterns
 
TMPA-2017: Vellvm - Verifying the LLVM
TMPA-2017: Vellvm - Verifying the LLVMTMPA-2017: Vellvm - Verifying the LLVM
TMPA-2017: Vellvm - Verifying the LLVM
 
TMPA-2017: Using Functional Directives to Analyze Code Complexity and Communi...
TMPA-2017: Using Functional Directives to Analyze Code Complexity and Communi...TMPA-2017: Using Functional Directives to Analyze Code Complexity and Communi...
TMPA-2017: Using Functional Directives to Analyze Code Complexity and Communi...
 
TMPA-2017: Modeling of PLC-programs by High-level Coloured Petri Nets
TMPA-2017: Modeling of PLC-programs by High-level Coloured Petri NetsTMPA-2017: Modeling of PLC-programs by High-level Coloured Petri Nets
TMPA-2017: Modeling of PLC-programs by High-level Coloured Petri Nets
 
TMPA-2017: The Quest for Average Response Time
TMPA-2017: The Quest for Average Response TimeTMPA-2017: The Quest for Average Response Time
TMPA-2017: The Quest for Average Response Time
 
TMPA-2017: Evolutionary Algorithms in Test Generation for digital systems
TMPA-2017: Evolutionary Algorithms in Test Generation for digital systemsTMPA-2017: Evolutionary Algorithms in Test Generation for digital systems
TMPA-2017: Evolutionary Algorithms in Test Generation for digital systems
 
TMPA-2017: Layered Layouts for Software Systems Visualization
TMPA-2017: Layered Layouts for Software Systems VisualizationTMPA-2017: Layered Layouts for Software Systems Visualization
TMPA-2017: Layered Layouts for Software Systems Visualization
 
TMPA-2017: Functional Parser of Markdown Language Based on Monad Combining an...
TMPA-2017: Functional Parser of Markdown Language Based on Monad Combining an...TMPA-2017: Functional Parser of Markdown Language Based on Monad Combining an...
TMPA-2017: Functional Parser of Markdown Language Based on Monad Combining an...
 
TMPA-2017: Predicate Abstraction Based Configurable Method for Data Race Dete...
TMPA-2017: Predicate Abstraction Based Configurable Method for Data Race Dete...TMPA-2017: Predicate Abstraction Based Configurable Method for Data Race Dete...
TMPA-2017: Predicate Abstraction Based Configurable Method for Data Race Dete...
 
TMPA-2017: Dl-Check: Dynamic Potential Deadlock Detection Tool for Java Programs
TMPA-2017: Dl-Check: Dynamic Potential Deadlock Detection Tool for Java ProgramsTMPA-2017: Dl-Check: Dynamic Potential Deadlock Detection Tool for Java Programs
TMPA-2017: Dl-Check: Dynamic Potential Deadlock Detection Tool for Java Programs
 
TMPA-2017: Extended Context-Free Grammars Parsing with Generalized LL
TMPA-2017: Extended Context-Free Grammars Parsing with Generalized LLTMPA-2017: Extended Context-Free Grammars Parsing with Generalized LL
TMPA-2017: Extended Context-Free Grammars Parsing with Generalized LL
 
TMPA-2017: Simple Type Based Alias Analysis for a VLIW Processor
TMPA-2017: Simple Type Based Alias Analysis for a VLIW ProcessorTMPA-2017: Simple Type Based Alias Analysis for a VLIW Processor
TMPA-2017: Simple Type Based Alias Analysis for a VLIW Processor
 
TMPA-2015: The Verification of Functional Programs by Applying Statechart Dia...
TMPA-2015: The Verification of Functional Programs by Applying Statechart Dia...TMPA-2015: The Verification of Functional Programs by Applying Statechart Dia...
TMPA-2015: The Verification of Functional Programs by Applying Statechart Dia...
 
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light SystemTMPA-2015: Implementing the MetaVCG Approach in the C-light System
TMPA-2015: Implementing the MetaVCG Approach in the C-light System
 
TMPA-2015: Formal Methods in Robotics
TMPA-2015: Formal Methods in RoboticsTMPA-2015: Formal Methods in Robotics
TMPA-2015: Formal Methods in Robotics
 
TMPA-2015: ClearTH: a Tool for Automated Testing of Post Trade Systems
TMPA-2015: ClearTH: a Tool for Automated Testing of Post Trade SystemsTMPA-2015: ClearTH: a Tool for Automated Testing of Post Trade Systems
TMPA-2015: ClearTH: a Tool for Automated Testing of Post Trade Systems
 

Similar a TMPA-2017: Live testing distributed system fault tolerance with fault injection techniques

Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012TEST Huddle
 
Foundational Design Patterns for Multi-Purpose Applications
Foundational Design Patterns for Multi-Purpose ApplicationsFoundational Design Patterns for Multi-Purpose Applications
Foundational Design Patterns for Multi-Purpose ApplicationsChing-Hwa Yu
 
Defects mining in exchanges - medvedev, klimakov, yamkovi
Defects mining in exchanges - medvedev, klimakov, yamkoviDefects mining in exchanges - medvedev, klimakov, yamkovi
Defects mining in exchanges - medvedev, klimakov, yamkoviDataFest Tbilisi
 
Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating SystemsRitu Ranjan Shrivastwa
 
Chapter 14 software testing techniques
Chapter 14 software testing techniquesChapter 14 software testing techniques
Chapter 14 software testing techniquesSHREEHARI WADAWADAGI
 
Metamorphic Security Testing for Web Systems
Metamorphic Security Testing for Web SystemsMetamorphic Security Testing for Web Systems
Metamorphic Security Testing for Web SystemsLionel Briand
 
Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...
Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...
Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...Docker, Inc.
 
New software testing-techniques
New software testing-techniquesNew software testing-techniques
New software testing-techniquesFincy V.J
 
WTF is a Microservice - Rafael Schloming, Datawire
WTF is a Microservice - Rafael Schloming, DatawireWTF is a Microservice - Rafael Schloming, Datawire
WTF is a Microservice - Rafael Schloming, DatawireAmbassador Labs
 
Multi Layer Monitoring V1
Multi Layer Monitoring V1Multi Layer Monitoring V1
Multi Layer Monitoring V1Lahav Savir
 
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...Matteo Ferroni
 
Discrete event simulation
Discrete event simulationDiscrete event simulation
Discrete event simulationssusera970cc
 
Seii unit6 software-testing-techniques
Seii unit6 software-testing-techniquesSeii unit6 software-testing-techniques
Seii unit6 software-testing-techniquesAhmad sohail Kakar
 
Newsoftware testing-techniques-141114004511-conversion-gate01
Newsoftware testing-techniques-141114004511-conversion-gate01Newsoftware testing-techniques-141114004511-conversion-gate01
Newsoftware testing-techniques-141114004511-conversion-gate01Mr. Jhon
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
 

Similar a TMPA-2017: Live testing distributed system fault tolerance with fault injection techniques (20)

Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
 
Foundational Design Patterns for Multi-Purpose Applications
Foundational Design Patterns for Multi-Purpose ApplicationsFoundational Design Patterns for Multi-Purpose Applications
Foundational Design Patterns for Multi-Purpose Applications
 
Defects mining in exchanges - medvedev, klimakov, yamkovi
Defects mining in exchanges - medvedev, klimakov, yamkoviDefects mining in exchanges - medvedev, klimakov, yamkovi
Defects mining in exchanges - medvedev, klimakov, yamkovi
 
Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating Systems
 
Chapter 14 software testing techniques
Chapter 14 software testing techniquesChapter 14 software testing techniques
Chapter 14 software testing techniques
 
Metamorphic Security Testing for Web Systems
Metamorphic Security Testing for Web SystemsMetamorphic Security Testing for Web Systems
Metamorphic Security Testing for Web Systems
 
Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...
Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...
Getting Deep on Orchestration: APIs, Actors, and Abstractions in a Distribute...
 
New software testing-techniques
New software testing-techniquesNew software testing-techniques
New software testing-techniques
 
The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017
 
SYNCHRONIZATION
SYNCHRONIZATIONSYNCHRONIZATION
SYNCHRONIZATION
 
WTF is a Microservice - Rafael Schloming, Datawire
WTF is a Microservice - Rafael Schloming, DatawireWTF is a Microservice - Rafael Schloming, Datawire
WTF is a Microservice - Rafael Schloming, Datawire
 
Multi Layer Monitoring V1
Multi Layer Monitoring V1Multi Layer Monitoring V1
Multi Layer Monitoring V1
 
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
[EUC2016] FFWD: latency-aware event stream processing via domain-specific loa...
 
Discrete event simulation
Discrete event simulationDiscrete event simulation
Discrete event simulation
 
Introduction
IntroductionIntroduction
Introduction
 
Seii unit6 software-testing-techniques
Seii unit6 software-testing-techniquesSeii unit6 software-testing-techniques
Seii unit6 software-testing-techniques
 
Deploying at will - SEI
 Deploying at will - SEI Deploying at will - SEI
Deploying at will - SEI
 
Newsoftware testing-techniques-141114004511-conversion-gate01
Newsoftware testing-techniques-141114004511-conversion-gate01Newsoftware testing-techniques-141114004511-conversion-gate01
Newsoftware testing-techniques-141114004511-conversion-gate01
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
 

Más de Iosif Itkin

Foundations of Software Testing Lecture 4
Foundations of Software Testing Lecture 4Foundations of Software Testing Lecture 4
Foundations of Software Testing Lecture 4Iosif Itkin
 
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...Iosif Itkin
 
Exactpro FinTech Webinar - Global Exchanges Test Oracles
Exactpro FinTech Webinar - Global Exchanges Test OraclesExactpro FinTech Webinar - Global Exchanges Test Oracles
Exactpro FinTech Webinar - Global Exchanges Test OraclesIosif Itkin
 
Exactpro FinTech Webinar - Global Exchanges FIX Protocol
Exactpro FinTech Webinar - Global Exchanges FIX ProtocolExactpro FinTech Webinar - Global Exchanges FIX Protocol
Exactpro FinTech Webinar - Global Exchanges FIX ProtocolIosif Itkin
 
Operational Resilience in Financial Market Infrastructures
Operational Resilience in Financial Market InfrastructuresOperational Resilience in Financial Market Infrastructures
Operational Resilience in Financial Market InfrastructuresIosif Itkin
 
20 Simple Questions from Exactpro for Your Enjoyment This Holiday Season
20 Simple Questions from Exactpro for Your Enjoyment This Holiday Season20 Simple Questions from Exactpro for Your Enjoyment This Holiday Season
20 Simple Questions from Exactpro for Your Enjoyment This Holiday SeasonIosif Itkin
 
Testing the Intelligence of your AI
Testing the Intelligence of your AITesting the Intelligence of your AI
Testing the Intelligence of your AIIosif Itkin
 
EXTENT 2019: Exactpro Quality Assurance for Financial Market Infrastructures
EXTENT 2019: Exactpro Quality Assurance for Financial Market InfrastructuresEXTENT 2019: Exactpro Quality Assurance for Financial Market Infrastructures
EXTENT 2019: Exactpro Quality Assurance for Financial Market InfrastructuresIosif Itkin
 
ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...
ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...
ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...Iosif Itkin
 
EXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan Shamrai
EXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan ShamraiEXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan Shamrai
EXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan ShamraiIosif Itkin
 
EXTENT Talks QA Community Tbilisi 20 April 2019 - Conference Open
EXTENT Talks QA Community Tbilisi 20 April 2019 - Conference OpenEXTENT Talks QA Community Tbilisi 20 April 2019 - Conference Open
EXTENT Talks QA Community Tbilisi 20 April 2019 - Conference OpenIosif Itkin
 
User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...
User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...
User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...Iosif Itkin
 
QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...
QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...
QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...Iosif Itkin
 
QA Community Saratov: Past, Present, Future (2019-02-08)
QA Community Saratov: Past, Present, Future (2019-02-08)QA Community Saratov: Past, Present, Future (2019-02-08)
QA Community Saratov: Past, Present, Future (2019-02-08)Iosif Itkin
 
Machine Learning and RoboCop Testing
Machine Learning and RoboCop TestingMachine Learning and RoboCop Testing
Machine Learning and RoboCop TestingIosif Itkin
 
Behaviour Driven Development: Oltre i limiti del possibile
Behaviour Driven Development: Oltre i limiti del possibileBehaviour Driven Development: Oltre i limiti del possibile
Behaviour Driven Development: Oltre i limiti del possibileIosif Itkin
 
2018 - Exactpro Year in Review
2018 - Exactpro Year in Review2018 - Exactpro Year in Review
2018 - Exactpro Year in ReviewIosif Itkin
 
Exactpro Discussion about Joy and Strategy
Exactpro Discussion about Joy and StrategyExactpro Discussion about Joy and Strategy
Exactpro Discussion about Joy and StrategyIosif Itkin
 
FIX EMEA Conference 2018 - Post Trade Software Testing Challenges
FIX EMEA Conference 2018 - Post Trade Software Testing ChallengesFIX EMEA Conference 2018 - Post Trade Software Testing Challenges
FIX EMEA Conference 2018 - Post Trade Software Testing ChallengesIosif Itkin
 
BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)
BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)
BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)Iosif Itkin
 

Más de Iosif Itkin (20)

Foundations of Software Testing Lecture 4
Foundations of Software Testing Lecture 4Foundations of Software Testing Lecture 4
Foundations of Software Testing Lecture 4
 
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
QA Financial Forum London 2021 - Automation in Software Testing. Humans and C...
 
Exactpro FinTech Webinar - Global Exchanges Test Oracles
Exactpro FinTech Webinar - Global Exchanges Test OraclesExactpro FinTech Webinar - Global Exchanges Test Oracles
Exactpro FinTech Webinar - Global Exchanges Test Oracles
 
Exactpro FinTech Webinar - Global Exchanges FIX Protocol
Exactpro FinTech Webinar - Global Exchanges FIX ProtocolExactpro FinTech Webinar - Global Exchanges FIX Protocol
Exactpro FinTech Webinar - Global Exchanges FIX Protocol
 
Operational Resilience in Financial Market Infrastructures
Operational Resilience in Financial Market InfrastructuresOperational Resilience in Financial Market Infrastructures
Operational Resilience in Financial Market Infrastructures
 
20 Simple Questions from Exactpro for Your Enjoyment This Holiday Season
20 Simple Questions from Exactpro for Your Enjoyment This Holiday Season20 Simple Questions from Exactpro for Your Enjoyment This Holiday Season
20 Simple Questions from Exactpro for Your Enjoyment This Holiday Season
 
Testing the Intelligence of your AI
Testing the Intelligence of your AITesting the Intelligence of your AI
Testing the Intelligence of your AI
 
EXTENT 2019: Exactpro Quality Assurance for Financial Market Infrastructures
EXTENT 2019: Exactpro Quality Assurance for Financial Market InfrastructuresEXTENT 2019: Exactpro Quality Assurance for Financial Market Infrastructures
EXTENT 2019: Exactpro Quality Assurance for Financial Market Infrastructures
 
ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...
ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...
ClearTH Test Automation Framework: Case Study in IRS & CDS Swaps Lifecycle Mo...
 
EXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan Shamrai
EXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan ShamraiEXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan Shamrai
EXTENT Talks 2019 Tbilisi: Failover and Recovery Test Automation - Ivan Shamrai
 
EXTENT Talks QA Community Tbilisi 20 April 2019 - Conference Open
EXTENT Talks QA Community Tbilisi 20 April 2019 - Conference OpenEXTENT Talks QA Community Tbilisi 20 April 2019 - Conference Open
EXTENT Talks QA Community Tbilisi 20 April 2019 - Conference Open
 
User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...
User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...
User-Assisted Log Analysis for Quality Control of Distributed Fintech Applica...
 
QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...
QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...
QAFF Chicago 2019 - Complex Post-Trade Systems, Requirements Traceability and...
 
QA Community Saratov: Past, Present, Future (2019-02-08)
QA Community Saratov: Past, Present, Future (2019-02-08)QA Community Saratov: Past, Present, Future (2019-02-08)
QA Community Saratov: Past, Present, Future (2019-02-08)
 
Machine Learning and RoboCop Testing
Machine Learning and RoboCop TestingMachine Learning and RoboCop Testing
Machine Learning and RoboCop Testing
 
Behaviour Driven Development: Oltre i limiti del possibile
Behaviour Driven Development: Oltre i limiti del possibileBehaviour Driven Development: Oltre i limiti del possibile
Behaviour Driven Development: Oltre i limiti del possibile
 
2018 - Exactpro Year in Review
2018 - Exactpro Year in Review2018 - Exactpro Year in Review
2018 - Exactpro Year in Review
 
Exactpro Discussion about Joy and Strategy
Exactpro Discussion about Joy and StrategyExactpro Discussion about Joy and Strategy
Exactpro Discussion about Joy and Strategy
 
FIX EMEA Conference 2018 - Post Trade Software Testing Challenges
FIX EMEA Conference 2018 - Post Trade Software Testing ChallengesFIX EMEA Conference 2018 - Post Trade Software Testing Challenges
FIX EMEA Conference 2018 - Post Trade Software Testing Challenges
 
BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)
BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)
BDD. The Outer Limits. Iosif Itkin at Youcon (in Russian)
 

Último

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Último (20)

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

TMPA-2017: Live testing distributed system fault tolerance with fault injection techniques

  • 1. [ ИМИДЖЕВОЕ ИЗОБРАЖЕНИЕ ] Live Testing of Distributed System Fault Tolerance With Fault Injection Techniques Alexey Vasyukov, Inventa Vadim Zherder, Moscow Exchange
  • 2. Plan • Introduction Distributed Trading System concepts • Distributed Consensus protocol • MOEX Fault Injection testing framework 2
  • 5. Transactions Processing • One incoming stream • Strictly ordered FIFO incoming stream • SeqNum • TimeStamp • Strictly ordered FIFO outcoming stream • Each transaction should get reply • No loss, no duplicates • Multistage processing • No 1 – finished, No 2,3 – in process, No 4 - incoming • Transaction processing result • ErrorCode, Message, Statuses, … • TransLog = Transactions Log • Transaction buffer • Large but finite. When exhausted, TS stops processing transactions and become “unavailable” 5
  • 7. Messaging 7 UP – transactions DOWN – responses and messages to nodes of the cluster Use CRC (Cyclic redundancy check) to control transaction flow Full history: “Late join” is possible
  • 8. Role: “Main” “Main” = the main TS instance • Get new transactions from incoming stream • Broadcast transactions within cluster • Process transactions • Check transaction result (compare to results obtained from other nodes) • Broadcast results within cluster • Publish replies to clients 8
  • 9. Role: “Backup” “Backup” = a special state of TS instance • Can be switched to Main quickly • Get all transactions from Main • Process transactions independently • Check transaction result (compare to results from other nodes) • Write its own TransLog • Do not send replies to clients 9
  • 10. Role: “Backup” 2 modes: SYNC (“Hot Backup”) and ASYNC (“Warm Backup”) 10 If Main failed, SYNC can switch to Main automatically SYNC publish transaction results If SYNC failed, ASYNC can switch to SYNC automatically ASYNC does not publish transaction result ASYNC can be switched to Main manually by Operator Number of SYNCs is a static parameter determined by Operator
  • 11. Role: Governor 11 Governor can • Force node to change its role • Force node to stop • Start Elections to assign new Main Only one Governor in the cluster Governor can be assigned only by Operator Governor role cannot be changed If some node asks Governor but it is unavailable then this node stalls until it recovers connectivity to the Governor Governor can be recovered or restarted only by Operator
  • 12. Roles Summary 12 Governor Main SYNC Backup ASYNC Backup Send Table of states V V V V Get Client Transaction V Broadcast Transaction V Process Transaction V V V Broadcast Transaction result V V Compare transaction results V V V Send replies to clients V Can Switch to Main SYNC
  • 13. If something goes wrong… 13 IF we detect • Mismatch in transaction result • A node does not respond • No new transactions incoming • Wrong CRC • Governor does not response • Mismatch in tables of states • … THEN ASK Governor
  • 14. Elections 14 Elections Starts to assign new Main Stop transaction processing 2-fold Generation counter (G:S) Initial values (0:0) Every successful election increases G and drop S to 0 (G:0). Every round of elections increases S Example: (1:0) -> (1:1) -> (1:2) -> (2:0) Generation counter in every message to/from Governor 2-Phase commit approach Governor sends new table of states and waits for confirmation from all nodes
  • 16. MOEX Consensus Protocol (by Sergey Kostanbaev, MOEX) 16 We must provide Tables of state to be consistent at all nodes during normal work period We must provide Tables of state to become consistent after some nodes failed Node 1 Node 2 Node 3 Node 4 uuid1 S_MAIN S_GOVERNOR S_BACKUP_SYNC S_BACKUP_ASYNC uuid2 S_MAIN S_GOVERNOR S_BACKUP_SYNC S_BACKUP_ASYNC uuid3 S_MAIN S_GOVERNOR S_BACKUP_SYNC S_BACKUP_ASYNC uuid4 S_MAIN S_GOVERNOR S_BACKUP_SYNC S_BACKUP_ASYNC Table of States at each node
  • 17. MOEX Consensus Protocol 17 Thus, it is an example of a Distributed consensus protocol Other examples: • Paxos, 1998, 2001, … LAMPORT, L. Paxos made simple. ACM SIGACT News 32, 4 (Dec. 2001), 18–25. • RAFT, 2014 https://raft.github.io/raft.pdf ONGARO, D., AND OUSTERHOUT, J. In search of an understandable consensus algorithm. In Proc ATC’14,USENIX Annual Technical Conference (2014), USENIX • DNCP, 2016 https://tools.ietf.org/html/rfc7787 Open questions: Is MOEX CP equivalent to any of known protocols? Hypothesis on MOEX CP features H1. Byzantine fault tolerance H2. Safety H3. No liveness
  • 18. Cluster Normal State Requirements 18 • There is exactly 1 Governor in the cluster • There is exactly 1 Main in the cluster • Tables of states at all nodes are consistent • All active nodes in the cluster have the same value of Generation Counter • The cluster is available (for client connection) and process transactions • All nodes process the same sequence of transactions • Either number of SYNCs equals to the predefined value, or it is less than predefined value and there is no ASYNCs …
  • 19. Main “Theorem” 19 • Assume that the cluster was in Normal state, and one of Main or Backup node fails. Then the cluster goes back to Normal state during finite time.
  • 20. MOEX CP Testing 20 Investigate • Fault detection • Implementation correctness • Timing • Dependence on load profile • Dependence on environment configuration • Statistics Integration with CI/CD processes
  • 21. Typical Test Scenario 21 1. Start all 2. Wait for normal state 3. Start transactions generator 4. Keep transactions flow for some time 5. Fault injection – emulate fault (single or multiple) 6. Wait for normal state (check timeout) 7. Check state at each node 8. Get artifacts
  • 22. References 22 WIDDER J., Introduction into Fault-tolerant Distributed Algorithms and their Modeling, TMPA (2014) LAMPORT, L. Paxos made simple. ACM SIGACT News 32, 4 (Dec. 2001), 18–25. https://raft.github.io/raft.pdf ONGARO, D., AND OUSTERHOUT, J. In search of an understandable consensus algorithm. In Proc ATC’14,USENIX Annual Technical Conference (2014), USENIX ONGARO D. Consensus: Bridging theory and practice : Doctoral dissertation – Stanford University, 2014.
  • 24. Fault Injection: Testing Implementation 24
  • 25. MOEX Fault Injection Framework Concepts • End-to-end testing of cluster implementation • Starts complete real system on real infrastructure • Provides modules to inject predictable faults on selected servers • Provides domain specific libraries to write tests • System, network, app issues are injected directly • Misconfiguration problems are tested indirectly (real infrastructure, config push before test start) 25
  • 27. Inject Techniques OS Processes • Kill (SIGKILL) • Hang (SIGSTOP for N seconds + SIGCONT) Network • Interface “blink” (DROP 100% packets for N seconds) • Interface “noise” (DROP X% packets for N seconds) • Content filtering – allows “smart” inject into protocol, dropping selected messages from the flow Application • Data corrupt (with gdb script) – emulates application level issues from incorrect calculation 27
  • 28. Basic Cluster State Validations 28 # Code Description 00 ALIVE_ON_START Cluster nodes should start correctly 01 SINGLE_MAIN Only one node should consider itself MAIN 02 GW_OK All gateways should be connected to correct MAIN 03 GEN_OK All active cluster nodes should have the same generation 04 TE_VIEW_OK Current MAIN should be connected to all alive nodes 05 CLU_VIEW_CONSISTENT All alive nodes should have the same cluster view 06 ELECTIONS_OK Elections count during the test should match inject scenario 07 DEAD_NODES_OK The number of lost nodes should match inject scenario 08 CLIENTS_ALIVE Clients should not notice any issue, fault handling logic is completely hidden from them
  • 29. Test Targets • Basic system faults • Multiple system faults on different nodes • Application level faults • Random network instabilities • Recovery after faults • Governor stability (failures, restarts, failures during elections) 29
  • 30. Test Summary 30 Logs from all nodes for root cause analysisCluster state validations summary Cluster nodes states (Sync Backup is dead, Async Backup switched to Sync)
  • 31. Basic Fault: Overall System Behavior Event log timeline BS died, elections started Elections, no transactions Resumed operation
  • 32. Restore After Fault: Overall System Behavior BS hanged, elections started Elections, no transactions Resumed operation BS is alive again BS rejoins the cluster, receiving missed transactions
  • 33. Performance Metrics • Key performance data from all cluster nodes • How faults influence service quality for consumers? • Compare configurations (indirectly, together with config push) 33
  • 34. Domain Specific Language • Useful for ad-hoc tests and quick analysis • Complements set of 'default' tests (written in Python) 34
  • 35. Statistics • Multiple runs to identify problems without stable reproducers • Heatmap to analyze quickly both which tests and which validations fail 35
  • 36. References 36 Similar tools: 1. Netflix Simian Army; http://techblog.netflix.com/2011/07/netflix-simian-army.html 2. Jepsen; https://jepsen.io/ Reading: 1. Caitie McCaffrey. 2015. The Verification of a Distributed System. Queue 13, 9, pages 60 (December 2015), 11 pages. DOI=http://dx.doi.org/10.1145/2857274.2889274 2. Alvaro, P., Rosen, J. and Hellerstein, J.M. 2015. Lineage-driven fault injection. http://www.cs.berkeley.edu/~palvaro/molly.pdf 3. Yuan, D., Luo, Y., Zhuang, X., Rodrigues, G. R., Zhao, X., Zhang, Y., Jain, P. U., Stumm, M. 2014. Simple testing can prevent most critical failures: an analysis of production failures in distributed data-intensive systems; https://www.usenix.org/conference/osdi14/technical- sessions/presentation/yuan 4. Ghosh S. et al. 1997. Software Fault Injection Testing on a Distributed System – A Case Study 5. Lai, M.-Y., Wang S.Y. 1995. Software Fault Insertion Testing for Fault Tolerance. Software Fault Tolerance, Edited by Lyu, Chapter 13.