Powerpoint exploring the locations used in television show Time Clash
Bft mr-clouds-of-clouds-discco2012 - navtalk
1. Byzantine Fault-Tolerant MapReduce
in Cloud-of-Clouds
Joint work with: Miguel Correia, Marcelo Pasin,
Alysson Bessani, Fernando Ramos, Paulo Verissimo
Presenter: Pedro Costa
Navtalk
2. Motivation
• How to count the number of words in the
internet?
• How to do it with the help of a cloud-of-clouds
(ie, several clouds)
• Guarantee integrity and availability of data
2
3. Outline
• Introduction
– MapReduce programming model
– Fault tolerance in Cloud-of-clouds
– 3 problems for Basic scheme
• Our approach
– Byzantine fault-tolerant MapReduce in clouds-of-clouds
• Evaluation
3
5. What is MapReduce?
• Programming model + execution environment
• Introduced by Google in 2004
• Used for processing large data sets using clusters of servers
• A few implementations available, used by many companies
• Hadoop MapReduce, an open-source MapReduce of Apache
• The most used, the one we have been using
• Includes HDFS, a distributed file system for large files
5
6. MapReduce basic idea
A file with all the words
on the Internet
Map Phase <word,1>
<word,n>
Reduce Phase
Tasktracker
servers
Tasktracker
servers
Job tracker detects and recovers crashed map/reduce tasks 6
8. But there are more faults…
• Problem: Accidental faults may affect the correctness of the results
of MapReduce
• Task corruptions: memory errors, chipset errors, …
• Cloud outages: MapReduce job interruptions
(as reported in popular clouds)
• Our goal:
• guarantee integrity and availability (despite task corruptions and
cloud outages)
• Develop a new model to compute MapReduce in cloud-of-clouds
• Commercially feasible?
Yes, but out of scope of this presentation
Tobias Kurze et al., Cloud federation. In Proceedings of the 2nd International
Conference on Cloud Computing, GRIDs, and Virtualization CLOUD COMPUTING
2011.
8
9. Byzantine fault-tolerant MapReduce
• Basic idea: to replicate tasks in different clouds and vote the
results returned by the replicas
• The set of clouds forms a clouds, so cloud-of-clouds
• Inputs initially stored in all clouds (i.e., not our problem)
Cloud 1
Cloud 2
Cloud 3
9
10. System model
• Client is correct (not part of MapReduce)
• Clouds: up to t clouds can arbitrarily corrupt all tasks and
other modules they execute
• Why use t and not f? t≤f
• Next:
• Basic BFT MapReduce scheme
• 3 problems of the Basic scheme
• Our approach: Full BFT MapReduce scheme
10
13. Improvements over basic version
• 3 problems have risen
• Computation problem
• Communication problem
• Job execution control problem
• 3 Solutions: Our BFT MapReduce can be thought of as this
basic version plus the following mechanisms,
• Deferred execution (computation problem)
• Digest communication (communication problem)
• Distributed Job tracker (job execution control problem)
13
14. Problem 1: computation
split 0 part 0
split 0 part 0
Replicas in different
Replicas in different
clouds
clouds
split 0 part 0
Tasks are executed 2t+1 times 14
15. Solution 1: Deferred execution
• Computation problem is uncommon
• Job Tracker replicates tasks across t+1 clouds (t in standby)
• If results differ or one cloud stops, request 1 more (up to t)
split 0
part 0
split 0
part 0
15
16. Problem 2: communication
split 0 part 0
split 0 part 0
Replicas in different
clouds
split 0 part 0
All this communication through the Internet (delay, cost)! 16
17. Solution 2: Transferring Digests
• Reduces must fetch the map task outputs
• Intra-cloud fetch: output fetched normally
• Inter-cloud fetch: only hash of the output fetched – key idea
split 0
other clouds same cloud
part 0
split 0
split 0
17
18. Problem 3: Job execution control
• Job tracker controls all task executions in the task trackers in
all clouds
• If Job tracker is in one cloud separated from many task
trackers by the internet:
• Communication is slow
• Large timeouts for detecting task tracker failure
• …and it’s a single point of failure (this is the case in MR & Hadoop MR)
18
21. Setup and Test
Platform configuration
• 3 clouds
• Each cloud has 3 nodes
• 1 JT and 3TT for each cloud
• All JTs are interconnected
Job submitted (Wordcount)
• Input data: 26 chunks of 64 MB (total 1.5GB )
• Map tasks: 26
• Reduce tasks: 120, 180, 360, 400
21
22. Number of reduce tasks executed
(no faults, t=1)
Nr. Job Job Diff
Reduce duration duration
tasks (Official) (CoC)
120 00:15:35 00:17:13 00:02:35
180 00:19:35 00:21:36 00:02:01
360 00:31:12 00:33:30 00:02:18
400 00:33:37 00:36:24 00:02:47
24. Conclusions
• Our method guarantee integrity and availability despite task
corruptions and cloud outages
• BFT MapReduce in cloud-of-clouds is feasible!
• No need to execute in all 2t+1 clouds
• Only digests sent through the Internet (no “big data”)
• Control job execution within each cloud
Thank you
24