SlideShare una empresa de Scribd logo
1 de 19
Peter R. Pietzuch
prp@doc.ic.ac.uk
Integrating Scale Out and Fault Tolerance
in Stream Processing using
Operator State Management
with Raul Castro Fernandez*
Matteo Migliavacca+ and Peter Pietzuch*
*Imperial College London, +Kent Univerisity
Big data …
… in numbers:
– 2.5 billions on gigabytes of data every day (source IBM)
– LSST telescope, Chile 2016, 30 TB nightly
… come from everywhere:
– web feeds, social networking
– mobile devices, sensors, cameras
– scientific instruments
– online transactions (public and private sectors)
… have value:
– Global Pulse forum for detecting human crises internationally
– real-time big data analytics in UK £25 billions  £216 billions in 2012-17
– recommendation applications (LinkedIn, Amazon)
2
 processing infrastructure for big data analysis
A black-box approach for big data analysis
• users issue analysis queries with real-time semantics
• streams of data updates, time-varying rates, generated in real-time
• streams of result data
 processing in near real-time
3
time
Stream
Processing
System
• queries consist of operators (join, map, select, ..., UDOs)
• operators form graphs
• operators process streams of tuples on-the-fly
• operators span nodes
Distributed Stream Processing System
4
Elastic DSPSs in the Cloud
Real-time big data analysis challenge traditional DSPS:
? what about continuous workload surges?
? what about real-time resource allocation to workload variations?
? keeping the state correct forstateful operators?
Massively scalable , cloud-based DSPSs [SIGMOD 2013]
1. gracefully handles stateful operators’ state
2. operator state management for combined scale out and
fault tolerance
3. SEEP system and evaluation
4. related work
5. future research directions
5
Stream Processing in the Cloud
• clouds provide infinite pools of resources
6
? How do we build a stream processing platform in the Cloud?
• Failure resilience:
– active fault-tolerance needs 2x resources
– passive fault-tolerance leads to long
recovery times
• Intra-query parallelism:
– provisioning for workload peaks
unnecessarily conservative
 dynamic scale out:
increase resources
when peaks appear
 hybrid fault-tolerance:
low resource overhead
with fast recovery
 Both mechanisms must support stateful operators
Stateless vs Stateful Operators
7
stateless:
 failure recovery
 scale out
filter
> 5
filter
filter
counter
counter
counter
stateful:
× failure recovery
× scale out
(the, 10)
(with, 5) (the, 10)
(with, 5)
the with the
(the, 2) !=12
(with, 1) !=6
7 1 5 9 9
7
9
9
(the, …)
(with, …)
with
operator state: a summary of past tuples’ processing
State Management
8
processing state: (summary of past tuples’ processing)
routing state: (routing of tuples)
buffer state: (tuples)
 operator state is an external entity managed by the DSPS
 primitives for state management
 mechanisms (scale out, failure recovery) on top of primitives
 dynamic reconfiguration of the dataflow graph
A
B
C
State Management Primitives
9
takes snapshot of state and
makes it externally available
 restore
 backup
A
A
B
B
 checkpoint
 partition
moves copy of state from
one operator to another
splits state in a semantically correct
fashion for parallel processing
State Management Scale Out, Stateful Ops
10
A
A
periodically, stateful operators
checkpoint and back up state
to designated upstream
backup node, in memory
A
A
backup node already
has state of operator
to be parallelised
A’
A
A’
A
A’
 checkpoint
 backup
 partition
 restore upstream ops send
unprocessed tuples
to update
checkpointed state
B
 How do we partition stateful operators?
Partitioning Stateful Operators
• 1. Processing state modeled as (key, value) dictionary
• 2. State partitioned according to key k of tuples
• 3. Tuples will be routed to correct operator as of k
11
t=1, key=c, “computer”
t=3, key=c, “cambridge”
t=3, (c, computer:1, cambridge:1)
t=1, “computer”
t=2, “laboratory”
t=3, “cambridge” splitter
counter
t=2, key=l, “laboratory”
(a  k), A
(l  z), A’
t=2, (l, laboratory:1)
counter
A
A’
routing
state
buffer state
processing state
Passive Fault-Tolerance Model
• recreate operator state by replaying tuples after failure:
– upstream backup: sends acks upstream for tuples processed downstream
• may result in long recovery times due to large buffers:
– system is reprocessing streams after failure  inefficient
12
ACKs
data
A B C D
Recovering using State Management (R+SM)
13
A
A
A
• Benefit from state management primitives:
– use periodically backed up state on upstream node to recover faster
– trim buffers at backup node
– same primitives as in scale out
A
A
state is restored and unprocessed
tuples are replayed from buffer
 same primitives for parallel recovery
A
A’
State Management in Action: SEEP
14
(1)
(2)
(1) dynamic Scale Out: detect bottleneck , add new parallelised operator
(2) failure Recovery: detect failure, replace with new operator
EC2 stats
fault
detector
scale out
coordinator
deployment manager
query manager
queries
bottleneck detector
scaling policy
VM pool
faults
recovery
coordinator
Dynamic Scale Out: Detecting bottlenecks
CPU
utilisation
report
35%
85%
30%
logical infrastructure
view
35% 85% 30%
bottleneck
detector
15
The VM Pool: Adding operators
• problem: allocating new VMs takes minutes...
16
bottleneck
detector
monitoring
information
Cloud
provider
VM1 VM2
virtual machine pool
provision VM from cloud
(order of mins)
add new VM to pool
fault detector
VM2
VM3 (dynamic pool size)
Experimental Evaluation
• Goals:
– investigate effectiveness of scale out mechanism
– recovery time after failure using R+SM
– overhead of state management
• Scalable and Elastic Event Processing (SEEP):
– implemented in Java; Storm-like data flow model
• Sample queries + workload
– Linear Road Benchmark (LRB) to evaluate scale out [VLDB’04]
• provides an increasing stream workload over time
• query with 8 operators, 3 are stateful; SLA: results < 5 secs
– Windowed word count query (2 ops) to evaluate fault tolerance
• induce failure to observe performance impact
• Deployment on Amazon AWS EC2
– sources and sinks on high-memory double extra large instances
– operators on small instances
17
Scale Out: LRB Workload
18
scales to load factor L=350
with 50 VMs on Amazon EC2
(automated query parallelisation,
scale out policy at 70%)
L=512 highest result [VLDB’12]
(hand-crafted query on cluster)
scale out leads to latency peaks,
but remains within LRB SLA
 SEEP scales out to increasing workload in the Linear Road Benchmark
Conclusions
19
• Stream processing will grow in importance:
– handling the data deluge
– enables real-time response and decision making
• Integrated approach for scale out and failure recovery:
– operator state an independent entity
– primitives and mechanisms
• Efficient approach extensible for additional operators:
– effectively applied to Amazon EC2 running LRB
– parallel recovery

Más contenido relacionado

Similar a data-stream-processing-SEEP.pptx

Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseBig Data Spain
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPTathagata Das
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQXin Wang
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseKostas Tzoumas
 
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupTill Rohrmann
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at ScaleSean Zhong
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017Jags Ramnarayan
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
First Flink Bay Area meetup
First Flink Bay Area meetupFirst Flink Bay Area meetup
First Flink Bay Area meetupKostas Tzoumas
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging振东 刘
 
Automated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsSAIL_QU
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowDaniel S. Katz
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaDataStax Academy
 
Update on Trinity System Procurement and Plans
Update on Trinity System Procurement and PlansUpdate on Trinity System Procurement and Plans
Update on Trinity System Procurement and Plansinside-BigData.com
 

Similar a data-stream-processing-SEEP.pptx (20)

Introduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas WeiseIntroduction to Apache Apex by Thomas Weise
Introduction to Apache Apex by Thomas Weise
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
Realtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQRealtime Statistics based on Apache Storm and RocketMQ
Realtime Statistics based on Apache Storm and RocketMQ
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
 
Machine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning GroupMachine Learning with Apache Flink at Stockholm Machine Learning Group
Machine Learning with Apache Flink at Stockholm Machine Learning Group
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
 
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data 2016: Next Gen Big Data Analytics with Apache Apex
 
First Flink Bay Area meetup
First Flink Bay Area meetupFirst Flink Bay Area meetup
First Flink Bay Area meetup
 
Telegraph Cq English
Telegraph Cq EnglishTelegraph Cq English
Telegraph Cq English
 
3.2 Streaming and Messaging
3.2 Streaming and Messaging3.2 Streaming and Messaging
3.2 Streaming and Messaging
 
Automated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise ApplicationsAutomated Discovery of Performance Regressions in Enterprise Applications
Automated Discovery of Performance Regressions in Enterprise Applications
 
Swift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance WorkflowSwift Parallel Scripting for High-Performance Workflow
Swift Parallel Scripting for High-Performance Workflow
 
Tsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in ChinaTsinghua University: Two Exemplary Applications in China
Tsinghua University: Two Exemplary Applications in China
 
Update on Trinity System Procurement and Plans
Update on Trinity System Procurement and PlansUpdate on Trinity System Procurement and Plans
Update on Trinity System Procurement and Plans
 

Más de AhmadTawfigAlRadaide

نموذج اوراق المؤتمر.pptx
نموذج اوراق المؤتمر.pptxنموذج اوراق المؤتمر.pptx
نموذج اوراق المؤتمر.pptxAhmadTawfigAlRadaide
 
49231fc6-3c82-4122-b3cb-4c64fd9db005.pptx
49231fc6-3c82-4122-b3cb-4c64fd9db005.pptx49231fc6-3c82-4122-b3cb-4c64fd9db005.pptx
49231fc6-3c82-4122-b3cb-4c64fd9db005.pptxAhmadTawfigAlRadaide
 
Chapter 4 Project Integration Management.ppt
Chapter 4 Project Integration Management.pptChapter 4 Project Integration Management.ppt
Chapter 4 Project Integration Management.pptAhmadTawfigAlRadaide
 
Chapter 3 The Project Management Process Groups A Case Study.ppt
Chapter 3 The Project Management Process Groups A Case Study.pptChapter 3 The Project Management Process Groups A Case Study.ppt
Chapter 3 The Project Management Process Groups A Case Study.pptAhmadTawfigAlRadaide
 
Chapter 2 The Project Management and Information Technology Context.ppt
Chapter 2 The Project Management and Information Technology Context.pptChapter 2 The Project Management and Information Technology Context.ppt
Chapter 2 The Project Management and Information Technology Context.pptAhmadTawfigAlRadaide
 
Chapter 1 Introduction to Project Management.ppt
Chapter 1 Introduction to Project Management.pptChapter 1 Introduction to Project Management.ppt
Chapter 1 Introduction to Project Management.pptAhmadTawfigAlRadaide
 

Más de AhmadTawfigAlRadaide (8)

Research title.pptx
Research title.pptxResearch title.pptx
Research title.pptx
 
نموذج اوراق المؤتمر.pptx
نموذج اوراق المؤتمر.pptxنموذج اوراق المؤتمر.pptx
نموذج اوراق المؤتمر.pptx
 
49231fc6-3c82-4122-b3cb-4c64fd9db005.pptx
49231fc6-3c82-4122-b3cb-4c64fd9db005.pptx49231fc6-3c82-4122-b3cb-4c64fd9db005.pptx
49231fc6-3c82-4122-b3cb-4c64fd9db005.pptx
 
Chapter 4 Project Integration Management.ppt
Chapter 4 Project Integration Management.pptChapter 4 Project Integration Management.ppt
Chapter 4 Project Integration Management.ppt
 
Chapter 3 The Project Management Process Groups A Case Study.ppt
Chapter 3 The Project Management Process Groups A Case Study.pptChapter 3 The Project Management Process Groups A Case Study.ppt
Chapter 3 The Project Management Process Groups A Case Study.ppt
 
Chapter 2 The Project Management and Information Technology Context.ppt
Chapter 2 The Project Management and Information Technology Context.pptChapter 2 The Project Management and Information Technology Context.ppt
Chapter 2 The Project Management and Information Technology Context.ppt
 
Chapter 1 Introduction to Project Management.ppt
Chapter 1 Introduction to Project Management.pptChapter 1 Introduction to Project Management.ppt
Chapter 1 Introduction to Project Management.ppt
 
ch 3.ppt
ch 3.pptch 3.ppt
ch 3.ppt
 

Último

'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.soniya singh
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607dollysharma2066
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...aditipandeya
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of indiaimessage0108
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Roomdivyansh0kumar0
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsstephieert
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Delhi Call girls
 

Último (20)

'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of india
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girls
 
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
Best VIP Call Girls Noida Sector 75 Call Me: 8448380779
 

data-stream-processing-SEEP.pptx

  • 1. Peter R. Pietzuch prp@doc.ic.ac.uk Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management with Raul Castro Fernandez* Matteo Migliavacca+ and Peter Pietzuch* *Imperial College London, +Kent Univerisity
  • 2. Big data … … in numbers: – 2.5 billions on gigabytes of data every day (source IBM) – LSST telescope, Chile 2016, 30 TB nightly … come from everywhere: – web feeds, social networking – mobile devices, sensors, cameras – scientific instruments – online transactions (public and private sectors) … have value: – Global Pulse forum for detecting human crises internationally – real-time big data analytics in UK £25 billions  £216 billions in 2012-17 – recommendation applications (LinkedIn, Amazon) 2  processing infrastructure for big data analysis
  • 3. A black-box approach for big data analysis • users issue analysis queries with real-time semantics • streams of data updates, time-varying rates, generated in real-time • streams of result data  processing in near real-time 3 time Stream Processing System
  • 4. • queries consist of operators (join, map, select, ..., UDOs) • operators form graphs • operators process streams of tuples on-the-fly • operators span nodes Distributed Stream Processing System 4
  • 5. Elastic DSPSs in the Cloud Real-time big data analysis challenge traditional DSPS: ? what about continuous workload surges? ? what about real-time resource allocation to workload variations? ? keeping the state correct forstateful operators? Massively scalable , cloud-based DSPSs [SIGMOD 2013] 1. gracefully handles stateful operators’ state 2. operator state management for combined scale out and fault tolerance 3. SEEP system and evaluation 4. related work 5. future research directions 5
  • 6. Stream Processing in the Cloud • clouds provide infinite pools of resources 6 ? How do we build a stream processing platform in the Cloud? • Failure resilience: – active fault-tolerance needs 2x resources – passive fault-tolerance leads to long recovery times • Intra-query parallelism: – provisioning for workload peaks unnecessarily conservative  dynamic scale out: increase resources when peaks appear  hybrid fault-tolerance: low resource overhead with fast recovery  Both mechanisms must support stateful operators
  • 7. Stateless vs Stateful Operators 7 stateless:  failure recovery  scale out filter > 5 filter filter counter counter counter stateful: × failure recovery × scale out (the, 10) (with, 5) (the, 10) (with, 5) the with the (the, 2) !=12 (with, 1) !=6 7 1 5 9 9 7 9 9 (the, …) (with, …) with operator state: a summary of past tuples’ processing
  • 8. State Management 8 processing state: (summary of past tuples’ processing) routing state: (routing of tuples) buffer state: (tuples)  operator state is an external entity managed by the DSPS  primitives for state management  mechanisms (scale out, failure recovery) on top of primitives  dynamic reconfiguration of the dataflow graph A B C
  • 9. State Management Primitives 9 takes snapshot of state and makes it externally available  restore  backup A A B B  checkpoint  partition moves copy of state from one operator to another splits state in a semantically correct fashion for parallel processing
  • 10. State Management Scale Out, Stateful Ops 10 A A periodically, stateful operators checkpoint and back up state to designated upstream backup node, in memory A A backup node already has state of operator to be parallelised A’ A A’ A A’  checkpoint  backup  partition  restore upstream ops send unprocessed tuples to update checkpointed state B  How do we partition stateful operators?
  • 11. Partitioning Stateful Operators • 1. Processing state modeled as (key, value) dictionary • 2. State partitioned according to key k of tuples • 3. Tuples will be routed to correct operator as of k 11 t=1, key=c, “computer” t=3, key=c, “cambridge” t=3, (c, computer:1, cambridge:1) t=1, “computer” t=2, “laboratory” t=3, “cambridge” splitter counter t=2, key=l, “laboratory” (a  k), A (l  z), A’ t=2, (l, laboratory:1) counter A A’ routing state buffer state processing state
  • 12. Passive Fault-Tolerance Model • recreate operator state by replaying tuples after failure: – upstream backup: sends acks upstream for tuples processed downstream • may result in long recovery times due to large buffers: – system is reprocessing streams after failure  inefficient 12 ACKs data A B C D
  • 13. Recovering using State Management (R+SM) 13 A A A • Benefit from state management primitives: – use periodically backed up state on upstream node to recover faster – trim buffers at backup node – same primitives as in scale out A A state is restored and unprocessed tuples are replayed from buffer  same primitives for parallel recovery A A’
  • 14. State Management in Action: SEEP 14 (1) (2) (1) dynamic Scale Out: detect bottleneck , add new parallelised operator (2) failure Recovery: detect failure, replace with new operator EC2 stats fault detector scale out coordinator deployment manager query manager queries bottleneck detector scaling policy VM pool faults recovery coordinator
  • 15. Dynamic Scale Out: Detecting bottlenecks CPU utilisation report 35% 85% 30% logical infrastructure view 35% 85% 30% bottleneck detector 15
  • 16. The VM Pool: Adding operators • problem: allocating new VMs takes minutes... 16 bottleneck detector monitoring information Cloud provider VM1 VM2 virtual machine pool provision VM from cloud (order of mins) add new VM to pool fault detector VM2 VM3 (dynamic pool size)
  • 17. Experimental Evaluation • Goals: – investigate effectiveness of scale out mechanism – recovery time after failure using R+SM – overhead of state management • Scalable and Elastic Event Processing (SEEP): – implemented in Java; Storm-like data flow model • Sample queries + workload – Linear Road Benchmark (LRB) to evaluate scale out [VLDB’04] • provides an increasing stream workload over time • query with 8 operators, 3 are stateful; SLA: results < 5 secs – Windowed word count query (2 ops) to evaluate fault tolerance • induce failure to observe performance impact • Deployment on Amazon AWS EC2 – sources and sinks on high-memory double extra large instances – operators on small instances 17
  • 18. Scale Out: LRB Workload 18 scales to load factor L=350 with 50 VMs on Amazon EC2 (automated query parallelisation, scale out policy at 70%) L=512 highest result [VLDB’12] (hand-crafted query on cluster) scale out leads to latency peaks, but remains within LRB SLA  SEEP scales out to increasing workload in the Linear Road Benchmark
  • 19. Conclusions 19 • Stream processing will grow in importance: – handling the data deluge – enables real-time response and decision making • Integrated approach for scale out and failure recovery: – operator state an independent entity – primitives and mechanisms • Efficient approach extensible for additional operators: – effectively applied to Amazon EC2 running LRB – parallel recovery