SlideShare una empresa de Scribd logo
1 de 28
ScaleJoin: a Deterministic,
Disjoint-Parallel and Skew-Resilient
Stream Join
Vincenzo Gulisano, Yiannis Nikolakopoulos,
Marina Papatriantafilou, Philippas Tsigas
2015-10-31 1
Chalmers University
of technology
Agenda
• What is a stream join?
• Which are the challenges of a parallel stream join?
• Why ScaleJoin?
• How well does ScaleJoin addresses stream joins’
challenges?
• Conclusions
2015-10-31 2
Agenda
• What is a stream join?
• Which are the challenges of a parallel stream join?
• Why ScaleJoin?
• How well does ScaleJoin addresses stream joins’
challenges?
• Conclusions
2015-10-31 3
Motivation
Applications in sensor networks, cyber-physical
systems:
• large and fluctuating volumes of data generated
continuously
demand for:
• Continuous processing of data streams
• In a real-time fashion
Store-then-process is not feasible!!!
2015-10-31 4
What is a stream join?
2015-10-31 5
Data stream:
unbounded sequence of tuples
t1
t2
t3
t4
t1
t2
t3
t4
t1
t2
t3
t4
R S
Sliding
window Window
size WS
WSWR
Predicate P
Why parallel stream joins?
• WS = 600 seconds
• R receives 500 tuples/second
• S receives 500 tuples/second
• WR will contain 300,000 tuples
• WS will contain 300,000 tuples
• Each new tuple from R gets compared with
all the tuples in WS
• Each new tuple from S gets compared with
all the tuples in WR
… 300,000,000 comparisons/second!
t1
t2
t3
t4
t1
t2
t3
t4
R S
WSWR
2015-10-31 6
Agenda
• What is a stream join?
• Which are the challenges of a parallel stream join?
• Why ScaleJoin?
• How well does ScaleJoin addresses stream joins’
challenges?
• Conclusions
2015-10-31 7
Which are the challenges of a parallel stream join?
Scalability
High
throughput
Low latency
Disjoint
parallelism
Skew
resilience
Determinism
2015-10-31 8
Agenda
• What is a stream join?
• Which are the challenges of a parallel stream join?
• Why ScaleJoin?
• How well does ScaleJoin addresses stream joins’
challenges?
• Conclusions
2015-10-31 9
The 3-step procedure (sequential stream join)
For each incoming tuple t:
1. compare t with all tuples in opposite window given predicate P
2. add t to its window
3. remove stale tuples from t’s window
Add tuples to S
Add tuples to R
Prod
R
Prod
S
Consume resultsConsPU
2015-10-31 10
We assume each
producer delivers tuples
in timestamp order
The 3-step procedure, is it enough?
Scalability
High
throughput
Low latency
Disjoint
parallelism
Skew
resilience
Determinism
2015-10-31 11
t1
t2
t1
t2
R S
WSWR
t3
t1
t2
t1
t2
R S
WSWR
t4
t3
Enforcing determinism in sequential stream joins
• Next tuple to process = earliest(tS,tR)
• The earliest(tS,tR) tuple is referred to as the next ready tuple
• Process ready tuples in timestamp order  Determinism
PU
tS tR
2015-10-31 12
Deterministic 3-step procedure
Pick the next ready tuple t:
1. compare t with all tuples in opposite window given predicate P
2. add t to its window
3. remove stale tuples from t’s window
Add tuples to S
Add tuples to R
Prod
R
Prod
S
Consume resultsConsPU
2015-10-31 13
Shared-nothing parallel stream join
(state-of-the-art)
Prod
R
Prod
S
PU1
PU2
PUN
… Cons
Add tuple to PUi S
Add tuple to PUi R
Consume results
Pick the next ready tuple t:
1. compare t with all tuples in opposite window given P
2. add t to its window
3. remove stale tuples from t’s window
Chose a PU
Chose a PU
Take the next
ready output tuple
Scalability
High
throughput
Low latency
Disjoint
parallelism
Skew
resilience
Determinism
2015-10-31 14
Merge
Shared-nothing parallel stream join
(state-of-the-art)
Prod
R
Prod
S
PU1
PU2
PUN
…
2015-10-31 15
enqueue()
dequeue()
ConsMerge
From coarse-grained to fine-grained synchronization
Prod
R
Prod
S
PU1
PU2
PUN
…
Cons
2015-10-31 16
ScaleGate
2015-10-31 17
addTuple(tuple,sourceID)
allows a tuple from sourceID to be merged by ScaleGate in the
resulting timestamp-sorted stream of ready tuples.
getNextReadyTuple(readerID)
provides to readerID the next earliest ready tuple that has not been
yet consumed by the former.
https://github.com/dcs-chalmers/ScaleGate_Java
ScaleJoin
Prod
R
Prod
S
PU1
PU2
PUN
…
Cons
Add tuple SGin
Add tuple SGin
Get next ready
output tuple
from SGout
Get next ready input tuple from SGin
1. compare t with all tuples in opposite window given P
2. add t to its window in a round-robin fashion
3. remove stale tuples from t’s window
2015-10-31 18
SGin SGout
Steps for PU
2015-10-31 19
t1
t2
R S
WR
t3
t4
R S
t4
t1
WR
R S
t4
t2
WR
R S
t4
WR
t3
Sequential stream join:
ScaleJoin with 3 PUs:
ScaleJoin (example)
ScaleJoin
Prod
R
Prod
S
PU1
PU2
PUN
… Cons
Add tuple SGin
Add tuple SGin
Get next ready
output tuple
from SGout
2015-10-31 20
SGin SGout
Scalability
High
throughput
Low latency
Disjoint
parallelism
Skew
resilience
Determinism
Prod
S
Prod
S
Prod
R Get next ready input tuple from SGin
1. compare t with all tuples in opposite window given P
2. add t to its window in a round robin fashion
3. remove stale tuples from t’s window
Steps for PUi
Agenda
• What is a stream join?
• Which are the challenges of a parallel stream join?
• Why ScaleJoin?
• How well does ScaleJoin addresses stream joins’
challenges?
• Conclusions
2015-10-31 21
Evaluation setup
• Common benchmark
• Implemented in Java
• Evaluation platform
– NUMA architecture: 2.6 GHz AMD Opteron 6230 (48 cores over 4
sockets), 64 GB of memory
– Architecture with Hyper Threading: 2.0 GHz Intel Xeon E5-2650 (16
cores over 2 sockets), 64 GB of memory
2015-10-31 22
t1
t2
t3
t4
t1
t2
t3
t4
R S
R: <timestamp,x,y,z> S: <timestamp,a,b,c,d>
P: a−10≤x≤a+10 AND b−10≤y≤b+10
ScaleJoin Scalability – comparisons/second
2015-10-31 23
Number of PUs
ScaleJoin latency – milliseconds
2015-10-31 24
Number of PUs
ScaleJoin skew-resilience
Constant distinct rates with peaks
2015-10-31 25
Agenda
• What is a stream join?
• Which are the challenges of a parallel stream join?
• Why ScaleJoin?
• How well does ScaleJoin addresses stream joins’
challenges?
• Conclusions
2015-10-31 26
Conclusions
• ScaleJoin: a Deterministic, Disjoint-Parallel and
Skew-Resilient Stream Join
• Challenges of parallel
stream joins
• Fine-grained synchronization (ScaleGate)
• 4 billion comparisons/second, with latency lower
than 60 milliseconds
Scalability
High
throughput
Low latency
Disjoint
parallelism
Skew
resilience
Determinism
2015-10-31 27
ScaleJoin: a Deterministic,
Disjoint-Parallel and Skew-Resilient
Stream Join
Vincenzo Gulisano, Yiannis Nikolakopoulos,
Marina Papatriantafilou, Philippas Tsigas
Thank you! Questions?
2015-10-31 28

Más contenido relacionado

Similar a ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join

Sequntial logic design
Sequntial logic designSequntial logic design
Sequntial logic designPavan Mukku
 
PosterFormatRNYF(1)
PosterFormatRNYF(1)PosterFormatRNYF(1)
PosterFormatRNYF(1)Usman Khalid
 
Mba om 14_statistical_qualitycontrolmethods
Mba om 14_statistical_qualitycontrolmethodsMba om 14_statistical_qualitycontrolmethods
Mba om 14_statistical_qualitycontrolmethodsNiranjana K.R.
 
ch08-Modeling & Simulation.ppt
ch08-Modeling & Simulation.pptch08-Modeling & Simulation.ppt
ch08-Modeling & Simulation.pptLuckySaigon1
 
IRJET- Metastability Mitigation & Error Masking of High Speed Flip-Flop
IRJET- Metastability Mitigation & Error Masking of High Speed Flip-FlopIRJET- Metastability Mitigation & Error Masking of High Speed Flip-Flop
IRJET- Metastability Mitigation & Error Masking of High Speed Flip-FlopIRJET Journal
 
Planning for Smart Operational Solutions
Planning for Smart Operational SolutionsPlanning for Smart Operational Solutions
Planning for Smart Operational SolutionsSmarter Grid Solutions
 
Dynamics of project-driven systems A production model for repetitive processe...
Dynamics of project-driven systems A production model for repetitive processe...Dynamics of project-driven systems A production model for repetitive processe...
Dynamics of project-driven systems A production model for repetitive processe...Ricardo Magno Antunes
 
PERFORMANCE ANALYSIS OF D-FLIP FLOP USING CMOS, GDI, DSTC TECHNIQUES
PERFORMANCE ANALYSIS OF D-FLIP FLOP USING CMOS, GDI, DSTC TECHNIQUESPERFORMANCE ANALYSIS OF D-FLIP FLOP USING CMOS, GDI, DSTC TECHNIQUES
PERFORMANCE ANALYSIS OF D-FLIP FLOP USING CMOS, GDI, DSTC TECHNIQUESIRJET Journal
 
Process Mining Reloaded: Event Structures as a Unified Representation of Proc...
Process Mining Reloaded: Event Structures as a Unified Representation of Proc...Process Mining Reloaded: Event Structures as a Unified Representation of Proc...
Process Mining Reloaded: Event Structures as a Unified Representation of Proc...Marlon Dumas
 
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Multi - vari Studies
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Multi - vari StudiesJavier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Multi - vari Studies
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Multi - vari StudiesJ. García - Verdugo
 
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Flink Forward
 
Self-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingSelf-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingVasia Kalavri
 
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)Bibhuti Prasad Nanda
 
DOE Full factorial
DOE Full factorialDOE Full factorial
DOE Full factorialMonty Webb
 
A multi phase decision on reliability growth with latent failure modes
A multi phase decision on reliability growth with latent failure modesA multi phase decision on reliability growth with latent failure modes
A multi phase decision on reliability growth with latent failure modesASQ Reliability Division
 
Queuing theory and traffic analysis in depth
Queuing theory and traffic analysis in depthQueuing theory and traffic analysis in depth
Queuing theory and traffic analysis in depthIdcIdk1
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkFlink Forward
 

Similar a ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join (20)

Sequntial logic design
Sequntial logic designSequntial logic design
Sequntial logic design
 
Lic presentation
Lic presentationLic presentation
Lic presentation
 
PosterFormatRNYF(1)
PosterFormatRNYF(1)PosterFormatRNYF(1)
PosterFormatRNYF(1)
 
Practica1 digi2
Practica1 digi2Practica1 digi2
Practica1 digi2
 
Mba om 14_statistical_qualitycontrolmethods
Mba om 14_statistical_qualitycontrolmethodsMba om 14_statistical_qualitycontrolmethods
Mba om 14_statistical_qualitycontrolmethods
 
ch08-Modeling & Simulation.ppt
ch08-Modeling & Simulation.pptch08-Modeling & Simulation.ppt
ch08-Modeling & Simulation.ppt
 
IRJET- Metastability Mitigation & Error Masking of High Speed Flip-Flop
IRJET- Metastability Mitigation & Error Masking of High Speed Flip-FlopIRJET- Metastability Mitigation & Error Masking of High Speed Flip-Flop
IRJET- Metastability Mitigation & Error Masking of High Speed Flip-Flop
 
Planning for Smart Operational Solutions
Planning for Smart Operational SolutionsPlanning for Smart Operational Solutions
Planning for Smart Operational Solutions
 
Dynamics of project-driven systems A production model for repetitive processe...
Dynamics of project-driven systems A production model for repetitive processe...Dynamics of project-driven systems A production model for repetitive processe...
Dynamics of project-driven systems A production model for repetitive processe...
 
PERFORMANCE ANALYSIS OF D-FLIP FLOP USING CMOS, GDI, DSTC TECHNIQUES
PERFORMANCE ANALYSIS OF D-FLIP FLOP USING CMOS, GDI, DSTC TECHNIQUESPERFORMANCE ANALYSIS OF D-FLIP FLOP USING CMOS, GDI, DSTC TECHNIQUES
PERFORMANCE ANALYSIS OF D-FLIP FLOP USING CMOS, GDI, DSTC TECHNIQUES
 
Process Mining Reloaded: Event Structures as a Unified Representation of Proc...
Process Mining Reloaded: Event Structures as a Unified Representation of Proc...Process Mining Reloaded: Event Structures as a Unified Representation of Proc...
Process Mining Reloaded: Event Structures as a Unified Representation of Proc...
 
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Multi - vari Studies
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Multi - vari StudiesJavier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Multi - vari Studies
Javier Garcia - Verdugo Sanchez - Six Sigma Training - W2 Multi - vari Studies
 
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
Self Managed and Automatically Reconfigurable Stream Processing - Vasiliki Ka...
 
Self-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processingSelf-managed and automatically reconfigurable stream processing
Self-managed and automatically reconfigurable stream processing
 
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
Six sigma-in-measurement-systems-evaluating-the-hidden-factory (2)
 
DOE Full factorial
DOE Full factorialDOE Full factorial
DOE Full factorial
 
To find raise to five of any number
To find raise to five of any numberTo find raise to five of any number
To find raise to five of any number
 
A multi phase decision on reliability growth with latent failure modes
A multi phase decision on reliability growth with latent failure modesA multi phase decision on reliability growth with latent failure modes
A multi phase decision on reliability growth with latent failure modes
 
Queuing theory and traffic analysis in depth
Queuing theory and traffic analysis in depthQueuing theory and traffic analysis in depth
Queuing theory and traffic analysis in depth
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
 

Último

Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicinesherlingomez2
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxBhagirath Gogikar
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONrouseeyyy
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Mohammad Khajehpour
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 

Último (20)

Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Introduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptxIntroduction,importance and scope of horticulture.pptx
Introduction,importance and scope of horticulture.pptx
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATIONSTS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
STS-UNIT 4 CLIMATE CHANGE POWERPOINT PRESENTATION
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
Dopamine neurotransmitter determination using graphite sheet- graphene nano-s...
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 

ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join

  • 1. ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join Vincenzo Gulisano, Yiannis Nikolakopoulos, Marina Papatriantafilou, Philippas Tsigas 2015-10-31 1 Chalmers University of technology
  • 2. Agenda • What is a stream join? • Which are the challenges of a parallel stream join? • Why ScaleJoin? • How well does ScaleJoin addresses stream joins’ challenges? • Conclusions 2015-10-31 2
  • 3. Agenda • What is a stream join? • Which are the challenges of a parallel stream join? • Why ScaleJoin? • How well does ScaleJoin addresses stream joins’ challenges? • Conclusions 2015-10-31 3
  • 4. Motivation Applications in sensor networks, cyber-physical systems: • large and fluctuating volumes of data generated continuously demand for: • Continuous processing of data streams • In a real-time fashion Store-then-process is not feasible!!! 2015-10-31 4
  • 5. What is a stream join? 2015-10-31 5 Data stream: unbounded sequence of tuples t1 t2 t3 t4 t1 t2 t3 t4 t1 t2 t3 t4 R S Sliding window Window size WS WSWR Predicate P
  • 6. Why parallel stream joins? • WS = 600 seconds • R receives 500 tuples/second • S receives 500 tuples/second • WR will contain 300,000 tuples • WS will contain 300,000 tuples • Each new tuple from R gets compared with all the tuples in WS • Each new tuple from S gets compared with all the tuples in WR … 300,000,000 comparisons/second! t1 t2 t3 t4 t1 t2 t3 t4 R S WSWR 2015-10-31 6
  • 7. Agenda • What is a stream join? • Which are the challenges of a parallel stream join? • Why ScaleJoin? • How well does ScaleJoin addresses stream joins’ challenges? • Conclusions 2015-10-31 7
  • 8. Which are the challenges of a parallel stream join? Scalability High throughput Low latency Disjoint parallelism Skew resilience Determinism 2015-10-31 8
  • 9. Agenda • What is a stream join? • Which are the challenges of a parallel stream join? • Why ScaleJoin? • How well does ScaleJoin addresses stream joins’ challenges? • Conclusions 2015-10-31 9
  • 10. The 3-step procedure (sequential stream join) For each incoming tuple t: 1. compare t with all tuples in opposite window given predicate P 2. add t to its window 3. remove stale tuples from t’s window Add tuples to S Add tuples to R Prod R Prod S Consume resultsConsPU 2015-10-31 10 We assume each producer delivers tuples in timestamp order
  • 11. The 3-step procedure, is it enough? Scalability High throughput Low latency Disjoint parallelism Skew resilience Determinism 2015-10-31 11 t1 t2 t1 t2 R S WSWR t3 t1 t2 t1 t2 R S WSWR t4 t3
  • 12. Enforcing determinism in sequential stream joins • Next tuple to process = earliest(tS,tR) • The earliest(tS,tR) tuple is referred to as the next ready tuple • Process ready tuples in timestamp order  Determinism PU tS tR 2015-10-31 12
  • 13. Deterministic 3-step procedure Pick the next ready tuple t: 1. compare t with all tuples in opposite window given predicate P 2. add t to its window 3. remove stale tuples from t’s window Add tuples to S Add tuples to R Prod R Prod S Consume resultsConsPU 2015-10-31 13
  • 14. Shared-nothing parallel stream join (state-of-the-art) Prod R Prod S PU1 PU2 PUN … Cons Add tuple to PUi S Add tuple to PUi R Consume results Pick the next ready tuple t: 1. compare t with all tuples in opposite window given P 2. add t to its window 3. remove stale tuples from t’s window Chose a PU Chose a PU Take the next ready output tuple Scalability High throughput Low latency Disjoint parallelism Skew resilience Determinism 2015-10-31 14 Merge
  • 15. Shared-nothing parallel stream join (state-of-the-art) Prod R Prod S PU1 PU2 PUN … 2015-10-31 15 enqueue() dequeue() ConsMerge
  • 16. From coarse-grained to fine-grained synchronization Prod R Prod S PU1 PU2 PUN … Cons 2015-10-31 16
  • 17. ScaleGate 2015-10-31 17 addTuple(tuple,sourceID) allows a tuple from sourceID to be merged by ScaleGate in the resulting timestamp-sorted stream of ready tuples. getNextReadyTuple(readerID) provides to readerID the next earliest ready tuple that has not been yet consumed by the former. https://github.com/dcs-chalmers/ScaleGate_Java
  • 18. ScaleJoin Prod R Prod S PU1 PU2 PUN … Cons Add tuple SGin Add tuple SGin Get next ready output tuple from SGout Get next ready input tuple from SGin 1. compare t with all tuples in opposite window given P 2. add t to its window in a round-robin fashion 3. remove stale tuples from t’s window 2015-10-31 18 SGin SGout Steps for PU
  • 19. 2015-10-31 19 t1 t2 R S WR t3 t4 R S t4 t1 WR R S t4 t2 WR R S t4 WR t3 Sequential stream join: ScaleJoin with 3 PUs: ScaleJoin (example)
  • 20. ScaleJoin Prod R Prod S PU1 PU2 PUN … Cons Add tuple SGin Add tuple SGin Get next ready output tuple from SGout 2015-10-31 20 SGin SGout Scalability High throughput Low latency Disjoint parallelism Skew resilience Determinism Prod S Prod S Prod R Get next ready input tuple from SGin 1. compare t with all tuples in opposite window given P 2. add t to its window in a round robin fashion 3. remove stale tuples from t’s window Steps for PUi
  • 21. Agenda • What is a stream join? • Which are the challenges of a parallel stream join? • Why ScaleJoin? • How well does ScaleJoin addresses stream joins’ challenges? • Conclusions 2015-10-31 21
  • 22. Evaluation setup • Common benchmark • Implemented in Java • Evaluation platform – NUMA architecture: 2.6 GHz AMD Opteron 6230 (48 cores over 4 sockets), 64 GB of memory – Architecture with Hyper Threading: 2.0 GHz Intel Xeon E5-2650 (16 cores over 2 sockets), 64 GB of memory 2015-10-31 22 t1 t2 t3 t4 t1 t2 t3 t4 R S R: <timestamp,x,y,z> S: <timestamp,a,b,c,d> P: a−10≤x≤a+10 AND b−10≤y≤b+10
  • 23. ScaleJoin Scalability – comparisons/second 2015-10-31 23 Number of PUs
  • 24. ScaleJoin latency – milliseconds 2015-10-31 24 Number of PUs
  • 25. ScaleJoin skew-resilience Constant distinct rates with peaks 2015-10-31 25
  • 26. Agenda • What is a stream join? • Which are the challenges of a parallel stream join? • Why ScaleJoin? • How well does ScaleJoin addresses stream joins’ challenges? • Conclusions 2015-10-31 26
  • 27. Conclusions • ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join • Challenges of parallel stream joins • Fine-grained synchronization (ScaleGate) • 4 billion comparisons/second, with latency lower than 60 milliseconds Scalability High throughput Low latency Disjoint parallelism Skew resilience Determinism 2015-10-31 27
  • 28. ScaleJoin: a Deterministic, Disjoint-Parallel and Skew-Resilient Stream Join Vincenzo Gulisano, Yiannis Nikolakopoulos, Marina Papatriantafilou, Philippas Tsigas Thank you! Questions? 2015-10-31 28

Notas del editor

  1. Welcome…
  2. Agenda for this talk
  3. Since this is about stream joins, it is legitimate to start asking ourselves what is data streaming?
  4. This is the IEEE Big Data conference, so I get you heard before about Big Data What we know is the we have applications … and store-then-process is not always an option Traditional way DB … replaced by data streaming, main memory first the query than the data continuously
  5. So, why are we in need of scalable parallelization approaches for stream joins? Present example
  6. OK then, why Scalejoin? Let’s look in detail at what is the state of the art and what we did
  7. So, let’s see how stream joins are actually implemented Example But wait, what happens if we get tuples in another order? Mmmmhhh
  8. So, let’s see how stream joins are actually implemented Example But wait, what happens if we get tuples in another order? Mmmmhhh
  9. Ok, so deterministic 3-step procedure looks like this Now let’s try to parallelize this, and let’s do it as it has been done before
  10. Ok, so deterministic 3-step procedure looks like this Now let’s try to parallelize this, and let’s do it as it has been done before First, we need to do more operations, this affects the latency for sure But what’s worst is that we introduce a new bottleneck, the output thread and ready tuples And this actually breaks disjoint parallelism too… Finally, is not really skew-resilient So, what’s the problem? Are we doing it in the wrong way or are we forgetting something? Look at the data structures, we parallelize by parallelizing the computation, but what about the communication?
  11. The queues! We parallelized the computation, but overlooked the communication We are still using a queue with its methods enqueue and dequeue
  12. Let’s be creative, let’s assume they actually share something more powerful, that let’s them communicate and synchronize in a more efficient way What do we want from such communication and synchronization ds?
  13. Then we can do something like that…
  14. Here we can basically discuss why this addresses the different challenges, one by one… It gets even better, you can even have multiple physical producers for S and R!!!! And this is actually important because in the real world it will be like that!
  15. OK, so this is the benchmark we used… Implemented in Java And we evaluated it with 2 different systems (SAY WHY TWO SYSTEMS IF THEY ASK OR JUST SAY IT?)…
  16. Here we want to check the number of comparisons per second sustained by ScaleJoin After checking the ones obtained for a single thread, we computed the expected max and then observed ones for 3 different window sizes As you can see… Up to 4 billion comparison/second!
  17. This is the processing latency we get (in milliseconds) As you can see, even when we have 48 PUs (and notice that this means more threads than cores, since we have also injectors and receivers…) less than 60 – Actually, when we do not spawn too many threads we are talking of 30 milliseconds Might seem counterintuitive that latency grows with PU, but that’s because of determinism!
  18. In this case we have two different rates for R and S (notice actually the multiple physical streams!) and then peaks over time As you can see, comparisons of course increase when we have a peak, but nevertheless the overall work is very well balanced, the standard deviation among the Pus is less than 0.2% even during the spikes!!! Skew-resilience!