SlideShare una empresa de Scribd logo
1 de 57
Berkley’sTelegraphCQ Details about TelegraphCQ Adaptivity: Eddies, SteMs and STAIR Routing Policy: Lottery Scheduling 1 Friday, July 15, 2011 Alberto Minetti Advanced Data Management @ DISI Università degli Studi di Genova
Data Stream Management System Continuous, unbounded, rapid, time-varying streams of data elements Occur in a variety of modern applications Network monitoring and traffic engineering Sensor networks, RFID tags Telecom call records Financial applications Web logs and click-streams Manufacturing processes DSMS = Data Stream Management System 2
The Begin:Telegraph Several continuous queries  Several data streams At the beginning in Java Then C-based using PostgreSQL No Distribuite Scheduling Level of adaptivity doesn’t change against overloading Ignore system resources Data managemente fully in memory 3
ReDesign:TelegraphCQ Developed byBerkeley University Written in C/C++ OpenBSD License Based on PostgreSQL sources Current Version: 0.6 (PostgreSQL 7.3.2) Closed Project in 2006 Several points of interest and features Software http://telegraph.cs.berkeley.edu Papers http://db.cs.berkeley.edu/telegraph Commercial spinoff Truviso 4
TelegraphCQ Architecture PostgreSQL backends Many TelegraphCQ front-end Only one TelegraphCQ back-end Front-End Fork for every client connection Doesn’t hold streams Parsing continuous query in the shared memory Back-End has an Eddy Joins query plans together Can be shared among queries Put results in the shared memory 5
TelegraphCQ Architecture Shared Memory Query Plan Queue TelegraphCQBack End TelegraphCQBack End TelegraphCQ Front End Eddy Control Queue Planner Parser Listener Modules Modules Query Result Queues Split Mini-Executor Proxy CQEddy CQEddy } Split Split Catalog Scans Scans Shared Memory Buffer Pool Wrappers TelegraphCQ  Wrapper   ClearingHouse Disk 6 Taken from Michael Franklin, UC Berkeley
Tipologie di Moduli Input and Caching (Relations and Streams) Interface to external datasource Wrapper for HTML, XML, FileSystem, proxy P2P Remote Databases with caching support Query Execution Non-blocking of classical operators (sel, proj) SteMs, STAIRs Adaptive Routing Reoptimize plan during execution Eddies: choose routing tupla per tupla Juggle: ordering on the fly (per value or timestamp), Flux: routing among computer of a cluster 7 Fjords Framework
Fjords Framework inJava for Operators on Remote Data Streams Interconnect modules Support queues among modules Non-blocking Support for Relazions and Streams 8
Stream in TelegraphCQ Unarchived Stream Never written on disk In shared memory between executor and wrapper Archived Stream Append-only method to send tuples to system No update, insert, delete; query aggregate with window tcqtimeof type TIMESTAMP for window queries With constraint TIMESTAMPCOLUMN 9
DDL: Create Stream 10  CREATE STREAM measurements (      tcqtime   TIMESTAMP TIMESTAMPCOLUMN,       stationid INTEGER,       speed     REAL)      TYPE ARCHIVED;  CREATE STREAM tinydb (      tcqtime     TIMESTAMP TIMESTAMPCOLUMN,      light       REAL,      temperature REAL)     TYPE UNARCHIVED; DROP STREAM measurements;
Acquisizione di Dati Sources must identify before sending datas Wrapper: user-defined functions How process datas Inside Wrapper Clearinghouse process Push sources Begin a connection to TelegraphCQ Pull sources Thewrapper begin the connection Different Wrapper Data can merge in the same stream Heartbeat: Punctuated tuple without datas, only timestamp When see a Punctuatedtuple no prior datas will come 11
Wrappers nel WCH Non-process-blocking over network socket (TCP/IP) Wrapper funct called when there are datas on socket Or when there are datas on buffer If funct return a tuple for time (classic iterator) Ritorns array ofPostgreSQL Datum Init(WrapperState*) allocate resources and state Next(WrapperState*) tuples are in the WrapperState Done(WrapperState*) free resources and destroy state All in the infrastructured memory of PostgreSQL 12
DDL: Create Wrapper 13  CREATE FUNCTION measurements_init(INTEGER)    RETURNS BOOLEAN   AS ‘libmeasurements.so,measurements_init’    LANGUAGE ‘C’;  CREATE WRAPPER mywrapper ( init=measurements_init, next=measurements_next, done=measurements_done);
HtmlGet and WebQueryServer HtmlGet allows the user to execute a series of Html GET or POST requests to arrive at a page, and then to extract data out of the page using a TElegraph Screen Scrapperdefinition file. Once extracted, this data can be output to a CSV file. Welcome to the TESS homepage. TESS is the TElegraph Screen Scrapper. It is part of The Telegraph Project at UC Berkeley. TESS is a program that takes data from web forms (like search engines or database queries) and turns it into a representation that is usable by a database query processor. http://telegraph.cs.berkeley.edu/tess/ 14
self-monitoring capability Three special streams: info about system state Support tointrospettive queries Dynamic catalog Queried as an any stream tcq_queries(tcqtime, qrynum, qid, kind, qrystr) tcq_operators(tcqtime, opnum, opid, numqrs, kind, 				qid, opstr, opkind, opdesc) tcq_queues(tcqtime, opid, qkind, kind) 15
Example 16 Welcome to psql 0.2, the PostgreSQL interactive terminal. # CREATE SCHEMA traffic; # CREATE STREAM traffic.measurements (stationid INT,  speed REAL, tcqtime TIMESTAMP TIMESTAMPCOLUMN ) TYPE ARCHIVED; # ALTER STREAM traffic.measurements ADD WRAPPER csvwrapper; $ cat tcpdump.log | source.pl localhost 5533 csvwrapper,network.tcpdum Default TelegraphCQ script written in Perl to simulate sources that send CSV datas Default Port
Load Shedding  CREATE STREAM … TYPE UNARCHIVED ON OVERLOAD ???? BLOCK: stop it(default) DROP: drop tuples KEEP COUNTS: keep the count of dropped tuples REGHIST: build a fixed-grid histog. of shedded tuples MYHIST: build a MHIST (multidimensional histog.) WAVELET wavelet params Build a wavelet histogram SAMPLE: keep a Reservoir Sample 17
LoadShedding:SummaryStream For a stream named schema.stream Automatically created two streams schema.__stream_dropped schema.__stream_kept For WAVELET, MYHIST, REHIST, COUNTS Schema contains: Summary datas Summary interval For SAMPLE Same schema with column __samplemult Keep the number of effective tuples rappresented 18
Quering TelegraphCQ:StreaQuel 19 Continuous Query for: Standard Relations inherit from PostgreSQL Data stream windowed (sliding, hopping, jumping) RANGE BY specify the window dimension SLIDE BY specify the update rate START AT specify whenthe query will begin optional SELECT stream.color, COUNT(*) FROM stream [RANGE BY ‘9’ SLIDE BY ‘1’] GROUP BY stream.color window START OUPUT! 1 1 1 1 1 2 2 1 2 1 2 2 1 2 1 2 1 2 1 Adapted from Jarle Søberg
Quering TelegraphCQ:StreaQuel (2) wtime(*) returns the last timestamp in the window Recoursive query using WITH [SQL 1999] StreaQuel doesn’t allow subqueries 20 SELECT  S.a, wtime(*) FROM    S [RANGE BY ’10 seconds’ SLIDE BY ’10 second’],    R [RANGE BY ’10 seconds’ SLIDE BY ’10 second’] WHERE  S.a = R.a; Data Stream Window … 10 seconds 10 seconds
Net Monitor Windowed Query 21 SELECT	(CASE when outgoing = true 	then src_ip else dst_ip end) as inside_ip , 	(CASE when outgoing = true 	then dst_ip else src_ip end) as outside_ip, sum(bytes_sent) + sum(bytes_recv) as bytes FROM flow [RANGE BY $w SLIDE BY $w] GROUP BY inside_ip, outside_ip All active connections SELECT src_ip, wtime(*), COUNT(DISTINCT(dst_ip||dst_port)) AS fanout, FROM flow [RANGE BY $w SLIDE BY $w] WHERE outgoing = false GROUP BY src_ip ORDER BY fanout DESC LIMIT 100 PER WINDOW; 100 sources with the max number of connections SELECT sum(bytes_sent), src_ip, wtime(*) AS now FROM flow [RANGE BY $w SLIDE BY $w] WHERE outgoing = false GROUP BY src_ip ORDER BY sum(bytes_sent) DESC LIMIT 100 PER WINDOW; 100 most significant sources of traffic Taken from: İlkay Ozan Kay
Evolutionary Revolutionary Adaptive Query Processing:Evolution Static               Late                  Inter                    Intra                       Per Plans             Binding             Operator              Operator                Tuple Traditional      Dynamic QEP    Query Scrambling         Xjoin, DPHJ,           Eddies DBMS              Parametric         Mid-query Reopt.       Convergent QP                                    Competitive        Progressive Opt. Taken from: AmolDeshpande, Vijayshankar Raman 22
Adaptive Query Processing:Current Background Several plans Parametric Queries Continuous Queries Focus on incremental output Complex Queries (10-20 relazions in join) Data Stream and asyncronous datas Statistics not available Interactive Queries and user preferences Sharing of the state Several operators XML data and text Wide area federations 23
Adaptive Query Processing:System R Repeat Observate environment: daily/weekly (runstats) Choose behaviour (optimizer) If current plan is not the best plan (analyzer) Actuate the new plan (executor) Cost-based optimization Runstats-optimize-execute -> high coarseness Weekly Adaptivity! Goal: adaptivity per tuple Merge the 4 works Measure Actuate Analyze Plan Taken from: Avnur, Hellerstein 24
TelegraphCQ Executor:Eddy Idea taken from fluid mechanic continuously adaptive query processing mechanism Eddy is a routing operator Delineate which modules must visit before After a tuple visit all modules, it can output See tuples before and after each module (operator) 25 Measure Eddy Analyze Actuate Plan Taken from: Amol Deshpande, Vijayshankar Raman
Eddies:Correctness Every tuple has two BitVector Every position correspond to an operator Ready: identify if tuple is ready for that operator Eddy can delineate which tuples send to an operator Done: identify if tuple was processed by that operator Eddy avoids sending two times to same operator When all Done bits are setted -> output joined tuple have Ready and Donein bitwise-OR Simple selections have Ready=complement(Done) 26
Eddies:Simple Selection Routing 27 S SELECT	*FROM	SWHERE	S.a > 10  AND	S.b < 15   AND	S.c = 15 σa σb S.b < 15  S.a > 10 Eddy σc S.c = 15  σaσbσc σaσbσc 15 ; 0 ; 15 S1 0 0 0 1 1 1 1 0 0 01 1 1 1 0 0 0 1 1 1 1 0 0 0 a 15 b 0 Ready Done c 15 Adapted from Manuel Hertlein
S >< T Relation Binary Join:R >< S >< T Join order (R >< S) >< T Alright with direct access to datas (index or seq.) 28 R >< S T R S Taken from Jarle Søberg
Stream Binary JoinR >< S >< T 29 Join order (R >< S) >< T But if data are sent by sources... S >< T R >< S Block or drop some tuples is inevitable! Taken from Jarle Søberg
On the fly optimization necessary S >< T 30 Stream Binary JoinR >< S >< T Often stream changes Reoptimize require lot of time Non dynamic enough! R >< S Taken from Jarle Søberg
Stream Binary Join:Eddies Using an Eddy Initial behaviour of  Telegraph 31 S >< T eddy R >< S tuple-based adaptivity Consider dynamic changes in the stram Taken from Jarle Søberg
Eddies:Sheduling Join problem Sheduling on selectivity of joins doesn’t work Example 32 |S     E| |EC| >> E and Carrive early; Sis delayed S E C time Taken from Amol Deshpande
33 |S      E| |E C| SE HashTable E.Name HashTable S.Name Eddy S E Output C HashTable C.Course HashTable E.Course Eddy decides to route E to EC EC >> E and Carrive early; Sis delayed S0 sent and received suggest S Join E is better option S S E S0 S –S0 E C time C SE S0E (S –S0)E Eddy learns the correct sizes Too Late !! Taken from Amol Deshpande
34 State got embedded as a result of earlier routing  decisions |S      E| |EC| SE HashTable E.Name HashTable S.Name EC Eddy S E Output C HashTable C.Course HashTable E.Course SE EC >> E and Carrive early; Sis delayed S E C C SE S E Execution Plan Used Query is executed using the worse plan! Too Late !! Taken from Amol Deshpande
STAIRStorage, Transformation and Access for Intermediate Results 35 S.Name STAIR Build into S.Name STAIR HashTable E.Name STAIR HashTable Eddy S Output E C HashTable HashTable E.Course STAIR C.Course STAIR Probe into  E.Name STAIR Show internal state of join to eddy Provide primitive function to state management Demotion Promotion Operation for Insertion (build) Lookup (probe) s1 s1 s1 s1 Taken from Amol Deshpande
STAIR: Demotion 36 e1 e1 e2c1 e2 s1e1 e2c1 e2 s1e1 S.Name STAIR HashTable E.Name STAIR s1 Demoting di e2c1ae2: Simple projection HashTable e1 e2c1 e2 Eddy S s1e1 E Output C HashTable e2 s1e1 e2c1 HashTable c1 Can be tought of as undoing work E.Course STAIR C.Course STAIR Adapted from Amol Deshpande
STAIR: Promotion 37 Promotinge1 using EC e1 e1 e1c1 e1 e1c1 S.Name STAIR HashTable E.Name STAIR s1 Two arguments: ,[object Object]
A join to be used to promote this tupleHashTable e1 e1c1 e2c1 Eddy S E Output C HashTable e2 s1e1 HashTable c1 e1 Can be tought of as precomputation of work E.Course STAIR C.Course STAIR Adapted from Amol Deshpande
Demotion OR Promotion 38 Taken from Lifting the Burden of History from Adaptive Query Processing Amol Deshpande and Joseph M. Hellerstein
Demotion AND Promotion Taken from Lifting the Burden of History from Adaptive Query Processing Amol Deshpande and Joseph M. Hellerstein 39
40 S.Name STAIR HashTable |S      E| |EC| S0 E E E HashTable E E E C C C S0E Eddy decides to route E to EC E.Course STAIR >> E and Carrive early; Sis delayed S0 E.Name STAIR HashTable S E E Eddy S C E Output C time E HashTable C Eddy decides to migrateE Eddy learns the correct selectivities By promoting E using EC C.Course STAIR Adapted from Amol Deshpande
41 |S      E| |EC| HashTable E C S0E E.Course STAIR >> E and Carrive early; Sis delayed S.Name STAIR HashTable S S0 E.Name STAIR HashTable S S –S0 S –S0 E Eddy S C (S –S0) EC E Output C time E HashTable C C.Course STAIR Adapted from Amol Deshpande
42 EC SE C S – S0 SE EC S0 E E C HashTable E C SE E.Course STAIR S.Name STAIR HashTable S E.Name STAIR HashTable UNION Eddy S E Output C E HashTable C Most of the data is processed using the correct plan C.Course STAIR Adapted from AmolDeshpande
STAIR:Correctness Theorem [3.1]: An eddy with STAIRs always produces the correct query result in spite of arbitrary applications of the promotion and demotion operations. STAIRs will produce wvery resul tuple There wull be no spurious duplicates 43 Taken from Lifting the Burden of History from Adaptive Query Processing Amol Deshpande and Joseph M. Hellerstein
State Module A kind of temporary data repository Half-Join operator that keep homogeneous tuple State inside the operators is Decisions-Indipendent Support the operations Insertion (build) Look-up (probe) Deletion (eviction) [useful for windows] Similar to a Mjoin but more adaptive Sharing of the state among other continuous queries But not storing intermediate results Increase the computation cost significant 44
Eddies Join with SteMs 45 T R S eddy R T S More adaptivity Eddy knows half-join Different access method Index access Scan access Simulate several join On overloading? Hash join (fast!!) Index join (mem limit) Join familty? Hash join (equi-join) B-tree join (<, <=, >) Parametrica query can be tought as a join Adapted from Jarle Søberg
SteMs:Correctness  46 R S Correctness problem! Possibile duplicates!! Global unique sequence number Only younger can probe Taken from Jarle Søberg
SteMs sliding Window 47 SELECT *  FROM Str [RANGE BY 5 Second  	   SLIDE BY 2 Second],      Relation, WHERE Relation.a = Str.a A|…….|18:49:36 R B|…….|18:49:36 C|…….|18:49:37 A|…….|18:49:38 Keep the state for the sliding window (eviction) At time 40, what will happen at time 42? Instead rebuild all hash table it remove only old tuples and add new tuples B|…….|18:49:39 C|…….|18:49:39 A|…….|18:49:40 B|…….|18:49:40 C|…….|18:49:40 Eviction!18:49:37 A|…….|18:49:41 B|…….|18:49:41
Binary Join, STAIR, SteMComparison 48 select * from customer c, orders o, lineitem l where c.custkey = o.custkey and     o.orderkey = l.orderkey and     c.nationkey = 1 and     c.acctbal > 9000 and     l.shipdate > date ’1996-01-01’ Necessary Recomputation NO adaptive lineitemcoming with ascending shipdate Initial routing (O >< L) >< C Taken from Lifting the Burden of History from Adaptive Query Processing Amol Deshpande and Joseph M. Hellerstein
Eddies:Routing Policy  How to choose the best plan? Using routing Every tuple is routed individually Routing policy estabilish thesystem efficiency Eddy has a tuple buffer with priorità Initially they have low priority Exiting form an operator they have higher priority A tuple is sent to output as early as possible Avoid system memory congestion Allow low memory consumption 49
Eddies’ Routing Policy:(old) Back-Pressure Approach Naive: Quick operator before 50 sel(s1) = sel(s2) cost(s2) = 5 cost(s1) changes cost(s1) = cost(s2) sel(s2) = 50% sel(s1) changes Taken from: Avnur, Hellerstein
Eddies’ Routing Policy:Lottery Scheduling Waldspurger& Weihlin1994 Algorithm to sheduling shared resources « rapidlyfocus availableresources» Every operator begin with N tickets Operator receive another ticket when take one tuple Promote operators which waste tuples fast Operator lose a ticket when returns one tuples Promote operators with lowselettività low: Operators that returns few tuples after processing many When two operators compete for a tuple The tuple is assigned  to lottery winner operator Never let op without tickets + randomexploration 51
Eddies’ Routing Policy:Lottery Scheduling Lottery Scheduling is better than Back-Pressure 52 cost(s1) = cost(s2) sel(s2) = 50% sel(s1) varia Taken from: Avnur, Hellerstein
Experiment 53 Stream: x with mean 40 and standard deviation 10
54 Experiment: Stream variation Stream: x with mean 10 and standard deviation 0
55 Experiment: Stream variation (2) Stream: x with mean 10 and standard deviation 0
Other Works Distribuited Eddies Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Content-based Routing Partial Results for Online Query Processing Flux: An Adaptive Partitioning Operator for Continuous Query Systems Java Support for Data-Intensive Systems: Experiences Building the Telegraph Dataflow System Ripple Join for Online Aggregation Highly Available, Fault-Tolerant, Parallel Dataflows 56

Más contenido relacionado

La actualidad más candente

Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacApache Apex
 
Predictive Maintenance at the Dutch Railways with Ivo Everts
Predictive Maintenance at the Dutch Railways with Ivo EvertsPredictive Maintenance at the Dutch Railways with Ivo Everts
Predictive Maintenance at the Dutch Railways with Ivo EvertsDatabricks
 
Deep dive into stateful stream processing in structured streaming by Tathaga...
Deep dive into stateful stream processing in structured streaming  by Tathaga...Deep dive into stateful stream processing in structured streaming  by Tathaga...
Deep dive into stateful stream processing in structured streaming by Tathaga...Databricks
 
Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopbalmanme
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLFlink Forward
 
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Tathagata Das
 
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...Databricks
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Petr Zapletal
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 BasicFlink Forward
 
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPTathagata Das
 
Databricks clusters in autopilot mode
Databricks clusters in autopilot modeDatabricks clusters in autopilot mode
Databricks clusters in autopilot modePrakash Chockalingam
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System OverviewFlink Forward
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream APIApache Apex
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Alexey Kharlamov
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkVasia Kalavri
 
Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Andra Lungu
 
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...InfluxData
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleDataWorks Summit/Hadoop Summit
 

La actualidad más candente (20)

Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - HackacIntro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
Intro to Apache Apex - Next Gen Native Hadoop Platform - Hackac
 
Spark streaming: Best Practices
Spark streaming: Best PracticesSpark streaming: Best Practices
Spark streaming: Best Practices
 
Predictive Maintenance at the Dutch Railways with Ivo Everts
Predictive Maintenance at the Dutch Railways with Ivo EvertsPredictive Maintenance at the Dutch Railways with Ivo Everts
Predictive Maintenance at the Dutch Railways with Ivo Everts
 
Deep dive into stateful stream processing in structured streaming by Tathaga...
Deep dive into stateful stream processing in structured streaming  by Tathaga...Deep dive into stateful stream processing in structured streaming  by Tathaga...
Deep dive into stateful stream processing in structured streaming by Tathaga...
 
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache FlinkThe Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
 
Presentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshopPresentation southernstork 2009-nov-southernworkshop
Presentation southernstork 2009-nov-southernworkshop
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
Guest Lecture on Spark Streaming in Stanford CME 323: Distributed Algorithms ...
 
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...
 
Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017Distributed Stream Processing - Spark Summit East 2017
Distributed Stream Processing - Spark Summit East 2017
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 Basic
 
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSPDiscretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
Discretized Stream - Fault-Tolerant Streaming Computation at Scale - SOSP
 
Databricks clusters in autopilot mode
Databricks clusters in autopilot modeDatabricks clusters in autopilot mode
Databricks clusters in autopilot mode
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream API
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache Flink
 
Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015Flink Gelly - Karlsruhe - June 2015
Flink Gelly - Karlsruhe - June 2015
 
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...Development and Applications of Distributed IoT Sensors for Intermittent Conn...
Development and Applications of Distributed IoT Sensors for Intermittent Conn...
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as example
 

Destacado

Destacado (8)

Index for meshes 2d
Index for meshes 2dIndex for meshes 2d
Index for meshes 2d
 
R programming language
R programming languageR programming language
R programming language
 
Minetti master thesis
Minetti master thesisMinetti master thesis
Minetti master thesis
 
SMDMS'13
SMDMS'13SMDMS'13
SMDMS'13
 
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In Ruby
 
Stream processing and Norikra
Stream processing and NorikraStream processing and Norikra
Stream processing and Norikra
 
Norikra in action
Norikra in actionNorikra in action
Norikra in action
 
Complex Event Processing - A brief overview
Complex Event Processing - A brief overviewComplex Event Processing - A brief overview
Complex Event Processing - A brief overview
 

Similar a Berkley’s TelegraphCQ Details

Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsStephan Ewen
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseKostas Tzoumas
 
CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring Tim Bell
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at ScaleSean Zhong
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
First Flink Bay Area meetup
First Flink Bay Area meetupFirst Flink Bay Area meetup
First Flink Bay Area meetupKostas Tzoumas
 
Migration of groups of virtual machines in distributed data centers to reduce...
Migration of groups of virtual machines in distributed data centers to reduce...Migration of groups of virtual machines in distributed data centers to reduce...
Migration of groups of virtual machines in distributed data centers to reduce...Sabidur Rahman
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureRobert Metzger
 
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Anyscale
 
Inter Task Communication On Volatile Nodes
Inter Task Communication On Volatile NodesInter Task Communication On Volatile Nodes
Inter Task Communication On Volatile Nodesnagarajan_ka
 
Alice data acquisition
Alice data acquisitionAlice data acquisition
Alice data acquisitionBertalan EGED
 
AndreaPetrucci_ACAT_2007
AndreaPetrucci_ACAT_2007AndreaPetrucci_ACAT_2007
AndreaPetrucci_ACAT_2007Andrea PETRUCCI
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Robert Metzger
 
Dataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayDataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayQAware GmbH
 
Stream and Batch Processing in the Cloud with Data Microservices
Stream and Batch Processing in the Cloud with Data MicroservicesStream and Batch Processing in the Cloud with Data Microservices
Stream and Batch Processing in the Cloud with Data Microservicesmarius_bogoevici
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustEvan Chan
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 

Similar a Berkley’s TelegraphCQ Details (20)

Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and Friends
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
 
CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
First Flink Bay Area meetup
First Flink Bay Area meetupFirst Flink Bay Area meetup
First Flink Bay Area meetup
 
Linux capacity planning
Linux capacity planningLinux capacity planning
Linux capacity planning
 
Migration of groups of virtual machines in distributed data centers to reduce...
Migration of groups of virtual machines in distributed data centers to reduce...Migration of groups of virtual machines in distributed data centers to reduce...
Migration of groups of virtual machines in distributed data centers to reduce...
 
Chicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architectureChicago Flink Meetup: Flink's streaming architecture
Chicago Flink Meetup: Flink's streaming architecture
 
Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0Continuous Application with Structured Streaming 2.0
Continuous Application with Structured Streaming 2.0
 
Inter Task Communication On Volatile Nodes
Inter Task Communication On Volatile NodesInter Task Communication On Volatile Nodes
Inter Task Communication On Volatile Nodes
 
Alice data acquisition
Alice data acquisitionAlice data acquisition
Alice data acquisition
 
AndreaPetrucci_ACAT_2007
AndreaPetrucci_ACAT_2007AndreaPetrucci_ACAT_2007
AndreaPetrucci_ACAT_2007
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
 
Dataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayDataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice Way
 
Stream and Batch Processing in the Cloud with Data Microservices
Stream and Batch Processing in the Cloud with Data MicroservicesStream and Batch Processing in the Cloud with Data Microservices
Stream and Batch Processing in the Cloud with Data Microservices
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Introduction to ns3
Introduction to ns3Introduction to ns3
Introduction to ns3
 

Más de Alberto Minetti

Gnutella Italian Printable
Gnutella Italian PrintableGnutella Italian Printable
Gnutella Italian PrintableAlberto Minetti
 
Development and analysis of a virtual keyboard optimized (Italian)
Development and analysis of a virtual keyboard optimized (Italian)Development and analysis of a virtual keyboard optimized (Italian)
Development and analysis of a virtual keyboard optimized (Italian)Alberto Minetti
 
High Level Synthesis Using Esterel
High Level Synthesis Using EsterelHigh Level Synthesis Using Esterel
High Level Synthesis Using EsterelAlberto Minetti
 

Más de Alberto Minetti (7)

Index for meshes 3d
Index for meshes 3dIndex for meshes 3d
Index for meshes 3d
 
Moodle for teachers
Moodle for teachersMoodle for teachers
Moodle for teachers
 
Gnutella Italian Printable
Gnutella Italian PrintableGnutella Italian Printable
Gnutella Italian Printable
 
Development and analysis of a virtual keyboard optimized (Italian)
Development and analysis of a virtual keyboard optimized (Italian)Development and analysis of a virtual keyboard optimized (Italian)
Development and analysis of a virtual keyboard optimized (Italian)
 
Inferno Limbo Italian
Inferno Limbo ItalianInferno Limbo Italian
Inferno Limbo Italian
 
Telegraph Cq Italian
Telegraph Cq ItalianTelegraph Cq Italian
Telegraph Cq Italian
 
High Level Synthesis Using Esterel
High Level Synthesis Using EsterelHigh Level Synthesis Using Esterel
High Level Synthesis Using Esterel
 

Berkley’s TelegraphCQ Details

  • 1. Berkley’sTelegraphCQ Details about TelegraphCQ Adaptivity: Eddies, SteMs and STAIR Routing Policy: Lottery Scheduling 1 Friday, July 15, 2011 Alberto Minetti Advanced Data Management @ DISI Università degli Studi di Genova
  • 2. Data Stream Management System Continuous, unbounded, rapid, time-varying streams of data elements Occur in a variety of modern applications Network monitoring and traffic engineering Sensor networks, RFID tags Telecom call records Financial applications Web logs and click-streams Manufacturing processes DSMS = Data Stream Management System 2
  • 3. The Begin:Telegraph Several continuous queries Several data streams At the beginning in Java Then C-based using PostgreSQL No Distribuite Scheduling Level of adaptivity doesn’t change against overloading Ignore system resources Data managemente fully in memory 3
  • 4. ReDesign:TelegraphCQ Developed byBerkeley University Written in C/C++ OpenBSD License Based on PostgreSQL sources Current Version: 0.6 (PostgreSQL 7.3.2) Closed Project in 2006 Several points of interest and features Software http://telegraph.cs.berkeley.edu Papers http://db.cs.berkeley.edu/telegraph Commercial spinoff Truviso 4
  • 5. TelegraphCQ Architecture PostgreSQL backends Many TelegraphCQ front-end Only one TelegraphCQ back-end Front-End Fork for every client connection Doesn’t hold streams Parsing continuous query in the shared memory Back-End has an Eddy Joins query plans together Can be shared among queries Put results in the shared memory 5
  • 6. TelegraphCQ Architecture Shared Memory Query Plan Queue TelegraphCQBack End TelegraphCQBack End TelegraphCQ Front End Eddy Control Queue Planner Parser Listener Modules Modules Query Result Queues Split Mini-Executor Proxy CQEddy CQEddy } Split Split Catalog Scans Scans Shared Memory Buffer Pool Wrappers TelegraphCQ Wrapper ClearingHouse Disk 6 Taken from Michael Franklin, UC Berkeley
  • 7. Tipologie di Moduli Input and Caching (Relations and Streams) Interface to external datasource Wrapper for HTML, XML, FileSystem, proxy P2P Remote Databases with caching support Query Execution Non-blocking of classical operators (sel, proj) SteMs, STAIRs Adaptive Routing Reoptimize plan during execution Eddies: choose routing tupla per tupla Juggle: ordering on the fly (per value or timestamp), Flux: routing among computer of a cluster 7 Fjords Framework
  • 8. Fjords Framework inJava for Operators on Remote Data Streams Interconnect modules Support queues among modules Non-blocking Support for Relazions and Streams 8
  • 9. Stream in TelegraphCQ Unarchived Stream Never written on disk In shared memory between executor and wrapper Archived Stream Append-only method to send tuples to system No update, insert, delete; query aggregate with window tcqtimeof type TIMESTAMP for window queries With constraint TIMESTAMPCOLUMN 9
  • 10. DDL: Create Stream 10 CREATE STREAM measurements ( tcqtime TIMESTAMP TIMESTAMPCOLUMN, stationid INTEGER, speed REAL) TYPE ARCHIVED; CREATE STREAM tinydb ( tcqtime TIMESTAMP TIMESTAMPCOLUMN, light REAL, temperature REAL) TYPE UNARCHIVED; DROP STREAM measurements;
  • 11. Acquisizione di Dati Sources must identify before sending datas Wrapper: user-defined functions How process datas Inside Wrapper Clearinghouse process Push sources Begin a connection to TelegraphCQ Pull sources Thewrapper begin the connection Different Wrapper Data can merge in the same stream Heartbeat: Punctuated tuple without datas, only timestamp When see a Punctuatedtuple no prior datas will come 11
  • 12. Wrappers nel WCH Non-process-blocking over network socket (TCP/IP) Wrapper funct called when there are datas on socket Or when there are datas on buffer If funct return a tuple for time (classic iterator) Ritorns array ofPostgreSQL Datum Init(WrapperState*) allocate resources and state Next(WrapperState*) tuples are in the WrapperState Done(WrapperState*) free resources and destroy state All in the infrastructured memory of PostgreSQL 12
  • 13. DDL: Create Wrapper 13 CREATE FUNCTION measurements_init(INTEGER) RETURNS BOOLEAN AS ‘libmeasurements.so,measurements_init’ LANGUAGE ‘C’; CREATE WRAPPER mywrapper ( init=measurements_init, next=measurements_next, done=measurements_done);
  • 14. HtmlGet and WebQueryServer HtmlGet allows the user to execute a series of Html GET or POST requests to arrive at a page, and then to extract data out of the page using a TElegraph Screen Scrapperdefinition file. Once extracted, this data can be output to a CSV file. Welcome to the TESS homepage. TESS is the TElegraph Screen Scrapper. It is part of The Telegraph Project at UC Berkeley. TESS is a program that takes data from web forms (like search engines or database queries) and turns it into a representation that is usable by a database query processor. http://telegraph.cs.berkeley.edu/tess/ 14
  • 15. self-monitoring capability Three special streams: info about system state Support tointrospettive queries Dynamic catalog Queried as an any stream tcq_queries(tcqtime, qrynum, qid, kind, qrystr) tcq_operators(tcqtime, opnum, opid, numqrs, kind, qid, opstr, opkind, opdesc) tcq_queues(tcqtime, opid, qkind, kind) 15
  • 16. Example 16 Welcome to psql 0.2, the PostgreSQL interactive terminal. # CREATE SCHEMA traffic; # CREATE STREAM traffic.measurements (stationid INT, speed REAL, tcqtime TIMESTAMP TIMESTAMPCOLUMN ) TYPE ARCHIVED; # ALTER STREAM traffic.measurements ADD WRAPPER csvwrapper; $ cat tcpdump.log | source.pl localhost 5533 csvwrapper,network.tcpdum Default TelegraphCQ script written in Perl to simulate sources that send CSV datas Default Port
  • 17. Load Shedding CREATE STREAM … TYPE UNARCHIVED ON OVERLOAD ???? BLOCK: stop it(default) DROP: drop tuples KEEP COUNTS: keep the count of dropped tuples REGHIST: build a fixed-grid histog. of shedded tuples MYHIST: build a MHIST (multidimensional histog.) WAVELET wavelet params Build a wavelet histogram SAMPLE: keep a Reservoir Sample 17
  • 18. LoadShedding:SummaryStream For a stream named schema.stream Automatically created two streams schema.__stream_dropped schema.__stream_kept For WAVELET, MYHIST, REHIST, COUNTS Schema contains: Summary datas Summary interval For SAMPLE Same schema with column __samplemult Keep the number of effective tuples rappresented 18
  • 19. Quering TelegraphCQ:StreaQuel 19 Continuous Query for: Standard Relations inherit from PostgreSQL Data stream windowed (sliding, hopping, jumping) RANGE BY specify the window dimension SLIDE BY specify the update rate START AT specify whenthe query will begin optional SELECT stream.color, COUNT(*) FROM stream [RANGE BY ‘9’ SLIDE BY ‘1’] GROUP BY stream.color window START OUPUT! 1 1 1 1 1 2 2 1 2 1 2 2 1 2 1 2 1 2 1 Adapted from Jarle Søberg
  • 20. Quering TelegraphCQ:StreaQuel (2) wtime(*) returns the last timestamp in the window Recoursive query using WITH [SQL 1999] StreaQuel doesn’t allow subqueries 20 SELECT S.a, wtime(*) FROM S [RANGE BY ’10 seconds’ SLIDE BY ’10 second’], R [RANGE BY ’10 seconds’ SLIDE BY ’10 second’] WHERE S.a = R.a; Data Stream Window … 10 seconds 10 seconds
  • 21. Net Monitor Windowed Query 21 SELECT (CASE when outgoing = true then src_ip else dst_ip end) as inside_ip , (CASE when outgoing = true then dst_ip else src_ip end) as outside_ip, sum(bytes_sent) + sum(bytes_recv) as bytes FROM flow [RANGE BY $w SLIDE BY $w] GROUP BY inside_ip, outside_ip All active connections SELECT src_ip, wtime(*), COUNT(DISTINCT(dst_ip||dst_port)) AS fanout, FROM flow [RANGE BY $w SLIDE BY $w] WHERE outgoing = false GROUP BY src_ip ORDER BY fanout DESC LIMIT 100 PER WINDOW; 100 sources with the max number of connections SELECT sum(bytes_sent), src_ip, wtime(*) AS now FROM flow [RANGE BY $w SLIDE BY $w] WHERE outgoing = false GROUP BY src_ip ORDER BY sum(bytes_sent) DESC LIMIT 100 PER WINDOW; 100 most significant sources of traffic Taken from: İlkay Ozan Kay
  • 22. Evolutionary Revolutionary Adaptive Query Processing:Evolution Static Late Inter Intra Per Plans Binding Operator Operator Tuple Traditional Dynamic QEP Query Scrambling Xjoin, DPHJ, Eddies DBMS Parametric Mid-query Reopt. Convergent QP Competitive Progressive Opt. Taken from: AmolDeshpande, Vijayshankar Raman 22
  • 23. Adaptive Query Processing:Current Background Several plans Parametric Queries Continuous Queries Focus on incremental output Complex Queries (10-20 relazions in join) Data Stream and asyncronous datas Statistics not available Interactive Queries and user preferences Sharing of the state Several operators XML data and text Wide area federations 23
  • 24. Adaptive Query Processing:System R Repeat Observate environment: daily/weekly (runstats) Choose behaviour (optimizer) If current plan is not the best plan (analyzer) Actuate the new plan (executor) Cost-based optimization Runstats-optimize-execute -> high coarseness Weekly Adaptivity! Goal: adaptivity per tuple Merge the 4 works Measure Actuate Analyze Plan Taken from: Avnur, Hellerstein 24
  • 25. TelegraphCQ Executor:Eddy Idea taken from fluid mechanic continuously adaptive query processing mechanism Eddy is a routing operator Delineate which modules must visit before After a tuple visit all modules, it can output See tuples before and after each module (operator) 25 Measure Eddy Analyze Actuate Plan Taken from: Amol Deshpande, Vijayshankar Raman
  • 26. Eddies:Correctness Every tuple has two BitVector Every position correspond to an operator Ready: identify if tuple is ready for that operator Eddy can delineate which tuples send to an operator Done: identify if tuple was processed by that operator Eddy avoids sending two times to same operator When all Done bits are setted -> output joined tuple have Ready and Donein bitwise-OR Simple selections have Ready=complement(Done) 26
  • 27. Eddies:Simple Selection Routing 27 S SELECT *FROM SWHERE S.a > 10 AND S.b < 15 AND S.c = 15 σa σb S.b < 15 S.a > 10 Eddy σc S.c = 15 σaσbσc σaσbσc 15 ; 0 ; 15 S1 0 0 0 1 1 1 1 0 0 01 1 1 1 0 0 0 1 1 1 1 0 0 0 a 15 b 0 Ready Done c 15 Adapted from Manuel Hertlein
  • 28. S >< T Relation Binary Join:R >< S >< T Join order (R >< S) >< T Alright with direct access to datas (index or seq.) 28 R >< S T R S Taken from Jarle Søberg
  • 29. Stream Binary JoinR >< S >< T 29 Join order (R >< S) >< T But if data are sent by sources... S >< T R >< S Block or drop some tuples is inevitable! Taken from Jarle Søberg
  • 30. On the fly optimization necessary S >< T 30 Stream Binary JoinR >< S >< T Often stream changes Reoptimize require lot of time Non dynamic enough! R >< S Taken from Jarle Søberg
  • 31. Stream Binary Join:Eddies Using an Eddy Initial behaviour of Telegraph 31 S >< T eddy R >< S tuple-based adaptivity Consider dynamic changes in the stram Taken from Jarle Søberg
  • 32. Eddies:Sheduling Join problem Sheduling on selectivity of joins doesn’t work Example 32 |S E| |EC| >> E and Carrive early; Sis delayed S E C time Taken from Amol Deshpande
  • 33. 33 |S E| |E C| SE HashTable E.Name HashTable S.Name Eddy S E Output C HashTable C.Course HashTable E.Course Eddy decides to route E to EC EC >> E and Carrive early; Sis delayed S0 sent and received suggest S Join E is better option S S E S0 S –S0 E C time C SE S0E (S –S0)E Eddy learns the correct sizes Too Late !! Taken from Amol Deshpande
  • 34. 34 State got embedded as a result of earlier routing decisions |S E| |EC| SE HashTable E.Name HashTable S.Name EC Eddy S E Output C HashTable C.Course HashTable E.Course SE EC >> E and Carrive early; Sis delayed S E C C SE S E Execution Plan Used Query is executed using the worse plan! Too Late !! Taken from Amol Deshpande
  • 35. STAIRStorage, Transformation and Access for Intermediate Results 35 S.Name STAIR Build into S.Name STAIR HashTable E.Name STAIR HashTable Eddy S Output E C HashTable HashTable E.Course STAIR C.Course STAIR Probe into E.Name STAIR Show internal state of join to eddy Provide primitive function to state management Demotion Promotion Operation for Insertion (build) Lookup (probe) s1 s1 s1 s1 Taken from Amol Deshpande
  • 36. STAIR: Demotion 36 e1 e1 e2c1 e2 s1e1 e2c1 e2 s1e1 S.Name STAIR HashTable E.Name STAIR s1 Demoting di e2c1ae2: Simple projection HashTable e1 e2c1 e2 Eddy S s1e1 E Output C HashTable e2 s1e1 e2c1 HashTable c1 Can be tought of as undoing work E.Course STAIR C.Course STAIR Adapted from Amol Deshpande
  • 37.
  • 38. A join to be used to promote this tupleHashTable e1 e1c1 e2c1 Eddy S E Output C HashTable e2 s1e1 HashTable c1 e1 Can be tought of as precomputation of work E.Course STAIR C.Course STAIR Adapted from Amol Deshpande
  • 39. Demotion OR Promotion 38 Taken from Lifting the Burden of History from Adaptive Query Processing Amol Deshpande and Joseph M. Hellerstein
  • 40. Demotion AND Promotion Taken from Lifting the Burden of History from Adaptive Query Processing Amol Deshpande and Joseph M. Hellerstein 39
  • 41. 40 S.Name STAIR HashTable |S E| |EC| S0 E E E HashTable E E E C C C S0E Eddy decides to route E to EC E.Course STAIR >> E and Carrive early; Sis delayed S0 E.Name STAIR HashTable S E E Eddy S C E Output C time E HashTable C Eddy decides to migrateE Eddy learns the correct selectivities By promoting E using EC C.Course STAIR Adapted from Amol Deshpande
  • 42. 41 |S E| |EC| HashTable E C S0E E.Course STAIR >> E and Carrive early; Sis delayed S.Name STAIR HashTable S S0 E.Name STAIR HashTable S S –S0 S –S0 E Eddy S C (S –S0) EC E Output C time E HashTable C C.Course STAIR Adapted from Amol Deshpande
  • 43. 42 EC SE C S – S0 SE EC S0 E E C HashTable E C SE E.Course STAIR S.Name STAIR HashTable S E.Name STAIR HashTable UNION Eddy S E Output C E HashTable C Most of the data is processed using the correct plan C.Course STAIR Adapted from AmolDeshpande
  • 44. STAIR:Correctness Theorem [3.1]: An eddy with STAIRs always produces the correct query result in spite of arbitrary applications of the promotion and demotion operations. STAIRs will produce wvery resul tuple There wull be no spurious duplicates 43 Taken from Lifting the Burden of History from Adaptive Query Processing Amol Deshpande and Joseph M. Hellerstein
  • 45. State Module A kind of temporary data repository Half-Join operator that keep homogeneous tuple State inside the operators is Decisions-Indipendent Support the operations Insertion (build) Look-up (probe) Deletion (eviction) [useful for windows] Similar to a Mjoin but more adaptive Sharing of the state among other continuous queries But not storing intermediate results Increase the computation cost significant 44
  • 46. Eddies Join with SteMs 45 T R S eddy R T S More adaptivity Eddy knows half-join Different access method Index access Scan access Simulate several join On overloading? Hash join (fast!!) Index join (mem limit) Join familty? Hash join (equi-join) B-tree join (<, <=, >) Parametrica query can be tought as a join Adapted from Jarle Søberg
  • 47. SteMs:Correctness 46 R S Correctness problem! Possibile duplicates!! Global unique sequence number Only younger can probe Taken from Jarle Søberg
  • 48. SteMs sliding Window 47 SELECT * FROM Str [RANGE BY 5 Second SLIDE BY 2 Second], Relation, WHERE Relation.a = Str.a A|…….|18:49:36 R B|…….|18:49:36 C|…….|18:49:37 A|…….|18:49:38 Keep the state for the sliding window (eviction) At time 40, what will happen at time 42? Instead rebuild all hash table it remove only old tuples and add new tuples B|…….|18:49:39 C|…….|18:49:39 A|…….|18:49:40 B|…….|18:49:40 C|…….|18:49:40 Eviction!18:49:37 A|…….|18:49:41 B|…….|18:49:41
  • 49. Binary Join, STAIR, SteMComparison 48 select * from customer c, orders o, lineitem l where c.custkey = o.custkey and o.orderkey = l.orderkey and c.nationkey = 1 and c.acctbal > 9000 and l.shipdate > date ’1996-01-01’ Necessary Recomputation NO adaptive lineitemcoming with ascending shipdate Initial routing (O >< L) >< C Taken from Lifting the Burden of History from Adaptive Query Processing Amol Deshpande and Joseph M. Hellerstein
  • 50. Eddies:Routing Policy How to choose the best plan? Using routing Every tuple is routed individually Routing policy estabilish thesystem efficiency Eddy has a tuple buffer with priorità Initially they have low priority Exiting form an operator they have higher priority A tuple is sent to output as early as possible Avoid system memory congestion Allow low memory consumption 49
  • 51. Eddies’ Routing Policy:(old) Back-Pressure Approach Naive: Quick operator before 50 sel(s1) = sel(s2) cost(s2) = 5 cost(s1) changes cost(s1) = cost(s2) sel(s2) = 50% sel(s1) changes Taken from: Avnur, Hellerstein
  • 52. Eddies’ Routing Policy:Lottery Scheduling Waldspurger& Weihlin1994 Algorithm to sheduling shared resources « rapidlyfocus availableresources» Every operator begin with N tickets Operator receive another ticket when take one tuple Promote operators which waste tuples fast Operator lose a ticket when returns one tuples Promote operators with lowselettività low: Operators that returns few tuples after processing many When two operators compete for a tuple The tuple is assigned to lottery winner operator Never let op without tickets + randomexploration 51
  • 53. Eddies’ Routing Policy:Lottery Scheduling Lottery Scheduling is better than Back-Pressure 52 cost(s1) = cost(s2) sel(s2) = 50% sel(s1) varia Taken from: Avnur, Hellerstein
  • 54. Experiment 53 Stream: x with mean 40 and standard deviation 10
  • 55. 54 Experiment: Stream variation Stream: x with mean 10 and standard deviation 0
  • 56. 55 Experiment: Stream variation (2) Stream: x with mean 10 and standard deviation 0
  • 57. Other Works Distribuited Eddies Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Content-based Routing Partial Results for Online Query Processing Flux: An Adaptive Partitioning Operator for Continuous Query Systems Java Support for Data-Intensive Systems: Experiences Building the Telegraph Dataflow System Ripple Join for Online Aggregation Highly Available, Fault-Tolerant, Parallel Dataflows 56
  • 58. Bibliography TelegraphCQ: An Architectural Status Report Continuous Dataflow Processing for an Uncertain World Enabling Real-Time Querying of Live and Historical Stream Data Declarative Network Monitoring with an Underprovisioned Query Processor Lifting the Burden of History from Adaptive Query Processing [STAIRs] Eddies: Continuously Adaptive Query Processing Using State Modules for Adaptive Query Processing E altri… http://telegraph.cs.berkeley.edu/papers.html Telegraph team @ UC Berkley: Mike Franklin, Joe Hellerstein, Bryan Fulton, Sirish Chandrasekaran, Amol Deshpande, Ryan Huebsch, Edwin Mach, Garrett Jacobson,Sailesh Krishnamurthy, Boon Thau Loo, Nick Lanham, Sam Madden, Fred Reiss, Mehul Shah, Eric Shan, Kyle Stanek, Owen Cooper, David Culler, Lisa Hellerstein, Wei Hong, Scott Shenker, Torsten Suel, Ion Stoica, Doug Tygar, Hal Varian, Ron Avnur, David Yu Chen, Mohan Lakhamraju, Vijayshankar Raman Lottery Scheduling: Flexible Proportional-Share Resource Management Carl A. Waldspurger & William E. Weihl @ MIT 57

Notas del editor

  1. DBMS Open Source, PostgreSQL, da cui partire per implementare TelegraphCQ.Sviluppato alla Berkeley UniversityScritto in C/C++Licenza OpenBSDBasato sul codice di PostgreSQLVersione corrente: 2.1 su PostgreSQL 7.3.2Progetto chiuso nel 2006Importanti punti di interesse e caratteristicheSoftware http://telegraph.cs.berkeley.eduPapers http://db.cs.berkeley.edu/telegraphSpinoff commerciale Truviso
  2. Versioni non bloccanti di operatori classici (sel, proj)Eddies: decidono routing tupla per tuplaFlux instradano le tuple fra le macchine di un cluster per supportare il parallelismo, il bilanciamento del carico e la tolleranza ai guasti
  3. Le sorgenti devono identificarsi prima di inviare datiWrapper: funzioni user-definedCome devono essere processati i datiAll’interno del Wrapper Clearinghouse processPush sourcesIniziano una connessione con TelegraphCQPull sourcesIl wrapper inizia la connessionePull, per esempio si connette ad un server mail e controlla la posta ogni minutoDati da Wrapper differenti possono confluire nello stesso streamHeartbeat: tupla Punctuated senza dati, solo timestamp Quando arriva una tupla Punctuated non arriveranno dati antecedenti
  4. I dati arrivati possono essere solo parti di tuple quindi è necessario bufferizzarle, ad ogni chiamata non è detto che avremo una tuplaWrapperState serve per fare comunicare funzioni utente con il WCH,Se arrivano meno campi saranno di default a NULL, e se ne arrivano troppi saranno troncati
  5. Una sorta di repository temporaneo di datiOperatore di Half-Join che memorizza tuple omogeneeStatoindipendentedalle precedenti decisioni di routing (poiché non memorizza le tuple)Supporta le operazioniInserimento (build)Ricerca (probe)Cancellazione (eviction) utili per le windowSimili a Mjoin ma più adattiviSimile alla facile routing policy con le query con solo le selezioniSharing dello stato tra altre query continueNon memorizza i risultati intermediAumento del costo di computazione
  6. The opportunity to impove:Optimizers pick a single plan for a queryHowever, different subsets of data may have very different statistical propertiesMay be more efficient to use different plans for different subsets of data