Berkley’s TelegraphCQ Details

Berkley’sTelegraphCQ Details about TelegraphCQ Adaptivity: Eddies, SteMs and STAIR Routing Policy: Lottery Scheduling 1 Friday, July 15, 2011 Alberto Minetti Advanced Data Management @ DISI Università degli Studi di Genova

Data Stream Management System Continuous, unbounded, rapid, time-varying streams of data elements Occur in a variety of modern applications Network monitoring and traffic engineering Sensor networks, RFID tags Telecom call records Financial applications Web logs and click-streams Manufacturing processes DSMS = Data Stream Management System 2

The Begin:Telegraph Several continuous queries Several data streams At the beginning in Java Then C-based using PostgreSQL No Distribuite Scheduling Level of adaptivity doesn’t change against overloading Ignore system resources Data managemente fully in memory 3

ReDesign:TelegraphCQ Developed byBerkeley University Written in C/C++ OpenBSD License Based on PostgreSQL sources Current Version: 0.6 (PostgreSQL 7.3.2) Closed Project in 2006 Several points of interest and features Software http://telegraph.cs.berkeley.edu Papers http://db.cs.berkeley.edu/telegraph Commercial spinoff Truviso 4

TelegraphCQ Architecture PostgreSQL backends Many TelegraphCQ front-end Only one TelegraphCQ back-end Front-End Fork for every client connection Doesn’t hold streams Parsing continuous query in the shared memory Back-End has an Eddy Joins query plans together Can be shared among queries Put results in the shared memory 5

TelegraphCQ Architecture Shared Memory Query Plan Queue TelegraphCQBack End TelegraphCQBack End TelegraphCQ Front End Eddy Control Queue Planner Parser Listener Modules Modules Query Result Queues Split Mini-Executor Proxy CQEddy CQEddy } Split Split Catalog Scans Scans Shared Memory Buffer Pool Wrappers TelegraphCQ Wrapper ClearingHouse Disk 6 Taken from Michael Franklin, UC Berkeley

Tipologie di Moduli Input and Caching (Relations and Streams) Interface to external datasource Wrapper for HTML, XML, FileSystem, proxy P2P Remote Databases with caching support Query Execution Non-blocking of classical operators (sel, proj) SteMs, STAIRs Adaptive Routing Reoptimize plan during execution Eddies: choose routing tupla per tupla Juggle: ordering on the fly (per value or timestamp), Flux: routing among computer of a cluster 7 Fjords Framework

Fjords Framework inJava for Operators on Remote Data Streams Interconnect modules Support queues among modules Non-blocking Support for Relazions and Streams 8

Stream in TelegraphCQ Unarchived Stream Never written on disk In shared memory between executor and wrapper Archived Stream Append-only method to send tuples to system No update, insert, delete; query aggregate with window tcqtimeof type TIMESTAMP for window queries With constraint TIMESTAMPCOLUMN 9

DDL: Create Stream 10 CREATE STREAM measurements ( tcqtime TIMESTAMP TIMESTAMPCOLUMN, stationid INTEGER, speed REAL) TYPE ARCHIVED; CREATE STREAM tinydb ( tcqtime TIMESTAMP TIMESTAMPCOLUMN, light REAL, temperature REAL) TYPE UNARCHIVED; DROP STREAM measurements;

Acquisizione di Dati Sources must identify before sending datas Wrapper: user-defined functions How process datas Inside Wrapper Clearinghouse process Push sources Begin a connection to TelegraphCQ Pull sources Thewrapper begin the connection Different Wrapper Data can merge in the same stream Heartbeat: Punctuated tuple without datas, only timestamp When see a Punctuatedtuple no prior datas will come 11

Wrappers nel WCH Non-process-blocking over network socket (TCP/IP) Wrapper funct called when there are datas on socket Or when there are datas on buffer If funct return a tuple for time (classic iterator) Ritorns array ofPostgreSQL Datum Init(WrapperState*) allocate resources and state Next(WrapperState*) tuples are in the WrapperState Done(WrapperState*) free resources and destroy state All in the infrastructured memory of PostgreSQL 12

DDL: Create Wrapper 13 CREATE FUNCTION measurements_init(INTEGER) RETURNS BOOLEAN AS ‘libmeasurements.so,measurements_init’ LANGUAGE ‘C’; CREATE WRAPPER mywrapper ( init=measurements_init, next=measurements_next, done=measurements_done);

HtmlGet and WebQueryServer HtmlGet allows the user to execute a series of Html GET or POST requests to arrive at a page, and then to extract data out of the page using a TElegraph Screen Scrapperdefinition file. Once extracted, this data can be output to a CSV file. Welcome to the TESS homepage. TESS is the TElegraph Screen Scrapper. It is part of The Telegraph Project at UC Berkeley. TESS is a program that takes data from web forms (like search engines or database queries) and turns it into a representation that is usable by a database query processor. http://telegraph.cs.berkeley.edu/tess/ 14

self-monitoring capability Three special streams: info about system state Support tointrospettive queries Dynamic catalog Queried as an any stream tcq_queries(tcqtime, qrynum, qid, kind, qrystr) tcq_operators(tcqtime, opnum, opid, numqrs, kind, qid, opstr, opkind, opdesc) tcq_queues(tcqtime, opid, qkind, kind) 15

Example 16 Welcome to psql 0.2, the PostgreSQL interactive terminal. # CREATE SCHEMA traffic; # CREATE STREAM traffic.measurements (stationid INT, speed REAL, tcqtime TIMESTAMP TIMESTAMPCOLUMN ) TYPE ARCHIVED; # ALTER STREAM traffic.measurements ADD WRAPPER csvwrapper; $ cat tcpdump.log | source.pl localhost 5533 csvwrapper,network.tcpdum Default TelegraphCQ script written in Perl to simulate sources that send CSV datas Default Port

Load Shedding CREATE STREAM … TYPE UNARCHIVED ON OVERLOAD ???? BLOCK: stop it(default) DROP: drop tuples KEEP COUNTS: keep the count of dropped tuples REGHIST: build a fixed-grid histog. of shedded tuples MYHIST: build a MHIST (multidimensional histog.) WAVELET wavelet params Build a wavelet histogram SAMPLE: keep a Reservoir Sample 17

LoadShedding:SummaryStream For a stream named schema.stream Automatically created two streams schema.__stream_dropped schema.__stream_kept For WAVELET, MYHIST, REHIST, COUNTS Schema contains: Summary datas Summary interval For SAMPLE Same schema with column __samplemult Keep the number of effective tuples rappresented 18

Quering TelegraphCQ:StreaQuel 19 Continuous Query for: Standard Relations inherit from PostgreSQL Data stream windowed (sliding, hopping, jumping) RANGE BY specify the window dimension SLIDE BY specify the update rate START AT specify whenthe query will begin optional SELECT stream.color, COUNT(*) FROM stream [RANGE BY ‘9’ SLIDE BY ‘1’] GROUP BY stream.color window START OUPUT! 1 1 1 1 1 2 2 1 2 1 2 2 1 2 1 2 1 2 1 Adapted from Jarle Søberg

Quering TelegraphCQ:StreaQuel (2) wtime(*) returns the last timestamp in the window Recoursive query using WITH [SQL 1999] StreaQuel doesn’t allow subqueries 20 SELECT S.a, wtime(*) FROM S [RANGE BY ’10 seconds’ SLIDE BY ’10 second’], R [RANGE BY ’10 seconds’ SLIDE BY ’10 second’] WHERE S.a = R.a; Data Stream Window … 10 seconds 10 seconds

Net Monitor Windowed Query 21 SELECT (CASE when outgoing = true then src_ip else dst_ip end) as inside_ip , (CASE when outgoing = true then dst_ip else src_ip end) as outside_ip, sum(bytes_sent) + sum(bytes_recv) as bytes FROM flow [RANGE BY $w SLIDE BY $w] GROUP BY inside_ip, outside_ip All active connections SELECT src_ip, wtime(*), COUNT(DISTINCT(dst_ip||dst_port)) AS fanout, FROM flow [RANGE BY $w SLIDE BY $w] WHERE outgoing = false GROUP BY src_ip ORDER BY fanout DESC LIMIT 100 PER WINDOW; 100 sources with the max number of connections SELECT sum(bytes_sent), src_ip, wtime(*) AS now FROM flow [RANGE BY $w SLIDE BY $w] WHERE outgoing = false GROUP BY src_ip ORDER BY sum(bytes_sent) DESC LIMIT 100 PER WINDOW; 100 most significant sources of traffic Taken from: İlkay Ozan Kay

Evolutionary Revolutionary Adaptive Query Processing:Evolution Static Late Inter Intra Per Plans Binding Operator Operator Tuple Traditional Dynamic QEP Query Scrambling Xjoin, DPHJ, Eddies DBMS Parametric Mid-query Reopt. Convergent QP Competitive Progressive Opt. Taken from: AmolDeshpande, Vijayshankar Raman 22

Adaptive Query Processing:Current Background Several plans Parametric Queries Continuous Queries Focus on incremental output Complex Queries (10-20 relazions in join) Data Stream and asyncronous datas Statistics not available Interactive Queries and user preferences Sharing of the state Several operators XML data and text Wide area federations 23

Adaptive Query Processing:System R Repeat Observate environment: daily/weekly (runstats) Choose behaviour (optimizer) If current plan is not the best plan (analyzer) Actuate the new plan (executor) Cost-based optimization Runstats-optimize-execute -> high coarseness Weekly Adaptivity! Goal: adaptivity per tuple Merge the 4 works Measure Actuate Analyze Plan Taken from: Avnur, Hellerstein 24

TelegraphCQ Executor:Eddy Idea taken from fluid mechanic continuously adaptive query processing mechanism Eddy is a routing operator Delineate which modules must visit before After a tuple visit all modules, it can output See tuples before and after each module (operator) 25 Measure Eddy Analyze Actuate Plan Taken from: Amol Deshpande, Vijayshankar Raman

Eddies:Correctness Every tuple has two BitVector Every position correspond to an operator Ready: identify if tuple is ready for that operator Eddy can delineate which tuples send to an operator Done: identify if tuple was processed by that operator Eddy avoids sending two times to same operator When all Done bits are setted -> output joined tuple have Ready and Donein bitwise-OR Simple selections have Ready=complement(Done) 26

Eddies:Simple Selection Routing 27 S SELECT *FROM SWHERE S.a > 10 AND S.b < 15 AND S.c = 15 σa σb S.b < 15 S.a > 10 Eddy σc S.c = 15 σaσbσc σaσbσc 15 ; 0 ; 15 S1 0 0 0 1 1 1 1 0 0 01 1 1 1 0 0 0 1 1 1 1 0 0 0 a 15 b 0 Ready Done c 15 Adapted from Manuel Hertlein

S >< T Relation Binary Join:R >< S >< T Join order (R >< S) >< T Alright with direct access to datas (index or seq.) 28 R >< S T R S Taken from Jarle Søberg

Stream Binary JoinR >< S >< T 29 Join order (R >< S) >< T But if data are sent by sources... S >< T R >< S Block or drop some tuples is inevitable! Taken from Jarle Søberg

On the fly optimization necessary S >< T 30 Stream Binary JoinR >< S >< T Often stream changes Reoptimize require lot of time Non dynamic enough! R >< S Taken from Jarle Søberg

Stream Binary Join:Eddies Using an Eddy Initial behaviour of Telegraph 31 S >< T eddy R >< S tuple-based adaptivity Consider dynamic changes in the stram Taken from Jarle Søberg

Eddies:Sheduling Join problem Sheduling on selectivity of joins doesn’t work Example 32 |S E| |EC| >> E and Carrive early; Sis delayed S E C time Taken from Amol Deshpande

33 |S E| |E C| SE HashTable E.Name HashTable S.Name Eddy S E Output C HashTable C.Course HashTable E.Course Eddy decides to route E to EC EC >> E and Carrive early; Sis delayed S0 sent and received suggest S Join E is better option S S E S0 S –S0 E C time C SE S0E (S –S0)E Eddy learns the correct sizes Too Late !! Taken from Amol Deshpande

34 State got embedded as a result of earlier routing decisions |S E| |EC| SE HashTable E.Name HashTable S.Name EC Eddy S E Output C HashTable C.Course HashTable E.Course SE EC >> E and Carrive early; Sis delayed S E C C SE S E Execution Plan Used Query is executed using the worse plan! Too Late !! Taken from Amol Deshpande

STAIRStorage, Transformation and Access for Intermediate Results 35 S.Name STAIR Build into S.Name STAIR HashTable E.Name STAIR HashTable Eddy S Output E C HashTable HashTable E.Course STAIR C.Course STAIR Probe into E.Name STAIR Show internal state of join to eddy Provide primitive function to state management Demotion Promotion Operation for Insertion (build) Lookup (probe) s1 s1 s1 s1 Taken from Amol Deshpande

STAIR: Demotion 36 e1 e1 e2c1 e2 s1e1 e2c1 e2 s1e1 S.Name STAIR HashTable E.Name STAIR s1 Demoting di e2c1ae2: Simple projection HashTable e1 e2c1 e2 Eddy S s1e1 E Output C HashTable e2 s1e1 e2c1 HashTable c1 Can be tought of as undoing work E.Course STAIR C.Course STAIR Adapted from Amol Deshpande

STAIR: Promotion 37 Promotinge1 using EC e1 e1 e1c1 e1 e1c1 S.Name STAIR HashTable E.Name STAIR s1 Two arguments: ,[object Object]

A join to be used to promote this tupleHashTable e1 e1c1 e2c1 Eddy S E Output C HashTable e2 s1e1 HashTable c1 e1 Can be tought of as precomputation of work E.Course STAIR C.Course STAIR Adapted from Amol Deshpande

Demotion OR Promotion 38 Taken from Lifting the Burden of History from Adaptive Query Processing Amol Deshpande and Joseph M. Hellerstein

Demotion AND Promotion Taken from Lifting the Burden of History from Adaptive Query Processing Amol Deshpande and Joseph M. Hellerstein 39

40 S.Name STAIR HashTable |S E| |EC| S0 E E E HashTable E E E C C C S0E Eddy decides to route E to EC E.Course STAIR >> E and Carrive early; Sis delayed S0 E.Name STAIR HashTable S E E Eddy S C E Output C time E HashTable C Eddy decides to migrateE Eddy learns the correct selectivities By promoting E using EC C.Course STAIR Adapted from Amol Deshpande

41 |S E| |EC| HashTable E C S0E E.Course STAIR >> E and Carrive early; Sis delayed S.Name STAIR HashTable S S0 E.Name STAIR HashTable S S –S0 S –S0 E Eddy S C (S –S0) EC E Output C time E HashTable C C.Course STAIR Adapted from Amol Deshpande

42 EC SE C S – S0 SE EC S0 E E C HashTable E C SE E.Course STAIR S.Name STAIR HashTable S E.Name STAIR HashTable UNION Eddy S E Output C E HashTable C Most of the data is processed using the correct plan C.Course STAIR Adapted from AmolDeshpande

STAIR:Correctness Theorem [3.1]: An eddy with STAIRs always produces the correct query result in spite of arbitrary applications of the promotion and demotion operations. STAIRs will produce wvery resul tuple There wull be no spurious duplicates 43 Taken from Lifting the Burden of History from Adaptive Query Processing Amol Deshpande and Joseph M. Hellerstein

State Module A kind of temporary data repository Half-Join operator that keep homogeneous tuple State inside the operators is Decisions-Indipendent Support the operations Insertion (build) Look-up (probe) Deletion (eviction) [useful for windows] Similar to a Mjoin but more adaptive Sharing of the state among other continuous queries But not storing intermediate results Increase the computation cost significant 44

Eddies Join with SteMs 45 T R S eddy R T S More adaptivity Eddy knows half-join Different access method Index access Scan access Simulate several join On overloading? Hash join (fast!!) Index join (mem limit) Join familty? Hash join (equi-join) B-tree join (<, <=, >) Parametrica query can be tought as a join Adapted from Jarle Søberg

SteMs:Correctness 46 R S Correctness problem! Possibile duplicates!! Global unique sequence number Only younger can probe Taken from Jarle Søberg

SteMs sliding Window 47 SELECT * FROM Str [RANGE BY 5 Second SLIDE BY 2 Second], Relation, WHERE Relation.a = Str.a A|…….|18:49:36 R B|…….|18:49:36 C|…….|18:49:37 A|…….|18:49:38 Keep the state for the sliding window (eviction) At time 40, what will happen at time 42? Instead rebuild all hash table it remove only old tuples and add new tuples B|…….|18:49:39 C|…….|18:49:39 A|…….|18:49:40 B|…….|18:49:40 C|…….|18:49:40 Eviction!18:49:37 A|…….|18:49:41 B|…….|18:49:41

Binary Join, STAIR, SteMComparison 48 select * from customer c, orders o, lineitem l where c.custkey = o.custkey and o.orderkey = l.orderkey and c.nationkey = 1 and c.acctbal > 9000 and l.shipdate > date ’1996-01-01’ Necessary Recomputation NO adaptive lineitemcoming with ascending shipdate Initial routing (O >< L) >< C Taken from Lifting the Burden of History from Adaptive Query Processing Amol Deshpande and Joseph M. Hellerstein

Eddies:Routing Policy How to choose the best plan? Using routing Every tuple is routed individually Routing policy estabilish thesystem efficiency Eddy has a tuple buffer with priorità Initially they have low priority Exiting form an operator they have higher priority A tuple is sent to output as early as possible Avoid system memory congestion Allow low memory consumption 49

Eddies’ Routing Policy:(old) Back-Pressure Approach Naive: Quick operator before 50 sel(s1) = sel(s2) cost(s2) = 5 cost(s1) changes cost(s1) = cost(s2) sel(s2) = 50% sel(s1) changes Taken from: Avnur, Hellerstein

Eddies’ Routing Policy:Lottery Scheduling Waldspurger& Weihlin1994 Algorithm to sheduling shared resources « rapidlyfocus availableresources» Every operator begin with N tickets Operator receive another ticket when take one tuple Promote operators which waste tuples fast Operator lose a ticket when returns one tuples Promote operators with lowselettività low: Operators that returns few tuples after processing many When two operators compete for a tuple The tuple is assigned to lottery winner operator Never let op without tickets + randomexploration 51

Eddies’ Routing Policy:Lottery Scheduling Lottery Scheduling is better than Back-Pressure 52 cost(s1) = cost(s2) sel(s2) = 50% sel(s1) varia Taken from: Avnur, Hellerstein

Experiment 53 Stream: x with mean 40 and standard deviation 10

54 Experiment: Stream variation Stream: x with mean 10 and standard deviation 0

55 Experiment: Stream variation (2) Stream: x with mean 10 and standard deviation 0

Other Works Distribuited Eddies Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Content-based Routing Partial Results for Online Query Processing Flux: An Adaptive Partitioning Operator for Continuous Query Systems Java Support for Data-Intensive Systems: Experiences Building the Telegraph Dataﬂow System Ripple Join for Online Aggregation Highly Available, Fault-Tolerant, Parallel Dataﬂows 56

Berkley’s TelegraphCQ Details

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (8)

Similar a Berkley’s TelegraphCQ Details

Similar a Berkley’s TelegraphCQ Details (20)

Más de Alberto Minetti

Más de Alberto Minetti (7)

Berkley’s TelegraphCQ Details

Notas del editor