SlideShare una empresa de Scribd logo
1 de 66
CS 542 Database Management Systems Query Execution J Singh  March 21, 2011
This meeting Data Models for NoSQL Databases Preliminaries What are we shooting for? Reference Material for Benchmarks posted in blog Some slides from TPC-C SIGMOD ‘97 Presentation Query Execution Sort: Chapter 15 Join: Sections 16.1 – 16.4
Data Models forNoSQL Databases Class Discussion at Next Meeting.  How would you represent many-to-many relationships? Also many-to-one and one-to-one. Cassandra. Brian Card MongoDB. AnniesDuctan Redis. Jonathan Glumac Google App Engine. Sahel Mastoureshgh Amazon SimpleDB. ZahidMian CouchDB. Robert Van Reenen 3-minute presentation (on 3/21) for 20 bonus points
What are we shooting for? Good benchmarks Define the playing field Set the performance agenda Measure release-to-release progress Set goals (e.g., 10,000 tpmC, < 50 $/tpmC)  Something managers can understand (!) Benchmark abuse  Benchmarketing Benchmark wars  more $ on ads than development To keep abuses to a minimum, Benchmarks are defined with precision and read like they are legal documents (example). Some companies include specific prohibitions against publishing benchmark results in their license agreements
Benchmarks have a Lifetime Good benchmarks drive industry and technology forward. At some point, all reasonable advances have been made. Benchmarks can become counter productive by encouraging artificial optimizations. So, even good benchmarks become obsolete over time.
Database Benchmarks Relational Database (OLTP) Benchmarks TPC = Transaction Processing Performance Council De facto industry standards body for OLTP performance Most TPC specs, info, results are on the web page: http://www.tpc.org TPC-C has been the workhorse of the industry, more in a minute TPC-E is more comprehensive Different problem spaces require different benchmarks Other benchmarks for analytics / decision support systems Two papers referenced on the course website on NoSQL / MapReduce Benchmarks define the problem set, not the technology E.g., if managing documents, create and use a document management benchmark, not one that was created to show off the capabilities of your DB.
TPC-C’s Five Transactions Workload Definition Transactions operate against a database of nine tables Transactions: New-order: enter a new order from a customer Payment: update customer balance to reflect a payment Delivery: deliver orders (done as a batch transaction) Order-status: retrieve status of customer’s most recent order Stock-level: monitor warehouse inventory Specifies size of each table Specifies # of users and workflow (next slide) Specifies configuration requirements  must be ACID, failure tolerant, distributed, … Response time requirement:  90% of each type of transaction must have a response time <= 5 seconds, except stock-level which is <= 20 seconds. Result: How many TPC-C transactions can be supported? What is the $/tpm cost
TPC-C Workflow 1 Select txn from menu: 1. New-Order 	45% 2. Payment 	43%	 3. Order-Status	4% 4. Delivery 	4% 5. Stock-Level 	4% Cycle Time Decomposition (typical values, in seconds,  for weighted average txn) Menu = 0.3 Keying = 9.6 Txn RT = 2.1 Think = 11.4 Average cycle time = 23.4 2 Measure menu Response Time Input screen Keying time 3 Measure txn Response Time Output screen Think time Go back to 1
TPC-C Results (by DBMS, as of 5/9/97) Stating the obvious… These results are not a comparison of databases They are a comparison of databases for the specific problem specified by the TPC-C benchmark Ensuring a level playing field is essential when defining a benchmark and conducting measurements Witness the Pavlo/Dean debate
Benchmarks for Other Databases Class Discussion at Next Meeting.  What benchmarks are appropriate for  Key-value stores? Document databases? Network databases? Geospatial databases? Genomic databases? Time series databases? Other? General discussion, no bonus points Please let me know if I may call on you, and for which?
Overview of Query Execution
An example to work with But first we must revisit Relational Algebra… Database:  City, Country, CountryLanguage database. Example query: All cities in Finland with a population at least double of Aruba SELECT  [xyz] FROM  City, Country WHERE City.CountryCode = 'fin' AND Country.Code = 'abw' AND City.population > 2*Country.population;
Relational Operators Selection Basics  Idempotent Commutative Selection Conjunctions Useful when pruning Selection Disjunctions Equivalent to UNIONS
Selection and Cross Product When Selection is followed by a Cross Product, for A(R  S),  Break A into three conditions such that A = r⋀ s⋀rs where r only has the set of attributes only in R s only has the set of attributes only in S rs, has the set of attributes in both R and S Then, the following holds: A(R  S) = r⋀ s⋀ rs(R  S) = rs(r(R)  s(S)) In case you forgot… R ⋈A S = A(R  S) This result helps us compute Theta-joins! Review Chapter 2 of the textbook for more; back to the example…
An example to work with Database:  City, Country, CountryLanguage database. Example query: All cities in Finland with a population at least double of Aruba SELECT  [xyz] FROM  City, Country WHERE City.CountryCode = 'fin' AND Country.Code = 'abw' AND City.population > 2*Country.population; Algebra Representation xyz((T.cc = 'fin' ⋀ Y.cc = 'abw' ⋀ T.pop > 2*Y.pop) (T  Y)), or continued…
Example: Algebra Manipulation Algebra Representation xyz((T.cc = 'fin' ⋀ Y.cc = 'abw'⋀T.pop > 2*Y.pop) (T  Y)), or xyz( ( T.pop > 2*Y.pop) (  (T.cc = 'fin' ) (T)   (Y.cc= 'abw' ) (Y) ) Graphical Representation of Plan
Visualizing Plan Execution The plan is a set of ‘operators’ The operators operate in parallel On different machines? On different processors? In different processes? In different threads? Yes, depends on the architecture. Each operator feeds its input to the next operator The “parallel operators” visualization allows for pipelining The output of one operator is the input to the next A operator can block if its inputs are not ready Design goal is for the operators to pipeline (if possible) Would like to start operating with partial data Takes advantage of as much parallelism as the problem allows
Common Elements Key metrics of each component: How much RAM does it consume? How much Disk I/O does it require? Each component is implemented as an Iterator Base class for each operator. Three methods: Open(). May block if Input is not ready Unable to proceed till all data has been received GetNext(). Returns the next tuple. May block if the next tuple is not ready Returns NotFound when exhausted Close() Performs any cleanup and terminates
Example: Table-scan operator Open():   pass GetNext():   for b in blocks:     for t in tuples of b:       if valid t: return t   return NotFound Close():   pass Key Metrics: RAM: 1 block Disk I/O: Number of blocks Notes: Represents the operations T(=City) and Y(=Country) Used only if appropriate indexes don’t exist Can use prefetching Not shown here
Summary so far Benchmarks are critical for defining performance goals of the database TPC-C is a widely-used benchmark, TPC-E is broader in scope but less widespread Need to choose benchmarks to fit the problem at hand A query can be parsed into primitives for execution Parallelism & pipelining are essential for performance
CS-542 Database Management Systems Query Execution Algorithms
One-pass Algorithms Lend themselves nicely to pipelining (with minimum blocking) Good for Table-scans (as seen) Tuple-at-a-time operations (selection and projection) Full-relation binary operations (∪, ∩, -, ⋈, ) as long as one of the operands can fit in memory Considering JOIN next, read others from book
Open():   read S into memory GetNext():   for b in blocks of R:     for t in tuples of b:       if t matches tuple s:         return join (t,s)   return NotFound Close():   pass Example: JOIN (R,S) Key Metrics: RAM: Blocks(S) + 1 block Disk I/O: Blocks(R) + Blocks(S) Notes: Can use prefetching for R Not shown here
Nested-Loop Joins What if all of S won’t fit into memory? We can do it chunk-by-chunk, a ‘chunk’ is as many blocks of S that will fit Algorithm sketch: (I/O operations shown in bold) GetNext(): for c in chunks of S:     for b in blocks of R:       for t in tuples of b:         for s in tuples of c:           return join(t,s)   return NotFound Key Metrics RAM: M Disk I/O: Blocks(S)                 + k * Blocks(R)     where k = (size(S)/#chunks) Note how quickly performance deteriorates! We can do better
Two-pass algorithms Sort-based two-pass algorithms The first pass does a sort on some parameter(s) of each operand The second pass algorithm relies on the sort results and can be pipelined Hash-based two-pass algorithms Do a prep-pass and write the result back to disk Compute the result in the second pass
Two-pass idea: sort example For each of C chunks of M blocks, sort each chunk and write it back In the example, we have 4 chunks, each 6 blocks Merge the result Key Metrics For the first pass: RAM: M Disk I/O: 2 * Blocks(R) For the 2nd pass: RAM: C Disk I/O: Blocks(R)
Naïve two-pass JOIN Sort R and S on the common attributes of the JOIN Merge the sorted R and S on the common attributes See section 15.4.9 of book for more details Also known as Sort-Join Key Metrics Sort RAM: M Disk I/O:          4 * (Blocks(R) + Blocks(S)) 4, not 3 because we wrote the sort results back Join RAM: 2 Disk I/O:          (Blocks(R) + Blocks(S)) Total Operation RAM: M Disk I/O:          5 * (Blocks(R) + Blocks(S))
Efficient two-pass JOIN Key Metrics Sort (only pass 1) RAM: M Disk I/O:          2 * (Blocks(R) + Blocks(S)) Join RAM: 2 Disk I/O: None additional         (Blocks(R) + Blocks(S)) Total Operation RAM: M Disk I/O:          3 * (Blocks(R) + Blocks(S)) Main idea: Combine pass 2 of the sort with join
Hash Join Main Idea: Pass 1: Dividetuples in R and S into m hash buckets Read a block of R (or S) For each tuple in that block, find its hash i and move it to hash bucket i. Keep one block for each hash bucket in memory Write it out to disk when full Pass 2: For each i Read buckets Ri and Si and do their join. Key Metrics RAM: M Disk I/O:            3 * (Blocks(R) + Blocks(S)) Disk I/O can be less if: Hash the bigger relation first Expect that many of the buckets will still be in memory
Index-based Algorithms Refresher course on indexes and clustering The basic idea: Use the index to locate records and thus cut down on I/O
Index-based Selection If the relation T has a clustering index on cc, All tuples will be contiguous Disk I/O: Blocks(T)/V(T, 'fin') Where V(T,cc) is the number of tuples with cc = 'fin‘ Sort of… If the relation T does not have a clustering index on cc, Tuples could be scattered Disk I/O: Tuples(T)/V(T, 'fin') Big difference! Consider the selection  (T.cc= 'fin' ) (T)
Index-based JOIN If, say, R has an index on Y, Same as a two-pass JOIN except that we don’t have to first sort/hash on R If clustering index, Disk I/O, Blocks(R)/V(R,Y) + 3 * Blocks(S) Otherwise, Tuples(R)/V(R,Y) + 3 * Blocks(S) If both R and S are indexed, Disk I/O is reduced even further Consider the JOIN R(X,Y) ⋈ S(Y,Z), where Y is the common set of attributes of R and S
Summary Execution primitives forpipelining One-pass algorithms should be used wherever possible Two-pass algorithms can usually be used no matter how big the problem Indexes help and should be taken advantage of where possible
Query Optimization Based on slides from Prof. Garcia-Molina
Desired Endpoint  x=1 AND y=2 AND z<5 (R) R ⋈ S ⋈ U Example Physical Query Plans two-pass hash-join 101 buffers Filter(x=1 AND z<5) materialize IndexScan(R,y=2) two-pass hash-join 101 buffers TableScan(U) TableScan(R) TableScan(S)
Outline Convert SQL query to a parse tree Semantic checking: attributes, relation names, types Convert to a logical query plan (relational algebra expression) deal with subqueries Improve the logical query plan use algebraic transformations group together certain operators evaluate logical plan based on estimated size of relations  Convert to a physical query plan search the space of physical plans  choose order of operations complete the physical query plan
Improving the Logical Query Plan There are numerous algebraic laws concerning relational algebra operations By applying them to a logical query plan judiciously, we can get an equivalent query plan that can be executed more efficiently Next we'll survey some of these laws
Relational Operators (revisited) Selection Basics  Idempotent Commutative Selection Conjunctions Useful when pruning Selection Disjunctions Equivalent to UNIONS
Laws Involving Selection Selections usually reduce the size of the relation Usually good to do selections early,  i.e., "push them down the tree" Also can be helpful to break up a complex selection into parts
Selection and Binary Operators Must push selection to both arguments: C (R U S) = C (R) U C (S) Must push to first arg, optional for 2nd: C (R - S) = C (R) -  S C (R - S) = C (R) -  C (S) Push to at least one arg with all attributes mentioned in C: product, natural join, theta join, intersection e.g., C (R X S) = C (R) X  S, if R has all the attributes in C
Pushing Selection Up the Tree Suppose we have relations StarsIn(title,year,starName) Movie(title,year,len,inColor,studioName) and a view CREATE VIEW MoviesOf1996 AS 			SELECT * 			FROM Movie 			WHERE year = 1996; and the query SELECT starName, studioName 	FROM MoviesOf1996 NATURAL JOIN StarsIn;
The Straightforward Tree Remember the rule C(R ⋈S) = C(R) ⋈S ? starName,studioName year=1996           StarsIn Movie
The Improved Logical Query Plan starName,studioName starName,studioName starName,studioName year=1996 year=1996      year=1996  year=1996           StarsIn StarsIn Movie StarsIn Movie push selection up tree push selection down tree Movie
Laws Involving Projections Adding a projection lower in the tree can improve performance, since often tuple size is reduced Usually not as helpful as pushing selections down Consult textbook for details, will not be on the exam
Joins and Products Recall from the definitions of relational algebra: R ⋈C S = C (R X S) (theta join) 	where C equates same-name attributes in R and S To improve a logical query plan, replace a product followed by a selection with a join Join algorithms are usually faster than doing product followed by selection
Summary of LQP Improvements Selections: push down tree as far as possible if condition is an AND, split and push separately sometimes need to push up before pushing down Projections: can be pushed down (sometimes, read book) Selection/product combinations: can sometimes be replaced with join
Outline Convert SQL query to a parse tree Semantic checking: attributes, relation names, types Convert to a logical query plan (relational algebra expression) deal with subqueries Improve the logical query plan use algebraic transformations group together certain operators evaluate logical plan based on estimated size of relations  Convert to a physical query plan search the space of physical plans  choose order of operations complete the physical query plan
Grouping Assoc/Comm Operators Group together adjacent joins, adjacent unions, and adjacent intersections as siblings in the tree Sets up the logical QP for future optimization when physical QP is constructed:  determine best order for doing a sequence of joins (or unions or intersections) U    D   E   F U D E F U A   B   C   A B C
Evaluating Logical Query Plans The transformations discussed so far intuitively seem like good ideas But how can we evaluate them more scientifically? Estimate size of relations, also helpful in evaluating physical query plans Coming up next…
CS-542 Database Management Systems Plan Estimation, based on slides from Prof. Garcia-Molina
Estimating Sizes of Relations Used in two places: to help decide between competing logical query plans to help decide between competing physical query plans Notation review: T(R): number of tuples in relation R B(R): minimum number of blocks needed to store R So far, we’ve spelled it out Blocks(R) V(R,a): number of distinct values in R of attribute a
Requirements for Estimation Rules Give accurate estimates Are easy (fast) to compute Are logically consistent: estimated size should not depend on how the relation is computed Here describe some simple heuristics. All we really need is a scheme that properly ranks competing plans.
Estimating Size of Selection (p1) Suppose selection condition is A = c, where A is an attribute and c is a constant. A reasonable estimate of the number of tuples in the result is: T(R)/V(R,A), i.e., original number of tuples divided by number of different values of A Good approximation if values of A are evenly distributed Also good approximation in some other, common, situations (see textbook)
Estimating Size of Selection (p2) If condition is A < c: a good estimate is T(R)/3;  intuition is that usually you ask about something that is true of less than half the tuples If condition is A ≠ c:	 a good estimate is T(R ) If condition is the AND of several equalities and inequalities, estimate in series.
Example Consider relation R(a,b,c) with 10,000 tuples and 50 different values for attribute a. Consider selecting all tuples from R with a = 10 and b < 20. Estimate of number of resulting tuples is  10,000*(1/50)*(1/3) = 67.
Estimating Size of Selection (p3) If condition has the form C1 OR C2, use: sum of estimate for C1 and estimate for C2,  unless that sum is > T(R) and the previous , or assuming C1 and C2 are independent, 	T(R)*(1  (1f1)*(1f2)), 	where f1 is fraction of R satisfying C1and                    f2is fraction of R satisfying C2
Example Consider relation R(a,b)  10,000 tuples and 50 different values for a. Consider selecting all tuples from R with a = 10 or b < 20. Estimate Estimate for a = 10 is 10,000/50 = 200 Estimate for b < 20 is 10,000/3 = 3333 Estimate for combined condition is 200 + 3333 = 3533 or 10,000*(1  (1  1/50)*(1  1/3)) = 3466 Different, but not really
Estimating Size of Natural Join Assume join is on a single attribute Y. Some possibilities: R and S have disjoint sets of Y values, so size of join is 0 Y is the key of S and a foreign key of R, so size of join is T(R) All the tuples of R and S have the same Y value, so size of join is T(R)*T(S) We need some assumptions…
Join Estimation Rule Expected number of tuples in result is T(R)*T(S) / max(V(R,Y),V(S,Y)) Why?  Suppose V(R,Y) ≤ V(S,Y). There are T(R) tuples in R. Each of them has a 1/V(S,Y) chance of joining with a given tuple of S, creating T(S)/V(S,Y) new tuples
Example Suppose we have R(a,b) with T(R) = 1000 and V(R,b) = 20 S(b,c) with T(S) = 2000, V(S,b) = 50, and V(S,c) = 100 U(c,d) with T(U) = 5000 and V(U,c) = 500 What is the estimated size of R ⋈S ⋈U? First join R and S (on attribute b):  estimated size of result, X, is T(R)*T(S)/max(V(R,b),V(S,b)) = 40,000 number of values of c in X is the same as in S, namely 100 Then join X with U (on attribute c):  estimated size of result is T(X)*T(U)/max(V(X,c),V(U,c)) = 400,000
Summary of Estimation Rules Projection: exactly computable Product: exactly computable Selection: reasonable heuristics Join: reasonable heuristics The other operators are harder to estimate…
Estimating Size Parameters Estimating the size of a relation depended on knowing T(R) and V(R,a)'s Estimating cost of a physical algorithm depends on also knowing B(R). How can the query compiler learn them? Scan relation to learn T, V's, and then calculate B Can also keep a histogram of the values of attributes. Makes estimating join results more accurate Recomputed periodically, after some time or some number of updates, or if DB administrator thinks optimizer isn't choosing good plans
Heuristics to Reduce Cost of LQP For each transformation of the tree being considered, estimate the "cost" before and after doing the transformation At this point, "cost" only refers to sizes of intermediate relations (we don't yet know about number of disk I/O's) Sum of sizes of all intermediate relations is the heuristic:  if this sum is smaller after the transformation, then incorporate it
Why couldn’t we…  A few questions to explore NoSQL has also been described as NoJOIN Could we use the techniques discussed here to implement JOINs on a NoSQL database? Could we implement the parallel operators as MapReduce jobs? Suitable topics in case you have not yet chosen a project
Update on Projects Consider includingbenchmark results in your presentation There is no need to submit your code Key fragments can be included in your report, as seen in numerous papers Do include design of the code in your report Do not submit code. It will not be evaluated Pace yourself Plan to finish up your project coding in 2 weeks (by 4/4) Plan to write and perfect your report and PPT after that Budget your presentation time carefully. How is it going?
Next week Query Optimization Suggested topic? We have half-a-lecture open to cover any topics of interest to everyone

Más contenido relacionado

La actualidad más candente

Search algorithms for discrete optimization
Search algorithms for discrete optimizationSearch algorithms for discrete optimization
Search algorithms for discrete optimizationSally Salem
 
Access to non local names
Access to non local namesAccess to non local names
Access to non local namesVarsha Kumar
 
Parallel sorting algorithm
Parallel sorting algorithmParallel sorting algorithm
Parallel sorting algorithmRicha Kumari
 
Oracle Parallel Distribution and 12c Adaptive Plans
Oracle Parallel Distribution and 12c Adaptive PlansOracle Parallel Distribution and 12c Adaptive Plans
Oracle Parallel Distribution and 12c Adaptive PlansFranck Pachot
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitionerSubhas Kumar Ghosh
 
Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advancedChirag Ahuja
 
Cloud schedulers and Scheduling in Hadoop
Cloud schedulers and Scheduling in HadoopCloud schedulers and Scheduling in Hadoop
Cloud schedulers and Scheduling in HadoopPallav Jha
 
Cupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmCupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmTarikuDabala1
 
Multi layered perceptron (mlp)
Multi layered perceptron (mlp)Multi layered perceptron (mlp)
Multi layered perceptron (mlp)Handson System
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectMao Geng
 
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CAApache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CARobert Metzger
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
Chapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesChapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesHeman Pathak
 

La actualidad más candente (20)

Chapter 7 Run Time Environment
Chapter 7   Run Time EnvironmentChapter 7   Run Time Environment
Chapter 7 Run Time Environment
 
Hadoop job chaining
Hadoop job chainingHadoop job chaining
Hadoop job chaining
 
Matrix Multiplication Report
Matrix Multiplication ReportMatrix Multiplication Report
Matrix Multiplication Report
 
Run time
Run timeRun time
Run time
 
Search algorithms for discrete optimization
Search algorithms for discrete optimizationSearch algorithms for discrete optimization
Search algorithms for discrete optimization
 
Access to non local names
Access to non local namesAccess to non local names
Access to non local names
 
Parallel sorting algorithm
Parallel sorting algorithmParallel sorting algorithm
Parallel sorting algorithm
 
Oracle Parallel Distribution and 12c Adaptive Plans
Oracle Parallel Distribution and 12c Adaptive PlansOracle Parallel Distribution and 12c Adaptive Plans
Oracle Parallel Distribution and 12c Adaptive Plans
 
Hadoop combiner and partitioner
Hadoop combiner and partitionerHadoop combiner and partitioner
Hadoop combiner and partitioner
 
Interpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with SawzallInterpreting the Data:Parallel Analysis with Sawzall
Interpreting the Data:Parallel Analysis with Sawzall
 
Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advanced
 
Cloud schedulers and Scheduling in Hadoop
Cloud schedulers and Scheduling in HadoopCloud schedulers and Scheduling in Hadoop
Cloud schedulers and Scheduling in Hadoop
 
Cupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmCupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithm
 
Multi layered perceptron (mlp)
Multi layered perceptron (mlp)Multi layered perceptron (mlp)
Multi layered perceptron (mlp)
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CAApache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
Apache Flink Deep-Dive @ Hadoop Summit 2015 in San Jose, CA
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
04 pig data operations
04 pig data operations04 pig data operations
04 pig data operations
 
Chapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming LanguagesChapter 4: Parallel Programming Languages
Chapter 4: Parallel Programming Languages
 
Heap Management
Heap ManagementHeap Management
Heap Management
 

Destacado

Query Execution Time and Query Optimization.
Query Execution Time and Query Optimization.Query Execution Time and Query Optimization.
Query Execution Time and Query Optimization.Radhe Krishna Rajan
 
Understanding Query Execution
Understanding Query ExecutionUnderstanding Query Execution
Understanding Query Executionwebhostingguy
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)Michael Rys
 
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...Olaf Hartig
 
U-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningU-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningMichael Rys
 
"Query Execution: Expectation - Reality (Level 300)" Денис Резник
"Query Execution: Expectation - Reality (Level 300)" Денис Резник"Query Execution: Expectation - Reality (Level 300)" Денис Резник
"Query Execution: Expectation - Reality (Level 300)" Денис РезникFwdays
 

Destacado (7)

Query Execution Time and Query Optimization.
Query Execution Time and Query Optimization.Query Execution Time and Query Optimization.
Query Execution Time and Query Optimization.
 
Understanding Query Execution
Understanding Query ExecutionUnderstanding Query Execution
Understanding Query Execution
 
Query execution
Query executionQuery execution
Query execution
 
U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)U-SQL Query Execution and Performance Basics (SQLBits 2016)
U-SQL Query Execution and Performance Basics (SQLBits 2016)
 
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversa...
 
U-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningU-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance Tuning
 
"Query Execution: Expectation - Reality (Level 300)" Денис Резник
"Query Execution: Expectation - Reality (Level 300)" Денис Резник"Query Execution: Expectation - Reality (Level 300)" Денис Резник
"Query Execution: Expectation - Reality (Level 300)" Денис Резник
 

Similar a CS 542 -- Query Execution

Query Optimization - Brandon Latronica
Query Optimization - Brandon LatronicaQuery Optimization - Brandon Latronica
Query Optimization - Brandon Latronica"FENG "GEORGE"" YU
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Spark Summit
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsZvi Avraham
 
Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLScaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLJim Mlodgenski
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCAapo Kyrölä
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured StreamingKnoldus Inc.
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionChetan Khatri
 
PostgreSQL 8.4 TriLUG 2009-11-12
PostgreSQL 8.4 TriLUG 2009-11-12PostgreSQL 8.4 TriLUG 2009-11-12
PostgreSQL 8.4 TriLUG 2009-11-12Andrew Dunstan
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Databricks
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using PythonNishantKumar1179
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Databricks
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureGabriele Modena
 
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageJanuary 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageZurich_R_User_Group
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Databricks
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packagesAjay Ohri
 
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster PerformanceWebinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster PerformanceAltinity Ltd
 

Similar a CS 542 -- Query Execution (20)

Oct.22nd.Presentation.Final
Oct.22nd.Presentation.FinalOct.22nd.Presentation.Final
Oct.22nd.Presentation.Final
 
Query Optimization - Brandon Latronica
Query Optimization - Brandon LatronicaQuery Optimization - Brandon Latronica
Query Optimization - Brandon Latronica
 
Ab ap faq
Ab ap faqAb ap faq
Ab ap faq
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
 
Flink internals web
Flink internals web Flink internals web
Flink internals web
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Scaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQLScaling PostgreSQL With GridSQL
Scaling PostgreSQL With GridSQL
 
Large-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PCLarge-scale Recommendation Systems on Just a PC
Large-scale Recommendation Systems on Just a PC
 
Introduction to Structured Streaming
Introduction to Structured StreamingIntroduction to Structured Streaming
Introduction to Structured Streaming
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 
PostgreSQL 8.4 TriLUG 2009-11-12
PostgreSQL 8.4 TriLUG 2009-11-12PostgreSQL 8.4 TriLUG 2009-11-12
PostgreSQL 8.4 TriLUG 2009-11-12
 
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
Everyday I'm Shuffling - Tips for Writing Better Spark Programs, Strata San J...
 
PPT on Data Science Using Python
PPT on Data Science Using PythonPPT on Data Science Using Python
PPT on Data Science Using Python
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 
Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2 Cost-Based Optimizer in Apache Spark 2.2
Cost-Based Optimizer in Apache Spark 2.2
 
Moving Towards a Streaming Architecture
Moving Towards a Streaming ArchitectureMoving Towards a Streaming Architecture
Moving Towards a Streaming Architecture
 
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table packageJanuary 2016 Meetup: Speeding up (big) data manipulation with data.table package
January 2016 Meetup: Speeding up (big) data manipulation with data.table package
 
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
Cost-Based Optimizer in Apache Spark 2.2 Ron Hu, Sameer Agarwal, Wenchen Fan ...
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster PerformanceWebinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
Webinar: Strength in Numbers: Introduction to ClickHouse Cluster Performance
 

Más de J Singh

OpenLSH - a framework for locality sensitive hashing
OpenLSH  - a framework for locality sensitive hashingOpenLSH  - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashingJ Singh
 
Designing analytics for big data
Designing analytics for big dataDesigning analytics for big data
Designing analytics for big dataJ Singh
 
Open LSH - september 2014 update
Open LSH  - september 2014 updateOpen LSH  - september 2014 update
Open LSH - september 2014 updateJ Singh
 
PaaS - google app engine
PaaS  - google app enginePaaS  - google app engine
PaaS - google app engineJ Singh
 
Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)J Singh
 
Data Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsData Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsJ Singh
 
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceFacebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceJ Singh
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data LaboratoryJ Singh
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh
 
Social Media Mining using GAE Map Reduce
Social Media Mining using GAE Map ReduceSocial Media Mining using GAE Map Reduce
Social Media Mining using GAE Map ReduceJ Singh
 
High Throughput Data Analysis
High Throughput Data AnalysisHigh Throughput Data Analysis
High Throughput Data AnalysisJ Singh
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduceJ Singh
 
CS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitCS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitJ Singh
 
CS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlCS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlJ Singh
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementJ Singh
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 
CS 542 Database Index Structures
CS 542 Database Index StructuresCS 542 Database Index Structures
CS 542 Database Index StructuresJ Singh
 
CS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and PerformanceCS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and PerformanceJ Singh
 
CS 542 Overview of query processing
CS 542 Overview of query processingCS 542 Overview of query processing
CS 542 Overview of query processingJ Singh
 
CS 542 Introduction
CS 542 IntroductionCS 542 Introduction
CS 542 IntroductionJ Singh
 

Más de J Singh (20)

OpenLSH - a framework for locality sensitive hashing
OpenLSH  - a framework for locality sensitive hashingOpenLSH  - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashing
 
Designing analytics for big data
Designing analytics for big dataDesigning analytics for big data
Designing analytics for big data
 
Open LSH - september 2014 update
Open LSH  - september 2014 updateOpen LSH  - september 2014 update
Open LSH - september 2014 update
 
PaaS - google app engine
PaaS  - google app enginePaaS  - google app engine
PaaS - google app engine
 
Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)
 
Data Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsData Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and Tradeoffs
 
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceFacebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/Reduce
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data Laboratory
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Social Media Mining using GAE Map Reduce
Social Media Mining using GAE Map ReduceSocial Media Mining using GAE Map Reduce
Social Media Mining using GAE Map Reduce
 
High Throughput Data Analysis
High Throughput Data AnalysisHigh Throughput Data Analysis
High Throughput Data Analysis
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
CS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitCS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed Commit
 
CS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlCS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency Control
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage Management
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
CS 542 Database Index Structures
CS 542 Database Index StructuresCS 542 Database Index Structures
CS 542 Database Index Structures
 
CS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and PerformanceCS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and Performance
 
CS 542 Overview of query processing
CS 542 Overview of query processingCS 542 Overview of query processing
CS 542 Overview of query processing
 
CS 542 Introduction
CS 542 IntroductionCS 542 Introduction
CS 542 Introduction
 

CS 542 -- Query Execution

  • 1. CS 542 Database Management Systems Query Execution J Singh March 21, 2011
  • 2. This meeting Data Models for NoSQL Databases Preliminaries What are we shooting for? Reference Material for Benchmarks posted in blog Some slides from TPC-C SIGMOD ‘97 Presentation Query Execution Sort: Chapter 15 Join: Sections 16.1 – 16.4
  • 3. Data Models forNoSQL Databases Class Discussion at Next Meeting. How would you represent many-to-many relationships? Also many-to-one and one-to-one. Cassandra. Brian Card MongoDB. AnniesDuctan Redis. Jonathan Glumac Google App Engine. Sahel Mastoureshgh Amazon SimpleDB. ZahidMian CouchDB. Robert Van Reenen 3-minute presentation (on 3/21) for 20 bonus points
  • 4. What are we shooting for? Good benchmarks Define the playing field Set the performance agenda Measure release-to-release progress Set goals (e.g., 10,000 tpmC, < 50 $/tpmC) Something managers can understand (!) Benchmark abuse Benchmarketing Benchmark wars more $ on ads than development To keep abuses to a minimum, Benchmarks are defined with precision and read like they are legal documents (example). Some companies include specific prohibitions against publishing benchmark results in their license agreements
  • 5. Benchmarks have a Lifetime Good benchmarks drive industry and technology forward. At some point, all reasonable advances have been made. Benchmarks can become counter productive by encouraging artificial optimizations. So, even good benchmarks become obsolete over time.
  • 6. Database Benchmarks Relational Database (OLTP) Benchmarks TPC = Transaction Processing Performance Council De facto industry standards body for OLTP performance Most TPC specs, info, results are on the web page: http://www.tpc.org TPC-C has been the workhorse of the industry, more in a minute TPC-E is more comprehensive Different problem spaces require different benchmarks Other benchmarks for analytics / decision support systems Two papers referenced on the course website on NoSQL / MapReduce Benchmarks define the problem set, not the technology E.g., if managing documents, create and use a document management benchmark, not one that was created to show off the capabilities of your DB.
  • 7. TPC-C’s Five Transactions Workload Definition Transactions operate against a database of nine tables Transactions: New-order: enter a new order from a customer Payment: update customer balance to reflect a payment Delivery: deliver orders (done as a batch transaction) Order-status: retrieve status of customer’s most recent order Stock-level: monitor warehouse inventory Specifies size of each table Specifies # of users and workflow (next slide) Specifies configuration requirements must be ACID, failure tolerant, distributed, … Response time requirement: 90% of each type of transaction must have a response time <= 5 seconds, except stock-level which is <= 20 seconds. Result: How many TPC-C transactions can be supported? What is the $/tpm cost
  • 8. TPC-C Workflow 1 Select txn from menu: 1. New-Order 45% 2. Payment 43% 3. Order-Status 4% 4. Delivery 4% 5. Stock-Level 4% Cycle Time Decomposition (typical values, in seconds, for weighted average txn) Menu = 0.3 Keying = 9.6 Txn RT = 2.1 Think = 11.4 Average cycle time = 23.4 2 Measure menu Response Time Input screen Keying time 3 Measure txn Response Time Output screen Think time Go back to 1
  • 9. TPC-C Results (by DBMS, as of 5/9/97) Stating the obvious… These results are not a comparison of databases They are a comparison of databases for the specific problem specified by the TPC-C benchmark Ensuring a level playing field is essential when defining a benchmark and conducting measurements Witness the Pavlo/Dean debate
  • 10. Benchmarks for Other Databases Class Discussion at Next Meeting. What benchmarks are appropriate for Key-value stores? Document databases? Network databases? Geospatial databases? Genomic databases? Time series databases? Other? General discussion, no bonus points Please let me know if I may call on you, and for which?
  • 11. Overview of Query Execution
  • 12. An example to work with But first we must revisit Relational Algebra… Database: City, Country, CountryLanguage database. Example query: All cities in Finland with a population at least double of Aruba SELECT [xyz] FROM City, Country WHERE City.CountryCode = 'fin' AND Country.Code = 'abw' AND City.population > 2*Country.population;
  • 13. Relational Operators Selection Basics Idempotent Commutative Selection Conjunctions Useful when pruning Selection Disjunctions Equivalent to UNIONS
  • 14. Selection and Cross Product When Selection is followed by a Cross Product, for A(R  S), Break A into three conditions such that A = r⋀ s⋀rs where r only has the set of attributes only in R s only has the set of attributes only in S rs, has the set of attributes in both R and S Then, the following holds: A(R  S) = r⋀ s⋀ rs(R  S) = rs(r(R)  s(S)) In case you forgot… R ⋈A S = A(R  S) This result helps us compute Theta-joins! Review Chapter 2 of the textbook for more; back to the example…
  • 15. An example to work with Database: City, Country, CountryLanguage database. Example query: All cities in Finland with a population at least double of Aruba SELECT [xyz] FROM City, Country WHERE City.CountryCode = 'fin' AND Country.Code = 'abw' AND City.population > 2*Country.population; Algebra Representation xyz((T.cc = 'fin' ⋀ Y.cc = 'abw' ⋀ T.pop > 2*Y.pop) (T  Y)), or continued…
  • 16. Example: Algebra Manipulation Algebra Representation xyz((T.cc = 'fin' ⋀ Y.cc = 'abw'⋀T.pop > 2*Y.pop) (T  Y)), or xyz( ( T.pop > 2*Y.pop) ( (T.cc = 'fin' ) (T)   (Y.cc= 'abw' ) (Y) ) Graphical Representation of Plan
  • 17. Visualizing Plan Execution The plan is a set of ‘operators’ The operators operate in parallel On different machines? On different processors? In different processes? In different threads? Yes, depends on the architecture. Each operator feeds its input to the next operator The “parallel operators” visualization allows for pipelining The output of one operator is the input to the next A operator can block if its inputs are not ready Design goal is for the operators to pipeline (if possible) Would like to start operating with partial data Takes advantage of as much parallelism as the problem allows
  • 18. Common Elements Key metrics of each component: How much RAM does it consume? How much Disk I/O does it require? Each component is implemented as an Iterator Base class for each operator. Three methods: Open(). May block if Input is not ready Unable to proceed till all data has been received GetNext(). Returns the next tuple. May block if the next tuple is not ready Returns NotFound when exhausted Close() Performs any cleanup and terminates
  • 19. Example: Table-scan operator Open(): pass GetNext(): for b in blocks: for t in tuples of b: if valid t: return t return NotFound Close(): pass Key Metrics: RAM: 1 block Disk I/O: Number of blocks Notes: Represents the operations T(=City) and Y(=Country) Used only if appropriate indexes don’t exist Can use prefetching Not shown here
  • 20. Summary so far Benchmarks are critical for defining performance goals of the database TPC-C is a widely-used benchmark, TPC-E is broader in scope but less widespread Need to choose benchmarks to fit the problem at hand A query can be parsed into primitives for execution Parallelism & pipelining are essential for performance
  • 21. CS-542 Database Management Systems Query Execution Algorithms
  • 22. One-pass Algorithms Lend themselves nicely to pipelining (with minimum blocking) Good for Table-scans (as seen) Tuple-at-a-time operations (selection and projection) Full-relation binary operations (∪, ∩, -, ⋈, ) as long as one of the operands can fit in memory Considering JOIN next, read others from book
  • 23. Open(): read S into memory GetNext(): for b in blocks of R: for t in tuples of b: if t matches tuple s: return join (t,s) return NotFound Close(): pass Example: JOIN (R,S) Key Metrics: RAM: Blocks(S) + 1 block Disk I/O: Blocks(R) + Blocks(S) Notes: Can use prefetching for R Not shown here
  • 24. Nested-Loop Joins What if all of S won’t fit into memory? We can do it chunk-by-chunk, a ‘chunk’ is as many blocks of S that will fit Algorithm sketch: (I/O operations shown in bold) GetNext(): for c in chunks of S: for b in blocks of R: for t in tuples of b: for s in tuples of c: return join(t,s) return NotFound Key Metrics RAM: M Disk I/O: Blocks(S) + k * Blocks(R) where k = (size(S)/#chunks) Note how quickly performance deteriorates! We can do better
  • 25. Two-pass algorithms Sort-based two-pass algorithms The first pass does a sort on some parameter(s) of each operand The second pass algorithm relies on the sort results and can be pipelined Hash-based two-pass algorithms Do a prep-pass and write the result back to disk Compute the result in the second pass
  • 26. Two-pass idea: sort example For each of C chunks of M blocks, sort each chunk and write it back In the example, we have 4 chunks, each 6 blocks Merge the result Key Metrics For the first pass: RAM: M Disk I/O: 2 * Blocks(R) For the 2nd pass: RAM: C Disk I/O: Blocks(R)
  • 27. Naïve two-pass JOIN Sort R and S on the common attributes of the JOIN Merge the sorted R and S on the common attributes See section 15.4.9 of book for more details Also known as Sort-Join Key Metrics Sort RAM: M Disk I/O: 4 * (Blocks(R) + Blocks(S)) 4, not 3 because we wrote the sort results back Join RAM: 2 Disk I/O: (Blocks(R) + Blocks(S)) Total Operation RAM: M Disk I/O: 5 * (Blocks(R) + Blocks(S))
  • 28. Efficient two-pass JOIN Key Metrics Sort (only pass 1) RAM: M Disk I/O: 2 * (Blocks(R) + Blocks(S)) Join RAM: 2 Disk I/O: None additional (Blocks(R) + Blocks(S)) Total Operation RAM: M Disk I/O: 3 * (Blocks(R) + Blocks(S)) Main idea: Combine pass 2 of the sort with join
  • 29. Hash Join Main Idea: Pass 1: Dividetuples in R and S into m hash buckets Read a block of R (or S) For each tuple in that block, find its hash i and move it to hash bucket i. Keep one block for each hash bucket in memory Write it out to disk when full Pass 2: For each i Read buckets Ri and Si and do their join. Key Metrics RAM: M Disk I/O: 3 * (Blocks(R) + Blocks(S)) Disk I/O can be less if: Hash the bigger relation first Expect that many of the buckets will still be in memory
  • 30. Index-based Algorithms Refresher course on indexes and clustering The basic idea: Use the index to locate records and thus cut down on I/O
  • 31. Index-based Selection If the relation T has a clustering index on cc, All tuples will be contiguous Disk I/O: Blocks(T)/V(T, 'fin') Where V(T,cc) is the number of tuples with cc = 'fin‘ Sort of… If the relation T does not have a clustering index on cc, Tuples could be scattered Disk I/O: Tuples(T)/V(T, 'fin') Big difference! Consider the selection  (T.cc= 'fin' ) (T)
  • 32. Index-based JOIN If, say, R has an index on Y, Same as a two-pass JOIN except that we don’t have to first sort/hash on R If clustering index, Disk I/O, Blocks(R)/V(R,Y) + 3 * Blocks(S) Otherwise, Tuples(R)/V(R,Y) + 3 * Blocks(S) If both R and S are indexed, Disk I/O is reduced even further Consider the JOIN R(X,Y) ⋈ S(Y,Z), where Y is the common set of attributes of R and S
  • 33. Summary Execution primitives forpipelining One-pass algorithms should be used wherever possible Two-pass algorithms can usually be used no matter how big the problem Indexes help and should be taken advantage of where possible
  • 34. Query Optimization Based on slides from Prof. Garcia-Molina
  • 35. Desired Endpoint  x=1 AND y=2 AND z<5 (R) R ⋈ S ⋈ U Example Physical Query Plans two-pass hash-join 101 buffers Filter(x=1 AND z<5) materialize IndexScan(R,y=2) two-pass hash-join 101 buffers TableScan(U) TableScan(R) TableScan(S)
  • 36. Outline Convert SQL query to a parse tree Semantic checking: attributes, relation names, types Convert to a logical query plan (relational algebra expression) deal with subqueries Improve the logical query plan use algebraic transformations group together certain operators evaluate logical plan based on estimated size of relations Convert to a physical query plan search the space of physical plans choose order of operations complete the physical query plan
  • 37. Improving the Logical Query Plan There are numerous algebraic laws concerning relational algebra operations By applying them to a logical query plan judiciously, we can get an equivalent query plan that can be executed more efficiently Next we'll survey some of these laws
  • 38. Relational Operators (revisited) Selection Basics Idempotent Commutative Selection Conjunctions Useful when pruning Selection Disjunctions Equivalent to UNIONS
  • 39. Laws Involving Selection Selections usually reduce the size of the relation Usually good to do selections early, i.e., "push them down the tree" Also can be helpful to break up a complex selection into parts
  • 40. Selection and Binary Operators Must push selection to both arguments: C (R U S) = C (R) U C (S) Must push to first arg, optional for 2nd: C (R - S) = C (R) - S C (R - S) = C (R) - C (S) Push to at least one arg with all attributes mentioned in C: product, natural join, theta join, intersection e.g., C (R X S) = C (R) X S, if R has all the attributes in C
  • 41. Pushing Selection Up the Tree Suppose we have relations StarsIn(title,year,starName) Movie(title,year,len,inColor,studioName) and a view CREATE VIEW MoviesOf1996 AS SELECT * FROM Movie WHERE year = 1996; and the query SELECT starName, studioName FROM MoviesOf1996 NATURAL JOIN StarsIn;
  • 42. The Straightforward Tree Remember the rule C(R ⋈S) = C(R) ⋈S ? starName,studioName year=1996 StarsIn Movie
  • 43. The Improved Logical Query Plan starName,studioName starName,studioName starName,studioName year=1996 year=1996 year=1996 year=1996 StarsIn StarsIn Movie StarsIn Movie push selection up tree push selection down tree Movie
  • 44. Laws Involving Projections Adding a projection lower in the tree can improve performance, since often tuple size is reduced Usually not as helpful as pushing selections down Consult textbook for details, will not be on the exam
  • 45. Joins and Products Recall from the definitions of relational algebra: R ⋈C S = C (R X S) (theta join) where C equates same-name attributes in R and S To improve a logical query plan, replace a product followed by a selection with a join Join algorithms are usually faster than doing product followed by selection
  • 46. Summary of LQP Improvements Selections: push down tree as far as possible if condition is an AND, split and push separately sometimes need to push up before pushing down Projections: can be pushed down (sometimes, read book) Selection/product combinations: can sometimes be replaced with join
  • 47. Outline Convert SQL query to a parse tree Semantic checking: attributes, relation names, types Convert to a logical query plan (relational algebra expression) deal with subqueries Improve the logical query plan use algebraic transformations group together certain operators evaluate logical plan based on estimated size of relations Convert to a physical query plan search the space of physical plans choose order of operations complete the physical query plan
  • 48. Grouping Assoc/Comm Operators Group together adjacent joins, adjacent unions, and adjacent intersections as siblings in the tree Sets up the logical QP for future optimization when physical QP is constructed: determine best order for doing a sequence of joins (or unions or intersections) U D E F U D E F U A B C A B C
  • 49. Evaluating Logical Query Plans The transformations discussed so far intuitively seem like good ideas But how can we evaluate them more scientifically? Estimate size of relations, also helpful in evaluating physical query plans Coming up next…
  • 50. CS-542 Database Management Systems Plan Estimation, based on slides from Prof. Garcia-Molina
  • 51. Estimating Sizes of Relations Used in two places: to help decide between competing logical query plans to help decide between competing physical query plans Notation review: T(R): number of tuples in relation R B(R): minimum number of blocks needed to store R So far, we’ve spelled it out Blocks(R) V(R,a): number of distinct values in R of attribute a
  • 52. Requirements for Estimation Rules Give accurate estimates Are easy (fast) to compute Are logically consistent: estimated size should not depend on how the relation is computed Here describe some simple heuristics. All we really need is a scheme that properly ranks competing plans.
  • 53. Estimating Size of Selection (p1) Suppose selection condition is A = c, where A is an attribute and c is a constant. A reasonable estimate of the number of tuples in the result is: T(R)/V(R,A), i.e., original number of tuples divided by number of different values of A Good approximation if values of A are evenly distributed Also good approximation in some other, common, situations (see textbook)
  • 54. Estimating Size of Selection (p2) If condition is A < c: a good estimate is T(R)/3; intuition is that usually you ask about something that is true of less than half the tuples If condition is A ≠ c: a good estimate is T(R ) If condition is the AND of several equalities and inequalities, estimate in series.
  • 55. Example Consider relation R(a,b,c) with 10,000 tuples and 50 different values for attribute a. Consider selecting all tuples from R with a = 10 and b < 20. Estimate of number of resulting tuples is 10,000*(1/50)*(1/3) = 67.
  • 56. Estimating Size of Selection (p3) If condition has the form C1 OR C2, use: sum of estimate for C1 and estimate for C2, unless that sum is > T(R) and the previous , or assuming C1 and C2 are independent, T(R)*(1  (1f1)*(1f2)), where f1 is fraction of R satisfying C1and f2is fraction of R satisfying C2
  • 57. Example Consider relation R(a,b) 10,000 tuples and 50 different values for a. Consider selecting all tuples from R with a = 10 or b < 20. Estimate Estimate for a = 10 is 10,000/50 = 200 Estimate for b < 20 is 10,000/3 = 3333 Estimate for combined condition is 200 + 3333 = 3533 or 10,000*(1  (1  1/50)*(1  1/3)) = 3466 Different, but not really
  • 58. Estimating Size of Natural Join Assume join is on a single attribute Y. Some possibilities: R and S have disjoint sets of Y values, so size of join is 0 Y is the key of S and a foreign key of R, so size of join is T(R) All the tuples of R and S have the same Y value, so size of join is T(R)*T(S) We need some assumptions…
  • 59. Join Estimation Rule Expected number of tuples in result is T(R)*T(S) / max(V(R,Y),V(S,Y)) Why? Suppose V(R,Y) ≤ V(S,Y). There are T(R) tuples in R. Each of them has a 1/V(S,Y) chance of joining with a given tuple of S, creating T(S)/V(S,Y) new tuples
  • 60. Example Suppose we have R(a,b) with T(R) = 1000 and V(R,b) = 20 S(b,c) with T(S) = 2000, V(S,b) = 50, and V(S,c) = 100 U(c,d) with T(U) = 5000 and V(U,c) = 500 What is the estimated size of R ⋈S ⋈U? First join R and S (on attribute b): estimated size of result, X, is T(R)*T(S)/max(V(R,b),V(S,b)) = 40,000 number of values of c in X is the same as in S, namely 100 Then join X with U (on attribute c): estimated size of result is T(X)*T(U)/max(V(X,c),V(U,c)) = 400,000
  • 61. Summary of Estimation Rules Projection: exactly computable Product: exactly computable Selection: reasonable heuristics Join: reasonable heuristics The other operators are harder to estimate…
  • 62. Estimating Size Parameters Estimating the size of a relation depended on knowing T(R) and V(R,a)'s Estimating cost of a physical algorithm depends on also knowing B(R). How can the query compiler learn them? Scan relation to learn T, V's, and then calculate B Can also keep a histogram of the values of attributes. Makes estimating join results more accurate Recomputed periodically, after some time or some number of updates, or if DB administrator thinks optimizer isn't choosing good plans
  • 63. Heuristics to Reduce Cost of LQP For each transformation of the tree being considered, estimate the "cost" before and after doing the transformation At this point, "cost" only refers to sizes of intermediate relations (we don't yet know about number of disk I/O's) Sum of sizes of all intermediate relations is the heuristic: if this sum is smaller after the transformation, then incorporate it
  • 64. Why couldn’t we… A few questions to explore NoSQL has also been described as NoJOIN Could we use the techniques discussed here to implement JOINs on a NoSQL database? Could we implement the parallel operators as MapReduce jobs? Suitable topics in case you have not yet chosen a project
  • 65. Update on Projects Consider includingbenchmark results in your presentation There is no need to submit your code Key fragments can be included in your report, as seen in numerous papers Do include design of the code in your report Do not submit code. It will not be evaluated Pace yourself Plan to finish up your project coding in 2 weeks (by 4/4) Plan to write and perfect your report and PPT after that Budget your presentation time carefully. How is it going?
  • 66. Next week Query Optimization Suggested topic? We have half-a-lecture open to cover any topics of interest to everyone