1. CS 542 Database Management Systems Query Execution J Singh March 21, 2011
2. This meeting Data Models for NoSQL Databases Preliminaries What are we shooting for? Reference Material for Benchmarks posted in blog Some slides from TPC-C SIGMOD ‘97 Presentation Query Execution Sort: Chapter 15 Join: Sections 16.1 – 16.4
3. Data Models forNoSQL Databases Class Discussion at Next Meeting. How would you represent many-to-many relationships? Also many-to-one and one-to-one. Cassandra. Brian Card MongoDB. AnniesDuctan Redis. Jonathan Glumac Google App Engine. Sahel Mastoureshgh Amazon SimpleDB. ZahidMian CouchDB. Robert Van Reenen 3-minute presentation (on 3/21) for 20 bonus points
4. What are we shooting for? Good benchmarks Define the playing field Set the performance agenda Measure release-to-release progress Set goals (e.g., 10,000 tpmC, < 50 $/tpmC) Something managers can understand (!) Benchmark abuse Benchmarketing Benchmark wars more $ on ads than development To keep abuses to a minimum, Benchmarks are defined with precision and read like they are legal documents (example). Some companies include specific prohibitions against publishing benchmark results in their license agreements
5. Benchmarks have a Lifetime Good benchmarks drive industry and technology forward. At some point, all reasonable advances have been made. Benchmarks can become counter productive by encouraging artificial optimizations. So, even good benchmarks become obsolete over time.
6. Database Benchmarks Relational Database (OLTP) Benchmarks TPC = Transaction Processing Performance Council De facto industry standards body for OLTP performance Most TPC specs, info, results are on the web page: http://www.tpc.org TPC-C has been the workhorse of the industry, more in a minute TPC-E is more comprehensive Different problem spaces require different benchmarks Other benchmarks for analytics / decision support systems Two papers referenced on the course website on NoSQL / MapReduce Benchmarks define the problem set, not the technology E.g., if managing documents, create and use a document management benchmark, not one that was created to show off the capabilities of your DB.
7. TPC-C’s Five Transactions Workload Definition Transactions operate against a database of nine tables Transactions: New-order: enter a new order from a customer Payment: update customer balance to reflect a payment Delivery: deliver orders (done as a batch transaction) Order-status: retrieve status of customer’s most recent order Stock-level: monitor warehouse inventory Specifies size of each table Specifies # of users and workflow (next slide) Specifies configuration requirements must be ACID, failure tolerant, distributed, … Response time requirement: 90% of each type of transaction must have a response time <= 5 seconds, except stock-level which is <= 20 seconds. Result: How many TPC-C transactions can be supported? What is the $/tpm cost
8. TPC-C Workflow 1 Select txn from menu: 1. New-Order 45% 2. Payment 43% 3. Order-Status 4% 4. Delivery 4% 5. Stock-Level 4% Cycle Time Decomposition (typical values, in seconds, for weighted average txn) Menu = 0.3 Keying = 9.6 Txn RT = 2.1 Think = 11.4 Average cycle time = 23.4 2 Measure menu Response Time Input screen Keying time 3 Measure txn Response Time Output screen Think time Go back to 1
9. TPC-C Results (by DBMS, as of 5/9/97) Stating the obvious… These results are not a comparison of databases They are a comparison of databases for the specific problem specified by the TPC-C benchmark Ensuring a level playing field is essential when defining a benchmark and conducting measurements Witness the Pavlo/Dean debate
10. Benchmarks for Other Databases Class Discussion at Next Meeting. What benchmarks are appropriate for Key-value stores? Document databases? Network databases? Geospatial databases? Genomic databases? Time series databases? Other? General discussion, no bonus points Please let me know if I may call on you, and for which?
12. An example to work with But first we must revisit Relational Algebra… Database: City, Country, CountryLanguage database. Example query: All cities in Finland with a population at least double of Aruba SELECT [xyz] FROM City, Country WHERE City.CountryCode = 'fin' AND Country.Code = 'abw' AND City.population > 2*Country.population;
13. Relational Operators Selection Basics Idempotent Commutative Selection Conjunctions Useful when pruning Selection Disjunctions Equivalent to UNIONS
14. Selection and Cross Product When Selection is followed by a Cross Product, for A(R S), Break A into three conditions such that A = r⋀ s⋀rs where r only has the set of attributes only in R s only has the set of attributes only in S rs, has the set of attributes in both R and S Then, the following holds: A(R S) = r⋀ s⋀ rs(R S) = rs(r(R) s(S)) In case you forgot… R ⋈A S = A(R S) This result helps us compute Theta-joins! Review Chapter 2 of the textbook for more; back to the example…
15. An example to work with Database: City, Country, CountryLanguage database. Example query: All cities in Finland with a population at least double of Aruba SELECT [xyz] FROM City, Country WHERE City.CountryCode = 'fin' AND Country.Code = 'abw' AND City.population > 2*Country.population; Algebra Representation xyz((T.cc = 'fin' ⋀ Y.cc = 'abw' ⋀ T.pop > 2*Y.pop) (T Y)), or continued…
17. Visualizing Plan Execution The plan is a set of ‘operators’ The operators operate in parallel On different machines? On different processors? In different processes? In different threads? Yes, depends on the architecture. Each operator feeds its input to the next operator The “parallel operators” visualization allows for pipelining The output of one operator is the input to the next A operator can block if its inputs are not ready Design goal is for the operators to pipeline (if possible) Would like to start operating with partial data Takes advantage of as much parallelism as the problem allows
18. Common Elements Key metrics of each component: How much RAM does it consume? How much Disk I/O does it require? Each component is implemented as an Iterator Base class for each operator. Three methods: Open(). May block if Input is not ready Unable to proceed till all data has been received GetNext(). Returns the next tuple. May block if the next tuple is not ready Returns NotFound when exhausted Close() Performs any cleanup and terminates
19. Example: Table-scan operator Open(): pass GetNext(): for b in blocks: for t in tuples of b: if valid t: return t return NotFound Close(): pass Key Metrics: RAM: 1 block Disk I/O: Number of blocks Notes: Represents the operations T(=City) and Y(=Country) Used only if appropriate indexes don’t exist Can use prefetching Not shown here
20. Summary so far Benchmarks are critical for defining performance goals of the database TPC-C is a widely-used benchmark, TPC-E is broader in scope but less widespread Need to choose benchmarks to fit the problem at hand A query can be parsed into primitives for execution Parallelism & pipelining are essential for performance
22. One-pass Algorithms Lend themselves nicely to pipelining (with minimum blocking) Good for Table-scans (as seen) Tuple-at-a-time operations (selection and projection) Full-relation binary operations (∪, ∩, -, ⋈, ) as long as one of the operands can fit in memory Considering JOIN next, read others from book
23. Open(): read S into memory GetNext(): for b in blocks of R: for t in tuples of b: if t matches tuple s: return join (t,s) return NotFound Close(): pass Example: JOIN (R,S) Key Metrics: RAM: Blocks(S) + 1 block Disk I/O: Blocks(R) + Blocks(S) Notes: Can use prefetching for R Not shown here
24. Nested-Loop Joins What if all of S won’t fit into memory? We can do it chunk-by-chunk, a ‘chunk’ is as many blocks of S that will fit Algorithm sketch: (I/O operations shown in bold) GetNext(): for c in chunks of S: for b in blocks of R: for t in tuples of b: for s in tuples of c: return join(t,s) return NotFound Key Metrics RAM: M Disk I/O: Blocks(S) + k * Blocks(R) where k = (size(S)/#chunks) Note how quickly performance deteriorates! We can do better
25. Two-pass algorithms Sort-based two-pass algorithms The first pass does a sort on some parameter(s) of each operand The second pass algorithm relies on the sort results and can be pipelined Hash-based two-pass algorithms Do a prep-pass and write the result back to disk Compute the result in the second pass
26. Two-pass idea: sort example For each of C chunks of M blocks, sort each chunk and write it back In the example, we have 4 chunks, each 6 blocks Merge the result Key Metrics For the first pass: RAM: M Disk I/O: 2 * Blocks(R) For the 2nd pass: RAM: C Disk I/O: Blocks(R)
27. Naïve two-pass JOIN Sort R and S on the common attributes of the JOIN Merge the sorted R and S on the common attributes See section 15.4.9 of book for more details Also known as Sort-Join Key Metrics Sort RAM: M Disk I/O: 4 * (Blocks(R) + Blocks(S)) 4, not 3 because we wrote the sort results back Join RAM: 2 Disk I/O: (Blocks(R) + Blocks(S)) Total Operation RAM: M Disk I/O: 5 * (Blocks(R) + Blocks(S))
28. Efficient two-pass JOIN Key Metrics Sort (only pass 1) RAM: M Disk I/O: 2 * (Blocks(R) + Blocks(S)) Join RAM: 2 Disk I/O: None additional (Blocks(R) + Blocks(S)) Total Operation RAM: M Disk I/O: 3 * (Blocks(R) + Blocks(S)) Main idea: Combine pass 2 of the sort with join
29. Hash Join Main Idea: Pass 1: Dividetuples in R and S into m hash buckets Read a block of R (or S) For each tuple in that block, find its hash i and move it to hash bucket i. Keep one block for each hash bucket in memory Write it out to disk when full Pass 2: For each i Read buckets Ri and Si and do their join. Key Metrics RAM: M Disk I/O: 3 * (Blocks(R) + Blocks(S)) Disk I/O can be less if: Hash the bigger relation first Expect that many of the buckets will still be in memory
30. Index-based Algorithms Refresher course on indexes and clustering The basic idea: Use the index to locate records and thus cut down on I/O
31. Index-based Selection If the relation T has a clustering index on cc, All tuples will be contiguous Disk I/O: Blocks(T)/V(T, 'fin') Where V(T,cc) is the number of tuples with cc = 'fin‘ Sort of… If the relation T does not have a clustering index on cc, Tuples could be scattered Disk I/O: Tuples(T)/V(T, 'fin') Big difference! Consider the selection (T.cc= 'fin' ) (T)
32. Index-based JOIN If, say, R has an index on Y, Same as a two-pass JOIN except that we don’t have to first sort/hash on R If clustering index, Disk I/O, Blocks(R)/V(R,Y) + 3 * Blocks(S) Otherwise, Tuples(R)/V(R,Y) + 3 * Blocks(S) If both R and S are indexed, Disk I/O is reduced even further Consider the JOIN R(X,Y) ⋈ S(Y,Z), where Y is the common set of attributes of R and S
33. Summary Execution primitives forpipelining One-pass algorithms should be used wherever possible Two-pass algorithms can usually be used no matter how big the problem Indexes help and should be taken advantage of where possible
35. Desired Endpoint x=1 AND y=2 AND z<5 (R) R ⋈ S ⋈ U Example Physical Query Plans two-pass hash-join 101 buffers Filter(x=1 AND z<5) materialize IndexScan(R,y=2) two-pass hash-join 101 buffers TableScan(U) TableScan(R) TableScan(S)
36. Outline Convert SQL query to a parse tree Semantic checking: attributes, relation names, types Convert to a logical query plan (relational algebra expression) deal with subqueries Improve the logical query plan use algebraic transformations group together certain operators evaluate logical plan based on estimated size of relations Convert to a physical query plan search the space of physical plans choose order of operations complete the physical query plan
37. Improving the Logical Query Plan There are numerous algebraic laws concerning relational algebra operations By applying them to a logical query plan judiciously, we can get an equivalent query plan that can be executed more efficiently Next we'll survey some of these laws
38. Relational Operators (revisited) Selection Basics Idempotent Commutative Selection Conjunctions Useful when pruning Selection Disjunctions Equivalent to UNIONS
39. Laws Involving Selection Selections usually reduce the size of the relation Usually good to do selections early, i.e., "push them down the tree" Also can be helpful to break up a complex selection into parts
40. Selection and Binary Operators Must push selection to both arguments: C (R U S) = C (R) U C (S) Must push to first arg, optional for 2nd: C (R - S) = C (R) - S C (R - S) = C (R) - C (S) Push to at least one arg with all attributes mentioned in C: product, natural join, theta join, intersection e.g., C (R X S) = C (R) X S, if R has all the attributes in C
41. Pushing Selection Up the Tree Suppose we have relations StarsIn(title,year,starName) Movie(title,year,len,inColor,studioName) and a view CREATE VIEW MoviesOf1996 AS SELECT * FROM Movie WHERE year = 1996; and the query SELECT starName, studioName FROM MoviesOf1996 NATURAL JOIN StarsIn;
42. The Straightforward Tree Remember the rule C(R ⋈S) = C(R) ⋈S ? starName,studioName year=1996 StarsIn Movie
43. The Improved Logical Query Plan starName,studioName starName,studioName starName,studioName year=1996 year=1996 year=1996 year=1996 StarsIn StarsIn Movie StarsIn Movie push selection up tree push selection down tree Movie
44. Laws Involving Projections Adding a projection lower in the tree can improve performance, since often tuple size is reduced Usually not as helpful as pushing selections down Consult textbook for details, will not be on the exam
45. Joins and Products Recall from the definitions of relational algebra: R ⋈C S = C (R X S) (theta join) where C equates same-name attributes in R and S To improve a logical query plan, replace a product followed by a selection with a join Join algorithms are usually faster than doing product followed by selection
46. Summary of LQP Improvements Selections: push down tree as far as possible if condition is an AND, split and push separately sometimes need to push up before pushing down Projections: can be pushed down (sometimes, read book) Selection/product combinations: can sometimes be replaced with join
47. Outline Convert SQL query to a parse tree Semantic checking: attributes, relation names, types Convert to a logical query plan (relational algebra expression) deal with subqueries Improve the logical query plan use algebraic transformations group together certain operators evaluate logical plan based on estimated size of relations Convert to a physical query plan search the space of physical plans choose order of operations complete the physical query plan
48. Grouping Assoc/Comm Operators Group together adjacent joins, adjacent unions, and adjacent intersections as siblings in the tree Sets up the logical QP for future optimization when physical QP is constructed: determine best order for doing a sequence of joins (or unions or intersections) U D E F U D E F U A B C A B C
49. Evaluating Logical Query Plans The transformations discussed so far intuitively seem like good ideas But how can we evaluate them more scientifically? Estimate size of relations, also helpful in evaluating physical query plans Coming up next…
51. Estimating Sizes of Relations Used in two places: to help decide between competing logical query plans to help decide between competing physical query plans Notation review: T(R): number of tuples in relation R B(R): minimum number of blocks needed to store R So far, we’ve spelled it out Blocks(R) V(R,a): number of distinct values in R of attribute a
52. Requirements for Estimation Rules Give accurate estimates Are easy (fast) to compute Are logically consistent: estimated size should not depend on how the relation is computed Here describe some simple heuristics. All we really need is a scheme that properly ranks competing plans.
53. Estimating Size of Selection (p1) Suppose selection condition is A = c, where A is an attribute and c is a constant. A reasonable estimate of the number of tuples in the result is: T(R)/V(R,A), i.e., original number of tuples divided by number of different values of A Good approximation if values of A are evenly distributed Also good approximation in some other, common, situations (see textbook)
54. Estimating Size of Selection (p2) If condition is A < c: a good estimate is T(R)/3; intuition is that usually you ask about something that is true of less than half the tuples If condition is A ≠ c: a good estimate is T(R ) If condition is the AND of several equalities and inequalities, estimate in series.
55. Example Consider relation R(a,b,c) with 10,000 tuples and 50 different values for attribute a. Consider selecting all tuples from R with a = 10 and b < 20. Estimate of number of resulting tuples is 10,000*(1/50)*(1/3) = 67.
56. Estimating Size of Selection (p3) If condition has the form C1 OR C2, use: sum of estimate for C1 and estimate for C2, unless that sum is > T(R) and the previous , or assuming C1 and C2 are independent, T(R)*(1 (1f1)*(1f2)), where f1 is fraction of R satisfying C1and f2is fraction of R satisfying C2
57. Example Consider relation R(a,b) 10,000 tuples and 50 different values for a. Consider selecting all tuples from R with a = 10 or b < 20. Estimate Estimate for a = 10 is 10,000/50 = 200 Estimate for b < 20 is 10,000/3 = 3333 Estimate for combined condition is 200 + 3333 = 3533 or 10,000*(1 (1 1/50)*(1 1/3)) = 3466 Different, but not really
58. Estimating Size of Natural Join Assume join is on a single attribute Y. Some possibilities: R and S have disjoint sets of Y values, so size of join is 0 Y is the key of S and a foreign key of R, so size of join is T(R) All the tuples of R and S have the same Y value, so size of join is T(R)*T(S) We need some assumptions…
59. Join Estimation Rule Expected number of tuples in result is T(R)*T(S) / max(V(R,Y),V(S,Y)) Why? Suppose V(R,Y) ≤ V(S,Y). There are T(R) tuples in R. Each of them has a 1/V(S,Y) chance of joining with a given tuple of S, creating T(S)/V(S,Y) new tuples
60. Example Suppose we have R(a,b) with T(R) = 1000 and V(R,b) = 20 S(b,c) with T(S) = 2000, V(S,b) = 50, and V(S,c) = 100 U(c,d) with T(U) = 5000 and V(U,c) = 500 What is the estimated size of R ⋈S ⋈U? First join R and S (on attribute b): estimated size of result, X, is T(R)*T(S)/max(V(R,b),V(S,b)) = 40,000 number of values of c in X is the same as in S, namely 100 Then join X with U (on attribute c): estimated size of result is T(X)*T(U)/max(V(X,c),V(U,c)) = 400,000
61. Summary of Estimation Rules Projection: exactly computable Product: exactly computable Selection: reasonable heuristics Join: reasonable heuristics The other operators are harder to estimate…
62. Estimating Size Parameters Estimating the size of a relation depended on knowing T(R) and V(R,a)'s Estimating cost of a physical algorithm depends on also knowing B(R). How can the query compiler learn them? Scan relation to learn T, V's, and then calculate B Can also keep a histogram of the values of attributes. Makes estimating join results more accurate Recomputed periodically, after some time or some number of updates, or if DB administrator thinks optimizer isn't choosing good plans
63. Heuristics to Reduce Cost of LQP For each transformation of the tree being considered, estimate the "cost" before and after doing the transformation At this point, "cost" only refers to sizes of intermediate relations (we don't yet know about number of disk I/O's) Sum of sizes of all intermediate relations is the heuristic: if this sum is smaller after the transformation, then incorporate it
64. Why couldn’t we… A few questions to explore NoSQL has also been described as NoJOIN Could we use the techniques discussed here to implement JOINs on a NoSQL database? Could we implement the parallel operators as MapReduce jobs? Suitable topics in case you have not yet chosen a project
65. Update on Projects Consider includingbenchmark results in your presentation There is no need to submit your code Key fragments can be included in your report, as seen in numerous papers Do include design of the code in your report Do not submit code. It will not be evaluated Pace yourself Plan to finish up your project coding in 2 weeks (by 4/4) Plan to write and perfect your report and PPT after that Budget your presentation time carefully. How is it going?
66. Next week Query Optimization Suggested topic? We have half-a-lecture open to cover any topics of interest to everyone