# Query trees

3 de Dec de 2018
1 de 9

### Query trees

• 1. DDBMS Assignment Topic: Query Trees Distributed Query Optimization Name: Shefa Idrees #101631049 Assignment Submitted to: BSCS fall-2016 Post Graduate College for Women Samnabad, Lahore
• 2. Query Trees • A query tree is a tree structure that corresponds to a relational algebra expression such that:  Each leaf node represents an input relation.  Each internal node represents a relation obtained by applying one relational operator to its child nodes.  The root relation represents the answer to the query. • It specifies what operations to apply, and the order to apply them, but not how to actually implement the operations. • A logical query tree does not select a particular algorithm to implement each relational operator. • Two query trees are equivalent if their root relations are the same (query result). • A query tree may have different execution plans. • Some query trees and plans are more efficient to execute than others. Representation • Query tree is a representation of a select statement.  Canonical form of select. • Order of operation in canonical rep.:  Project  Select  Join The Parser  The role of the parser is to convert an SQL statement represented as a string of characters into a parse tree.  A parse tree consists of nodes, and each node is either an:  Atom - lexical elements such as words (WHERE), attribute or relation names, constants, operator symbols, etc.  Syntactic category - are names for query subparts.  E.g. <SFW> represents a query in select-from-where form.  Nodes that are atoms have no children. Nodes that correspond to categories have children based on one of the rules of the grammar for the language.
• 3. General Example: Parse Trees to Logical Query Trees Converting Simplest Parse Trees to Logical Query Trees The simplest parse tree to convert is one where there is only one select-from-where (<SFW>) construct, and the <Condition> construct has no nested queries. The logical query tree produced consists of:  The cross-product (×) of all relations mentioned in the <FromList> which are inputs to:  A selection operator, σC, where C is the <Condition> expression in the construct being replaced which is the input to:  A projection, πL, where L is the list of attributes in the <SelList>
• 5. Converting Nested Parse Trees to Logical Query Trees Converting a parse tree that contains a nested query is slightly more challenging. A nested query may be correlated with the outside query if it must be re-computed for every tuple produced by the outside query. Otherwise, it is uncorrelated, and the nested query can be converted to a non- nested query using joins. The nested subquery translation algorithm involves defining a tree from root to leaves as follows:  Root node is a projection, πL, where L is the list of attributes in the <SelList> of the outer query.  Child of root is a selection operator, σC, where C is the <Condition> expression in the outer query ignoring the subquery.  The two-operand selection operator σ with left-child as the cross-product (×) of all relations mentioned in the <FromList> of the outer query, and right child as the <Condition> expression for the subquery.  The subquery itself involved in the <Condition> expression is translated to relational algebra. Example 3:
• 6. Uncorrelated  Now, we must remove the two-operand selection and replace it by relational algebra operators.  Rule for replacing two-operand selection (uncorrelated):  Let R be the first operand, and the second operand is a <Condition> of the form t IN S. (S is uncorrelated subquery.)  Replace <Condition> by the tree that is expression for S.  May require applying duplicate elimination if expression has duplicates.  Replace two-operand selection by one-argument selection, σC, where C is the condition that equates each component of the tuple t to the corresponding attribute of relation S.  Give σC an argument that is the product of R and S.
• 7. So, example 3 becomes: Correlated Translating correlated subqueries is more difficult because the result of the subquery depends on a value defined outside the query itself. In general, correlated subqueries may require the subquery to be evaluated for each tuple of the outside relation as an attribute of each tuple is used as the parameter for the subquery. We will not study translation of correlated subqueries. Distributed Query Optimization Dynamic Approach • The dynamic approach can be illustrated with the algorithm of Distributed INGRES. • The objective function of the algorithm is to minimize a combination of both the communication time and the response time. • However, these two objectives may be conflicting. For instance, increasing communication time (by means of parallelism) may well decrease response time. • This query optimization algorithm ignores the cost of transmitting the data to the result site.
• 8. • The algorithm also takes advantage of fragmentation, but only horizontal fragmentation is handled for simplicity. • Since both general and broadcast networks are considered, the optimizer takes into account the network topology. • In broadcast networks, the same data unit can be transmitted from one site to all the other sites in a single transfer, and the algorithm explicitly takes advantage of this capability. For example, broadcasting is used to replicate fragments and then to maximize the degree of parallelism. Static Approach • Static approach can be illustrated with the algorithm of R*. • This algorithm performs an exhaustive search of all alternative strategies in order to choose the one with the least cost. • Although predicting and enumerating these strategies may be costly, the overhead of exhaustive search is rapidly amortized if the query is executed frequently. • Two methods are supported for inter-site data transfers. o Ship-whole. The entire relation is shipped to the join site and stored in a temporary relation before being joined. If the join algorithm is merge join, the relation does not need to be stored, and the join site can process incoming tuples in a pipeline mode, as they arrive. o Fetch-as-needed. The external relation is sequentially scanned, and for each tuple the join value is sent to the site of the internal relation, which selects the internal tuples matching the value and sends the selected tuples to the site of the external relation. This method is equivalent to the semijoin of the internal relation with each external tuple. Semijoin-based Approach • The semijoin-based approach can be illustrated with the algorithm of SDD-1 which takes full advantage of the semijoin to minimize communication cost. Hybrid Approach • The hybrid query optimization technique using dynamic QEPs is general enough to incorporate site and copy selection decisions.
• 9. • Several hybrid techniques have been proposed to optimize queries in distributed systems. • They essentially rely on the following two-step approach: o At compile time, generate a static plan that specifies the ordering of operations and the access methods, without considering where relations are stored. o At startup time, generate an execution plan by carrying out site and copy selection and allocating the operations to the sites.