Hey friends, here is my "query tree" assignment. :-) I have searched a lot to get this master piece :p and I can guarantee you that this one gonna help you In Sha ALLAH more than any else document on the subject. Have a good day :-)
2. Query Trees
• A query tree is a tree structure that corresponds to a relational algebra expression such that:
Each leaf node represents an input relation.
Each internal node represents a relation obtained by applying one relational operator to
its child nodes.
The root relation represents the answer to the query.
• It specifies what operations to apply, and the order to apply them, but not how to actually
implement the operations.
• A logical query tree does not select a particular algorithm to implement each relational
operator.
• Two query trees are equivalent if their root relations are the same (query result).
• A query tree may have different execution plans.
• Some query trees and plans are more efficient to execute than others.
Representation
• Query tree is a representation of a select statement.
Canonical form of select.
• Order of operation in canonical rep.:
Project
Select
Join
The Parser
The role of the parser is to convert an SQL statement represented as a string of characters
into a parse tree.
A parse tree consists of nodes, and each node is either an:
Atom - lexical elements such as words (WHERE), attribute or relation names, constants,
operator symbols, etc.
Syntactic category - are names for query subparts.
E.g. <SFW> represents a query in select-from-where form.
Nodes that are atoms have no children. Nodes that correspond to categories have children
based on one of the rules of the grammar for the language.
3. General Example:
Parse Trees to Logical Query Trees
Converting Simplest Parse Trees to Logical Query Trees
The simplest parse tree to convert is one where there is only one select-from-where (<SFW>)
construct, and the <Condition> construct has no nested queries.
The logical query tree produced consists of:
The cross-product (×) of all relations mentioned in the <FromList> which are inputs to:
A selection operator, σC, where C is the <Condition> expression in the construct being
replaced which is the input to:
A projection, πL, where L is the list of attributes in the <SelList>
5. Converting Nested Parse Trees to Logical Query Trees
Converting a parse tree that contains a nested query is slightly more challenging. A nested query
may be correlated with the outside query if it must be re-computed for every tuple produced by
the outside query. Otherwise, it is uncorrelated, and the nested query can be converted to a non-
nested query using joins.
The nested subquery translation algorithm involves defining a tree from root to leaves as
follows:
Root node is a projection, πL, where L is the list of attributes in the <SelList> of the outer
query.
Child of root is a selection operator, σC, where C is the <Condition> expression in the
outer query ignoring the subquery.
The two-operand selection operator σ with left-child as the cross-product (×) of all relations
mentioned in the <FromList> of the outer query, and right child as the <Condition>
expression for the subquery.
The subquery itself involved in the <Condition> expression is translated to relational
algebra.
Example 3:
6. Uncorrelated
Now, we must remove the two-operand selection and replace it by relational algebra
operators.
Rule for replacing two-operand selection (uncorrelated):
Let R be the first operand, and the second operand is a <Condition> of the form t IN S. (S
is uncorrelated subquery.)
Replace <Condition> by the tree that is expression for S.
May require applying duplicate elimination if expression has duplicates.
Replace two-operand selection by one-argument selection, σC, where C is the condition
that equates each component of the tuple t to the corresponding attribute of relation S.
Give σC an argument that is the product of R and S.
7. So, example 3 becomes:
Correlated
Translating correlated subqueries is more difficult because the result of the subquery depends on
a value defined outside the query itself.
In general, correlated subqueries may require the subquery to be evaluated for each tuple of the
outside relation as an attribute of each tuple is used as the parameter for the subquery.
We will not study translation of correlated subqueries.
Distributed Query Optimization
Dynamic Approach
• The dynamic approach can be illustrated with the algorithm of Distributed INGRES.
• The objective function of the algorithm is to minimize a combination of both the
communication time and the response time.
• However, these two objectives may be conflicting. For instance, increasing communication
time (by means of parallelism) may well decrease response time.
• This query optimization algorithm ignores the cost of transmitting the data to the result
site.
8. • The algorithm also takes advantage of fragmentation, but only horizontal fragmentation is
handled for simplicity.
• Since both general and broadcast networks are considered, the optimizer takes into account
the network topology.
• In broadcast networks, the same data unit can be transmitted from one site to all the other
sites in a single transfer, and the algorithm explicitly takes advantage of this capability. For
example, broadcasting is used to replicate fragments and then to maximize the degree of
parallelism.
Static Approach
• Static approach can be illustrated with the algorithm of R*.
• This algorithm performs an exhaustive search of all alternative strategies in order to choose
the one with the least cost.
• Although predicting and enumerating these strategies may be costly, the overhead of
exhaustive search is rapidly amortized if the query is executed frequently.
• Two methods are supported for inter-site data transfers.
o Ship-whole. The entire relation is shipped to the join site and stored in a temporary
relation before being joined. If the join algorithm is merge join, the relation does
not need to be stored, and the join site can process incoming tuples in a pipeline
mode, as they arrive.
o Fetch-as-needed. The external relation is sequentially scanned, and for each tuple
the join value is sent to the site of the internal relation, which selects the internal
tuples matching the value and sends the selected tuples to the site of the external
relation. This method is equivalent to the semijoin of the internal relation with each
external tuple.
Semijoin-based Approach
• The semijoin-based approach can be illustrated with the algorithm of SDD-1 which takes
full advantage of the semijoin to minimize communication cost.
Hybrid Approach
• The hybrid query optimization technique using dynamic QEPs is general enough to
incorporate site and copy selection decisions.
9. • Several hybrid techniques have been proposed to optimize queries in distributed systems.
• They essentially rely on the following two-step approach:
o At compile time, generate a static plan that specifies the ordering of operations and
the access methods, without considering where relations are stored.
o At startup time, generate an execution plan by carrying out site and copy selection
and allocating the operations to the sites.