Dev Dives: Streamline document processing with UiPath Studio Web
Query optimization using case-based reasoning in ubiquitous environments
1. Query Optimization Using
Case-based Reasoning in
Ubiquitous Environments
Lourdes Angelica Martinez-Medina
Christophe Bibineau
Jose Luis Zevhinelli-Martini
2009 Mexican International Conference on Computer Science (ENC '09)
2011/05/16 - Ria Mae Borromeo
2. Introduction
Query Optimization
Rely on cost models that are dependent on metadata (statistics,
cardinality estimates)
Typically restricted to execution time estimation
Problem
There are computational environments where metadata
acquisition and support are expensive.
i.e. Ubiquitous environments
Proposed Solution
Query Optimization technique based on learning, particularly
case-based reasoning
2
3. Ubiquitous Environment
Integrates information from
different computational tools and
application
Characteristics
1. Heterogeneity ( )
• extensive range of computational
resources and electronic devices
• devices have different physical and logical
characteristics
2. Dynamicity ( )
• resources change continuously due to
mobility
• communication network properties and
the resources that interact with it vary
3
4. Ubiquitous Environment
3. Distribution ( )
• resources are distributed within a physical space thus information used by these
resources are also distributed
4. Autonomy ( )
• resources can change their availability status anytime
6. Physical Constraints ( )
• i.e.: processing and storage capability, energy consumption, location
7. Metadata lack ( )
• Constant changes --> Expensive maintenance --> No global schema
4
5. ill be available again. is composed by three phases: logical, global, and physical
s. Resources present physical lim- Logical and physical optimization phases are related to cen
Classical Query Optimization
ain their appropriate operation, e.g.
rage capability, energy consump-
tralized environments. Global optimization is required in
distributed environment. Figure 1 illustrates the optimization
ng others. A device or a process is phases of the typical optimization process.
e a task only if it counts with the
Evaluation cost models used
ational resources. It is convenient
for most of classical query
sk performance based on specific
optimization techniques are
he resource characteristics previ-
tightly tied to metadata
make difficult the acquisition and
use.
tadata like cardinality and statistics
alues. There is not a global schema
Each phase requires
utational environments, its mainte-
nsive different constant changes
due to the metadata types
and has different
ational environments metadata ac-
optimization objectives
ce is very expensive. Ubiquitous
must provide a set of methods to
m available resources. The proper-
ources in ubiquitous environment
s for query processing. Some of
Figure 1. Phases of the optimization process
metadata required for estimating
xecution plans (possible execution
5
esults of a query) as a result of
6. Classical Query Optimization
Logical Optimization
Aims to reduce the number of tuples combined as
intermediate results
Appropriate order for applying selection, projection and join
operators must be decided
Uses heuristics and metadata
Result:
Figure 2. Algebraic query trees
6
7. Classical Query Optimization
Global Optimization
Aims to minimize communication cost related to interactions
among resources and a set of views
Global optimizer: decides where to perform each part of the
execution tree
Result: new execution tree with communication operators
7
8. Classical Query Optimization
Physical Optimization
Aims to reduce disk access for retrieving requested data and
minimize execution time for executing query plans
Metadata related to execution context is required
Figure 2. Algebraic query trees
Algebraic query trees 8
timization Figure 3. Query execution plan
9. Contribution of the Paper
Proposes a query optimization technique for ubiquitous
environments
Allows query optimization according to user requirements
Query optimization based on learning
Goal: Improve or acquire new capabilities rom experience
related some specific tasks
9
10. Query Optimization Based on Learning
Learn from past experience!
Experience : the knowledge gained from a problem resolution
Learning : the acquisition of knowledge in order to improve the
behavior or to acquire new capabilities from previous
experiences
Machine Learning : a sub-discipline of AI that is in-charge of
designing and developing methods that allow computers to
automatically learn in order to improve or create specific
capabilities
10
11. Case-based Reasoning
Proposes a reasoning process that aims to solve new
problems using the experience gained when similar
problems are solved
Case minimum unit of reasoning
Problem Description
Solution
Set of annotations that
describe how the
solution was derived
11
12. consists of (i) a problem description, (ii) its correspondent
solution, and, (iii) a set of annotations that describe how s
Case-based Reasoning Process
the solution was derived. Case based reasoning has been t
formalized as a four-step process: retrieve, reuse, review and
retain [7].
(4) Store as a new (1) Get relevant cases
case in the memory
(2) Adjust the solution
(3) New solution must of the relevant case
be verified in the real to the problem
world (simulation)
Figure 4. Case-based reasoning process
12
13. Case-based Reasoning Adaptation to
Query Optimization
Adapts case-based reasoning to provide optimal execution plans
for new queries
Uses the knowledge acquired from experience to optimize and
execute similar queries
The solution is represented by the current execution plan:
1. Query
2. Problem
3. Case
4. Reasoning Process
13
14. to solve new The whereClause specifies the set of conditions (for data
milar problems
f reasoning. It 1. Query
selection and data combination or join) that must be verified
by the data to form part of the query result.
correspondent Figure 5 illustrates the model that we propose for repre-
describe how
Modular part of knowledge in the definition of and join operations are
senting a query. In a query, selection a problem & case
ning haspiece of knowledge that links amost frequent. the existing
The been the most important and problem with
use, cases and
review
selectClause
fromClause
whereClause
Query Representation (UML Diagram)
Figure 5. Query representation (UML diagram)
ss
14
15. 1. Query
Query Operation
Type
Select condition(atttexp, cnstexp)
Join condition(attrexp.a, attrexp.b)
Set of attributes
Specific Condition
Q = {O1, O2, O3, O4 }
SELECT Rest.nom
FROM Resto, Ville, Region
WHERE Region.nom = ‘RA’ O1
AND Resto.spec = ‘IT’ O2
AND Resto.vil = Ville.nom O3
AND Ville.numDep = Region.numDep O4
15
16. We propose the concept of operation family in ord
1. Query
group operations that include the same condition applie
the same attributes and for this reason, the same relat
Two operations ox and oy pertain to the same oper
Operation Family
family if they associated to asame operation families or join)
All queries are are of the set of type (selection
Used to group operations that include the same condition
involve thethe sameattributesand sameof them must pertain
applied to same attributes (each relations
theTwo operations Ox and Oy respectively). An operation fami
same data source are from the same operation family if:
represented as follows:
same operation type (selection or join)
same attributes
(1) R.an = {on | on = condition(R.an ,value)}
an attribute that pertains
The operation set
operations family is composed
by
R.an the relation R
to
operations set on with a condition of the
condition(R.an , value), where an is an attribute
16
17. of all possible comparison operators: Equal, EqualOrLower,
set. These operations are members of different operation the T p
Lower, GreaterOrEqual, Greater and Different. All the
families: R1.a1 , R2.a2 and R1.a3,R2.a4 . Equation (2) inclu
1. Query
queries are associated to a set of operation families. The
shows the operationa familiesQ is that are associated to each
Q defined by an operations
with
unde
whereClause of query simi
requi
operations in Q.
set. These operations are members of different operation solv
The whereClause ,of a query Q is defined by. an operations set Th
families: R1.a1 R2.a2 and R1.a3,R2.a4 Equation (2) within
of
(On) Q the {
shows = operation families Q that are ,associated to }
(2) R1.a1 , R2.a2 , R1.a3,R2.a4 R2.a4,R3.a5 each com
simil
These operations are members of different operation families
operations in Q. that
solve
Operation families associated to each operation in Q
Each different combination of operation families R.an of int exec
conforms a = { R1.a1 , R2.a2 , i.e. the class R2.a4,R3.a5 } by comp
(2) Q class description, R1.a3,R2.a4 , Cn defined chan
Class operation families in (3). The queries are classified in a
the Description (Cn) that2a
set ofEach different combinationoperation families mustR.an
Each different combination of of operation families
classes. execu
conform text
to conforms a class description, i.e. the class Cn defined by
this. chang
the operation families in (3). The queries are classified in a Figu
(3) Cn = { Rn.an , Rm.am , Rn.ap,Rm.aq , R2.a4,R3.a5 } 2)
set of classes. text e
composed of all queries that contain at least one operation
Figur
that (3)class=Cto is composed specified families that contain
The Cn {n Rn.an ,ofRm.amby Rn.ap,Rm.aq ,Qn
pertains each the , all queries
R2.a4,R3.a5 }
at least one operation that pertains to each of the specified
families as definedisin (4). Thisby all queries Qn Qn pertains
The class Cn composed means, a query that contain
17
18. The class Cn is composed by all queries Qn that contain at least one op
at least one operation that pertains to each of the specified families as defi
1. Query
families as defined in (4). This means, a query Qn pertains to the class Cn
selection C if and only ifpertains operation family family
to the class n operation o2 for all to operation that describes C
that Qn, pertains,to operationCnoif andnonlyQnto operation family is of
R2.a2 the Cn exists class
describes join the an operation o in if for all operation family
3 pertains such as this operation
operation is of the, form nofthe operation family n o4 Cn such pertains
F that describes C , exists an operation O in that as this
R2.a4,R1.a3 and the join operation .
to the operation the form of the operation family F (4) Qand Cn i
operation is of family R1.a1,R3.a6 . The operator n ∈
(4) Qnattribute (∀ Rn.an ∈ not) ∃ ((on ∈ Qn ) ∧ determine the
the ∈ Cn iff value are Cn important to (on ∈ Rn.an ))
Rn.an ))
operation family to which a specific operationVille
Relation R1 pertains,
Q = {O1, O2, O3, O4 }
the important knowledge is related to a1 the operation to
According
numReg
According to the query Q presented above, the selection operation o1 Fi
type and the attribute(s) included in the a2
SELECT Rest.nom operation. The
spec p
FROM o1 pertains to operation family
operationResto, Ville, Region R3.a5 , the nom
a3
operation families ‘RA’
WHERE Region.nom = described before make a4
O1
up a class a).
vil
Any Resto.spec = ‘IT’composed by operations that pertain
AND query that is O2
AND Resto.vil = Ville.nom Relation R2 Resto
toAND Ville.numDep = Region.numDep pertains to the same !!! b).
the families described before O3
a5
class
nom
O4
a6 num
a) C = { R3.a5 , R2.a2 , R2.a4,R1.a3 and R1.a1,R3.a6
b) q ∈ C iff (∀ Rn.an ∈ Cn )∃((on ∈q)∧(on ∈ Rn.an ))
18
19. computational resources consumed by the query and those
that are available at the moment that the new query will be
n
y
2. Problem
executed as well as in the optimization objective that can
changes each time the query is executed.
a 2) Problem: A problem is composed by a query, a con-
text execution representation, and an optimization objective.
Specifies an optimized query, optimization parameters and
measures illustrates to computational resources available of query
Figure 6 related the components of a problem.
execution
context
n query
d optimization
ns target
is
∈
Problem Representation (UML Diagram)
n Figure 6. Problem representation (UML diagram)
e 19
20. available memory, and remaining energy, among others.
Finally, the optimization objective indicates the resource or
2. Problem
set of resources that will be optimized, e.g. minimize energy
consumption. Figure 7 shows an example.
Figure 9
Context - representsFigure 7. An example ofcomputational
measure of the a problem resources
instance sol
available when the query is executed which is a
The set of touples that represent the instance of context de- projection,
Optimization Objectiveis: indicates{ the resource or set of data source
picted on Figure 7 - Context = <memory, 400>, <CPU,
resources75>, <energy, 70> } . Finally, the optimization objective
that will be optimized consumed
indicates the resource or resources from which their con- posed quer
sumption must be optimized.20 Typically, optimization means { <memory,
21. minimize the utilization of these resources. According to o
example, the optimization objective is minimize the memo
3. Case
consumption specified by F(memory).
3) Case: A case is composed of a query, a solution (que
plan) and a set of evaluation measures used to express t
Specifies an optimized query, the solution query. Figure query and t
optimization objective of a to solve the 8 illustrates
the measures related to computational resources that were
components of a case.
consumed by the query execution
query
solution
evaluation measures used to
express optimization objective
Case Representation (UML Diagram)
Figure 8. Case representation (UML diagram)
21
22. imization target to a set of measures collected during the query execution.
cribed as a set These measures are represented as couples of the form
that represents
ilable when the 3. Case
<attribute, value> and express the computational resources
(e.g. memory, CPU, or energy) consumed by the query
de CPU charge, execution. Figure 9 shows an example.
among others.
the resource or
minimize energy
Query - optimization target that hasof a case evaluated and solved
Figure 9. An example been
m Solution - physical execution plan that of this model. Such
Figure 9 presents a simple instance solves the query
instance solves the query Q by means of the query problem
which is an ordered and pertinent sequence of selection,
Evaluationprojection, sort, and join collected during query of
ce of context de- - set of measures operations for accessing a set execution
, 400>, <CPU, data sources. The set of touples representing the resources
22
zation objective consumed during the query evaluation applying the pro-
23. are solved. A case is the minimum unit of reasoning. It by t
consists of (i) a problem description, (ii) its correspondent F
solution, and, (iii) aReasoning Process
4. set of annotations that describe how sent
the solution was derived. Case based reasoning has been the
formalized as a four-step process: retrieve, Retrieval review and
reuse,
retain query class, query plan
[7].
Retention
* The
* Get relevant cases using a
similarity function
and consumption measures * If there is no relevant case in
are stored in form of a case the case base, a new query plan
within the case base must be psuedo-randomly
Retrieval generated to increase the query
optimizer knowledge
Retention Reuse
Reuse
* Adjust the solution of the
Review relevant case to the
* Execution plan is problem
verified Review * The matching processes
depends on the cases’
23 similarity
Figure 4. Case-based reasoning process
24. relevant case within the class must be retrieved by means
Similarity Function
of an intra-class similarity function [10][11]. When the most
relevant case is retrieved, a detailed comparison between the
clauses of the new query and the relevant query (the query
Inter-class Similarity Function
included by the relevant case) is carried out. This determines
* used to define membership of a query
a similarity level between the two queries.
These functions are based on the contrast model of
similarity proposed by Tversky [12] that allow us to
determine Intra-class Similarity Function
the similarity between two objects by means
* used to retrieve most relevant case
of a feature-matching function. Similarity increases as
most common features and decreases as most distinctive
Uses features [13]. The formalization of the original definition is
a feature-matching function
Similarity increases as most common features and decreases as
expressed as follows [12]:
most distinctive features
(5) S (a, b) = θf(A ∩ B) - αf(A - B) - βf(B - A)
Similarity between a and b, is defined in terms of the
24
25. ion families and as a decreasing function of distinctive families,
go- in other words, families that pertain to one query but not the
ific Inter-class similarity
other. The function can be applied to both classes, each one
ing defined by a set of operation families, or applied to a query
and a class. In this case, it is necessary to determine the
Increasing function of common operation families
mp- operation families related to the involved operations. The
Decreasing function of distinctive families
the formalization of this definition in terms of the similarity
Determine operation and a class is expressedinvolved operations
between a query families related to the as follows:
(6) S(C1 ,Q) = θ (C1 ∩ Q) - α (C1 -Q) - β (Q-C1 )
ase-
vantoperation families commonC C1 and Qis defined in terms of
Similarity between to and Q,
1
her-features that pertain tocommon to C and Q, C ∩ Q, the
operation families C1 only 1 1
features that pertain to Q only
on features that pertain to C1 but no to Q, C1 - Q, and those
em. that pertain to Q but no to C1 , Q - C1 . The function f
ase refers particularly to operation families . According to the
ble purpose of our work, these are the features that must be
the compared.
the 25
For practical purposes, suppose that we know the class
26. ble of the query q and the definition of the classes c1 and c2 .
wo purpose of our work, these are the features that must be
the compared.
tep
the
ost
Inter-class similarity
q For practical3 } c = { R.a1 , ∈ R.a2 ,weR.a3,R.a4 } class
= {o1 , o2 , o purposes, suppose that know the
ans of the{query ,q and the definition }of the classes c1 and c2 .
c1 = R.a1 R.a2 , R.a3,R.a4
wo c2 = { R.a1 , R.a2 , x }
ost
tep
the q = {o1 , o2 , o3 } c = { R.a1 , ∈ R.a2 , R.a3,R.a4 }
most
ery c1From R.a1 ,intersections between the query class c that
= { the R.a2 , R.a3,R.a4 }
ans describesR.a1 , query, q x }
nes c2 = { the R.a2 and the classes c1 and c2 , it is
most possible to state that the query class c is similar to c1 .
the Compute for intersections of C with C1 and C2
of
ery From the intersections between the query class c that
to
nes describes the 1 ∩Q)={ and ,the classes c1 and c2 , it is
S(c1 ,q) = (C query q R.a1 R.a2 , R.a3,R.a4 }
ans possible = state∩Q)={ R.a1 , R.a2 } c is similar to c1 .
S(c2 ,q) to (C 2 that the query class
as
of Query class C is similar to C1
ive
to B. Intra-class 1 ∩Q)={ R.a1 , R.a2 , R.a3,R.a4 }
S(c1 ,q) = (Csimilarity
n is
ans S(c2 ,q) = (C2 ∩Q)={ function aims to find the most similar
Intra-class similarity R.a1 , R.a2 }
as queries with respect to a new query, which is desired to
tive be Intra-class similarity same class. In this step, all the
B. optimized, within the
n is compared queries are defined exactly to find the most similar
Intra-class similarity function26aims by the same operations
(operation type and involved attributes), the is desired to
queries with respect to a new query, which difference is
27. queries with respect to a new query, which is desired to
be optimized, within the same class. In this step, all the
Intra-class Similarity
compared queries are defined exactly by the same operations
(operation type and involved attributes), the difference is
he related to the comparison operators, as well as the attribute
ain Aims to find the most similar two queries Q and to a is
values. Similarity between queries with respect Q2 new query
1
no
All defined as an increasing function ofoperationsoperations
compared queries have the same common
ve
Comparison operators or attribute values may differ and
(identical operations in terms of its type, attributes
of operators). The formalization of this definition is as follows:
de Increasing function of common operations
ies (7) S (Q1 , Q2 ) = θo(Q1 ∩ Q2 ) - αo(Q1 - Q2 ) - βo(Q1 - Q2 )
ity
Operations that are common to Q1 and features that pertain to
Q1 but not to Q2
!!"
Find the query that contains the maximum number of operation
mappings!
27
28. two main modules, the case-reasoner and the execution plan
ons in common, they differ in the operator
generator. The case-base reasoner is in charge of adapting
join operation. Also, q1 and q2 have two
Query Optimizer Architecture
the solutions of similar queries to the new situation. The ex-
mmon, they differ in the operator applied by
ecution plan generator is in charge of generating new query
n. Finally, q1 and q5 have only one operation
plans in a pseudo-aleatory way. The case-base reasoner is
ording to this analysis, q2 is the most similar
the most complex of the two modules but the smartest, on
ect to q1 because contains the maximum
the other hand, the execution plan generator is simpler and
Reutilizes the solutions related to queries that does not been solved
tion mappings. q5 is the most different query
probably faster; however it have apply machine learning
cause it contains the minimum number of
techniques. Figure 10 illustrates the optimizer architecture.
ngs. Generates new has exactly the
On the other hand, q1 solutions
f mappings with q3 and q4 . How can we
hese two queries is the most similar to q1 .
A.
vels Case-based Reasoner
level 1. Smart queries indicates which
between two Search Engine
levant query must be adapted. This adapta-
ormed 2. Adapter and Where clauses.
just over Select
lause, interesting attributes to be projected
3. Execution Manager
he Where clause, comparison operators or
ted to the variablesBase Manager the
4. Case can be modified. On
From clause can not be changed because the
ried can not be changed. Table I illustrates
arity levels. Here, selectClause is expresed
B. Execution Plan Generator
se as FC and whereClause as WC.
n must be performed for the similarity levels
). If the similarity level is (3) the From Figure 10. Optimizer architecture
ry clauses are equal, the adaptation must
28
n the select clause, which means that the
29. Case-based Reasoner
Adapts solutions of similar queries to the new situation
1. Smart Search Engine
• retrieves relevant cases
• applies Inter and Intra-class Similarity functions
• selects the query that minimizes the optimization parameters
2. Adapter
• adapts the query plan included in the relevant case to query
problem specifications
• used to facilitate and minimize the cost of the adaptation
process
29
30. Case-based Reasoner
3. Execution Engine
• tests the new query execution plan created by the adaptation
module
4. Case Base Manager
• allows to retain a new knowledge in form of a case
• similarity function is also used
30
This is basically the summary of the entire paper\n
\n
\n
\n
Result: algebraic query tree that optimizes the order in which operators must be applied. \nTree A is not the best plan because the selection operation is applied before the join.\nTree B is the optimal algebraic plan because all selection and projection operations are applied as soon as possible\n\n
In an ubiquitous environment, there are no global views because it&#x2019;s expensive!\n\n
Given: Algebraic Tree (from logical optimized) \nResult: All corresponding execution plans that specify the implementation of each algebraic operator\n\n
- Classical query optimization techniques typically generate execution plans that are optimized according to a single dimension, query execution time.\n- Useful knowledge must be obtained from previously executed queries and be managed and exploited by means of automatic learning techniques\n- GOAL: improve or acquire new capabilities from experience related to some specific tasks\n- Query evaluation time is no longer the main optimization objective\n
Given a new query Q, an existent query plan is retrieved if it can be adapted to Q. Also, it is required to verify if it is possible to accomplish its execution with the computational resources available at the moment of query execution (mem, CPU, energy)\n
\n
\n
\n
It is also necessary to pay attention on the computational resources consumed by the query and those that are available at the moment that the new query will be executed as well as in the optimization objective that can change each time the query is executed.\n
\n
\n
\n
The operator and the attribute value are not important to determine the operation family to which a specific operation pertains, the important knowledge is related to the operation type and the attribute(s) included in the operations\n
\n
\n
\n
\n
\n
Similarity of a and b is defined in terms of features common to a and b\nminus the features that pertain to a but not to b\nand those that pertain to b but not to a\ntheta, alpha, beta : non-negative valued parameters that determine the relative weight of the three components of similarity\n- provide the flexibility when modifying the importance of similarities or differences accdng to area of application\n