Professional Documents
Culture Documents
A cost-based query optimizer works as follows: First, it generates all possible query execution plans.
Next, the cost of each plan is estimated. Finally, based on the estimation, the plan with the lowest
estimated cost is chosen. Since the decision is made using estimated cost values, the plan chosen may
actually not be optimal. The quality of optimizer decisions depends on the complexity and accuracy of
cost functions used. It includes different techniques such as use of dynamic programming for deciding
best plan. Its main drawback is that it is very costly. As a result most of the optimizers do not employ
this strategy. A cost estimation technique is so that a cost may be assigned to each plan in the search
space. Intuitively, this is an estimation of the resources needed for the execution of the plan.
1. Generates all possible query execution plans and then cost is
Calculate
2. Quality depends on complexity and accuracy of cost Function.
Query Optimization
We divide the query optimization into two types: Heuristic (sometimes called Rule
based) and Systematic (Cost based).
Heuristic Query Optimization
SSN BDATE
ADDRESS
S SALARY
30000
40000
25000
43000
38000
25000
25000
55000
WORKS_ON TABLE:
DNAME
DNUMBER
MGRSSN MGRSTARTD
--------------- --------- --------- --------HEADQUARTERS
1 888665555 19-JUN-71
ADMINISTRATION
4 987654321 01-JAN-85
RESEARCH
5 333445555 22-MAY-78
PROJECT TABLE:
PNAME
PNUMBER
---------------- ------ProductX
1
ProductY
2
ProductZ
3
Computerization
10
Reorganization
20
NewBenefits
30
PLOCATION
---------Bellaire
Sugarland
Houston
Stafford
Houston
Stafford
DNUM
---5
5
5
4
1
4
ESSN
--------123456789
123456789
666884444
453453453
453453453
333445555
333445555
333445555
333445555
999887777
999887777
987987987
987987987
987654321
987654321
888665555
PNO
--1
2
3
1
2
2
3
10
20
30
10
10
30
30
20
20
HOURS
----32.5
7.5
40.0
20.0
20.0
10.0
10.0
10.0
10.0
30.0
10.0
35.0
5.0
20.0
15.0
null
Note the two cross product operations. These require lots of space and time
(nested loops) to build.
After the two cross products, we have a temporary table with 144 records (6
projects * 3 departments * 8 employees).
An overall rule for heuristic query optimization is to perform as many select
and project operations as possible before doing any joins.
There are a number of transformation rules that can be used to transform a
query:
1. Cascading selections. A list of conjunctive conditions can be broken up
into separate individual conditions.
2. Commutativity of the selection operation.
3. Cascading projections. All but the last projection can be ignored.
Just looking at the Syntax of the query may not give the whole picture - need to
look at the data as well.
Several Cost components to consider:
1. Access cost to secondary storage (hard disk)
2. Storage Cost for intermediate result sets
3. Computation costs: CPU, memory transfers, etc. for performing inmemory operations.
4. Communications Costs to ship data around a network. e.g., in a
distributed or client/server database.
Of these, Access cost is the most crucial in a centralized DBMS. The more
work we can do with data in cache or in memory, the better.
Access Routines are algorithms that are used to access and aggregate data in a
database.
An RDBMS may have a collection of general purpose access routines that can
be combined to implement a query execution plan.
We are interested in access routines for selection, projection, join and set
operations such as union, intersection, set difference, cartesian product, etc.
As with heuristic optimization, there can be many different plans that lead to
the same result.
In general, if a query contains n operations, there will be n! possible plans.
However, not all plans will make sense. We should consider:
Perform all simple selections first
Perform joins next
Perform projection last
Overview of the Cost Based optimization process
1. Enumerate all of the legitimate plans (call these P1...Pn) where each plan
contains a set of operations O1...Ok
2. Select a plan
3. For each operation Oi in the plan, enumerate the access routines
4. For each possible Access routine for Oi, estimate the cost
Select the access routine with the lowest cost
5. Repeat previous 2 steps until an efficient access routine has been
selected for each operation
Sum up the costs of each access routine to determine a total cost for the
plan
6. Repeat steps 2 through 5 for each plan and choose the plan with the
lowest total cost.
Example outline: Assume 3 operations (one projection, two selections and a
join) : P1 S1 S2 and J1
In general, perform the selections first, and then the join and finally the
projection