You are on page 1of 8

Cost Based Optimization :

A cost-based query optimizer works as follows: First, it generates all possible query execution plans.
Next, the cost of each plan is estimated. Finally, based on the estimation, the plan with the lowest
estimated cost is chosen. Since the decision is made using estimated cost values, the plan chosen may
actually not be optimal. The quality of optimizer decisions depends on the complexity and accuracy of
cost functions used. It includes different techniques such as use of dynamic programming for deciding
best plan. Its main drawback is that it is very costly. As a result most of the optimizers do not employ
this strategy. A cost estimation technique is so that a cost may be assigned to each plan in the search
space. Intuitively, this is an estimation of the resources needed for the execution of the plan.
1. Generates all possible query execution plans and then cost is
Calculate
2. Quality depends on complexity and accuracy of cost Function.

SELECT p.pname, d.dname FROM Patients p, Doctors


d WHERE p.doctor = d.dname AND d.dgender ='M'

Query Optimization
We divide the query optimization into two types: Heuristic (sometimes called Rule
based) and Systematic (Cost based).
Heuristic Query Optimization

Oracle calls this Rule Based optimization.


A query can be represented as a tree data structure. Operations are at the
interior nodes and data items (tables, columns) are at the leaves.
The query is evaluated in a depth-first pattern.
Consider this query from Elmasri/Navathe:

SELECT PNUMBER, DNUM, LNAME


FROM
PROJECT, DEPARTMENT, EMPLOYEE
WHERE DNUM=DNUMBER and MGRSSN=SSN and
PLOCATION = 'Stafford';

Or, in relational algebra:

on the following schema:


EMPLOYEE TABLE
FNAME
MI LNAME
SUPERSSN DNO
-------- -- ------------- -JOHN
B SMITH
333445555 5
FRANKLIN T WONG
888665555 5
ALICIA
J ZELAYA
987654321 4
JENNIFER S WALLACE
888665555 4
RAMESH
K NARAYAN
333445555 5
JOYCE
A ENGLISH
333445555 5
AHMAD
V JABBAR
987654321 4
JAMES
E BORG
1
DEPARTMENT TABLE:

SSN BDATE

ADDRESS

S SALARY

--------- --------- ------------------------- - ------ -123456789 09-JAN-55 731 FONDREN, HOUSTON, TX

30000

333445555 08-DEC-45 638 VOSS,HOUSTON TX

40000

999887777 19-JUL-58 3321 CASTLE, SPRING, TX

25000

987654321 20-JUN-31 291 BERRY, BELLAIRE, TX

43000

666884444 15-SEP-52 975 FIRE OAK, HUMBLE, TX

38000

453453453 31-JUL-62 5631 RICE, HOUSTON, TX

25000

987987987 29-MAR-59 980 DALLAS, HOUSTON, TX

25000

888665555 10-NOV-27 450 STONE, HOUSTON, TX

55000

WORKS_ON TABLE:

DNAME
DNUMBER
MGRSSN MGRSTARTD
--------------- --------- --------- --------HEADQUARTERS
1 888665555 19-JUN-71
ADMINISTRATION
4 987654321 01-JAN-85
RESEARCH
5 333445555 22-MAY-78
PROJECT TABLE:
PNAME
PNUMBER
---------------- ------ProductX
1
ProductY
2
ProductZ
3
Computerization
10
Reorganization
20
NewBenefits
30

PLOCATION
---------Bellaire
Sugarland
Houston
Stafford
Houston
Stafford

DNUM
---5
5
5
4
1
4

ESSN
--------123456789
123456789
666884444
453453453
453453453
333445555
333445555
333445555
333445555
999887777
999887777
987987987
987987987
987654321
987654321
888665555

Which of the following query trees is more efficient ?

The left hand tree is evaluated in steps as follows:

PNO
--1
2
3
1
2
2
3
10
20
30
10
10
30
30
20
20

HOURS
----32.5
7.5
40.0
20.0
20.0
10.0
10.0
10.0
10.0
30.0
10.0
35.0
5.0
20.0
15.0
null

The right hand tree is evaluated in steps as follows:

Note the two cross product operations. These require lots of space and time
(nested loops) to build.
After the two cross products, we have a temporary table with 144 records (6
projects * 3 departments * 8 employees).
An overall rule for heuristic query optimization is to perform as many select
and project operations as possible before doing any joins.
There are a number of transformation rules that can be used to transform a
query:
1. Cascading selections. A list of conjunctive conditions can be broken up
into separate individual conditions.
2. Commutativity of the selection operation.
3. Cascading projections. All but the last projection can be ignored.

4. Commuting selection and projection. If a selection condition only


involves attributes contained in a projection clause, the two can be
commuted.
5. Commutativity of Join and Cross Product.
6. Commuting selection with Join.
7. Commuting projection with Join.
8. Commutativity of set operations. Union and Intersection are
commutative.
9. Associativity of Union, Intersection, Join and Cross Product.
10. Commuting selection with set operations.
11. Commuting projection with set operations.
12. Logical transformation of selection conditions. For example, using
DeMorgan's law, etc.
13. Combine Selection and Cartesian product to form Joins.
These transformations can be used in various combinations to optimize queries.
Some general steps follow:
1. Using rule 1, break up conjunctive selection conditions and chain them
together.
2. Using the commutativity rules, move the selection operations as far
down the tree as possible.
3. Using the associativity rules, rearrange the leaf nodes so that the most
restrictive selection conditions are executed first. For example, an
equality condition is likely more restrictive than an inequality condition
(range query).
4. Combine cartesian product operations with associated selection
conditions to form a single Join operation.
5. Using the commutativity of Projection rules, move the projection
operations down the tree to reduce the sizes of intermediate result sets.
6. Finally, identify subtrees that can be executed using a single efficient
access method.

Example of Heuristic Query Optimization


1. Original Query Tree

2. Use Rule 1 to Break up Cascading


Selections

3. Commute Selection with Cross Product

4. Combine Cross Product and Selection to


form Joins

Systematic (Cost based) Query Optimization


Note: The following notes are based upon materials presented in the Connolly/Begg 3rd edition. The notations differ
between textbooks.

Just looking at the Syntax of the query may not give the whole picture - need to
look at the data as well.
Several Cost components to consider:
1. Access cost to secondary storage (hard disk)
2. Storage Cost for intermediate result sets
3. Computation costs: CPU, memory transfers, etc. for performing inmemory operations.
4. Communications Costs to ship data around a network. e.g., in a
distributed or client/server database.
Of these, Access cost is the most crucial in a centralized DBMS. The more
work we can do with data in cache or in memory, the better.
Access Routines are algorithms that are used to access and aggregate data in a
database.
An RDBMS may have a collection of general purpose access routines that can
be combined to implement a query execution plan.
We are interested in access routines for selection, projection, join and set
operations such as union, intersection, set difference, cartesian product, etc.
As with heuristic optimization, there can be many different plans that lead to
the same result.
In general, if a query contains n operations, there will be n! possible plans.
However, not all plans will make sense. We should consider:
Perform all simple selections first
Perform joins next
Perform projection last
Overview of the Cost Based optimization process
1. Enumerate all of the legitimate plans (call these P1...Pn) where each plan
contains a set of operations O1...Ok
2. Select a plan
3. For each operation Oi in the plan, enumerate the access routines
4. For each possible Access routine for Oi, estimate the cost
Select the access routine with the lowest cost
5. Repeat previous 2 steps until an efficient access routine has been
selected for each operation
Sum up the costs of each access routine to determine a total cost for the
plan
6. Repeat steps 2 through 5 for each plan and choose the plan with the
lowest total cost.
Example outline: Assume 3 operations (one projection, two selections and a
join) : P1 S1 S2 and J1
In general, perform the selections first, and then the join and finally the
projection

1. Enumerate the plans.


Note there are two orderings of selection that are possible so the two
plans become:
Plan A: S1 S2 J1 P1
Plan B: S2 S1 J1 P1
2. Choose a plan (let us start with Plan A)
3. For each operation enumerate the access routines:
Operation S1 has possible access routines: Linear Search and binary
search
Operation S2 has possible access routines: Linear Search and indexed
search
Operation J1 has possible access routines: Nested Loop join and indexed
join
4. Choose the least cost access routine for each operation
Operation S1 least cost access routine is binary search at a cost of 10
blocks
Operation S2 least cost access routine is linear search at a cost of 20
blocks
Operation J1 least cost access routine is indexed join at a cost of 40
blocks
5. The sum of the costs for each access routine are: 10 + 20 + 40 = 70
Thus the total cost for Plan A will be: 70
6. In repeating the steps 2 though 5 for Plan B, we come up with:
Operation S2 least cost access routine is binary search at a cost of 20
blocks
Operation S1 least cost access routine is indexed search at a cost of 5
blocks
Operation J1 least cost access routine is indexed join at a cost of 30
blocks
The sum of the costs for each access routine are: 20 + 5 + 30 = 55
Thus the total cost for Plan B will be: 55
Final result: Plan B would be the best plan - pass Plan B along to the query
code generator.

You might also like