You are on page 1of 31

GLOBAL QUERY OPTIMIZATION

Query optimization process of producing a query execution plan which represents an execution strategy for the query. Input: Fragment query

Find the best (not necessarily optimal) global schedule


Minimize a cost function

Query Optimization Process

Search Space
Restrict by means of heuristics

Perform unary operations before binary operations Restrict the shape of the join tree linear versus bushy trees Linear tree is a tree such that at least one operand of each operator node is a base relation. Bushy tree both operands are intermediate relations

Search Strategy
The most popular search strategy used by query optimizers is dynamic programming

Deterministic
Start from base relations and build plans by adding one relation at each step Dynamic programming: breadth-first Greedy: depth-first

Randomized
Search for optimalities around a particular starting point Trade optimization time for execution time Better when > 5-6 relations

Distributed cost model


An optimizers cost model includes cost functions to predict the cost of operators, statistics and base data and formulas to evaluate the sizes of intermediate results. Cost Function The cost of a distributed execution strategy can be expressed with respect to either the total time or the response time Total time - The sum of all time components Response time - Elapsed time from the initiation to the completion of the query.

Total Time =

Tcpu * #insts + TI/Os * I/Os + Tmsg * #msgs + TTR * #bytes


Tcpu - Time of a CPU instruction
TI/Os - Time of a disk I/O

Tmsg - Fixed tune of initiating and receiving a message


TTR - Time it takes to transmit a data unit from one site to

another

Response_time =

Tcpu * seq_#insts + TI/Os * seq_#I/Os + Tmsg * seq_#msgs + TTR * seq_#bytes Seq.#x x can be instructions(insts), I/O, messages or bytes

Database Statistics
The primary factor affecting the performance of an execution strategy is the size of the intermediate relations that are produced during the execution. For each relation R defined over the attributes A= {A1, A2,.., An} fragmented as R1, , Rr the statistical data length of each attribute: length(Ai) the number of distinct values for each attribute in each fragment: card(Ai (Rj)) maximum and minimum values in the domain of each attribute: min(Ai), max(Ai) the cardinality of each domain: card(dom[Ai]) the cardinality of each fragment: card(Rj)

Selectivity factor of each operation for relations The join selectivity factor denoted SFj of relations R and S is a real value between 0 and 1

For eg., SFj of 0.5 corresponds to a very large joined

relation 0.001 corresponds to a small one.


These statistics are useful to predict the size of intermediate relations

The size of an Intermediate Relation R as follows :

size(R) = card(R) * length(R)


Length(R) the length of a tuple of R computed from the length of its attributes Cardinalities of Intermediate results Selection :The cardinality of selection is

Centralized Query Optimization


INGRES dynamic System R static exhaustive search

INGRES Algorithm
Decompose each multi-variable query into a sequence of mono-variable queries with a common variable Process each by a one variable query processor

Choose an initial execution plan (heuristics)


Order the rest by considering intermediate relation

sizes

INGRES AlgorithmDecomposition
Replace an n variable query q by a series of queries q1 q2 qn where qi uses the result of qi-1. Detachment Query q decomposed into q' q" where q' and q" have a common variable which is the result of q' Tuple substitution Replace the value of each tuple with actual values and simplify the query q(V1, V2, ... Vn) (q' (t1, V2, V3, ... , Vn), t1 R)

System R Algorithm
Simple (i.e., mono-relation) queries are executed according to the best access path Execute joins
2.1 Determine the possible ordering of joins 2.2 Determine the cost of each ordering 2.3 Choose the join ordering with minimal cost

You might also like