Professional Documents
Culture Documents
Optimizer
This article has the following sections:
So let's start by examining the CBO optimizer modes and the Oracle
parameters that influence the CBO.
The CBO and optimizer modes. In Oracle9i Database there are four
optimizer modes, all determined by the value of the optimizer_mode
parameter: rule, choose, all_rows, and first_rows. The rule and choose modes
reflect the obsolete rule-based optimizer, so we will focus on the CBO
modes here.
Let's assume that this execution plan starts delivering results in .0001
seconds and requires 10,000 db_block_gets. But what if your goal is to
minimize computing resources? If this SQL is inside a batch program,
then it is not important to start returning rows quickly, and a different
execution plan would take fewer resources. In this example, a
parallel full-table scan followed by a back-end sort will require less
machine resources and less I/O because blocks do not have to be
reread to pull the data in sorted order (see Figure 2). In this example,
we expect the result to take longer to deliver (no rows until the sort is
complete), but we will see far less I/O because blocks will not have to
be reaccessed to deliver the rows in presorted order.
Let's assume that this execution plan delivers the result in 10 seconds
with 5,000 db_block_gets.
Oracle offers several optimizer modes that allow you to choose your
definition of the "best" execution plan for you:
• optimizer_mode=first_rows_ This CBO mode will return rows as soon as
possible, even if the overall query runs longer or consumes more computing
resources than other plans. The first_rows optimizer_mode usually involves
choosing an index scan over a full-table scan because index access will return
rows quickly. Because the first_rows mode favors index scans over full-table
scans, the first_rows mode is more appropriate for OLTP systems where the end
user needs to see small result sets as quickly as possible.
The most important key to success with the CBO is to carefully define
and manage your statistics. In order for the CBO to make an intelligent
decision about the best execution plan for your SQL, it must
have information about the table and indexes that participate in the
query. When the CBO knows the size of the tables and the distribution,
cardinality, and selectivity of column values, the CBO can make an
informed decision and almost always generates the best execution
plan.
Let's examine the following areas of CBO statistics and see how to
gather top-quality statistics for the CBO and how to create an
appropriate CBO environment for your database.
exec dbms_stats.gather_schema_stats( -
ownname => 'SCOTT', -
options => 'GATHER AUTO', -
estimate_percent => dbms_stats.auto_sample_size, -
method_opt => 'for all columns size repeat', -
degree => 34 -
)
There are several values for the OPTIONS parameter that we need to
know about:
• GATHER STALE_ only reanalyzes tables with more than 10 percent modifications
(inserts, updates, deletes)
• GATHER AUTO_ will reanalyze objects that currently have no statistics and objects
with stale statistics. Using GATHER AUTO is like combining GATHER STALE and
GATHER EMPTY.
Note that both GATHER STALE and GATHER AUTO require monitoring. If you
issue the ALTER TABLE XXX MONITORING command, Oracle tracks changed
tables with the dba_tab_modifications view. Below we see that the exact
number of inserts, updates and deletes are tracked since the last
analysis of statistics:
SQL> desc dba_tab_modifications;
Name Type
--------------------------------
TABLE_OWNER VARCHAR2(30)
TABLE_NAME VARCHAR2(30)
PARTITION_NAME VARCHAR2(30)
SUBPARTITION_NAME VARCHAR2(30)
INSERTS NUMBER
UPDATES NUMBER
DELETES NUMBER
TIMESTAMP DATE
TRUNCATED VARCHAR2(3)
The most interesting of these options is the GATHER STALE option.
Because all statistics will become stale quickly in a robust OLTP
database, we must remember the rule for GATHER STALE is > 10% row
change (based on num_rows at statistics collection time). Hence,
almost every table except read-only tables will be reanalyzed with the
GATHER STALE option, making the GATHER STALE option best for systems
that are largely read-only. For example, if only five percent of the
database tables get significant updates, then only five percent of
the tables will be reanalyzed with the GATHER STALE option.
Automating sample size with dbms_stats.The better the quality of the
statistics, the better the job that the CBO will do when determining
your execution plans. Unfortunately, doing a complete analysis on a
large database could take days, and most shops must sample your
database to get CBO statistics. The goal is to take a large enough
sample of the database to provide top-quality data for the CBO.
Now that we see how the dbms_stats option works, let's see how to
specify an adequate sample size for dbms_stats.
In earlier releases, the DBA had to guess what percentage of the
database provided the best sample size and sometimes underanalyzed
the schema. Starting with Oracle9i Database, the estimate_percent
argument is a great way to allow Oracle's dbms_stats to automatically
estimate the "best" percentage of a segment to sample when
gathering statistics:
estimate_percent => dbms_stats.auto_sample_size
After collecting automatic sample sizes, you can verify the accuracy of
the automatic statistics sampling by looking at the sample_size column
on any of these data dictionary views:
• DBA_ALL_TABLES
• DBA_INDEXES
• DBA_IND_PARTITIONS
• DBA_IND_SUBPARTITIONS
• DBA_OBJECT_TABLES
• DBA_PART_COL_STATISTICS
• DBA_SUBPART_COL_STATISTICS
• DBA_TABLES
• DBA_TAB_COLS
• DBA_TAB_COLUMNS
• DBA_TAB_COL_STATISTICS
• DBA_TAB_PARTITIONS
• DBA_TAB_SUBPARTITIONS
The key to success with the CBO is stability and ensuring your success
with the CBO involves several important infrastructure issues.
• Reanalyze statistics only when necessary. One of the most common mistakes
made by Oracle DBAs is to frequently re-analyze the schema. Remember, the sole
purpose of doing that is to change the execution plans for your SQL, and if it ain't
broke, don't fix it. If you are satisfied with your current SQL performance, re-
analyzing a schema could cause significant performance problems and undo the
tuning efforts of the development staff. In practice, very few shops are sufficiently
dynamic to require periodic schema re-analysis.
• Force developers to tune their SQL. Many developers falsely assume that their
sole goal is to write SQL statements that deliver the correct data from Oracle. In
reality, formulating the SQL is only half their job. Successful Oracle shops always
require that developers ensure that their SQL accesses the database in an optimal
fashion and require migration forms that include the execution plan for all new
SQL.
• Carefully manage CBO statistics. Successful Oracle shops carefully manage the
CBO statistics to ensure that the CBO works the same in their test and production
environments. A savvy DBA will collect high-quality statistics and migrate their
production statistics into their test environments. This approach ensures that all
SQL migrating into production has the same execution plan as it did in the test
database.
• Rarely change CBO parameters. The CBO parameters are very dangerous
because a single parameter change could adversely affect the performance of an
entire enterprise. Changes to critical CBO parameters such as optimizer_mode,
optimizer_index_cost_adj, and optimizer_index_caching should only be made
after careful system testing.
• Ensure static execution plans. Nobody like surprises, and successful CBO shops
lock down SQL execution plans by carefully controlling CBO statistics, using
stored outlines optimizer plan stability, or adding detailed hints to their SQL.
Let's take a closer look at these issues.
Re-analyze statistics only when necessary. It is very rare for the
fundamental nature of a schema to change; large tables remain large,
and index columns rarely change distribution, cardinality, and skew.
You should only consider periodically re-analyzing your total schema
statistics if your database matches these criteria:
• Highly volatile databases. In these rare cases, the size of table and the
characteristics of index column data changes radically. For example, if you have a
table that has 100 rows one week and 10,000 rows the next week, then you may
want to consider a periodic reanalysis of statistics.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=64)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'BOOK' (Cost=1 Card=1 Bytes=64)
3 1 TABLE ACCESS (FULL) OF 'SALES' (Cost=1 Card=5 Bytes=25)
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=3 Card=100 Bytes=8200)
1 0 FILTER
2 1 FILTER
3 2 HASH JOIN (OUTER)
4 3 TABLE ACCESS (FULL) OF 'BOOK' (Cost=1 Card=20 Bytes=1280)
5 3 TABLE ACCESS (FULL) OF 'SALES' (Cost=1 Card=100 Bytes=1800)
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=59)
1 0 FILTER
2 1 TABLE ACCESS (FULL) OF 'BOOK' (Cost=1 Card=1 Bytes=59)
3 1 FILTER
4 3 NESTED LOOPS (Cost=6 Card=1 Bytes=82)
5 4 TABLE ACCESS (FULL) OF 'SALES' (Cost=1 Card=5 Bytes=90)
6 4 TABLE ACCESS (BY INDEX ROWID) OF 'BOOK' (Cost=1 Card=1)
7 6 INDEX (UNIQUE SCAN) OF 'PK_BOOK' (UNIQUE)
• Forcing all SQL that is migrating into production to have verification that the
SQL has been tuned.
• The DBA can control the overall behavior of the CBO with several Oracle
parameters, but they should only be changed under limited circumstances.
• The CBO relies on statistics to determine the optimal execution plan for SQL
statements, and statistics should always be collected with the dbms_stats
package.
The auto option is used when monitoring is implemented (alter table xxx
monitoring;) and creates histograms based upon data distribution and
the manner in which the column is accessed by the application (e.g.
the workload on the column as determined by monitoring). Using
method_opt=>'auto'is similar to using the gather auto in the option
parameter of dbms_stats:
begin
dbms_stats.gather_schema_stats(
ownname => 'SCOTT',
estimate_percent => dbms_stats.auto_sample_size,
method_opt => 'for all columns size auto',
degree => 7
);
end;
/
For example, here we see a query using a column value to filer the
result set:
select
customer_name
from
customer
where
customer_state = 'Rhode Island';
In this example, the choice to use an index versus a full-table scan is
influenced by the proportion of customers in Rhode Island. If there are
a super-small proportion of customers in Rhode Island and the values
are clustered on the data blocks, then an index scan might be the
fastest execution plan for this query.
Many Oracle developers are perplexed when the optimizer chooses a
full-table scan when they are only retrieving a small number of rows,
not realizing that the optimizer is considering the clustering of the
column values within the table.
Oracle provides a column called clustering_factor in the dba_indexes view
that tells the optimizer how synchronized the table rows are with
the index. When the clustering factor is close to the number of
data blocks, the table rows are synchronized with the index.
The selectivity of a column value, the db_block_size, the avg_row_len and
the cardinality all work together in helping the optimizer decide
whether to use and index versus using a full-table scan. If a
data column has high selectivity and a low clustering_factor, then an
index scan is usually the fastest execution method
(see Figure 4).
• CPU_COST — The optimizer can now estimate the number of machine cycles
required for an operation, and factors this cost into the execution plan calculation.
The CPU costs associated with servicing an Oracle query depends upon the
current server configuration (which Oracle cannot see). In Oracle Database 10g,
CPU costing is the default behavior because of the importance of considering the
CPU costs associated with each SQL execution phase—thus, the savvy Oracle
professional should turn on CPU costing with dbms_stats.get_system_stats.
CPU costs are generally not important unless the entire Oracle instance is using
excessive CPU resources.
According to the Oracle documentation, the I/O and CPU costs are
evaluated as follows:
where:
Let's take a quick look at how hints are used to alter optimizer
execution plans: A optimizer hint is an optimizer directive placed inside
comments inside your SQL statement and used in those rare cases
where the optimizer makes an incorrect decision about the execution
plan. Because hints are inside comments, it is important to ensure that
the hint name is spelled correctly and that the hint is appropriate to
the query.
For example, the following hint is invalid because first_rows access and
parallel access are mutually exclusive. That's because parallel always
assumes a full-table scan and first_rows favors index access.
-- An invalid hint
select /*+ first_rows parallel(emp,8)*/
emp_name
from
emp
order by
ename;
-- A valid hint
select /*+ full parallel(emp,35)*/
emp_name
from
emp
order by
ename;
Now that we have the general concept of hints, let's take a look at one
of the most important hints for optimizer tuning.
The ordered hint determines the driving table for the query execution
and also specifies the order that tables are joined together. The ordered
hint requests that the tables should be joined in the order that they are
specified in the from clause, with the first table in the from clause
specifying the driving table. Using the ordered hint can save a huge
amount of parse time and speed SQL execution because you are telling
the optimizer the best order to join the tables.
For example, the following query uses the ordered hint to join the
tables in their specified order in the from clause. In this example, we
further refine the execution plan by specifying that the emp to dept
join use a hash join and the sal to bonus join use a nested loop join:
select
/*+ ordered use_hash (emp, dept) use_nl (sal, bon) */
from
emp,
dept,
sal,
bon
where . . .
• Sub-optimal index access to a table — This problem occurs when the optimizer
cannot find an index or the most restrictive where clause in the SQL is not
matched with an index. When the optimizer cannot find an appropriate index to
access table rows, the optimizer will always invoke a full-table scan, reading
every row in the table. Hence, a large-table full-table scan might indicate a sub-
optimal SQL statement that can be tuned by adding an index that matches the
where clause of the query.
• Sub-optimal join methods — The optimizer has many join methods available
including a merge join, a nested loop join, hash join and a star join. To choose the
right join method, the optimizer must guess at the size of the intermediate result
sets from multi-way table joins. To make this guess, the optimizer has incomplete
information. Even if histograms are present, the optimizer cannot know for certain
the exact number of rows returned from a join. The most common remedy is to
use hints to change the join (use_nl, use_hash) or re-analyze the statistics on the
target tables.
Let's examine how the v$sql_plan view can help us locate SQL tuning
opportunities. When searching for tuning opportunities, we start by
interrogating the v$sql_plan view to find these large-table full-table
scans as shown in Listing 3.
Listing 3:
--*****************************************************
-- Object Access script report
--
-- (c) 2003 by Donald K. Burleson
-- This is freeware, Never to be sold
--*****************************************************
ttitle 'full table scans and counts| |The "K" indicates that the table
is in the KEEP Pool (Oracle8).'
select
p.owner,
p.name,
t.num_rows,
ltrim(t.cache) ch,
decode(t.buffer_pool,'KEEP','Y','DEFAULT','N') K,
s.blocks blocks,
sum(a.executions) nbr_FTS
from
dba_tables t,
dba_segments s,
v$sqlarea a,
(select distinct
address,
object_owner owner,
object_name name
from
v$sql_plan
where
operation = 'TABLE ACCESS'
and
options = 'FULL') p
where
a.address = p.address
and
t.owner = s.owner
and
t.table_name = s.segment_name
and
t.table_name = p.name
and
t.owner = p.owner
and
t.owner not in ('SYS','SYSTEM')
having
sum(a.executions) > 9
group by
p.owner, p.name, t.num_rows, t.cache, t.buffer_pool, s.blocks
order by
sum(a.executions) desc;
Then, we extract the corresponding SQL and see if the full-table scan is
warranted or due to a missing index.
How can we locate small tables that are subject to full-table scans?
One method is to search the SQL that is currently in the library cache.
Oracle can then generate a report that lists all the full-table scans in
the database at that time. The script in Listing 3 examines the
execution plans from v$sql_plan and reports on the frequency of full-
table scans.
The report (see Listing 4) has the following columns:
• Tables & indexes for the KEEP pool—SQL speed may benefit from placing
small tables (and associated indexes) that have frequent full-table scans in the
KEEP pool. The report above shows full-table scans on both large and small tables.
Examining this report, we can quickly identify possible candidates for the KEEP
pool by selecting the tables with less than 50 blocks that have no "K" designation.
Assigning a table, partition or index to the KEEP pool is easy, and objects can be
added or removed at-will with alter system commands:
• alter table CUSTOMER storage (buffer_pool KEEP);