Histogram
Histograms are feature in CBO( cost based optimizer ) and it helps to optimizer to determine how data are skewed(distributed) with in the column.Histogram is good to create for the column which are included in the WHERE clause where the column is highly skewed.Histogram helps to optimizer to decide whether to use an index or full-table scan or help the optimizer determine the fastest table join order.
Advantages:
1. Histograms are useful for Oracle optimizer to choose the right access method in a table.
2. It is also useful for optimizer to decide the correct table join order. When we join multiple tables, histogram helps to minimize the intermediate result set.
Since the smaller size of the intermediate result set will improve the performance.
Method_opt Parameter: This is the parameter which tells about creating histogram while collecting the statistics.
The default is FOR ALL COLUMNS SIZE AUTO in Oracle10g.
We have one table containing 3.6 million record, and one columns emp_status is highly skewed, it two distinct values (N,Y). we have bitmap index on emp_status.
1. Let generate the statistics without any histogram. without histogram oracle assume that data is evenly distributed and optimizer think that we have
1.8 million record for emp_status=y and 1.8 for emp_status=n
SQL> select count(*),emp_status from pardeep.emp 2 group by emp_status;
COUNT(*) E
---------- -
1 N
3000000 Y
SQL> execute DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'pardeep', TABNAME => 'EMP',ESTIMATE_PERCENT =>10, METHOD_OPT => 'FOR ALL COLUMNS SIZE 1',CASCADE => TRUE);
PL/SQL procedure successfully completed.
SQL> select ename from pardeep.emp where emp_status='Y';
3000000 rows selected.
Examplain plain
Id Operation Name Rows Bytes Cost (%CPU) Time
--------------------------------------------------------------------------
0 SELECT STATEMENT 1832K 15M 5374 (5) 00:01:05
* 1 TABLE ACCESS FULL EMP 1832K 15M 5374 (5) 00:01:05
SQL> select ename from scott.emp where emp_status='N';
Examplain plain
Id Operation Name Rows Bytes Cost (%CPU) Time
--------------------------------------------------------------------------
0 SELECT STATEMENT 1832K 15M 5374 (5) 00:01:05
* 1 TABLE ACCESS FULL EMP 1832K 15M 5374 (5) 00:01:05
Conclusion : Optimizer is using full scan for the query which return 3000000 as well as it using full table scan for query which return only 1 record.
2. Let us generate the statistics with histogram and see what kind of execution path optimizer is using
FOR COLUMN SIZE 2 EMP_STATUS will create two bucket for column emp_status.
If we are not sure the distinct number of values in the column, then we can use AUTO option to collect histogram.
SQL> execute DBMS_STATS.GATHER_TABLE_STATS(OWNNAME => 'pardeep', TABNAME => 'EMP',ESTIMATE_PERCENT =>10, METHOD_OPT => 'FOR COLUMNS SIZE 2 EMP_STATUS',CASCADE => TRUE);
PL/SQL procedure successfully completed.
SQL> select ename from pardeep.emp where emp_status='Y';
3670016 rows selected.
Examplain plain
--------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
--------------------------------------------------------------------------
0 SELECT STATEMENT 3681K 31M 5375 (5) 00:01:05
* 1 TABLE ACCESS FULL EMP 3681K 31M 5375 (5) 00:01:05
SQL> select ename from pardeep.emp where emp_status='N';
Examplain plain
--------------------------------------------------------------------------
Id Operation Name Rows Bytes Cost (%CPU) Time
--------------------------------------------------------------------------
0 SELECT STATEMENT 1 9 1 (0) 00:00:01
1 TABLE ACCESS BY INDEX ROWID EMP 1 9 1 (0) 00:00:01
2 BITMAP CONVERSION TO ROWIDS
* 3 BITMAP INDEX SINGLE VALUE IDX_EMP
Conclusion : Optimizer is using full scan for the query which return 3000000 records. optimizer is using index scan for other query which retrun 1 record.
Data dictionary objects for Histogram:
- user_histograms
- user_part_histograms
- user_subpart_histograms
- user_tab_histograms
- user_tab_col_statistics
if you find this information useful, please comment.
No comments:
Post a Comment