When the CBO evaluates the cost of using an index it uses the following formula:
cost =
blevel +
ceiling(leaf_blocks * effective index selectivity) +
ceiling(clustering_factor * effective table selectivity)
The third line of the above formula multiplies the index clustering factor by the effective table selectivity. This means that the more the index clustering factor value is big the less is the chance to see this index used by the CBO.
But wait a moment: is this formula still valid for a unique index?
What about a unique index with an enormous clustering factor? Will its chance to be used by the CBO be altered by its enormous clustering factor?
I am going to answer:
The clustering factor makes sense only when you access more than one row from a table block.
Do you think that this answer is 100% correct? Not really.
The clustering factor is still considered by the CBO even when you access only one row from a table block but you do that via an index range scan.
Is it still not clear?
Let me then try to be more precise:
The clustering factor does not influence the CBO when
1. you access one row from a table block and
2. when the CBO knows that it will access this one row via a unique index scan.
This blog article aims to demonstrate the above points
This is the model (thanks Jonathan !)
SQL> create table t1
as
with generator as (
select --+ materialize
rownum id
from dual
connect by
level >= 10000
)
select
rownum id1,
trunc(dbms_random.value(1,10000)) n1,
lpad(rownum,10,'0') small_vc,
rpad('x',100) padding
from
generator v1,
generator v2
where
rownum <= 1000000;
SQL > create index ind_id1 on t1(id1);
Index created.
SQL> create index ind_id1_n1 on t1(id1,n1);
Index created.
SQL> exec dbms_stats.gather_table_stats(user,'t1', cascade =>; true);
PL/SQL procedure successfully completed.
SQL> select INDEX_NAME, uniqueness, LEAF_BLOCKS, NUM_ROWS
,clustering_factor
from USER_INDEXES
where table_name = 'T1';
INDEX_NAME UNIQUENES LEAF_BLOCKS NUM_ROWS CLUSTERING_FACTOR
------------------------------ --------- ----------- ---------- -----------------
IND_ID1_N1 NONUNIQUE 2784 1000000 17544
IND_ID1 NONUNIQUE 2226 1000000 17544
SQL> select *
from t1
where id1 = 42;
ID1 N1 SMALL_VC PADDING
---------- ---------- ---------- -----------------------------------------
42 491 0000000042 x
SQL> select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------
SQL_ID 08gbtm7g2pns8, child number 0
-------------------------------------
select * from t1 where id1 = 42
Plan hash value: 3333317548
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 121 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND_ID1 | 1 | | 3 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID1"=42)
So far, the CBO chooses to range scan ind_id1 (id1) index to get only one row from table t1. What will happen if we alter the clustering factor of this index to a dramatic value?
SQL> exec DBMS_STATS.SET_INDEX_STATS(user,'ind_id1',clstfct => 1e7);
PL/SQL procedure successfully completed.
SQL> select INDEX_NAME, uniqueness, LEAF_BLOCKS, NUM_ROWS, clustering_factor
from USER_INDEXES
where table_name = 'T1';
INDEX_NAME UNIQUENES LEAF_BLOCKS NUM_ROWS CLUSTERING_FACTOR
------------------------------ --------- ----------- ---------- ------------
IND_ID1_N1 NONUNIQUE 2784 1000000 17544
IND_ID1 NONUNIQUE 2226 1000000 10000000
SQL> select * from t1 where id1 = 42;
ID1 N1 SMALL_VC PADDING
---------- ---------- ---------- ---------------------------------------------------------
42 4000 0000000042 x
SQL> select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID 60qd5vf8jadn9, child number 0
-------------------------------------
select * from t1 where id1 = 42
Plan hash value: 862204649
------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 4 (100)| |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 121 | 4 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IND_ID1_N1 | 1 | | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID1"=42)
We see that, even when accessing only one row, the CBO has decided to use the second index (ind_id1_n1) which possess a better clustering factor value. What will happen if we make the index ind_id1 unique?
SQL> drop index ind_id1;
Index dropped.
SQL> create unique index ind_id1 on t1(id1);
Index created.
SQL> exec DBMS_STATS.SET_INDEX_STATS(user,'ind_id1',clstfct => 1e7);
PL/SQL procedure successfully completed.
SQL> select * from t1 where id1 = 42;
ID1 N1 SMALL_VC PADDING
---------- ---------- ---------- --------------------------------------------
42 4000 0000000042 x
SQL> select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
SQL_ID 60qd5vf8jadn9, child number 0
-------------------------------------
select * from t1 where id1 = 42
Plan hash value: 2434187540
---------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 3 (100)| |
| 1 | TABLE ACCESS BY INDEX ROWID| T1 | 1 | 121 | 3 (0)| 00:00:01 |
|* 2 | INDEX UNIQUE SCAN | IND_ID1 | 1 | | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID1"=42)
Even when we set the clustering factor of the unique index ind_id1 to the dramatic value of 1e7, it is still prefered and used by the CBO.
Bottom line: when you know that your index will be created on a column which is unique from the business point of view you had better then to let the CBO knows this information by creating this index as a UNIQUE index. As such you will enhance the propability of using this index when selecting one row from a table whatever will be the value of the clustering factor of this unique index
Mohamed, you said “….when you know that your index will be created on a column which is unique from the business point of view you had better then let the CBO knows this information by creating this index as a UNIQUE index”.
If your business knows a column is going to be unique, why haven’t you already created a UNIQUE CONSTRAINT on it right from the very beginning, when the column was first added to the table?
Surely, that is a fundamental step in the build process of your system, and one you have missed out !
You should have recognised its uniqueness at the design stage, and added that fact to your logical model.
If you don’t have a unique constraint on your column, performance is the least of your worries. You now have data integrity issues because you believed that column to contain unique data across your system, but failed to tell the DB that fact !!!
Having data integrity issues is a very serious affair because you now cannot trust your data. You will believe it to be one thing (in this case, unique), whilst the DB thinks it is something else (non-unique). You have become out of synch with each other at a fundamental level.
Comment by Martin Rose — November 5, 2012 @ 2:00 pm |
Hi Martin,
You are absolutely right.
What I wanted to emphasize is that when an index is unique it has to be created as a unique index. Because having a unique index in place, we will not be disturbed by a change of its clustering factor. The CBO knows that our index is unique and hence will not consider its clustering factor in the cost evaluation of the index usage.
Of course that a UNIQUE CONSTRAINT has to be first created. I agree with you. But for those who want to police a unique constraint with a non unique index they need to be aware of the content of this blog. -:)
Best Regards
Comment by hourim — November 5, 2012 @ 2:10 pm |
Ah, you didn’t make that clear initially. You’re making a point where a unique constraint may be enforced with a non-unique index, such as with deferred constraints.
Comment by Martin Rose — November 5, 2012 @ 2:41 pm |
Hi. INDEX UNIQUE SCAN is a special optimizer operation which returns only single rowid and gives optimizer the idea as ” will access one table block ” (however one row can be distribute more than one blocks but as far as i know optimizer don`t aware that). But clustering factor(CF) indicate how your data “ORDERED” so when you walk over index blocks then how much times table blocks “switching” happened?. Due to cost formula for INDEX UNIQUE SCAN is not contain CF, so :
COST(INDEX UNIQUE SCAN) = BLEVEL OF THE INDEX + 1 (if cost model is IO).
Comment by Chinar Aliyev — November 5, 2012 @ 5:13 pm |
Chinar,
Thanks for the additional precisions you brought to this article. What I wanted to express from this blog article is that even when the CBO knows that it will access only one block via a single index rowid including an access to a table by this ROWID, if it will do this operation using an index RANGE scan, than the clustering factor of this index will be considered into the index cost. Whereas if the CBO will do the same one block access via a single index rowid using an index UNIQUE scan operation, than the clustering factor of this unique index will not be considered in the index cost as you’ve shown throw the above formula.
Best regards
Comment by hourim — November 6, 2012 @ 11:08 am |