Mohamed Houri’s Oracle Notes

November 2, 2012

Index unique scan and the Clustering Factor: does it matter?

Filed under: Oracle — hourim @ 10:58 am

When the CBO evaluates the cost of using an index it uses the following formula:

cost =
    blevel +
    ceiling(leaf_blocks * effective index selectivity) +
    ceiling(clustering_factor * effective table selectivity)

The third line of the above formula multiplies the index clustering factor by the effective table selectivity. This means that the more the index clustering factor value is big the less is the chance to see this index used by the CBO.

But wait a moment: is this formula still valid for a unique index?
What about a unique index with an enormous clustering factor? Will its chance to be used by the CBO be altered by its enormous clustering factor?

I am going to answer:

The clustering factor makes sense only when you access more than one row from a table block.

Do you think that this answer is 100% correct? Not really.

The clustering factor is still considered by the CBO even when you access only one row from a table block but you do that via an index range scan.

Is it still not clear?

Let me then try to be more precise:

The clustering factor does not influence the CBO when

1. you access one row from a table block and
2. when the CBO knows that it will access this one row via a unique index scan.

This blog article aims to demonstrate the above points

This is the model (thanks Jonathan !)

SQL> create table t1
    as
    with generator as (
        select  --+ materialize
            rownum id
        from dual
       connect by
            level >= 10000
    )
   select
       rownum          id1,
       trunc(dbms_random.value(1,10000)) n1,
       lpad(rownum,10,'0') small_vc,
       rpad('x',100)       padding
   from
       generator   v1,
       generator   v2
   where
       rownum  <= 1000000;

SQL > create index ind_id1 on t1(id1);
Index created.

SQL> create index ind_id1_n1 on t1(id1,n1);
Index created.

SQL> exec dbms_stats.gather_table_stats(user,'t1', cascade =>; true);
PL/SQL procedure successfully completed.

SQL> select INDEX_NAME, uniqueness, LEAF_BLOCKS, NUM_ROWS
           ,clustering_factor
      from USER_INDEXES
      where table_name = 'T1';

INDEX_NAME                     UNIQUENES LEAF_BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------------------------ --------- ----------- ----------  -----------------
IND_ID1_N1                     NONUNIQUE        2784    1000000              17544
IND_ID1                        NONUNIQUE        2226    1000000              17544

SQL> select *
    from t1
    where id1 = 42;
       ID1         N1 SMALL_VC   PADDING
---------- ---------- ---------- -----------------------------------------
        42        491 0000000042 x

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------
SQL_ID  08gbtm7g2pns8, child number 0
-------------------------------------
select * from t1 where id1 = 42
Plan hash value: 3333317548
---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |       |       |     4 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |     1 |   121 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID1 |     1 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID1"=42)

So far, the CBO chooses to range scan ind_id1 (id1) index to get only one row from table t1. What will happen if we alter the clustering factor of this index to a dramatic value?

SQL> exec DBMS_STATS.SET_INDEX_STATS(user,'ind_id1',clstfct => 1e7);

PL/SQL procedure successfully completed.

SQL> select INDEX_NAME, uniqueness, LEAF_BLOCKS, NUM_ROWS, clustering_factor
        from USER_INDEXES
        where table_name = 'T1';

INDEX_NAME                     UNIQUENES LEAF_BLOCKS   NUM_ROWS CLUSTERING_FACTOR
------------------------------ --------- ----------- ----------  ------------
IND_ID1_N1                     NONUNIQUE        2784    1000000           17544
IND_ID1                        NONUNIQUE        2226    1000000           10000000

SQL> select * from t1 where id1 = 42;
       ID1         N1 SMALL_VC   PADDING
---------- ---------- ---------- ---------------------------------------------------------
        42       4000 0000000042 x

SQL> select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------
SQL_ID  60qd5vf8jadn9, child number 0
-------------------------------------
select * from t1 where id1 = 42
Plan hash value: 862204649
------------------------------------------------------------------------------------------
| Id  | Operation                   | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |            |       |       |     4 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1         |     1 |   121 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | IND_ID1_N1 |     1 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID1"=42)

We see that, even when accessing only one row, the CBO has decided to use the second index (ind_id1_n1) which possess a better clustering factor value. What will happen if we make the index ind_id1 unique?

SQL> drop index ind_id1;
Index dropped.

SQL> create unique index ind_id1 on t1(id1);
Index created.

SQL> exec DBMS_STATS.SET_INDEX_STATS(user,'ind_id1',clstfct => 1e7);
PL/SQL procedure successfully completed.

SQL> select * from t1 where id1 = 42;
       ID1         N1 SMALL_VC   PADDING
---------- ---------- ---------- --------------------------------------------
       42       4000 0000000042 x

SQL> select * from table(dbms_xplan.display_cursor);
PLAN_TABLE_OUTPUT
-------------------------------------------------------------------------------
SQL_ID  60qd5vf8jadn9, child number 0
-------------------------------------
select * from t1 where id1 = 42
Plan hash value: 2434187540
---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |       |       |     3 (100)|          |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |     1 |   121 |     3   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | IND_ID1 |     1 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID1"=42)

Even when we set the clustering factor of the unique index ind_id1 to the dramatic value of 1e7, it is still prefered and used by the CBO.

Bottom line: when you know that your index will be created on a column which is unique from the business point of view you had better then to let the CBO knows this information by creating this index as a UNIQUE index. As such you will enhance the propability of using this index when selecting one row from a table whatever will be the value of the clustering factor of this unique index

5 Comments »

  1. Mohamed, you said “….when you know that your index will be created on a column which is unique from the business point of view you had better then let the CBO knows this information by creating this index as a UNIQUE index”.

    If your business knows a column is going to be unique, why haven’t you already created a UNIQUE CONSTRAINT on it right from the very beginning, when the column was first added to the table?

    Surely, that is a fundamental step in the build process of your system, and one you have missed out !

    You should have recognised its uniqueness at the design stage, and added that fact to your logical model.

    If you don’t have a unique constraint on your column, performance is the least of your worries. You now have data integrity issues because you believed that column to contain unique data across your system, but failed to tell the DB that fact !!!

    Having data integrity issues is a very serious affair because you now cannot trust your data. You will believe it to be one thing (in this case, unique), whilst the DB thinks it is something else (non-unique). You have become out of synch with each other at a fundamental level.

    Comment by Martin Rose — November 5, 2012 @ 2:00 pm | Reply

  2. Hi Martin,

    You are absolutely right.

    What I wanted to emphasize is that when an index is unique it has to be created as a unique index. Because having a unique index in place, we will not be disturbed by a change of its clustering factor. The CBO knows that our index is unique and hence will not consider its clustering factor in the cost evaluation of the index usage.

    Of course that a UNIQUE CONSTRAINT has to be first created. I agree with you. But for those who want to police a unique constraint with a non unique index they need to be aware of the content of this blog. -:)

    Best Regards

    Comment by hourim — November 5, 2012 @ 2:10 pm | Reply

  3. Ah, you didn’t make that clear initially. You’re making a point where a unique constraint may be enforced with a non-unique index, such as with deferred constraints.

    Comment by Martin Rose — November 5, 2012 @ 2:41 pm | Reply

  4. Hi. INDEX UNIQUE SCAN is a special optimizer operation which returns only single rowid and gives optimizer the idea as ” will access one table block ” (however one row can be distribute more than one blocks but as far as i know optimizer don`t aware that). But clustering factor(CF) indicate how your data “ORDERED” so when you walk over index blocks then how much times table blocks “switching” happened?. Due to cost formula for INDEX UNIQUE SCAN is not contain CF, so :

    COST(INDEX UNIQUE SCAN) = BLEVEL OF THE INDEX + 1 (if cost model is IO).

    Comment by Chinar Aliyev — November 5, 2012 @ 5:13 pm | Reply

  5. Chinar,

    Thanks for the additional precisions you brought to this article. What I wanted to express from this blog article is that even when the CBO knows that it will access only one block via a single index rowid including an access to a table by this ROWID, if it will do this operation using an index RANGE scan, than the clustering factor of this index will be considered into the index cost. Whereas if the CBO will do the same one block access via a single index rowid using an index UNIQUE scan operation, than the clustering factor of this unique index will not be considered in the index cost as you’ve shown throw the above formula.

    Best regards

    Comment by hourim — November 6, 2012 @ 11:08 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

Tony's Oracle Tips

Tony Hasler's light hearted approach to learning about Oracle

Richard Foote's Oracle Blog

Focusing Specifically On Oracle Indexes, Database Administration and Some Great Music

Hatem Mahmoud Oracle's blog

Just another Oracle blog : Database topics and techniques

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Diagnostician

Performance troubleshooting as exact science

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

EU Careers info

Your career in the European Union

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)

%d bloggers like this: