Mohamed Houri’s Oracle Notes

March 2, 2014

Index design

Filed under: Index — hourim @ 8:35 am

Here it is a very brief discussion I have had with one of my colleagues about index design

Colleague: what kind of index would you suggest to cover the following query?

SELECT
     rowid
    ,a.*
FROM  message_out a
WHERE sms_status    in (700, 707)
AND  (scheduled_time is null
      OR scheduled_time   <= :1)
AND   provider_id     in (0,0)
ORDER BY
      priority_level desc,
      creation_time asc;

----------------------------------------------------------------------------
| Id | Operation         | Name        | Rows | Bytes |TempSpc| Cost (%CPU)|
----------------------------------------------------------------------------
| 0  | SELECT STATEMENT  |             | 5529 | 1376K |       | 4769 (1)   |
| 1  | SORT ORDER BY     |             | 5529 | 1376K | 1856K | 4769 (1)   |
|* 2 |  TABLE ACCESS FULL| MESSAGE_OUT | 5529 | 1376K |       | 4462 (1)   |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("SMS_STATUS"=700 OR "SMS_STATUS"=707)
AND ("SCHEDULED_TIME" IS NULL OR "SCHEDULED_TIME"<=:1)
AND "PROVIDER_ID"=0)

Me: and what have you ended up with until now?

Colleague: here my suggested index and the related execution plan

CREATE INDEX virtual_index ON MESSAGE_OUT(sms_status,scheduled_time,provider_id) ;

---------------------------------------------------------------------------------------
| Id | Operation                   | Name          | Rows | Bytes |TempSpc| Cost (%CPU)|
---------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT            |               | 5529 | 1376K |       | 446 (1)    |
| 1  | SORT ORDER BY               |               | 5529 | 1376K | 1856K | 446 (1)    |
| 2  | INLIST ITERATOR             |               |      |       |       |            |
|* 3 |  TABLE ACCESS BY INDEX ROWID| MESSAGE_OUT   | 5529 | 1376K |       | 140 (0)    |
|* 4 |   INDEX RANGE SCAN          | VIRTUAL_INDEX | 5529 |       |       | 6 (0)      |
---------------------------------------------------------------------------------------

Me: I would not have created the same index

Me: here it is the index I would have created (after several questions regarding the data distribution, the table data volume, the use of bind variables, etc…)

create index mho_ind on MESSAGE_OUT (status, provider_id, scheduled_time);

Me: and if sms_status contains repetitive values then I would have added a compress command to that index creation

Colleague: there is no difference in the execution plan either by using my index or your index

------------------------------------------------------------------------------------------
| Id | Operation                     | Name          | Rows | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT              |               | 5529 | 1376K |       | 446 (1)    |
| 1  | SORT ORDER BY                 |               | 5529 | 1376K | 1856K | 446 (1)    |
| 2  |  INLIST ITERATOR              |               |      |       |       |            |
|* 3 |   TABLE ACCESS BY INDEX ROWID | MESSAGE_OUT   | 5529 | 1376K |       | 140 (0)    |
|* 4 |    INDEX RANGE SCAN           | VIRTUAL_INDEX | 5529 |       |       | 6 (0)      |
------------------------------------------------------------------------------------------

Me: no, it is not the same plan. Please always consider the predicate part

Me: what is the predicate part of the plan using your index

Colleague: this is my index predicate part


Predicate Information (identified by operation id):
 ---------------------------------------------------
 3 - filter("SCHEDULED_TIME" IS NULL OR "SCHEDULED_TIME"<=:1)
 4 - access(("SMS_STATUS"=700 OR "SMS_STATUS"=707) AND "PROVIDER_ID"=0)
     filter("PROVIDER_ID"=0) --> additional filter operation

Colleague: and this is your index predicate part

Predicate Information (identified by operation id):
---------------------------------------------------
 3 - filter("SCHEDULED_TIME" IS NULL OR "SCHEDULED_TIME"<=:1)
 4 - access(("SMS_STATUS"=700 OR "SMS_STATUS"=707) AND "PROVIDER_ID"=0)
--> no additional filter operation

Me: and did you pointed out the difference or not yet?

Colleague: no, same plan, same cost and same execution time

Me: there is a fundamental difference between your plan and mine. In your plan there is a double operation on your engineered index:  “ACCESS + FILTER” operation while my engineered index needs only one precise operation :  “ACCESS

Me: and when it comes to performance you always prefers a precise index ACCESS operation to that double ACCESS and FILTER operations.

Me: your engineered index has a second columns on which an inequality predicate is applied

SCHEDULED_TIME <= :1

You should always start your index by the columns on which an equality predicate is applied. In my case, I put the SCHEDULED_TIME column at the trailing edge of my index and doing as such I have avoided a costly filter operation on my index while your engineered index has been subject to that costly filter operation

If you want to test this behaviour then below is an example to play with. I hope you will enjoy it

SQL> create table t1
     (id number,
     n_1000 number,
     n_5000 number,
     n_10000 number,
     small_vc varchar2(20),
     padding varchar2(100)
     );

Table created.

SQL> insert into t1
  with generator as (
  select --+ materialize
  rownum id
  from dual
  connect by
  rownum <= 10000
  )
  select
    rownum id,
    mod(rownum,1000) n_1000,
    mod(rownum,5000) n_5000,
    mod(rownum,10000) n_10000,
    lpad(rownum,10,'0') small_vc,
    rpad('x',100) padding
  from
    generator v1,
    generator v2
  where
  rownum <= 100000
  ;

SQL> create index my_ind on t1(id, n_5000, n_1000);

Index created.

SQL> create index colleague_ind on t1(id, n_1000, n_5000);

Index created.

SQL> alter index my_ind invisible;

Index altered.

SQL> exec dbms_stats.gather_table_stats(user, 't1');

PL/SQL procedure successfully completed.

SQL> select
        a.*
  from t1 a
  where id in (112,120)
  and (n_1000 is null
  or n_1000 <= 3000)
  and n_5000 in (120);

Statistics
------------------------------------------------------
 65 recursive calls
 0 db block gets
 95 consistent gets ---> spot this
 0 physical reads
 0 redo size
 1005 bytes sent via SQL*Net to client
 543 bytes received via SQL*Net from client
 2 SQL*Net roundtrips to/from client
 6 sorts (memory)
 0 sorts (disk)
 1 rows processed

SQL_ID 7d6ag1m1ztpgr, child number 1
-------------------------------------
select a.* from t1 a where id in (112,120) and (n_1000 is null
or n_1000 <= 3000) and n_5000 in (120)

Plan hash value: 3644584748
-------------------------------------------------------------------------------------------
| Id  | Operation                            | Name          | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |               |       |       |     4 (100)|
|   1 |  INLIST ITERATOR                     |               |       |       |            |
|*  2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1            |     2 |   258 |     4   (0)|
|*  3 |    INDEX RANGE SCAN                  | COLLEAGUE_IND |     2 |       |     3   (0)|
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 2 - filter(("N_1000"<=3000 OR "N_1000" IS NULL))
 3 - access((("ID"=112 OR "ID"=120)) AND "N_5000"=120)
 filter("N_5000"=120) ---> spot this

SQL> alter index colleague_ind invisible;

Index altered.

SQL> alter index my_ind visible;

Index altered.

SQL> select
       a.*
  from t1 a
  where id in (112,120)
  and (n_1000 is null
  or n_1000 <= 3000)
  and n_5000 in (120);

Statistics
------------------------------------------------------
 33 recursive calls
 0 db block gets
 49 consistent gets --> spot the reduction
 0 physical reads
 0 redo size
 1005 bytes sent via SQL*Net to client
 543 bytes received via SQL*Net from client
 2 SQL*Net roundtrips to/from client
 6 sorts (memory)
 0 sorts (disk)
 1 rows processed

SQL_ID 7d6ag1m1ztpgr, child number 1
-------------------------------------
select a.* from t1 a where id in (112,120) and (n_1000 is null
or n_1000 <= 3000) and n_5000 in (120)

Plan hash value: 4286547933</pre>
------------------------------------------------------------------------------------
| Id  | Operation                            | Name   | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |        |       |       |     4 (100)|
|   1 |  INLIST ITERATOR                     |        |       |       |            |
|*  2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1     |     2 |   258 |     4   (0)|
|*  3 |    INDEX RANGE SCAN                  | MY_IND |     2 |       |     3   (0)|
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 2 - filter(("N_1000"<=3000 OR "N_1000" IS NULL))
 3 - access((("ID"=112 OR "ID"=120)) AND "N_5000"=120)
     --> spot the absence of filter on index
About these ads

9 Comments »

  1. Mohamed,

    There’s an anomaly with both execution plans that might be worth further investigation – all three predicate columns are in the index, so why is one of the predicate tests postponed until after the table visit ?
    The same effect appears in 11.2.0.4; the anomaly doesn’t appear if you change the n_1000 predicate to nvl(n_1000,3000) <= 3000

    Comment by Jonathan Lewis — March 2, 2014 @ 9:28 am | Reply

    • Jonathan,

      Yes, you are right. When using nvl(n_1000,3000) <= 3000, the filter is done only on the index.

      Looking to the corresponding 10053 trace files I have emphasized the following sections:

      Orginal case:

      id=0 frofkks[i] (index start key) predicate="A"."ID"=112 OR "A"."ID"=120
      id=0 frofkks[i] (index start key) predicate="A"."N_5000"=120
      id=0 frofkke[i] (index stop key) predicate="A"."ID"=112 OR "A"."ID"=120
      id=0 frofkke[i] (index stop key) predicate="A"."N_5000"=120
      id=0 frofand predicate="A"."N_1000"<=3000 OR "A"."N_1000" IS NULL
      

      Where I guess that the frofand is related to the table filter operation

      Case with modified predicate as nvl(n_1000,3000) <= 3000:

      id=0 frofkks[i] (index start key) predicate="A"."ID"=112 OR "A"."ID"=120
      id=0 frofkks[i] (index start key) predicate="A"."N_5000"=120
      id=0 frofkke[i] (index stop key) predicate="A"."ID"=112 OR "A"."ID"=120
      id=0 frofkke[i] (index stop key) predicate="A"."N_5000"=120
      id=0 froiand (index only filter) predicate=NVL("A"."N_1000",3000)<=3000
      

      Where the filter is this time applied on the index only

      But the question is why?

      I think that this is due to the fact that null are not indexed and the CBO needs to go to the table in order to filter n_1000 null values. And when we’ve changed the predicate it is as if we have allowed the CBO to use the index in order to check n_1000 null values because we have “replaced” those null n_1000 values by 3000. This is the unique explanation I came up. This explanation might be comforted by the following example where I have set the n_1000 column to be not null

      SQL> alter table t1 modify n_1000 not null;
      
      Table altered.
      
      SQL> select
        2          a.*
        3    from t1 a
        4    where id in (112,120)
        5    and (n_1000 is null
        6    or n_1000 <= 3000)
        7    and n_5000 in (120);
      
      SQL_ID  gtqawgpwprz7n, child number 0
      -------------------------------------
      select         a.*   from t1 a   where id in (112,120)   and (n_1000 is
      null   or n_1000 <= 3000)   and n_5000 in (120)
      
      Plan hash value: 4286547933
      
      -----------------------------------------------------------------------------------------------
      | Id  | Operation                            | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
      -----------------------------------------------------------------------------------------------
      |   0 | SELECT STATEMENT                     |        |       |       |     4 (100)|          |
      |   1 |  INLIST ITERATOR                     |        |       |       |            |          |
      |   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1     |     2 |   258 |     4   (0)| 00:00:01 |
      |*  3 |    INDEX RANGE SCAN                  | MY_IND |     2 |       |     3   (0)| 00:00:01 |
      -----------------------------------------------------------------------------------------------
      
      Predicate Information (identified by operation id):
      ---------------------------------------------------
         3 - access((("ID"=112 OR "ID"=120)) AND "N_5000"=120 AND "N_1000"<=3000) 
      

      Thanks a lot for giving me this opportunity to investigate. I am hungry of such cases to look at.

      Best regards

      Comment by hourim — March 2, 2014 @ 4:42 pm | Reply

      • Mohamed,

        If you change the predicate to just: and n_1000 is null you’ll get the filter applied in the index. There’s no reason why Oracle should have trouble finding the NULL in the index just because it’s in an OR.

        You might also try declaring ID as not null with the OR predicate in case there’s some feature of the optimizer code that decides (on splitting the OR) that the NULL part actually can’t be resolved safely in the index because it believes (incorrectly) that there could be such entries that don’t appear in the index. What is you select only the three columns in the index rather than * ? Does this become an index-only query, or do you still see the OR predicate applied at the table ?

        Comment by Jonathan Lewis — March 2, 2014 @ 5:04 pm

      • Jonathan

        Definitely it needs a deep investigation

        Selecting only indexed columns: there is no access to the table

        SQL> select
                a.id
               ,a.n_5000
               ,a.n_1000
            from t1 a
            where id in (112,120)
            and (n_1000 is null
                 or n_1000 <= 3000)
            and n_5000 in (120);
        
        SQL_ID  dc4sxrk5sac9q, child number 0
        -------------------------------------
        select     a.id    ,a.n_5000    ,a.n_1000 from t1 a where id in
        (112,120) and (n_1000 is null      or n_1000 <= 3000) and n_5000 in
        (120)
        
        Plan hash value: 3247889462
        --------------------------------------------------------------------------------------
        | Id  | Operation         | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
        --------------------------------------------------------------------------------------
        |   0 | SELECT STATEMENT  |        |      1 |        |      1 |00:00:00.01 |       5 |
        |   1 |  INLIST ITERATOR  |        |      1 |        |      1 |00:00:00.01 |       5 |
        |*  2 |   INDEX RANGE SCAN| MY_IND |      2 |      2 |      1 |00:00:00.01 |       5 |
        --------------------------------------------------------------------------------------
        
        Predicate Information (identified by operation id):
        ---------------------------------------------------
           2 - access((("ID"=112 OR "ID"=120)) AND "N_5000"=120)
               filter(("N_1000"<=3000 OR "N_1000" IS NULL))
        

        Declaring only id column as not null: there is no filter on the table

        SQL> alter table t1 modify id not null;
        
        Table altered.
        
        SQL> select
                a.*
            from t1 a
            where id in (112,120)
            and (n_1000 is null
                 or n_1000 <= 3000)
            and n_5000 in (120);
        
        SQL_ID  7d6ag1m1ztpgr, child number 1
        -------------------------------------
        select     a.* from t1 a where id in (112,120) and (n_1000 is null
        or n_1000 <= 3000) and n_5000 in (120)
        
        Plan hash value: 4286547933
        --------------------------------------------------------------------------------------------
        | Id  | Operation                            | Name   | Starts | E-Rows | A-Rows | Buffers |
        --------------------------------------------------------------------------------------------
        |   0 | SELECT STATEMENT                     |        |      1 |        |      1 |       6 |
        |   1 |  INLIST ITERATOR                     |        |      1 |        |      1 |       6 |
        |   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1     |      2 |      2 |      1 |       6 |
        |*  3 |    INDEX RANGE SCAN                  | MY_IND |      2 |      2 |      1 |       5 |
        --------------------------------------------------------------------------------------------
        
        Predicate Information (identified by operation id):
        ---------------------------------------------------
           3 - access((("ID"=112 OR "ID"=120)) AND "N_5000"=120)
               filter(("N_1000"<=3000 OR "N_1000" IS NULL))
        

        Best regards

        Comment by hourim — March 2, 2014 @ 6:36 pm

  2. Mohamed,
    Both plans have the same PHV ans PHV2 ?

    Comment by Laroussi — March 2, 2014 @ 3:00 pm | Reply

    • Laroussi

      They don’t have the same plan hash value; you can see this in the plans posted above. Having different indexes (and indexes names) there is a little chance for them to share the same phv2.

      And here below how to check (sql_id should be in the memory)

      Using my_ind index:

      SQL> SELECT
        2         p.sql_id
        3        ,p.plan_hash_value
        4        ,t.phv2
        5  FROM   v$sql_plan p
        6        ,xmltable('for $i in /other_xml/info
        7                   where $i/@type eq "plan_hash_2"
        8                   return $i'
        9                   passing xmltype(p.other_xml)
       10                   columns phv2 number path '/') t
       11      WHERE p.sql_id = '7d6ag1m1ztpgr'
       12      and   p.other_xml is not null
       13      ;
      
      SQL_ID        PLAN_HASH_VALUE       PHV2
      ------------- --------------- ----------
      7d6ag1m1ztpgr      4286547933 3312611661
      

      Using colleague_ind index:

      SQL> SELECT
        2         p.sql_id
        3        ,p.plan_hash_value
        4        ,t.phv2
        5  FROM   v$sql_plan p
        6        ,xmltable('for $i in /other_xml/info
        7                   where $i/@type eq "plan_hash_2"
        8                   return $i'
        9                   passing xmltype(p.other_xml)
       10                   columns phv2 number path '/') t
       11      WHERE p.sql_id = '7d6ag1m1ztpgr'
       12      and   p.other_xml is not null
       13      ;
      
      SQL_ID        PLAN_HASH_VALUE       PHV2
      ------------- --------------- ----------
      7d6ag1m1ztpgr      3644584748 2343153009 
      

      Comment by hourim — March 2, 2014 @ 5:08 pm | Reply

  3. I think, it’s not so easy, because there are too many very different cases.
    Of course, it’s great if all your queries has predicates like that, but it’s quite often when you have also big number other queries without predicates by one of the columns. And if it would be provider_id, you will get many “index skip scans”.
    Or if you have queries with big number inlist values like that:
    scheduled_time between :1 and :2
    AND (sms_status,provider_id) in (
    (700,:v1),(700,v2), ..(700.v1000)
    , (707,:v1),(707,v2), ..(707.v1000))

    Therefore, in such cases, I always ask about the other queries and their frequency distribution.
    Also, it’s very important to check the amount of load, to be sure that new index will not cause index contentions – may be, would be better to use global partitioned index.

    Best regards,
    Sayan Malakshinov

    Comment by Sayan Malakshinov — March 3, 2014 @ 7:43 pm | Reply

  4. Sayan,

    As it is very often the case in Oracle (and in many other domains) see how many questions can be raised from a simple discussion about the best index to cover a single query. And in the path to get the best possible index, if we observe attentively to what we came up with (as Jonathan Lewis did on the unnecessary table visit) we might end up opening a new question which is completely different from the one that originated the initial discussion.

    Best regards

    Comment by hourim — March 4, 2014 @ 11:38 am | Reply

  5. […] the importance of the leading index column that should be the ones on which an equality predicate is […]

    Pingback by Index design | Mohamed Houri’s Oracle Notes — September 3, 2014 @ 2:56 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Theme: Rubric. Get a free blog at WordPress.com

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Database 11g

Oracle Database

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

EU Careers info

Your career in the European Union

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

Tanel Poder's blog: Responsible data management

Linux, Oracle, Exadata and Hadoop.

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)

Follow

Get every new post delivered to your Inbox.

Join 101 other followers

%d bloggers like this: