Index design

March 2, 2014

Index design

Filed under: Index — hourim @ 8:35 am

Here it is a very brief discussion I have had with one of my colleagues about index design

Colleague: what kind of index would you suggest to cover the following query?

SELECT
     rowid
    ,a.*
FROM  message_out a
WHERE sms_status    in (700, 707)
AND  (scheduled_time is null
      OR scheduled_time   <= :1)
AND   provider_id     in (0,0)
ORDER BY
      priority_level desc,
      creation_time asc;

----------------------------------------------------------------------------
| Id | Operation         | Name        | Rows | Bytes |TempSpc| Cost (%CPU)|
----------------------------------------------------------------------------
| 0  | SELECT STATEMENT  |             | 5529 | 1376K |       | 4769 (1)   |
| 1  | SORT ORDER BY     |             | 5529 | 1376K | 1856K | 4769 (1)   |
|* 2 |  TABLE ACCESS FULL| MESSAGE_OUT | 5529 | 1376K |       | 4462 (1)   |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(("SMS_STATUS"=700 OR "SMS_STATUS"=707)
AND ("SCHEDULED_TIME" IS NULL OR "SCHEDULED_TIME"<=:1)
AND "PROVIDER_ID"=0)

Me: and what have you ended up with until now?

Colleague: here my suggested index and the related execution plan

CREATE INDEX virtual_index ON MESSAGE_OUT(sms_status,scheduled_time,provider_id) ;

---------------------------------------------------------------------------------------
| Id | Operation                   | Name          | Rows | Bytes |TempSpc| Cost (%CPU)|
---------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT            |               | 5529 | 1376K |       | 446 (1)    |
| 1  | SORT ORDER BY               |               | 5529 | 1376K | 1856K | 446 (1)    |
| 2  | INLIST ITERATOR             |               |      |       |       |            |
|* 3 |  TABLE ACCESS BY INDEX ROWID| MESSAGE_OUT   | 5529 | 1376K |       | 140 (0)    |
|* 4 |   INDEX RANGE SCAN          | VIRTUAL_INDEX | 5529 |       |       | 6 (0)      |
---------------------------------------------------------------------------------------

Me: I would not have created the same index

Me: here it is the index I would have created (after several questions regarding the data distribution, the table data volume, the use of bind variables, etc…)

create index mho_ind on MESSAGE_OUT (status, provider_id, scheduled_time);

Me: and if sms_status contains repetitive values then I would have added a compress command to that index creation

Colleague: there is no difference in the execution plan either by using my index or your index

------------------------------------------------------------------------------------------
| Id | Operation                     | Name          | Rows | Bytes |TempSpc| Cost (%CPU)|
------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT              |               | 5529 | 1376K |       | 446 (1)    |
| 1  | SORT ORDER BY                 |               | 5529 | 1376K | 1856K | 446 (1)    |
| 2  |  INLIST ITERATOR              |               |      |       |       |            |
|* 3 |   TABLE ACCESS BY INDEX ROWID | MESSAGE_OUT   | 5529 | 1376K |       | 140 (0)    |
|* 4 |    INDEX RANGE SCAN           | VIRTUAL_INDEX | 5529 |       |       | 6 (0)      |
------------------------------------------------------------------------------------------

Me: no, it is not the same plan. Please always consider the predicate part

Me: what is the predicate part of the plan using your index

Colleague: this is my index predicate part


Predicate Information (identified by operation id):
 ---------------------------------------------------
 3 - filter("SCHEDULED_TIME" IS NULL OR "SCHEDULED_TIME"<=:1)
 4 - access(("SMS_STATUS"=700 OR "SMS_STATUS"=707) AND "PROVIDER_ID"=0)
     filter("PROVIDER_ID"=0) --> additional filter operation

Colleague: and this is your index predicate part

Predicate Information (identified by operation id):
---------------------------------------------------
 3 - filter("SCHEDULED_TIME" IS NULL OR "SCHEDULED_TIME"<=:1)
 4 - access(("SMS_STATUS"=700 OR "SMS_STATUS"=707) AND "PROVIDER_ID"=0)
--> no additional filter operation

Me: and did you pointed out the difference or not yet?

Colleague: no, same plan, same cost and same execution time

Me: there is a fundamental difference between your plan and mine. In your plan there is a double operation on your engineered index: “ACCESS + FILTER” operation while my engineered index needs only one precise operation : “ACCESS”

Me: and when it comes to performance you always prefers a precise index ACCESS operation to that double ACCESS and FILTER operations.

Me: your engineered index has a second columns on which an inequality predicate is applied

SCHEDULED_TIME <= :1

You should always start your index by the columns on which an equality predicate is applied. In my case, I put the SCHEDULED_TIME column at the trailing edge of my index and doing as such I have avoided a costly filter operation on my index while your engineered index has been subject to that costly filter operation

If you want to test this behaviour then below is an example to play with. I hope you will enjoy it

SQL> create table t1
     (id number,
     n_1000 number,
     n_5000 number,
     n_10000 number,
     small_vc varchar2(20),
     padding varchar2(100)
     );

Table created.

SQL> insert into t1
  with generator as (
  select --+ materialize
  rownum id
  from dual
  connect by
  rownum <= 10000
  )
  select
    rownum id,
    mod(rownum,1000) n_1000,
    mod(rownum,5000) n_5000,
    mod(rownum,10000) n_10000,
    lpad(rownum,10,'0') small_vc,
    rpad('x',100) padding
  from
    generator v1,
    generator v2
  where
  rownum <= 100000
  ;

SQL> create index my_ind on t1(id, n_5000, n_1000);

Index created.

SQL> create index colleague_ind on t1(id, n_1000, n_5000);

Index created.

SQL> alter index my_ind invisible;

Index altered.

SQL> exec dbms_stats.gather_table_stats(user, 't1');

PL/SQL procedure successfully completed.

SQL> select
        a.*
  from t1 a
  where id in (112,120)
  and (n_1000 is null
  or n_1000 <= 3000)
  and n_5000 in (120);

Statistics
------------------------------------------------------
 65 recursive calls
 0 db block gets
 95 consistent gets ---> spot this
 0 physical reads
 0 redo size
 1005 bytes sent via SQL*Net to client
 543 bytes received via SQL*Net from client
 2 SQL*Net roundtrips to/from client
 6 sorts (memory)
 0 sorts (disk)
 1 rows processed

SQL_ID 7d6ag1m1ztpgr, child number 1
-------------------------------------
select a.* from t1 a where id in (112,120) and (n_1000 is null
or n_1000 <= 3000) and n_5000 in (120)

Plan hash value: 3644584748
-------------------------------------------------------------------------------------------
| Id  | Operation                            | Name          | Rows  | Bytes | Cost (%CPU)|
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |               |       |       |     4 (100)|
|   1 |  INLIST ITERATOR                     |               |       |       |            |
|*  2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1            |     2 |   258 |     4   (0)|
|*  3 |    INDEX RANGE SCAN                  | COLLEAGUE_IND |     2 |       |     3   (0)|
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 2 - filter(("N_1000"<=3000 OR "N_1000" IS NULL))
 3 - access((("ID"=112 OR "ID"=120)) AND "N_5000"=120)
 filter("N_5000"=120) ---> spot this

SQL> alter index colleague_ind invisible;

Index altered.

SQL> alter index my_ind visible;

Index altered.

SQL> select
       a.*
  from t1 a
  where id in (112,120)
  and (n_1000 is null
  or n_1000 <= 3000)
  and n_5000 in (120);

Statistics
------------------------------------------------------
 33 recursive calls
 0 db block gets
 49 consistent gets --> spot the reduction
 0 physical reads
 0 redo size
 1005 bytes sent via SQL*Net to client
 543 bytes received via SQL*Net from client
 2 SQL*Net roundtrips to/from client
 6 sorts (memory)
 0 sorts (disk)
 1 rows processed

SQL_ID 7d6ag1m1ztpgr, child number 1
-------------------------------------
select a.* from t1 a where id in (112,120) and (n_1000 is null
or n_1000 <= 3000) and n_5000 in (120)

Plan hash value: 4286547933</pre>
------------------------------------------------------------------------------------
| Id  | Operation                            | Name   | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |        |       |       |     4 (100)|
|   1 |  INLIST ITERATOR                     |        |       |       |            |
|*  2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1     |     2 |   258 |     4   (0)|
|*  3 |    INDEX RANGE SCAN                  | MY_IND |     2 |       |     3   (0)|
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 2 - filter(("N_1000"<=3000 OR "N_1000" IS NULL))
 3 - access((("ID"=112 OR "ID"=120)) AND "N_5000"=120)
     --> spot the absence of filter on index

Comments (9)

9 Comments »

Mohamed,

There’s an anomaly with both execution plans that might be worth further investigation – all three predicate columns are in the index, so why is one of the predicate tests postponed until after the table visit ?
The same effect appears in 11.2.0.4; the anomaly doesn’t appear if you change the n_1000 predicate to nvl(n_1000,3000) <= 3000

Comment by Jonathan Lewis — March 2, 2014 @ 9:28 am | Reply

Jonathan,

Yes, you are right. When using nvl(n_1000,3000) <= 3000, the filter is done only on the index.

Looking to the corresponding 10053 trace files I have emphasized the following sections:

Orginal case:

id=0 frofkks[i] (index start key) predicate="A"."ID"=112 OR "A"."ID"=120
id=0 frofkks[i] (index start key) predicate="A"."N_5000"=120
id=0 frofkke[i] (index stop key) predicate="A"."ID"=112 OR "A"."ID"=120
id=0 frofkke[i] (index stop key) predicate="A"."N_5000"=120
id=0 frofand predicate="A"."N_1000"<=3000 OR "A"."N_1000" IS NULL

Where I guess that the frofand is related to the table filter operation

Case with modified predicate as nvl(n_1000,3000) <= 3000:

id=0 frofkks[i] (index start key) predicate="A"."ID"=112 OR "A"."ID"=120
id=0 frofkks[i] (index start key) predicate="A"."N_5000"=120
id=0 frofkke[i] (index stop key) predicate="A"."ID"=112 OR "A"."ID"=120
id=0 frofkke[i] (index stop key) predicate="A"."N_5000"=120
id=0 froiand (index only filter) predicate=NVL("A"."N_1000",3000)<=3000

Where the filter is this time applied on the index only

But the question is why?

I think that this is due to the fact that null are not indexed and the CBO needs to go to the table in order to filter n_1000 null values. And when we’ve changed the predicate it is as if we have allowed the CBO to use the index in order to check n_1000 null values because we have “replaced” those null n_1000 values by 3000. This is the unique explanation I came up. This explanation might be comforted by the following example where I have set the n_1000 column to be not null

SQL> alter table t1 modify n_1000 not null;

Table altered.

SQL> select
  2          a.*
  3    from t1 a
  4    where id in (112,120)
  5    and (n_1000 is null
  6    or n_1000 <= 3000)
  7    and n_5000 in (120);

SQL_ID  gtqawgpwprz7n, child number 0
-------------------------------------
select         a.*   from t1 a   where id in (112,120)   and (n_1000 is
null   or n_1000 <= 3000)   and n_5000 in (120)

Plan hash value: 4286547933

-----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |        |       |       |     4 (100)|          |
|   1 |  INLIST ITERATOR                     |        |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1     |     2 |   258 |     4   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | MY_IND |     2 |       |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access((("ID"=112 OR "ID"=120)) AND "N_5000"=120 AND "N_1000"<=3000)

Thanks a lot for giving me this opportunity to investigate. I am hungry of such cases to look at.

Best regards

Comment by hourim — March 2, 2014 @ 4:42 pm | Reply

Mohamed,

If you change the predicate to just: and n_1000 is null you’ll get the filter applied in the index. There’s no reason why Oracle should have trouble finding the NULL in the index just because it’s in an OR.

You might also try declaring ID as not null with the OR predicate in case there’s some feature of the optimizer code that decides (on splitting the OR) that the NULL part actually can’t be resolved safely in the index because it believes (incorrectly) that there could be such entries that don’t appear in the index. What is you select only the three columns in the index rather than * ? Does this become an index-only query, or do you still see the OR predicate applied at the table ?

Comment by Jonathan Lewis — March 2, 2014 @ 5:04 pm

Jonathan

Definitely it needs a deep investigation

Selecting only indexed columns: there is no access to the table

SQL> select
        a.id
       ,a.n_5000
       ,a.n_1000
    from t1 a
    where id in (112,120)
    and (n_1000 is null
         or n_1000 <= 3000)
    and n_5000 in (120);

SQL_ID  dc4sxrk5sac9q, child number 0
-------------------------------------
select     a.id    ,a.n_5000    ,a.n_1000 from t1 a where id in
(112,120) and (n_1000 is null      or n_1000 <= 3000) and n_5000 in
(120)

Plan hash value: 3247889462
--------------------------------------------------------------------------------------
| Id  | Operation         | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |      1 |        |      1 |00:00:00.01 |       5 |
|   1 |  INLIST ITERATOR  |        |      1 |        |      1 |00:00:00.01 |       5 |
|*  2 |   INDEX RANGE SCAN| MY_IND |      2 |      2 |      1 |00:00:00.01 |       5 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access((("ID"=112 OR "ID"=120)) AND "N_5000"=120)
       filter(("N_1000"<=3000 OR "N_1000" IS NULL))

Declaring only id column as not null: there is no filter on the table

SQL> alter table t1 modify id not null;

Table altered.

SQL> select
        a.*
    from t1 a
    where id in (112,120)
    and (n_1000 is null
         or n_1000 <= 3000)
    and n_5000 in (120);

SQL_ID  7d6ag1m1ztpgr, child number 1
-------------------------------------
select     a.* from t1 a where id in (112,120) and (n_1000 is null
or n_1000 <= 3000) and n_5000 in (120)

Plan hash value: 4286547933
--------------------------------------------------------------------------------------------
| Id  | Operation                            | Name   | Starts | E-Rows | A-Rows | Buffers |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |        |      1 |        |      1 |       6 |
|   1 |  INLIST ITERATOR                     |        |      1 |        |      1 |       6 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1     |      2 |      2 |      1 |       6 |
|*  3 |    INDEX RANGE SCAN                  | MY_IND |      2 |      2 |      1 |       5 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access((("ID"=112 OR "ID"=120)) AND "N_5000"=120)
       filter(("N_1000"<=3000 OR "N_1000" IS NULL))

Best regards

Comment by hourim — March 2, 2014 @ 6:36 pm

Mohamed,
Both plans have the same PHV ans PHV2 ?

Comment by Laroussi — March 2, 2014 @ 3:00 pm | Reply

Laroussi

They don’t have the same plan hash value; you can see this in the plans posted above. Having different indexes (and indexes names) there is a little chance for them to share the same phv2.

And here below how to check (sql_id should be in the memory)

Using my_ind index:

SQL> SELECT
  2         p.sql_id
  3        ,p.plan_hash_value
  4        ,t.phv2
  5  FROM   v$sql_plan p
  6        ,xmltable('for $i in /other_xml/info
  7                   where $i/@type eq "plan_hash_2"
  8                   return $i'
  9                   passing xmltype(p.other_xml)
 10                   columns phv2 number path '/') t
 11      WHERE p.sql_id = '7d6ag1m1ztpgr'
 12      and   p.other_xml is not null
 13      ;

SQL_ID        PLAN_HASH_VALUE       PHV2
------------- --------------- ----------
7d6ag1m1ztpgr      4286547933 3312611661

Using colleague_ind index:

SQL> SELECT
  2         p.sql_id
  3        ,p.plan_hash_value
  4        ,t.phv2
  5  FROM   v$sql_plan p
  6        ,xmltable('for $i in /other_xml/info
  7                   where $i/@type eq "plan_hash_2"
  8                   return $i'
  9                   passing xmltype(p.other_xml)
 10                   columns phv2 number path '/') t
 11      WHERE p.sql_id = '7d6ag1m1ztpgr'
 12      and   p.other_xml is not null
 13      ;

SQL_ID        PLAN_HASH_VALUE       PHV2
------------- --------------- ----------
7d6ag1m1ztpgr      3644584748 2343153009

Comment by hourim — March 2, 2014 @ 5:08 pm | Reply

I think, it’s not so easy, because there are too many very different cases.
Of course, it’s great if all your queries has predicates like that, but it’s quite often when you have also big number other queries without predicates by one of the columns. And if it would be provider_id, you will get many “index skip scans”.
Or if you have queries with big number inlist values like that:
scheduled_time between :1 and :2
AND (sms_status,provider_id) in (
(700,:v1),(700,v2), ..(700.v1000)
, (707,:v1),(707,v2), ..(707.v1000))

Therefore, in such cases, I always ask about the other queries and their frequency distribution.
Also, it’s very important to check the amount of load, to be sure that new index will not cause index contentions – may be, would be better to use global partitioned index.

Best regards,
Sayan Malakshinov

Comment by Sayan Malakshinov — March 3, 2014 @ 7:43 pm | Reply
Sayan,

As it is very often the case in Oracle (and in many other domains) see how many questions can be raised from a simple discussion about the best index to cover a single query. And in the path to get the best possible index, if we observe attentively to what we came up with (as Jonathan Lewis did on the unnecessary table visit) we might end up opening a new question which is completely different from the one that originated the initial discussion.

Best regards

Comment by hourim — March 4, 2014 @ 11:38 am | Reply
[…] the importance of the leading index column that should be the ones on which an equality predicate is […]

Pingback by Index design | Mohamed Houri’s Oracle Notes — September 3, 2014 @ 2:56 pm | Reply

RSS feed for comments on this post. TrackBack URI

Mohamed Houri’s Oracle Notes

March 2, 2014

Index design

9 Comments »

Leave a reply to hourim Cancel reply

Follow Blog via Email

Categories

Blog Stats

Blogs I Follow

Archives

Mohamed Houri’s Oracle Notes

March 2, 2014

Index design

Share this:

Related

9 Comments »

Leave a reply to hourim Cancel reply

Follow Blog via Email

Categories

Blog Stats

Blogs I Follow

Archives