Here it is a very brief discussion I have had with one of my colleagues about index design
Colleague: what kind of index would you suggest to cover the following query?
SELECT rowid ,a.* FROM message_out a WHERE sms_status in (700, 707) AND (scheduled_time is null OR scheduled_time <= :1) AND provider_id in (0,0) ORDER BY priority_level desc, creation_time asc; ---------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| ---------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5529 | 1376K | | 4769 (1) | | 1 | SORT ORDER BY | | 5529 | 1376K | 1856K | 4769 (1) | |* 2 | TABLE ACCESS FULL| MESSAGE_OUT | 5529 | 1376K | | 4462 (1) | ---------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(("SMS_STATUS"=700 OR "SMS_STATUS"=707) AND ("SCHEDULED_TIME" IS NULL OR "SCHEDULED_TIME"<=:1) AND "PROVIDER_ID"=0)
Me: and what have you ended up with until now?
Colleague: here my suggested index and the related execution plan
CREATE INDEX virtual_index ON MESSAGE_OUT(sms_status,scheduled_time,provider_id) ; --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 5529 | 1376K | | 446 (1) | | 1 | SORT ORDER BY | | 5529 | 1376K | 1856K | 446 (1) | | 2 | INLIST ITERATOR | | | | | | |* 3 | TABLE ACCESS BY INDEX ROWID| MESSAGE_OUT | 5529 | 1376K | | 140 (0) | |* 4 | INDEX RANGE SCAN | VIRTUAL_INDEX | 5529 | | | 6 (0) | ---------------------------------------------------------------------------------------
Me: I would not have created the same index
Me: here it is the index I would have created (after several questions regarding the data distribution, the table data volume, the use of bind variables, etc…)
create index mho_ind on MESSAGE_OUT (status, provider_id, scheduled_time);
Me: and if sms_status contains repetitive values then I would have added a compress command to that index creation
Colleague: there is no difference in the execution plan either by using my index or your index
------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| ------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 5529 | 1376K | | 446 (1) | | 1 | SORT ORDER BY | | 5529 | 1376K | 1856K | 446 (1) | | 2 | INLIST ITERATOR | | | | | | |* 3 | TABLE ACCESS BY INDEX ROWID | MESSAGE_OUT | 5529 | 1376K | | 140 (0) | |* 4 | INDEX RANGE SCAN | VIRTUAL_INDEX | 5529 | | | 6 (0) | ------------------------------------------------------------------------------------------
Me: no, it is not the same plan. Please always consider the predicate part
Me: what is the predicate part of the plan using your index
Colleague: this is my index predicate part
Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("SCHEDULED_TIME" IS NULL OR "SCHEDULED_TIME"<=:1) 4 - access(("SMS_STATUS"=700 OR "SMS_STATUS"=707) AND "PROVIDER_ID"=0) filter("PROVIDER_ID"=0) --> additional filter operation
Colleague: and this is your index predicate part
Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("SCHEDULED_TIME" IS NULL OR "SCHEDULED_TIME"<=:1) 4 - access(("SMS_STATUS"=700 OR "SMS_STATUS"=707) AND "PROVIDER_ID"=0) --> no additional filter operation
Me: and did you pointed out the difference or not yet?
Colleague: no, same plan, same cost and same execution time
Me: there is a fundamental difference between your plan and mine. In your plan there is a double operation on your engineered index: “ACCESS + FILTER” operation while my engineered index needs only one precise operation : “ACCESS”
Me: and when it comes to performance you always prefers a precise index ACCESS operation to that double ACCESS and FILTER operations.
Me: your engineered index has a second columns on which an inequality predicate is applied
SCHEDULED_TIME <= :1
You should always start your index by the columns on which an equality predicate is applied. In my case, I put the SCHEDULED_TIME column at the trailing edge of my index and doing as such I have avoided a costly filter operation on my index while your engineered index has been subject to that costly filter operation
If you want to test this behaviour then below is an example to play with. I hope you will enjoy it
SQL> create table t1 (id number, n_1000 number, n_5000 number, n_10000 number, small_vc varchar2(20), padding varchar2(100) ); Table created. SQL> insert into t1 with generator as ( select --+ materialize rownum id from dual connect by rownum <= 10000 ) select rownum id, mod(rownum,1000) n_1000, mod(rownum,5000) n_5000, mod(rownum,10000) n_10000, lpad(rownum,10,'0') small_vc, rpad('x',100) padding from generator v1, generator v2 where rownum <= 100000 ; SQL> create index my_ind on t1(id, n_5000, n_1000); Index created. SQL> create index colleague_ind on t1(id, n_1000, n_5000); Index created. SQL> alter index my_ind invisible; Index altered. SQL> exec dbms_stats.gather_table_stats(user, 't1'); PL/SQL procedure successfully completed. SQL> select a.* from t1 a where id in (112,120) and (n_1000 is null or n_1000 <= 3000) and n_5000 in (120); Statistics ------------------------------------------------------ 65 recursive calls 0 db block gets 95 consistent gets ---> spot this 0 physical reads 0 redo size 1005 bytes sent via SQL*Net to client 543 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 6 sorts (memory) 0 sorts (disk) 1 rows processed SQL_ID 7d6ag1m1ztpgr, child number 1 ------------------------------------- select a.* from t1 a where id in (112,120) and (n_1000 is null or n_1000 <= 3000) and n_5000 in (120) Plan hash value: 3644584748 ------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | 4 (100)| | 1 | INLIST ITERATOR | | | | | |* 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 2 | 258 | 4 (0)| |* 3 | INDEX RANGE SCAN | COLLEAGUE_IND | 2 | | 3 (0)| ------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(("N_1000"<=3000 OR "N_1000" IS NULL)) 3 - access((("ID"=112 OR "ID"=120)) AND "N_5000"=120) filter("N_5000"=120) ---> spot this SQL> alter index colleague_ind invisible; Index altered. SQL> alter index my_ind visible; Index altered. SQL> select a.* from t1 a where id in (112,120) and (n_1000 is null or n_1000 <= 3000) and n_5000 in (120); Statistics ------------------------------------------------------ 33 recursive calls 0 db block gets 49 consistent gets --> spot the reduction 0 physical reads 0 redo size 1005 bytes sent via SQL*Net to client 543 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 6 sorts (memory) 0 sorts (disk) 1 rows processed SQL_ID 7d6ag1m1ztpgr, child number 1 ------------------------------------- select a.* from t1 a where id in (112,120) and (n_1000 is null or n_1000 <= 3000) and n_5000 in (120) Plan hash value: 4286547933</pre> ------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| ------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | | | 4 (100)| | 1 | INLIST ITERATOR | | | | | |* 2 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 2 | 258 | 4 (0)| |* 3 | INDEX RANGE SCAN | MY_IND | 2 | | 3 (0)| ------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter(("N_1000"<=3000 OR "N_1000" IS NULL)) 3 - access((("ID"=112 OR "ID"=120)) AND "N_5000"=120) --> spot the absence of filter on index
Mohamed,
There’s an anomaly with both execution plans that might be worth further investigation – all three predicate columns are in the index, so why is one of the predicate tests postponed until after the table visit ?
The same effect appears in 11.2.0.4; the anomaly doesn’t appear if you change the n_1000 predicate to nvl(n_1000,3000) <= 3000
Comment by Jonathan Lewis — March 2, 2014 @ 9:28 am |
Jonathan,
Yes, you are right. When using nvl(n_1000,3000) <= 3000, the filter is done only on the index.
Looking to the corresponding 10053 trace files I have emphasized the following sections:
Orginal case:
Where I guess that the frofand is related to the table filter operation
Case with modified predicate as nvl(n_1000,3000) <= 3000:
Where the filter is this time applied on the index only
But the question is why?
I think that this is due to the fact that null are not indexed and the CBO needs to go to the table in order to filter n_1000 null values. And when we’ve changed the predicate it is as if we have allowed the CBO to use the index in order to check n_1000 null values because we have “replaced” those null n_1000 values by 3000. This is the unique explanation I came up. This explanation might be comforted by the following example where I have set the n_1000 column to be not null
Thanks a lot for giving me this opportunity to investigate. I am hungry of such cases to look at.
Best regards
Comment by hourim — March 2, 2014 @ 4:42 pm |
Mohamed,
If you change the predicate to just: and n_1000 is null you’ll get the filter applied in the index. There’s no reason why Oracle should have trouble finding the NULL in the index just because it’s in an OR.
You might also try declaring ID as not null with the OR predicate in case there’s some feature of the optimizer code that decides (on splitting the OR) that the NULL part actually can’t be resolved safely in the index because it believes (incorrectly) that there could be such entries that don’t appear in the index. What is you select only the three columns in the index rather than * ? Does this become an index-only query, or do you still see the OR predicate applied at the table ?
Comment by Jonathan Lewis — March 2, 2014 @ 5:04 pm
Jonathan
Definitely it needs a deep investigation
Selecting only indexed columns: there is no access to the table
Declaring only id column as not null: there is no filter on the table
Best regards
Comment by hourim — March 2, 2014 @ 6:36 pm
Mohamed,
Both plans have the same PHV ans PHV2 ?
Comment by Laroussi — March 2, 2014 @ 3:00 pm |
Laroussi
They don’t have the same plan hash value; you can see this in the plans posted above. Having different indexes (and indexes names) there is a little chance for them to share the same phv2.
And here below how to check (sql_id should be in the memory)
Using my_ind index:
Using colleague_ind index:
Comment by hourim — March 2, 2014 @ 5:08 pm |
I think, it’s not so easy, because there are too many very different cases.
Of course, it’s great if all your queries has predicates like that, but it’s quite often when you have also big number other queries without predicates by one of the columns. And if it would be provider_id, you will get many “index skip scans”.
Or if you have queries with big number inlist values like that:
scheduled_time between :1 and :2
AND (sms_status,provider_id) in (
(700,:v1),(700,v2), ..(700.v1000)
, (707,:v1),(707,v2), ..(707.v1000))
Therefore, in such cases, I always ask about the other queries and their frequency distribution.
Also, it’s very important to check the amount of load, to be sure that new index will not cause index contentions – may be, would be better to use global partitioned index.
Best regards,
Sayan Malakshinov
Comment by Sayan Malakshinov — March 3, 2014 @ 7:43 pm |
Sayan,
As it is very often the case in Oracle (and in many other domains) see how many questions can be raised from a simple discussion about the best index to cover a single query. And in the path to get the best possible index, if we observe attentively to what we came up with (as Jonathan Lewis did on the unnecessary table visit) we might end up opening a new question which is completely different from the one that originated the initial discussion.
Best regards
Comment by hourim — March 4, 2014 @ 11:38 am |
[…] the importance of the leading index column that should be the ones on which an equality predicate is […]
Pingback by Index design | Mohamed Houri’s Oracle Notes — September 3, 2014 @ 2:56 pm |