Index compression: two birds with one stone

March 9, 2015

Index compression: two birds with one stone

Filed under: Index — hourim @ 7:52 pm

Have you ever been asked to trouble shoot a performance issue of a complex query with the following restrictions:

You are not allowed to change the code of the query because it belongs to a third party software
You are not allowed to create a new index because of disk space stress
You are supposed to solve the issue without using neither a SPM baseline nor a SQL Profile

What do you think you still have on your hands to tackle this issue?

I was quite confident that the performance issue was coming, as almost always, from a poor or not representative statistics which ultimately have biased the Optimizer choosing a wrong execution path. The row source execution plan taken this time from the Real Time Sql Monitoring report confirms my initial feeling about non representative statistics as shown by the several 1 estimated cardinality below (Rows Estim):

SQL Plan Monitoring Details (Plan Hash Value=2278065992)
==============================================================================================
| Id |                Operation                 |          Name |  Rows   | Execs |   Rows   |
|    |                                          |               | (Estim) |       | (Actual) |
==============================================================================================
|  0 | SELECT STATEMENT                         |               |         |     1 |     1059 |
|  1 |   HASH GROUP BY                          |               |       1 |     1 |     1059 |
|  2 |    FILTER                                |               |         |     1 |     135K |
|  3 |     NESTED LOOPS                         |               |       1 |     1 |     135K |
|  4 |      NESTED LOOPS                        |               |       1 |     1 |     135K |
|  5 |       NESTED LOOPS OUTER                 |               |       1 |     1 |     135K |
|  6 |        NESTED LOOPS                      |               |       1 |     1 |     135K |
|  7 |         HASH JOIN OUTER                  |               |       1 |     1 |     145K |
|  8 |          NESTED LOOPS                    |               |         |     1 |     145K |
|  9 |           NESTED LOOPS                   |               |       1 |     1 |     145K |
| 10 |            NESTED LOOPS                  |               |       1 |     1 |     146K |
| 11 |             NESTED LOOPS                 |               |       1 |     1 |     146K |
| 12 |              NESTED LOOPS                |               |       1 |     1 |        1 |
| 13 |               NESTED LOOPS               |               |       1 |     1 |        1 |
| 14 |                FAST DUAL                 |               |       1 |     1 |        1 |
| 15 |                FAST DUAL                 |               |       1 |     1 |        1 |
| 16 |               FAST DUAL                  |               |       1 |     1 |        1 |
| 17 |              TABLE ACCESS BY INDEX ROWID | TABLE_XY      |       1 |     1 |     146K |
| 18 |               INDEX RANGE SCAN           | IDX_TABLE_XY23|       1 |     1 |      12M |
==============================================================================================
17 - filter(("AU"."NEW_VALUE"=:SYS_B_119 AND "AU"."ATTRIBUTE_NAME"=:SYS_B_118))
18 - access("AU"."UPDATED_DATE">=TO_DATE(:SYS_B_001||TO_CHAR(EXTRACT(MONTH FROM
              ADD_MONTHS(CURRENT_DATE,(-:SYS_B_000))))||:SYS_B_002||TO_CHAR(EXTRACT(YEAR FROM
              ADD_MONTHS(CURRENT_DATE,(-:SYS_B_000))))||:SYS_B_003,:SYS_B_004)
              AND "AU"."COURSE_NAME"=:SYS_B_117
              AND "AU"."UPDATED_DATE"<=TO_DATE(TO_CHAR(LAST_DAY(ADD_MONTHS(CURRENT_DATE,(-
             :SYS_B_000))),:SYS_B_005)||:SYS_B_006,:SYS_B_007))
     filter("AU"."COURSE_NAME"=:SYS_B_117)

Collecting statistics adequately was of course the right path to follow; however looking at the above execution plan I have realized that the most consuming operation was an index range scan followed by a table access by index rowid. The index operation at line 18 was supplying its parent operation at line 17 with 12 million worth of rows from which the filter at this line has allowed only 2% (146K) of the rows to survive its elimination. This is a classical problem of imprecise index wasting a lot of time and energy in throwing away rows that should have been eliminated earlier. There is also another indication about the imprecision of the index used at operation line 18. Its predicate part contains both an access and a filter operation. In order to make the picture clear here below is the index definition and the predicate part used in the problematic query:

IDX_TABLE_XY23 (UPDATED_DATE, COURSE_NAME)

FROM TABLE_XY     
WHERE AU.COURSE_NAME = ‘Point’
AND AU.UPDATED_DATE  >= PidDates.END_DATE

The index has been defined to start with the column on which an inequality is applied. We should instead always place at the leading edge of the index the column on which we have the intention to apply an equality predicate. One solution would be to reverse the above index columns. But I was not going to do that without checking the whole application looking for queries using the UPDATED_DATE column in equality predicate so that reversing the columns of that index will harm them. Hopefully there were no such queries and the way was already paved for me to proceed to that index columns order reversing proposition.

Few minutes before going ahead I remembered that this application is suffering from a disk space shortage so that compressing the index would certainly help. Moreover the “new” leading index column is the less repetitive one which hopefully will give a better level of compressibility:

SQL> select column_name, num_distinct
     from all_tab_col_statistics
     where table_name = 'TABLE_XY'
     and column_name in ('UPDATED_DATE','COURSE_NAME');

COLUMN_NAME                    NUM_DISTINCT
------------------------------ ------------
UPDATED_DATE                   1309016064
COURSE_NAME                    63

SQL> create index idx_TABLE_XYmh on TABLE_XY(COURSE_NAME, UPDATED_DATE) compress 1;

And here are the results

Before the new index

1059 rows selected.
Elapsed: 00:32:37.34

Global Stats
================================================================
| Elapsed |   Cpu   |    IO    | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |
================================================================
|    2020 |     154 |     1867 |    72 |    10M |   1M |   9GB |
================================================================

After the new index

1059 rows selected.
Elapsed: 00:19:56.08

Global Stats
========================================================
| Elapsed |   Cpu   |    IO    | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Gets  | Reqs | Bytes |
========================================================
|    1204 |      70 |     1134 |     9M | 463K |   4GB |
========================================================

The query execution time dropped from 33 minutes to 19 minutes and so did the logical and physical I/O.

Of course I am still throwing the same amount of rows at the table level a shown below

SQL Plan Monitoring Details (Plan Hash Value=2577971998)
===================================================================
| Id    |  Operation                 |          Name   |   Rows   |
|       |                            |                 | (Actual) |
===================================================================
|    17 |TABLE ACCESS BY INDEX ROWID | TABLE_XY        |     146K |
|    18 | INDEX RANGE SCAN           | IDX_TABLE_XYMH  |      12M |
===================================================================
17 - filter(("AU"."NEW_VALUE"=:SYS_B_119 AND "AU"."ATTRIBUTE_NAME"=:SYS_B_118))
18 - access("AU"."COURSE_NAME"=:SYS_B_117 AND
           "AU"."UPDATED_DATE">=TO_DATE(:SYS_B_001||TO_CHAR(EXTRACT(MONTH FROM
            ADD_MONTHS(CURRENT_DATE,(-:SYS_B_000))))||:SYS_B_002||TO_CHAR(EXTRACT(YEAR FROM
            ADD_MONTHS(CURRENT_DATE,(-:SYS_B_000))))||:SYS_B_003,:SYS_B_004) AND
           "AU"."UPDATED_DATE"<=TO_DATE(TO_CHAR(LAST_DAY(ADD_MONTHS(CURRENT_DATE,(-:SYS_B_000)))
           ,:SYS_B_005)||:SYS_B_006,:SYS_B_007))

But I have drastically reduced the part of the index I was traversing before reversing the columns order and I have got rid of the filter at the index predicate. And finally, because it is the title of this article, by compressing the index I have gained 67GB

SQL> select segment_name, trunc(bytes/1024/1024/1024) GB
    from dba_segments
    where segment_type = 'INDEX'
    and segment_name in ('IDX_TABLE_XYMH','IDX_TABLE_XY23');

SEGMENT_NAME                           GB
------------------------------ ----------
IDX_TABLE_XYMH                        103
IDX_TABLE_XY23                        170

Comments (6)

6 Comments »

Hi,
Thanks for sharing but be careful , updating of compressed index can be very slow …performance issue !
Kais

Comment by kaisyasmine — March 10, 2015 @ 5:30 am | Reply
- Kais
  
  Yes you’are right. Compressed index may cause an increase in CPU during inserts and updates. Jonathan Lewis explained this side effects in the following article
  
  http://allthingsoracle.com/compression-in-oracle-part-5-costs-of-index-compression/
  
  Best regards
  Mohamed
  
  Comment by hourim — March 10, 2015 @ 8:27 am | Reply
Hi Mohamed,

in 12c you have now sql translation framework … perhaps can be useful with the set of restrictions you listed at the beginning of your article
( dbms_sql_translator ) …

Regards,
Frank

Comment by Frank — March 10, 2015 @ 7:04 am | Reply
- Frank
  
  Thanks for this information. I’ve just read Kerry Osborne article about the 12c SQL translation feature (http://kerryosborne.oracle-guy.com/2013/07/sql-translation-framework/). Interesting. I will try to start investigating it as soon as possible
  
  Best regards
  Mohamed
  
  Comment by hourim — March 10, 2015 @ 8:51 am | Reply
Mohamed,
any chance you would be allowed to spend the gained 67GB into adding one of the step 17 filter attributes to the index? Of course this could have an impact on other queries, and of course the 67GB could be insufficient to add an attribute – but step 17 just hurts…

Regards
Martin

Comment by Martin Preiss — March 10, 2015 @ 9:29 am | Reply
Hi Martin,

That was my initial index proposition. No filter on the table at all. However, there is such an over indexing that we are trying to get rid carefully of the redundant ones I have already identified. For the moment we put a “hold on” on new indexes.

Comment by hourim — March 10, 2015 @ 9:58 am | Reply

RSS feed for comments on this post. TrackBack URI

Mohamed Houri’s Oracle Notes

March 9, 2015