Mohamed Houri’s Oracle Notes

March 9, 2015

Index compression: two birds with one stone

Filed under: Index — hourim @ 7:52 pm

Have you ever been asked to trouble shoot a performance issue of a complex query with the following restrictions:

  • You are not allowed to change the code of the query because it belongs to a third party software
  • You are not allowed to create a new index because of disk space stress
  • You are supposed to solve the issue without using  neither a SPM baseline nor a SQL Profile

What do you think you still have on your hands to tackle this issue?

I was quite confident that the performance issue was coming, as almost always, from a poor or not representative statistics which ultimately have biased the Optimizer choosing a wrong execution path. The row source execution plan taken this time from the Real Time Sql Monitoring report confirms my initial feeling about non representative statistics as shown by the several 1 estimated cardinality below (Rows Estim):

SQL Plan Monitoring Details (Plan Hash Value=2278065992)
==============================================================================================
| Id |                Operation                 |          Name |  Rows   | Execs |   Rows   |
|    |                                          |               | (Estim) |       | (Actual) |
==============================================================================================
|  0 | SELECT STATEMENT                         |               |         |     1 |     1059 |
|  1 |   HASH GROUP BY                          |               |       1 |     1 |     1059 |
|  2 |    FILTER                                |               |         |     1 |     135K |
|  3 |     NESTED LOOPS                         |               |       1 |     1 |     135K |
|  4 |      NESTED LOOPS                        |               |       1 |     1 |     135K |
|  5 |       NESTED LOOPS OUTER                 |               |       1 |     1 |     135K |
|  6 |        NESTED LOOPS                      |               |       1 |     1 |     135K |
|  7 |         HASH JOIN OUTER                  |               |       1 |     1 |     145K |
|  8 |          NESTED LOOPS                    |               |         |     1 |     145K |
|  9 |           NESTED LOOPS                   |               |       1 |     1 |     145K |
| 10 |            NESTED LOOPS                  |               |       1 |     1 |     146K |
| 11 |             NESTED LOOPS                 |               |       1 |     1 |     146K |
| 12 |              NESTED LOOPS                |               |       1 |     1 |        1 |
| 13 |               NESTED LOOPS               |               |       1 |     1 |        1 |
| 14 |                FAST DUAL                 |               |       1 |     1 |        1 |
| 15 |                FAST DUAL                 |               |       1 |     1 |        1 |
| 16 |               FAST DUAL                  |               |       1 |     1 |        1 |
| 17 |              TABLE ACCESS BY INDEX ROWID | TABLE_XY      |       1 |     1 |     146K |
| 18 |               INDEX RANGE SCAN           | IDX_TABLE_XY23|       1 |     1 |      12M |
==============================================================================================
17 - filter(("AU"."NEW_VALUE"=:SYS_B_119 AND "AU"."ATTRIBUTE_NAME"=:SYS_B_118))
18 - access("AU"."UPDATED_DATE">=TO_DATE(:SYS_B_001||TO_CHAR(EXTRACT(MONTH FROM
              ADD_MONTHS(CURRENT_DATE,(-:SYS_B_000))))||:SYS_B_002||TO_CHAR(EXTRACT(YEAR FROM
              ADD_MONTHS(CURRENT_DATE,(-:SYS_B_000))))||:SYS_B_003,:SYS_B_004)
              AND "AU"."COURSE_NAME"=:SYS_B_117
              AND "AU"."UPDATED_DATE"<=TO_DATE(TO_CHAR(LAST_DAY(ADD_MONTHS(CURRENT_DATE,(-
             :SYS_B_000))),:SYS_B_005)||:SYS_B_006,:SYS_B_007))
     filter("AU"."COURSE_NAME"=:SYS_B_117)

Collecting statistics adequately was of course the right path to follow; however looking at the above execution plan I have realized that the most consuming operation was an index range scan followed by a table access by index rowid. The index operation at line 18 was supplying its parent operation at line 17 with 12 million worth of rows from which the filter at this line has allowed only 2% (146K) of the rows to survive its elimination. This is a classical problem of imprecise index wasting a lot of time and energy in throwing away rows that should have been eliminated earlier.  There is also another indication about the imprecision of the index used at operation line 18. Its predicate part contains both an access and a filter operation. In order to make the picture clear here below is the index definition and the predicate part used in the problematic query:

IDX_TABLE_XY23 (UPDATED_DATE, COURSE_NAME)

FROM TABLE_XY     
WHERE AU.COURSE_NAME = ‘Point’
AND AU.UPDATED_DATE  >= PidDates.END_DATE

The index has been defined to start with the column on which an inequality is applied. We should instead always place at the leading edge of the index the column on which we have the intention to apply an equality predicate. One solution would be to reverse the above index columns. But I was not going to do that without checking the whole application looking for queries using the UPDATED_DATE column in equality predicate so that reversing the columns of that index will harm them. Hopefully there were no such queries and the way was already paved for me to proceed to that index columns order reversing proposition.

Few minutes before going ahead I remembered that this application is suffering from a disk space shortage so that compressing the index would certainly help. Moreover the “new” leading index column is the less repetitive one which hopefully will give a better level of compressibility:

SQL> select column_name, num_distinct
     from all_tab_col_statistics
     where table_name = 'TABLE_XY'
     and column_name in ('UPDATED_DATE','COURSE_NAME');

COLUMN_NAME                    NUM_DISTINCT
------------------------------ ------------
UPDATED_DATE                   1309016064
COURSE_NAME                    63
SQL> create index idx_TABLE_XYmh on TABLE_XY(COURSE_NAME, UPDATED_DATE) compress 1;

And here are the results

Before the new index

1059 rows selected.
Elapsed: 00:32:37.34

Global Stats
================================================================
| Elapsed |   Cpu   |    IO    | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  | Reqs | Bytes |
================================================================
|    2020 |     154 |     1867 |    72 |    10M |   1M |   9GB |
================================================================

After the new index

1059 rows selected.
Elapsed: 00:19:56.08

Global Stats
========================================================
| Elapsed |   Cpu   |    IO    | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Gets  | Reqs | Bytes |
========================================================
|    1204 |      70 |     1134 |     9M | 463K |   4GB |
========================================================

The query execution time dropped from 33 minutes to 19 minutes and so did the logical and physical I/O.

Of course I am still throwing the same amount of rows at the table level a shown below

SQL Plan Monitoring Details (Plan Hash Value=2577971998)
===================================================================
| Id    |  Operation                 |          Name   |   Rows   |
|       |                            |                 | (Actual) |
===================================================================
|    17 |TABLE ACCESS BY INDEX ROWID | TABLE_XY        |     146K |
|    18 | INDEX RANGE SCAN           | IDX_TABLE_XYMH  |      12M |
===================================================================
17 - filter(("AU"."NEW_VALUE"=:SYS_B_119 AND "AU"."ATTRIBUTE_NAME"=:SYS_B_118))
18 - access("AU"."COURSE_NAME"=:SYS_B_117 AND
           "AU"."UPDATED_DATE">=TO_DATE(:SYS_B_001||TO_CHAR(EXTRACT(MONTH FROM
            ADD_MONTHS(CURRENT_DATE,(-:SYS_B_000))))||:SYS_B_002||TO_CHAR(EXTRACT(YEAR FROM
            ADD_MONTHS(CURRENT_DATE,(-:SYS_B_000))))||:SYS_B_003,:SYS_B_004) AND
           "AU"."UPDATED_DATE"<=TO_DATE(TO_CHAR(LAST_DAY(ADD_MONTHS(CURRENT_DATE,(-:SYS_B_000)))
           ,:SYS_B_005)||:SYS_B_006,:SYS_B_007))

But I have drastically reduced the part of the index I was traversing before reversing the columns order and I have got rid of the filter at the index predicate.  And finally, because it is the title of this article, by compressing the index I have gained 67GB

SQL> select segment_name, trunc(bytes/1024/1024/1024) GB
    from dba_segments
    where segment_type = 'INDEX'
    and segment_name in ('IDX_TABLE_XYMH','IDX_TABLE_XY23');

SEGMENT_NAME                           GB
------------------------------ ----------
IDX_TABLE_XYMH                        103
IDX_TABLE_XY23                        170

 

6 Comments »

  1. Hi,
    Thanks for sharing but be careful , updating of compressed index can be very slow …performance issue !
    Kais

    Comment by kaisyasmine — March 10, 2015 @ 5:30 am | Reply

  2. Hi Mohamed,

    in 12c you have now sql translation framework … perhaps can be useful with the set of restrictions you listed at the beginning of your article
    ( dbms_sql_translator ) …

    Regards,
    Frank

    Comment by Frank — March 10, 2015 @ 7:04 am | Reply

  3. Mohamed,
    any chance you would be allowed to spend the gained 67GB into adding one of the step 17 filter attributes to the index? Of course this could have an impact on other queries, and of course the 67GB could be insufficient to add an attribute – but step 17 just hurts…

    Regards
    Martin

    Comment by Martin Preiss — March 10, 2015 @ 9:29 am | Reply

  4. Hi Martin,

    That was my initial index proposition. No filter on the table at all. However, there is such an over indexing that we are trying to get rid carefully of the redundant ones I have already identified. For the moment we put a “hold on” on new indexes.

    Comment by hourim — March 10, 2015 @ 9:58 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Create a free website or blog at WordPress.com.

Tony's Oracle Tips

Tony Hasler's light hearted approach to learning about Oracle

Richard Foote's Oracle Blog

Focusing Specifically On Oracle Indexes, Database Administration and Some Great Music

Hatem Mahmoud's blog

Just another blog : Databases, Linux and other stuffs

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Diagnostician

Performance troubleshooting as exact science

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)