Mohamed Houri’s Oracle Notes

September 3, 2014

Index design I

Filed under: Uncategorized — hourim @ 2:56 pm

Very often questions about the best position a column should be put in within a composite index come out into forums and Oracle discussions. The last question I have contributed to and tried to answer has been raised up in a French Forum. The original poster was wondering whether it is a good idea to place the very often repeated column (contains duplicates) at the leading edge of the index or not.

First of all, in contrast to my usual style of blogging I am not going to provide a SQL model on which I am going to expand my knowledge of index design. However, for those who want to learn and master indexes I would encourage them to read the world expert person in this field, Richard Foote. He has an excellent blog with several articles about almost all what one has to know about indexes and not only the widely used b-tree indexes but on all other types of indexes including bitmap indexes, function based indexes, partitioned indexes, exadata storage indexes etc..

The second reference is as always Jonathan Lewis blog in which you can find several articles about  index design, index efficiency and index maintenance. In addition, it is not sufficient to know how to design precise index; you need to know as well how your index will evolve with the delete, insert or update operations their underlined tables will undergo during the lifecycle of the application they participate to its normal functioning.

The third reference is the book Relational Database Index Design and the Optimizers which extends the index design to several databases including DB2, Oracle and SQL Server

I, from time to time, come to read very interesting articles about indexes in this web site that I am following via twitter. It contains valuable index design information which, according to what I have read up to now, is pertinent, correct and back up all what I have learned from Jonathan Lewis, Richard Foote and from my own professional experience.

That’s said, I will post here below few of my answers and articles(and Jonathan Lewis articles) about index design as a small answer to a lot of questions about index design

  1. On the importance of the leading index columns that should be the ones on which an equality predicate is applied
  2. Indexing Foreign keys
  3. Redundant Indexes
  4. Global or Local Partitioned Index
  5. Compressing indexes basic part and cost of index compression

I am planning to write several other articles on indexes and I will be completing the above list as far as I will go with this publishing task

My answer to the original poster question about the importance of the number of distinct values property of an index candidate column is that the starting index column decision is not driven by its number of distinct values. It is instead driven by:

  • The nature of the query where clause he has to cover
  • The nature of the predicate (equality, inequality, etc..) applied on the starting column
  • The constant desire to cover with the same index one or a couple of other queries
  • The constant desire to cover with the same index a foreign key deadlock threat : sometime just by reversing the columns order we succeed to cover the current query and an existing foreign key
  • The constant desire to avoid redundant indexes

And finally comes up the reason for which one has to consider placing the column with the small number of distinct values at the leading edge of the index: Compression. If you start your index with the more often duplicated column you will make a good index compression reducing, efficiently, the size of that index which means it will be very quickly put into the buffer cache and will be kept there more often due to its small size.

July 30, 2014

On table constraint execution order

Filed under: Oracle — hourim @ 8:03 pm

Introduction

Before to be left behind due to my infrequent blogging activity these days, here it is a simple note about the order of execution of constraints and triggers placed on a given table. Basically, I am inclined, according to my very recent tests done on 11.2.0.3 and shown below, to claimn that that order of execution obeys the following steps:

  1. Before (insert) trigger fires first
  2. Then CHECK or UNIQUE constraint follows. We will see herein after in what order
  3. Then it will be the turn of after (insert) trigger
  4.  And finally checking Foreign Key integrity constraint comes at the trailing edge of the process

The execution order of the check and unique constraints seems to be closely related to the order of the columns (in the table) with which the constraints has been implemented, as shown by Karthick_App in the above mentioned otn thread.

 Details

This is the model on which I have made the above observations:


drop table c purge;
drop table p purge;

create table p as select rownum n1, trunc((rownum-1)/3) n2
from dual connect by level <= 10;

alter table p add constraint p_pk primary key (n1);

create table c (m1 number not null, m2 number, m3 number);

alter table c add constraint child_fk foreign key (m2) references p(n1);

alter table c add constraint child_ck check (m3 in (1,2));

alter table c add constraint child_uk unique (m1);
  
create or replace trigger c_bfi_trg before insert on c
for each row
begin
    dbms_output.put_line('Trigger before insert fired');
end;
/

create or replace trigger c_afi_trg after insert on c
for each row
begin
    dbms_output.put_line('Trigger after insert fired');
end;
/       

The following select shows the order of the column within the table


select table_name, column_name, column_id
from user_tab_columns
where table_name = 'C'
order
by 1, 3;

TABLE_NAME                     COLUMN_NAME                     COLUMN_ID
------------------------------ ------------------------------ ----------
C                              M1                                      1
C                              M2                                      2
C                              M3                                      3

Note particularly that I have:

  • A not null CHECK constraint on the first column (m1) because it has been declared as the primary key of the child table c
  • A UNIQUE constraint on the first column (m1) because it has been declared as the primary key of the child table c
  • A FOREIGN Key constraint on the second column (m2) because it has been declared as a foreign key to the parent table p
  • A CHECK constraint on the third column(m3) to allow only 1 and 2 as possible values

A check constraint (being it unique or check on possible values) other than a foreign key constraint is executed according to the column order in the table. For example, the not null CHECK constraint will be the first to be verified because it has been implemented on column m1 which is the leading column of the table

SQL> set serveroutput on

SQL> insert into c values (null, 11, 3);

Trigger before insert fired
insert into c values (null, 11, 3)
                      *
ERROR at line 1:
ORA-01400: cannot insert NULL into ("XXX"."C"."M1")

Note that in the above insert, I have 3 reasons for which the insert ought to be refused

  • Inserting a null value into the primary key (m1)
  • Inserting a non-existing parent key (m2 = 11)
  • Inserting a non-acceptable value into m3 (m3 in (1,2))

Among all those refusal reasons, it is the ORA-01400 that has fired up because the NOT NULL constraint has been implemented on the first column of the table (m1). The same observation can be made when the unique constraint placed on the first column is violated as shown below:

SQL> insert into c values (1, 10,1);
Trigger before insert fired
Trigger after insert fired
1 row created.

SQL> insert into c values (1, 11,3);
Trigger before insert fired
insert into c values (1, 11,3)
*
ERROR at line 1:
ORA-00001: unique constraint (XXX.CHILD_UK) violated

When we work around the not null refusal reason by supplying a not null value to the m1 column and re-insert the same statement, this time it is the check constraint m3 in (1,2) that raises up

SQL> insert into c values (1, 11,3);
Trigger before insert fired
insert into c values (1, 11,3)
*
ERROR at line 1:
ORA-02290: check constraint (XXX.CHILD_CK) violated

Despite the Foreign key is placed on the second column of the table, it is the check constraint on the third column that fires up.

And when all constraints are valid, the foreign key integrity constraint is finally checked as shown below:

SQL> insert into c values (1, 11,1);

Trigger before insert fired
Trigger after insert fired

insert into c values (1, 11,1)
*
ERROR at line 1:
ORA-02291: integrity constraint (XXX.CHILD_FK) violated - parent key not found

Conclusion

I have been very often asking myself (or being asked) the same question when confronted to such a kind of situation: what is the order of the constraint/trigger execution? And for each time I was obliged to spend few minutes creating a simple model in order to come up with a reliable answer. From now and on, I am sure that I will not redo this work as far as I definitely know that I wrote the answer into an article that has been published somewhere on the net.

In passing, spot why I am encouraging people to write:

  •  They will definitely not redo the same work again
  •  They will definitely find easily what they have done in this context
  • They  will definitely let other readers validate or correct their answer

July 2, 2014

Execution plan of a cancelled query

Filed under: explain plan — hourim @ 2:47 pm

I have been reminded by this otn thread that I have to write something about the usefulness of an execution plan pulled from memory of a query that has been cancelled due to its long running execution time. Several months ago I have been confronted to the same issue where one of my queries was desperately running sufficiently slow so that I couldn’t resist the temptation to cancel it.

It is of course legitimate to ask in such a situation about the validity of the corresponding execution plan? Should we wait until the query has finished before starting investigating where things went wrong? What if the query takes more than 20 minutes to complete? Will we wait 20 minutes each time before getting an exploitable execution plan?

The answer of course is no. You shouldn’t wait until the end of the query. Once the query has started you could get its execution plan from memory using its sql_id. And you will be surprised that you might already be able to found a clue of what is wrong with your query even in this ‘’partial’’ execution plan.

Let me model an example and show you what I am meaning.

    create table t1
       as
       with generator as (
           select  --+ materialize
               rownum id
           from dual
           connect by
               level <= 10000)
       select
          rownum                  id,
          trunc(dbms_random.value(1,1000))    n1,
          lpad(rownum,10,'0') small_vc,
          rpad('x',100)       padding
      from
          generator   v1,
          generator   v2
      where
          rownum <= 1000000;

   create index t1_n1 on t1(id, n1);

   create table t2
       as
       with generator as (
           select  --+ materialize
               rownum id
           from dual
           connect by
               level <= 10000)
       select
          rownum                  id,
          trunc(dbms_random.value(10001,20001))   x1,
          lpad(rownum,10,'0') small_vc,
          rpad('x',100)       padding
      from
          generator   v1,
          generator   v2
      where
          rownum <= 1000000;
    
   create index t2_i1 on t2(x1);

   exec dbms_stats.gather_table_stats(user, 't2', method_opt => 'FOR ALL COLUMNS SIZE 1');
   exec dbms_stats.gather_table_stats(user, 't1', method_opt => 'FOR ALL COLUMNS SIZE 1');
   

I have two simple indexed tables t1 and t2 against which I engineered the following query

    select *
       from t1
        where id in
                (Select id from t2 where x1 = 17335)
       order by id ;
   

The above query contains an order by clause on an indexed column (id). In order to make this query running sufficiently slow so that I can cancel it I will change the optimizer mode from all_rows to first_rows so that the CBO will prefer doing an index full scan and avoid the order by operation whatever the cost of the index full scan operation is.

   SQL> alter session set statistics_level=all;

   SQL> alter session set optimizer_mode=first_rows;

   SQL> select *
       from t1
        where id in
                (Select id from t2 where x1 = 17335)
       order by id ;

   107 rows selected.

   Elapsed: 00:02:14.70 -- more than 2 minutes

   SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

   SQL_ID  9gv186ag1mz0c, child number 0
  -------------------------------------
  Plan hash value: 1518369540
  ------------------------------------------------------------------------------------------------
  | Id  | Operation                    | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
  ------------------------------------------------------------------------------------------------
  |   0 | SELECT STATEMENT             |       |      1 |        |    107 |00:02:39.85 |     108M|
  |   1 |  NESTED LOOPS SEMI           |       |      1 |    100 |    107 |00:02:39.85 |     108M|
  |   2 |   TABLE ACCESS BY INDEX ROWID| T1    |      1 |   1000K|   1000K|00:00:01.03 |   20319 |
  |   3 |    INDEX FULL SCAN           | T1_N1 |      1 |   1000K|   1000K|00:00:00.31 |    2774 |
  |*  4 |   TABLE ACCESS BY INDEX ROWID| T2    |   1000K|      1 |    107 |00:02:37.70 |     108M|
  |*  5 |    INDEX RANGE SCAN          | T2_I1 |   1000K|    100 |    106M|00:00:30.73 |    1056K|
  ------------------------------------------------------------------------------------------------

  Predicate Information (identified by operation id):
  ---------------------------------------------------
     4 - filter("ID"="ID")
     5 - access("X1"=17335)
 

And the plan_hash_value2 of this query is:

  SQL> @phv2 9gv186ag1mz0c

  SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER       PHV2
  ------------- --------------- ------------ ----------
  9gv186ag1mz0c      1518369540            0 2288661314
  

The query took more than 2 minutes to complete and for the purpose of this blog I didn’t cancelled it in order to get its final execution plan shown above. Now I am going to re-execute it again and cancel it after few seconds.

    SQL> select *
       from t1
        where id in
                (Select id from t2 where x1 = 17335)
       order by id ;

   ^C
   C:\>
   

And here below is the corresponding partial plan pulled from memory

  SQL> select * from table(dbms_xplan.display_cursor('9gv186ag1mz0c',0,'ALLSTATS LAST'));

  SQL_ID  9gv186ag1mz0c, child number 0
  -------------------------------------
  Plan hash value: 1518369540

  ------------------------------------------------------------------------------------------------
  | Id  | Operation                    | Name  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
  ------------------------------------------------------------------------------------------------
  |   0 | SELECT STATEMENT             |       |      1 |        |     17 |00:00:18.13 |      13M|
  |   1 |  NESTED LOOPS SEMI           |       |      1 |    100 |     17 |00:00:18.13 |      13M|
  |   2 |   TABLE ACCESS BY INDEX ROWID| T1    |      1 |   1000K|    129K|00:00:00.12 |    2631 |
  |   3 |    INDEX FULL SCAN           | T1_N1 |      1 |   1000K|    129K|00:00:00.04 |     361 |
  |*  4 |   TABLE ACCESS BY INDEX ROWID| T2    |    129K|      1 |     17 |00:00:18.26 |      13M|
  |*  5 |    INDEX RANGE SCAN          | T2_I1 |    129K|    100 |     13M|00:00:03.63 |     136K|
  ------------------------------------------------------------------------------------------------

  Predicate Information (identified by operation id):
  ---------------------------------------------------
     4 - filter("ID"="ID")
     5 - access("X1"=17335)

  SQL>  @phv2 9gv186ag1mz0c

  SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER       PHV2
  ------------- --------------- ------------ ----------
  9gv186ag1mz0c      1518369540            0 2288661314
  

The main observations are:

  1.   Both queries followed the same plan hash value
  2.   Both queries have the same phv2
  3.   Both execution plans contain the same operations at the same order of execution
  4.   Both execution plans contain the same Estimations (E-Rows) which is a clear indication that this is obtained at hard parse time

Either from the complete plan or from the ”partial” one we can have the same clue of what is going wrong here (even thought that in this particular case the performance penalty is due to the first_rows mode which has the tendency to avoid an order by on an indexed column at any price). Indeed it is the operation number 4 which is the most time consuming operation and from where there is the most important deviation between the CBO estimations and the Actual rows

So there is nothing that impeaches hurried developers from cancelling a query and getting its execution plan from memory. They can end up with valuable information from the cancelled query plan as well as from the complete plan.

This article is also a reminder for experienced Oracle DBA performance specialists that, there are cases where a difference between estimations done by the CBO and the actual rows obtained during execution time, does not necessary imply an absence of fresh and representative statistics. This difference could be due to a plan coming from a cancelled query.

By the way, I know another situation where an excellent optimal plan is showing a value of Starts*E-Rows diverging greatly from A-Rows and where statistics are perfect.  That will be explained, I hope,  in a different blog article.

July 1, 2014

DDL optimization : french version

Filed under: Oracle — hourim @ 7:16 pm

I wrote an article about DDL optimization which I have submitted to Oracle otn and which will be hopefully published very soon. As you might also know I have been invited to be a member of the Oracle World team in order to spread the use of the Oracle technology in Latin America. This is why the English (original) article has been translated into Portuguese and published in the otn Latin America. The Spanish version will be published very soon as well.

I have also translated the original article to French and have been looking unsuccessfully for a known place where I would have published it so that it will have reached many persons in several countries like France, Belgium, Luxembourg, Switzerland,  North Africa and several other countries too. Having been unable to find that place, I decided to publish it here in my blog. So here it is  DDL Optimization

June 2, 2014

OTN Mena Tour 2014

Filed under: Oracle — hourim @ 9:26 am

This is a brief note to signal that it has been a great pleasure for me to participate in the first Oracle Technology Network Middle East and North Africa tour in Tunisia this may 2014. I prepared a presentation on Adaptive Cursor Sharing in English when few minutes before the start of my presentation the organisation committee asked me to service it in French. The majority of the audience seems not to be at its complete ease with Shakespeare Language. So it was my pleasure to do it in Molière language with few words in English for what seems to be the minority in this conference. The moral of the story is that the Oracle French community is growing so that it needs more than often to have expert speaking French.

 

May 12, 2014

Disjunctive subquery

Filed under: Trouble shooting — hourim @ 1:29 pm

Here it is a SQL query called from a .Net web service which is ”time outing” and to which I have been asked to bring a fix.

SELECT      d.col_id,
            d.col_no,
            d.col_dost_id,
            d.col_r_id,
            d.xxx,
            d.yyy,
            ……..
            d.zzz
      FROM table_mho d
      WHERE (d.COL_UK = ‘LRBRE-12052014’
          OR EXISTS (select 1
                     from table_mho d1
                     where d1.col_id = d.col_id
                       and exists (select 1
                                   from table_mho d2
                                   where d2.COL_UK = ‘LRBRE-12052014’
                                     and d1.master_col_id = d2.col_id
                                     and d2.col_type = 'M' )
                       and d1.col_type = 'S'
                       )
              )
    order by d.col_id;

Looking carefully at the content of this query I have immediately got a clue on what might be happening here: Disjunctive Subquery.

A disjunctive subquery represents a subquery that appears in an OR predicate (disjunction). And the above query has indeed an OR predicate followed by an EXISTS clause:

          OR EXISTS (select 1
                     from table_mho d1
                     where d1.col_id = d.col_id
                       and exists (select 1
                                   from table_mho d2
                                   where d2.COL_UK = ‘LRBRE-12052014’
                                     and d1.master_col_id = d2.col_id
                                     and d2.col_type = 'M' )
                       and d1.col_type = 'S'
                       )

I am not going to dig in the details of disjunctive subqueries and their inability to be unnested by the CBO for releases prior to 12c. I will be writing in a near future (I hope) a general article in which disjunctive subqueries will be explained and popularized via a reproducible model. The goal of this brief blog post is just to show how I have been successful to trouble shoot the above web service performance issue by transforming a disjunctive subquery into an UNION ALL SQL statement so that I gave the CBO an opportunity to choose an optimal plan.

Here it is the sub-optimal plan for the original query

---------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name             | Starts | E-Rows | A-Rows |   A-Time   |
---------------------------------------------------------------------------------------------------
|   1 |  SORT ORDER BY                 |                  |      1 |  46854 |      1 |00:00:10.70 |
|*  2 |   FILTER                       |                  |      1 |        |      1 |00:00:10.70 |
|   3 |    TABLE ACCESS FULL           | TABLE_MHO        |      1 |    937K|    937K|00:00:00.94 |
|   4 |    NESTED LOOPS                |                  |    937K|      1 |      0 |00:00:07.26 |
|*  5 |     TABLE ACCESS BY INDEX ROWID| TABLE_MHO        |    937K|      1 |     60 |00:00:06.65 |
|*  6 |      INDEX UNIQUE SCAN         | COL_MHO_PK       |    937K|      1 |    937K|00:00:04.14 |
|*  7 |     TABLE ACCESS BY INDEX ROWID| TABLE_MHO        |     60 |      1 |      0 |00:00:00.01 |
|*  8 |      INDEX UNIQUE SCAN         | COL_MHO_UK       |     60 |      1 |     60 |00:00:00.01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("D"."COL_UK"=‘LRBRE-12052014’ OR  IS NOT NULL))
   5 - filter(("D1"."MASTER_COL_ID" IS NOT NULL AND "D1"."COL_TYPE"='S'))
   6 - access("D1"."COL_ID"=:B1)
   7 - filter(("D2"."COL_TYPE"='M' AND "D1"."MASTER_COL_ID"="D2"."COL_ID"))
   8 - access("D2"."COL_UK"=‘LRBRE-12052014’)

Statistics
---------------------------------------------------
          0  recursive calls
          0  db block gets
    3771234  consistent gets
      22748  physical reads
          0  redo size
       1168  bytes sent via SQL*Net to client
        244  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

Note the apparition of the FILTER operation (n°2) which is less efficient. One of the dramatic consequences of that is the NESTED LOOP operation (n°4) which has been started 937,000 times without producing any rows but nevertheless generating almost 4 millions of buffer gets. Because of this disjunctive subquery, Oracle is not able to merge the subquery clause with the rest of the query in order to consider another optimal path.

There is a simple technique if you want to re-write the above query in order to get rid of the disjunctive subquery: use of an UNION ALL as I did for my original query (bear in mind that in my actual case COL_UK column is NOT NULL)

SELECT ww.**
FROM
(SELECT     d.col_id,
            d.col_no,
            d.col_dost_id,
            d.col_r_id,
            d.xxx,
            d.yyy,
            ……..
            d.zzz
      FROM table_mho d
      WHERE d.COL_UK = ‘LRBRE-12052014’
UNION ALL
SELECT      d.col_id,
            d.col_no,
            d.col_dost_id,
            d.col_r_id,
            d.xxx,
            d.yyy,
            ……..
            d.zzz
      FROM table_mho d
      WHERE d.COL_UK != ‘LRBRE-12052014’
      AND EXISTS (select 1
                     from table_mho d1
                     where d1.col_id = d.col_id
                       and exists (select 1
                                   from table_mho d2
                                   where d2.COL_UK = ‘LRBRE-12052014’
                                     and d1.master_col_id = d2.col_id
                                     and d2.col_type = 'M' )
                       and d1.col_type = 'S'
                       )
              )
) ww
 order by ww.col_id;

And here it is the new corresponding optimal plan

------------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name                  | Starts | E-Rows | A-Rows |   A-Time   |
------------------------------------------------------------------------------------------------------------
|   1 |  SORT ORDER BY                     |                       |      1 |      2 |      1 |00:00:00.01 |
|   2 |   VIEW                             |                       |      1 |      2 |      1 |00:00:00.01 |
|   3 |    UNION-ALL                       |                       |      1 |        |      1 |00:00:00.01 |
|   4 |     TABLE ACCESS BY INDEX ROWID    | TABLE_MHO             |      1 |      1 |      1 |00:00:00.01 |
|*  5 |      INDEX UNIQUE SCAN             | COL_MHO_UK            |      1 |      1 |      1 |00:00:00.01 |
|   6 |     NESTED LOOPS                   |                       |      1 |      1 |      0 |00:00:00.01 |
|   7 |      VIEW                          | VW_SQ_1               |      1 |      1 |      0 |00:00:00.01 |
|   8 |       HASH UNIQUE                  |                       |      1 |      1 |      0 |00:00:00.01 |
|   9 |        NESTED LOOPS                |                       |      1 |      1 |      0 |00:00:00.01 |
|* 10 |         TABLE ACCESS BY INDEX ROWID| TABLE_MHO             |      1 |      1 |      0 |00:00:00.01 |
|* 11 |          INDEX UNIQUE SCAN         | COL_MHO_UK            |      1 |      1 |      1 |00:00:00.01 |
|* 12 |         TABLE ACCESS BY INDEX ROWID| TABLE_MHO             |      0 |      1 |      0 |00:00:00.01 |
|* 13 |          INDEX RANGE SCAN          | COL_COL_MHO_FK_I      |      0 |     62 |      0 |00:00:00.01 |
|* 14 |      TABLE ACCESS BY INDEX ROWID   | TABLE_MHO             |      0 |      1 |      0 |00:00:00.01 |
|* 15 |       INDEX UNIQUE SCAN            | COL_MHO_PK            |      0 |      1 |      0 |00:00:00.01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("D"."COL_UK"=‘LRBRE-12052014’)
  10 - filter("D2"."COL_TYPE"='M')
  11 - access("D2"."COL_UK"=‘LRBRE-12052014’)
  12 - filter("D1"."COL_TYPE"='S')
  13 - access("D1"."MASTER_COL_ID"="D2"."COL_ID")
       filter("D1"."MASTER_COL_ID" IS NOT NULL)
  14 - filter("D"."COL_UK"<>‘LRBRE-12052014’)
  15 - access("COL_ID"="D"."COL_ID")

Statistics
------------------------------------------------------
          0  recursive calls
          0  db block gets
          8  consistent gets
          0  physical reads
          0  redo size
       1168  bytes sent via SQL*Net to client
        403  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          1  rows processed

I went from a massif 3,771,234 of consistent gets to just 8 logical I/O. The subquery has been transformed into an in-line view (VW_SQ_1) while the not always good FILTER operation disappeared letting the place to rapid operations accurately estimated by the CBO without re-computing the statistics in between

Bottom Line:  Trouble shooting a query performance problem can sometime be achieved by reformulating the query so that you give the CBO a way to circumvent a transformation it can’t do with the original query.

May 7, 2014

ORA-04030 when parsing XML messages

Filed under: Trouble shooting — hourim @ 2:56 pm

Two days ago I have been asked to look at the following error which occurred in a PRE-PRODUCTION data base just few weeks before going live:

ORA-04030: out of process memory when trying to allocate 4032 bytes (qmxdGetElemsBy,qmemNextBuf:alloc)

The aim of this article is to explain how (with the help of my colleagues) I succeeded to circumvent this ORA-04030 error.

First, I was aware that ORA-04031 is linked to an issue with the SGA while the current ORA-04030 is linked with an abnormal consummation of PGA memory.Unfortunately, having no prior acquaintances with this application makes the issue hard to solve. All what I have been said is that when the corresponding PRE-PRODUCTION application has been triggered to parse 10 XML messages with 3MB of size for each message, it crashes during the parsing process of the 6th xml message with the above error.

I asked, nevertheless, the customer whether treating simultaneously 10 x 3MB XML messages is a situation that is possible to happen in their real life PRODUCTION application or they are just trying to fudge the database with extreme situations. And the answer was, to my disappointment, YES. This kind of XML load is very plausible once the application will go live.

I asked (again) if I can have those 10 XML messages in order to inject them into a DEVELOPMENT data base. Once they handled me a copy of the desired messages, I tried to parse them in the DEVELOPMENT instance which, hopefully (yes hopefully) crashes with the same error.

There is an ORA-04030 diagnostic tool developed by Oracle Support which, if you give it the corresponding alert log file, will generate for your system a set of recommendations that if implemented might solve the issue.

This tool when provided with the corresponding trace file, suggested us, among many other suggestions, to:

  1. Set the hidden parameter _realfree_heap_pagesize_hint
  2. Use bulk collect with the limit clause and gave us a link showing how to do that following Steven Feuerstein  article published on oracle magazine

Taken into account my ignorance of the application business and how the code has been exactly managed to parse the xml messages, I was looking to use a tactical fix, which will be safe, necessitate a minimum of tests and with a big chance of no side effects. As far as I was in a development database I instrumented (added extra logging) the PL/SQL code until I have clearly isolated the piece of code which is causing the error:


BEGIN
   -- New parser
       lx_parser := xmlparser.newParser;

   -- Parse XML file
       xmlparser.parseClob(lx_parser,pil_xml_file);

 
   --pil_xml_file);
       lxd_docxml := xmlparser.getDocument(lx_parser);

   -- get all elements
         lxd_element := xmldom.GETDOCUMENTELEMENT(lxd_docxml);

   FOR li_metadata_id IN 1..pit_metadata.COUNT
   LOOP
     IF pit_metadata(li_metadata_id).MXTA_FATHER_ID IS NULL
     THEN
       gi_num_row_table := 1;
       P_PARSE_XML_ELEMENTS(lxd_element
                           ,NULL
                           ,pit_metadata
                           ,li_metadata_id
                           ,pit_metadata(li_metadata_id).TAG_PATH
                            );
     END IF;
   END LOOP;
   …./….
 END;

The above piece of code is called in a loop and is traumatizing the PGA memory.

With the help of a colleague, we decided to use a tactical fix which consists of freeing up the pga memory each time a xml document is treated using the Oracle xmldom.freedocument API. In other words, I added the following piece of code at the end of the above stored procedure:


     END IF;

   END LOOP;

   …./….

xmldom.freedocument(lxd_docxml); -- added this

END;

And guess what?

The ORA-04030 ceases to disturb us.

Bottom Line: When troubleshooting an issue in a real life production application try first to reproduce it in a TEST environment. Then instrument again the PL/SQL code in this TEST environment until you narrow the error close to its calling program. Then look to a tactical fix that will not need a lot of non-regression tests and which should be of limited side effect.

April 22, 2014

Trouble shooting a performance issue

Filed under: Trouble shooting — hourim @ 2:34 pm

Here it is a case of Trouble shooting a performance issue taken from a real life production system that I wanted to share with you. There is an Oracle stored procedure continuously activated via a scheduled job. Its goal is to parse incoming XML messages before accepting them into or rejecting them from a production application. 80% of the XML messages are treated in an acceptable execution time. But 20% of those messages when injected into the queue are hanging and slowing down the database with a side effect on the execution time of the normal messages. My strategy was to ask for one of those problematic XML messages from PRODUCTION instance and execute it in the TEST data base with a 10046 trace file. So I did, and among many observations, I stopped at the following select I wanted to share with you the method I followed to trouble shoot it:

SELECT *
FROM
 ( SELECT WD.WAGD_ID,WH.WAGH_ID,WD.WAG_ID,WH.ARRIVAL_DATE,WH.DEPARTURE_DATE,
  WD.CHECK_REPERAGE ,ROW_NUMBER() OVER (PARTITION BY WH.WAGD_ID ORDER BY
  WH.ARRIVAL_DATE) RN FROM SRI_ORDER_DETAILS WD ,SRI_ORDER_HISTORIC WH
  WHERE WD.FROM_COM_ID = :B1 AND WD.WAGD_ID = WH.WAGD_ID AND WH.PDES_ID IS
  NULL ) WHERE RN = 1 

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.00          0          0          0           0
Execute   3124      0.28       0.27          0          0          0           0
Fetch     3124     31.79      32.05          0   10212356          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     6250     32.07      32.33          0   10212356          0           0

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 151     (recursive depth: 1)
Number of plan statistics captured: 2

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  VIEW  (cr=3269 pr=0 pw=0 time=14483 us cost=709 size=1245 card=15)
         0          0          0   WINDOW SORT PUSHED RANK (cr=3269 pr=0 pw=0 time=14480 us cost=709 size=660 card=15)
         0          0          0    MERGE JOIN  (cr=3269 pr=0 pw=0 time=14466 us cost=708 size=660 card=15)
         0          0          0     TABLE ACCESS BY INDEX ROWID SRI_ORDER_DETAILS (cr=3269 pr=0 pw=0 time=14464 us cost=532
     15139      15139      15139       INDEX FULL SCAN SRI_WAGD_PK (cr=2822 pr=0 pw=0 time=2305 us cost=80 size=0
         0          0          0     SORT JOIN (cr=0 pr=0 pw=0 time=0 us cost=176 size=320528 card=12328)
         0          0          0      TABLE ACCESS FULL SRI_ORDER_HISTORIC (cr=0 pr=0 pw=0 time=0 us cost=67 size=320528
********************************************************************************

Notice that the above select undergone 3,124 executions which have necessitated more than one million of logical reads to end up finally by producing 0 records. There is for sure a way to reduce this enormous waste of time and energy in manipulating non necessary millions of logical I/O. My experience in trouble shooting performance issues shows me that this kind of excessive logical I/O can be very often linked to two situations:

  •  Use of a not very precise index to access a table via an index rowid and throwing away the majority of rows
  •  The first operation of the plan is the one that manipulated the big number of rows in the entire row source execution plan

Coincidently, the first operation in my row source plan is an INDEX FULL SCAN and it is the operation generating the maximum number of rows (15,139). Well, now I can say that in order to tune this select I need first to have a more precise access to the SRI_ORDER_DETAILS table. Here how I managed to solve this issue:

1. Get a real time execution plan from memory

The desired execution plan confirming the row source stats from the Tkprof file is given here below:

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                     |      1 |        |      1 |00:00:00.03 |    3469 |
|*  1 |  VIEW                          |                     |      1 |     13 |      1 |00:00:00.03 |    3469 |
|*  2 |   WINDOW SORT PUSHED RANK      |                     |      1 |     13 |      1 |00:00:00.03 |    3469 |
|   3 |    MERGE JOIN                  |                     |      1 |     13 |      1 |00:00:00.03 |    3469 |
|*  4 |     TABLE ACCESS BY INDEX ROWID| SRI_ORDER_DETAILS   |      1 |     13 |      1 |00:00:00.02 |    3269 |
|   5 |      INDEX FULL SCAN           | SRI_WAGD_PK         |      1 |  15139 |  15139 |00:00:00.01 |    2822 |
|*  6 |     SORT JOIN                  |                     |      1 |  13410 |      1 |00:00:00.01 |     200 |
|*  7 |      TABLE ACCESS FULL         | SRI_ORDER_HISTORIC  |      1 |  13410 |  13410 |00:00:00.01 |     200 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("RN"=1)
   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "WH"."WAGD_ID" ORDER BY "WH"."ARRIVAL_DATE")<=1)
   4 - filter("WD"."FROM_COM_ID"=3813574)
   6 - access("WD"."WAGD_ID"="WH"."WAGD_ID")
       filter("WD"."WAGD_ID"="WH"."WAGD_ID")
   7 - filter("WH"."PDES_ID" IS NULL)

We identify a lot of data (15,139) by FULL SCANNING the primary key index SRI_WAGD_PK . The identified rows via the single PK column (wagd_id) index (operation n°5) are sent back to their parent operation (n°4) which wasted a lot of effort throwing away the entire set of generated rows (except one row for the bind variable I used during my test) by applying filter n°4 (FROM_COM_ID). Notice also how the CBO has made a complete mess of the cardinality estimate for the SORT JOIN operation n°6.

Now that the symptoms have been clearly identified next will be presented the solution

2. Create a new precise index

The solution that immediately came to my mind is to create an index with the FROM_COM_ID column. I was just wondering on how best is to design it?

  •  Using a single column index
  • Or a composite index
  •  And  in case of a composite index will the FROM_COM_ID be put at the beginning of the index or not?

As far as the single PK index contains the WAGD_ID column I was tended to create a composite index starting by the FROM_COM_ID. Something which resembles to this:

SQL> create index mho_test on SRI_ORDER_DETAILS(FROM_COM_ID, wagd_id);

Index created.

SQL> exec dbms_stats.gather_table_stats(user, 'SRI_ORDER_DETAILS', cascade => true, no_invalidate=> false);

PL/SQL procedure successfully completed.

When I re-queried again, the new row source execution plan looked like:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                           |      1 |        |      1 |00:00:00.01 |       7 |
|*  1 |  VIEW                           |                           |      1 |     13 |      1 |00:00:00.01 |       7 |
|*  2 |   WINDOW SORT PUSHED RANK       |                           |      1 |     13 |      1 |00:00:00.01 |       7 |
|   3 |    NESTED LOOPS                 |                           |      1 |        |      1 |00:00:00.01 |       7 |
|   4 |     NESTED LOOPS                |                           |      1 |     13 |      1 |00:00:00.01 |       6 |
|   5 |      TABLE ACCESS BY INDEX ROWID| SRI_ORDER_DETAILS         |      1 |     13 |      1 |00:00:00.01 |       3 |
|*  6 |       INDEX RANGE SCAN          | MHO_TEST                  |      1 |     13 |      1 |00:00:00.01 |       2 |
|*  7 |      INDEX RANGE SCAN           | SRI_WAGH_ARRIVAL_DATE_UK  |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  8 |     TABLE ACCESS BY INDEX ROWID | SRI_ORDER_HISTORIC        |      1 |      1 |      1 |00:00:00.01 |       1 |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("RN"=1)
   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "WH"."WAGD_ID" ORDER BY "WH"."ARRIVAL_DATE")<=1)
   6 - access("WD"."FROM_COM_ID"=3813574)
   7 - access("WD"."WAGD_ID"="WH"."WAGD_ID")
   8 - filter("WH"."PDES_ID" IS NULL)

Thanks to this new precise index(MHO_TEST ) we have drastically reduced the total number of logical reads (from 3,469 to 7) and the estimations done by the CBO are almost acceptable which made the CBO prefering a more convenient NESTED LOOPS instead of the previous inadequately estimated SORT JOIN operation.

3. Conclusion

When trouble shooting performance issue in real life production system, look carefully at INDEX access operations that generate a lot of rows and for which the parent table access operation is discarding the majority of those corresponding rows. In such case there is certainly a way to create a precise index which not only will avoid that waste of effort but might conduct the CBO to have better estimations and hence an optimal plan

April 10, 2014

DBMS_SCHEDULER and CALL statement

Filed under: Dbms_scheduler — hourim @ 3:10 pm

This is a brief reminder for those using intensively Oracle dbms_scheduler package to schedule a launch of a stored PL/SQL procedure. Recently I was investigating a wide range performance problem via a 60 minutes AWR snapshot until the following action that appears in the TOP SQL ordered by Gets has captured my attention:

SQL ordered by Gets

Buffer Gets Executions Gets per Exec %Total Elapsed Time (s) %CPU %IO SQL Id SQL Module SQL Text
858,389,984 2 429,194,992.00 89.58 3,606.79 97.2 1.9 6cjn7jnzpc160 DBMS_SCHEDULER call xxx_PA_REDPC.P_EXPORT_DA…
542,731,679 32,021 16,949.24 56.64 3,201.40 98.4 .6 4bnh2nc4shkz3 w3wp.exe SELECT WAGD_ID FROM S1

It is not the enormous Logical I/O done by this scheduled stored procedure that has retained my attention but it is the appearance of the call statement in the corresponding SQL Text.

Where does this come from?

Let’s get the DDL of the corresponding program

 SELECT dbms_metadata.get_ddl('PROCOBJ','XXX_PARSE_MESSAGE_PRG', 'SXXX') from dual;

Which gives this

 BEGIN
  DBMS_SCHEDULER.CREATE_PROGRAM
    (program_name         => 'XXX_PARSE_MESSAGE_PRG'
    ,program_type         => 'STORED_PROCEDURE'
    ,program_action       => 'XXX_PA_REDPC.P_EXPORT_DA.P_xxxx'
    ,number_of_arguments  => 0
    ,enabled              => FALSE
    ,comments             => NULL);
 COMMIT;
 END;
 /
 

I have  arranged a little bit the generated DDL script for clarity.

Now things become clear.

When you define your program using

     program_type         => 'STORED_PROCEDURE'
 

Then your job will be executed using the SQL command call

    call XXX_PA_REDPC.P_EXPORT_DA..'
 

This is in contrast to when you define your program using

  program_type         => 'PLSQL_BLOCK'
 

which has the consequence of making your job being executed using an anonymous PL/SQL block

 BEGIN
    XXX_PA_REDPC.P_EXPORT_DA..
 END;
 

And now the question: how would you prefer your scheduled stored procedure to be executed?

  1. via the  SQL call statement
  2. via the anonymous PL/SQL block

Well, after a brief research on My Oracle support I found a bug that seems closely related to it

DBMS_SCHEDULER Is Suppressing NO_DATA_FOUND Exceptions for Jobs that Execute Stored Procedures (Doc ID 1331778.1)

In fact, there is one fundamental threat when opting for the call statement. Consider this

 SQL> create or replace procedure p1 as
    begin
     insert into t values (1,2);
      raise no_data_found;
      commit;
    end;
 /
Procedure created.

SQL> select count(1) from t;   

   COUNT(1)
   ----------        
    0

I am going to call this procedure using the call statement which normally will raise a no_data_found exception and the inserted data into table t will be rolled back

SQL> call p1();

Call completed.

SQL> select * from t;

        N1         N2
---------- ----------
         1          2

Despite the raise of the no_data_found exception inserted data has been committed and the exception ignored. This will not have happened if I have managed to execute the stored procedure using an anonymous PL/SQL block as shown below:

SQL> truncate table t;

Table truncated.

SQL> begin
p1();
end;
/
begin
*
ERROR at line 1:
ORA-01403: no data found
ORA-06512: at "XXX.P1", line 4
ORA-06512: at line 2

SQL> select count(1) from t;

COUNT(1)
----------
0

So, please be informed :-)

March 24, 2014

Redundant Indexes

Filed under: Index — hourim @ 11:28 am

I am very often warning:  dropping redundant indexes in production is not 100% safe. I have instead always been advocating paying a careful attention during design time to avoid creating redundant indexes. In my professional experience I have realized that, it is very often when creating indexes to cover the lock threat of unindexed foreign key constraints, that developers are creating unintentionally redundant indexes. It has been irritating me so that I have created a script which checks if a given foreign key is already indexed or not before creating supplementary non wanted indexes damaging the DML part of the application.

Having said that I still have not defined what is redundant indexes

Indexes ind_1 and ind_2 are said redundant when leading columns of one index are a superset of the leading columns of the other one

For example

Ind_1 (a,b) and ind_2(a,b,c) are redundant because ind_2 contains index ind_1.

If you are at design time it is obvious that you should not create index ind_1. However, once in production, it is not 100% safe to drop index ind_1 without any impact. There are, for sure, occasions were the clustering factor of ind_2 is so dramatic when compared to index ind_1 so that if the later index is dropped the optimizer will opt for a full table scan traumatizing the queries that were perfectly happy with the dropped redundant index.

I can also show you another type of what people might consider redundant indexes while they aren’t. Consider the following model where I have created a range partitioned table (mho_date being the partition key and I have created 1493 partitions) and two indexes as shown below

desc partitioned_tab

Name                            Null?    Type
------------------------------- -------- ------------
1      MHO_ID                          NOT NULL NUMBER(10)
2      MHO_DATE                        NOT NULL DATE
3      MHO_CODE                        NOT NULL VARCHAR2(1)
4      MHO_TYP_ID                      NOT NULL NUMBER(10)

create index local_ind_1 on partitioned_tab (mho_typ_id,mho_code) local;

create index global_ind_1 on partitioned_tab (mho_typ_id);

I am going to execute a simple query against the above engineered partitioned table.

select * from partitioned_tab where mho_typ_id = 0;

Which, in the presence of the above two indexes is honored via the following execution plan

----------------------------------------------------------------------
Plan hash value: 3042058313
----------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name            | Rows  |Cost (%CPU)| Pstart| Pstop |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                 |  1493 | 1496   (0)|       |       |
|   1 |  TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| PARTITIONED_TAB |  1493 | 1496   (0)| ROWID | ROWID |
|*  2 |   INDEX RANGE SCAN                         | GLOBAL_IND_1    |  1493 |    4   (0)|       |       |
----------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("MHO_TYP_ID"=0)

Statistics
-------------------------------------------------------
48    recursive calls
0     db block gets
3201  consistent gets
0     physical reads
0     redo size
51244 bytes sent via SQL*Net to client
1632  bytes received via SQL*Net from client
101   SQL*Net roundtrips to/from client
7     sorts (memory)
0     sorts (disk)
1493  rows processed

As you might already know I am one of the fans of SQLT tool developed by Carlos Sierra. And here below what this tool said about redundant indexes in this particular case

Redundant_indexes

It is clearly suggesting considering dropping the redundant index global_ind_1 index

So let’s follow this advice and see what happens. Hopefully with the recent Oracle release I will first make the index invisible ( by the way that’s a good point to signal for Carlos Sierra and Mauro Pagano for them to suggest setting first the index invisible before considering dropping it)

alter index global_ind_1 invisible;

select * from partitioned_tab where mho_typ_id = 0;

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                  | Name            | Rows  | Bytes | Cost (%CPU)| Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                           |                 |  1493 | 23888 |  2985   (1)|       |       |
|   1 |  PARTITION RANGE ALL                       |                 |  1493 | 23888 |  2985   (1)|     1 |  1493 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PARTITIONED_TAB |  1493 | 23888 |  2985   (1)|     1 |  1493 |
|*  3 |    INDEX RANGE SCAN                        | LOCAL_IND_1     |  1493 |       |  1492   (0)|     1 |  1493 |
-------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("MHO_TYP_ID"=0)

Statistics
--------------------------------------------------------
48    recursive calls
0     db block gets
4588  consistent gets
0     physical reads
0     redo size
47957 bytes sent via SQL*Net to client
1632  bytes received via SQL*Net from client
101   SQL*Net roundtrips to/from client
7     sorts (memory)
0     sorts (disk)
1493  rows processed

By making the ”redundant” index invisible we went from a smooth one global index range scan with a cost of 4 to 1493 index range scans with a cost of 1492 with an additional 1387 logical reads.

Bottom line: You should always consider dropping (avoiding) redundant index at design time. Once in production consider making them invisible first before thinking carefully about dropping them.

PS : if you want to create the table the script can be found partitioned table

Next Page »

Theme: Rubric. Get a free blog at WordPress.com

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Database 11g

Oracle Database

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

EU Careers info

Your career in the European Union

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

Tanel Poder's blog: Responsible data management

Linux, Oracle, Exadata and Hadoop.

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)

Follow

Get every new post delivered to your Inbox.

Join 83 other followers