April | 2014 | Mohamed Houri’s Oracle Notes

April 22, 2014

Trouble shooting a performance issue

Filed under: Trouble shooting — hourim @ 2:34 pm

Here it is a case of Trouble shooting a performance issue taken from a real life production system that I wanted to share with you. There is an Oracle stored procedure continuously activated via a scheduled job. Its goal is to parse incoming XML messages before accepting them into or rejecting them from a production application. 80% of the XML messages are treated in an acceptable execution time. But 20% of those messages when injected into the queue are hanging and slowing down the database with a side effect on the execution time of the normal messages. My strategy was to ask for one of those problematic XML messages from PRODUCTION instance and execute it in the TEST data base with a 10046 trace file. So I did, and among many observations, I stopped at the following select I wanted to share with you the method I followed to trouble shoot it:

SELECT *
FROM
 ( SELECT WD.WAGD_ID,WH.WAGH_ID,WD.WAG_ID,WH.ARRIVAL_DATE,WH.DEPARTURE_DATE,
  WD.CHECK_REPERAGE ,ROW_NUMBER() OVER (PARTITION BY WH.WAGD_ID ORDER BY
  WH.ARRIVAL_DATE) RN FROM SRI_ORDER_DETAILS WD ,SRI_ORDER_HISTORIC WH
  WHERE WD.FROM_COM_ID = :B1 AND WD.WAGD_ID = WH.WAGD_ID AND WH.PDES_ID IS
  NULL ) WHERE RN = 1 

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.00       0.00          0          0          0           0
Execute   3124      0.28       0.27          0          0          0           0
Fetch     3124     31.79      32.05          0   10212356          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     6250     32.07      32.33          0   10212356          0           0

Misses in library cache during parse: 0
Optimizer mode: ALL_ROWS
Parsing user id: 151     (recursive depth: 1)
Number of plan statistics captured: 2

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         0          0          0  VIEW  (cr=3269 pr=0 pw=0 time=14483 us cost=709 size=1245 card=15)
         0          0          0   WINDOW SORT PUSHED RANK (cr=3269 pr=0 pw=0 time=14480 us cost=709 size=660 card=15)
         0          0          0    MERGE JOIN  (cr=3269 pr=0 pw=0 time=14466 us cost=708 size=660 card=15)
         0          0          0     TABLE ACCESS BY INDEX ROWID SRI_ORDER_DETAILS (cr=3269 pr=0 pw=0 time=14464 us cost=532
     15139      15139      15139       INDEX FULL SCAN SRI_WAGD_PK (cr=2822 pr=0 pw=0 time=2305 us cost=80 size=0
         0          0          0     SORT JOIN (cr=0 pr=0 pw=0 time=0 us cost=176 size=320528 card=12328)
         0          0          0      TABLE ACCESS FULL SRI_ORDER_HISTORIC (cr=0 pr=0 pw=0 time=0 us cost=67 size=320528
********************************************************************************

Notice that the above select undergone 3,124 executions which have necessitated more than one million of logical reads to end up finally by producing 0 records. There is for sure a way to reduce this enormous waste of time and energy in manipulating non necessary millions of logical I/O. My experience in trouble shooting performance issues shows me that this kind of excessive logical I/O can be very often linked to two situations:

Use of a not very precise index to access a table via an index rowid and throwing away the majority of rows
The first operation of the plan is the one that manipulated the big number of rows in the entire row source execution plan

Coincidently, the first operation in my row source plan is an INDEX FULL SCAN and it is the operation generating the maximum number of rows (15,139). Well, now I can say that in order to tune this select I need first to have a more precise access to the SRI_ORDER_DETAILS table. Here how I managed to solve this issue:

1. Get a real time execution plan from memory

The desired execution plan confirming the row source stats from the Tkprof file is given here below:

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                     |      1 |        |      1 |00:00:00.03 |    3469 |
|*  1 |  VIEW                          |                     |      1 |     13 |      1 |00:00:00.03 |    3469 |
|*  2 |   WINDOW SORT PUSHED RANK      |                     |      1 |     13 |      1 |00:00:00.03 |    3469 |
|   3 |    MERGE JOIN                  |                     |      1 |     13 |      1 |00:00:00.03 |    3469 |
|*  4 |     TABLE ACCESS BY INDEX ROWID| SRI_ORDER_DETAILS   |      1 |     13 |      1 |00:00:00.02 |    3269 |
|   5 |      INDEX FULL SCAN           | SRI_WAGD_PK         |      1 |  15139 |  15139 |00:00:00.01 |    2822 |
|*  6 |     SORT JOIN                  |                     |      1 |  13410 |      1 |00:00:00.01 |     200 |
|*  7 |      TABLE ACCESS FULL         | SRI_ORDER_HISTORIC  |      1 |  13410 |  13410 |00:00:00.01 |     200 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("RN"=1)
   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "WH"."WAGD_ID" ORDER BY "WH"."ARRIVAL_DATE")<=1)
   4 - filter("WD"."FROM_COM_ID"=3813574)
   6 - access("WD"."WAGD_ID"="WH"."WAGD_ID")
       filter("WD"."WAGD_ID"="WH"."WAGD_ID")
   7 - filter("WH"."PDES_ID" IS NULL)

We identify a lot of data (15,139) by FULL SCANNING the primary key index SRI_WAGD_PK . The identified rows via the single PK column (wagd_id) index (operation n°5) are sent back to their parent operation (n°4) which wasted a lot of effort throwing away the entire set of generated rows (except one row for the bind variable I used during my test) by applying filter n°4 (FROM_COM_ID). Notice also how the CBO has made a complete mess of the cardinality estimate for the SORT JOIN operation n°6.

Now that the symptoms have been clearly identified next will be presented the solution

2. Create a new precise index

The solution that immediately came to my mind is to create an index with the FROM_COM_ID column. I was just wondering on how best is to design it?

Using a single column index
Or a composite index
And in case of a composite index will the FROM_COM_ID be put at the beginning of the index or not?

As far as the single PK index contains the WAGD_ID column I was tended to create a composite index starting by the FROM_COM_ID. Something which resembles to this:

SQL> create index mho_test on SRI_ORDER_DETAILS(FROM_COM_ID, wagd_id);

Index created.

SQL> exec dbms_stats.gather_table_stats(user, 'SRI_ORDER_DETAILS', cascade => true, no_invalidate=> false);

PL/SQL procedure successfully completed.

When I re-queried again, the new row source execution plan looked like:

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                           |      1 |        |      1 |00:00:00.01 |       7 |
|*  1 |  VIEW                           |                           |      1 |     13 |      1 |00:00:00.01 |       7 |
|*  2 |   WINDOW SORT PUSHED RANK       |                           |      1 |     13 |      1 |00:00:00.01 |       7 |
|   3 |    NESTED LOOPS                 |                           |      1 |        |      1 |00:00:00.01 |       7 |
|   4 |     NESTED LOOPS                |                           |      1 |     13 |      1 |00:00:00.01 |       6 |
|   5 |      TABLE ACCESS BY INDEX ROWID| SRI_ORDER_DETAILS         |      1 |     13 |      1 |00:00:00.01 |       3 |
|*  6 |       INDEX RANGE SCAN          | MHO_TEST                  |      1 |     13 |      1 |00:00:00.01 |       2 |
|*  7 |      INDEX RANGE SCAN           | SRI_WAGH_ARRIVAL_DATE_UK  |      1 |      1 |      1 |00:00:00.01 |       3 |
|*  8 |     TABLE ACCESS BY INDEX ROWID | SRI_ORDER_HISTORIC        |      1 |      1 |      1 |00:00:00.01 |       1 |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("RN"=1)
   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "WH"."WAGD_ID" ORDER BY "WH"."ARRIVAL_DATE")<=1)
   6 - access("WD"."FROM_COM_ID"=3813574)
   7 - access("WD"."WAGD_ID"="WH"."WAGD_ID")
   8 - filter("WH"."PDES_ID" IS NULL)

Thanks to this new precise index(MHO_TEST ) we have drastically reduced the total number of logical reads (from 3,469 to 7) and the estimations done by the CBO are almost acceptable which made the CBO prefering a more convenient NESTED LOOPS instead of the previous inadequately estimated SORT JOIN operation.

3. Conclusion

When trouble shooting performance issue in real life production system, look carefully at INDEX access operations that generate a lot of rows and for which the parent table access operation is discarding the majority of those corresponding rows. In such case there is certainly a way to create a precise index which not only will avoid that waste of effort but might conduct the CBO to have better estimations and hence an optimal plan

Comments (5)

April 10, 2014

DBMS_SCHEDULER and CALL statement

Filed under: Dbms_scheduler — hourim @ 3:10 pm

This is a brief reminder for those using intensively Oracle dbms_scheduler package to schedule a launch of a stored PL/SQL procedure. Recently I was investigating a wide range performance problem via a 60 minutes AWR snapshot until the following action that appears in the TOP SQL ordered by Gets has captured my attention:

SQL ordered by Gets

Buffer Gets	Executions	Gets per Exec	%Total	Elapsed Time (s)	%CPU	%IO	SQL Id	SQL Module	SQL Text
858,389,984	2	429,194,992.00	89.58	3,606.79	97.2	1.9	6cjn7jnzpc160	DBMS_SCHEDULER	call xxx_PA_REDPC.P_EXPORT_DA…
542,731,679	32,021	16,949.24	56.64	3,201.40	98.4	.6	4bnh2nc4shkz3	w3wp.exe	SELECT WAGD_ID FROM S1

It is not the enormous Logical I/O done by this scheduled stored procedure that has retained my attention but it is the appearance of the call statement in the corresponding SQL Text.

Where does this come from?

Let’s get the DDL of the corresponding program

 SELECT dbms_metadata.get_ddl('PROCOBJ','XXX_PARSE_MESSAGE_PRG', 'SXXX') from dual;

Which gives this

 BEGIN
  DBMS_SCHEDULER.CREATE_PROGRAM
    (program_name         => 'XXX_PARSE_MESSAGE_PRG'
    ,program_type         => 'STORED_PROCEDURE'
    ,program_action       => 'XXX_PA_REDPC.P_EXPORT_DA.P_xxxx'
    ,number_of_arguments  => 0
    ,enabled              => FALSE
    ,comments             => NULL);
 COMMIT;
 END;
 /

I have arranged a little bit the generated DDL script for clarity.

Now things become clear.

When you define your program using

     program_type         => 'STORED_PROCEDURE'

Then your job will be executed using the SQL command call

    call XXX_PA_REDPC.P_EXPORT_DA..'

This is in contrast to when you define your program using

  program_type         => 'PLSQL_BLOCK'

which has the consequence of making your job being executed using an anonymous PL/SQL block

 BEGIN
    XXX_PA_REDPC.P_EXPORT_DA..
 END;

And now the question: how would you prefer your scheduled stored procedure to be executed?

via the SQL call statement
via the anonymous PL/SQL block

Well, after a brief research on My Oracle support I found a bug that seems closely related to it

DBMS_SCHEDULER Is Suppressing NO_DATA_FOUND Exceptions for Jobs that Execute Stored Procedures (Doc ID 1331778.1)

In fact, there is one fundamental threat when opting for the call statement. Consider this

 SQL> create or replace procedure p1 as
    begin
     insert into t values (1,2);
      raise no_data_found;
      commit;
    end;
 /
Procedure created.

SQL> select count(1) from t;   

   COUNT(1)
   ----------        
    0

I am going to call this procedure using the call statement which normally will raise a no_data_found exception and the inserted data into table t will be rolled back

SQL> call p1();

Call completed.

SQL> select * from t;

        N1         N2
---------- ----------
         1          2

Despite the raise of the no_data_found exception inserted data has been committed and the exception ignored. This will not have happened if I have managed to execute the stored procedure using an anonymous PL/SQL block as shown below:

SQL> truncate table t;

Table truncated.

SQL> begin
p1();
end;
/
begin
*
ERROR at line 1:
ORA-01403: no data found
ORA-06512: at "XXX.P1", line 4
ORA-06512: at line 2

SQL> select count(1) from t;

COUNT(1)
----------
0

So, please be informed 🙂

Comments (3)

Mohamed Houri’s Oracle Notes

April 22, 2014

Trouble shooting a performance issue

1. Get a real time execution plan from memory

2. Create a new precise index

3. Conclusion

April 10, 2014

DBMS_SCHEDULER and CALL statement

SQL ordered by Gets

Follow Blog via Email

Categories

Blog Stats

Blogs I Follow

Archives