Mohamed Houri’s Oracle Notes

June 16, 2011

Bind variable, shared pool and cursor sharing parameter

Filed under: Trouble shooting — hourim @ 6:23 pm

I was asked to investigate a real life production problem concerning the shared pool (library cache) of the SGA which come to be full so that the only solution which remains available was to stop and restart the data base.  I did immediately start thinking if I have not already been confronted to a similar case until I do realize that I have read something close to this situation in Chapter 4 – Memory Structures – of Tom Kyte book Expert Oracle Data Base Architecture 9i and 10g Programming Techniques and Solution;

So my first step in the trouble shooting path of this memory problem was to issue the following select against the v$sql dynamic view:

SELECT sql_text
      ,executions
FROM v$sql
WHERE sql_text LIKE '%Package_Name.P_UPDATE%'
AND executions <= 2
;

Which returns the following data picture:

sql_text                                           executions
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
…/…
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1

Thousands calls of the same instructions

BEGIN Package_Name.P_UPDATE_procedure(p_in_p1 ..) END;

executed once for each call. It is clear that  this application is suffering from not using bind variables and is dramatically filling up the library cache of the shared pool. This needs absolutely to be fixed.

Then I asked myself how could they fill up the shared pool while all what they are doing is calling a stored PL/SQL procedure? We all know that when using PL/SQL (static SQL) we should not encounter issues related to bind variables. PL/SQL itself takes care of our code and uses bind variables behind the scene. The only situation where we have to look carefully to the use of bind variable within PL/SQL is when we use Dynamic sql into stored procedures or functions (this is in fact another reason to avoid using dynamic SQL).

So how could they arrive to such a shared pool filled by thousands of same non re-executed PL/SQL calls?

And here where the problem resides: it is true that when using static PL/SQL stored objects, you don’t have to care about using bind variables inside those stored programs but you have to care about the third party which will call your PL/SQL stored procedure; they should do that using input parameters as bind variables; otherwise you will have your shared pool (library cache) full of calls to your stored PL/SQL objects as it is the case here.

Knowing this, I started figuring out the solution. I have identified the application responsible of those thousands of PL/SQL calls.  It is a non Oracle application developed with a language named SNAP and which is calling our PL/SQL stored objects without using bind variables. I explained them that they need to use OCI interface to prepare their statement, and implement it correctly using bind variables via OCIStmtPrepare2 and OCIDefineByPos (http://www.amazon.fr/Troubleshooting-Oracle-Performance-Christian-Antognini/dp/1590599179

Although they agreed doing that they have asked me to find a work around in between. The only dirty work around I was able to suggest them is to alter the session they are connecting to (and via which they are calling the stored PL/SQL procedure) so that the corresponding cursor_sharing value will be set to “FORCE”.

alter session set cursor_sharing=force;

Which they did immediately.

However, the same sql against the v$sql executed after this change was still showing the same picture of thousands of non re-executed calls

sql_text                                          executions
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
BEGIN Package_Name.P_UPDATE_procedure(p_in_p1          1
…/…

What went wrong?

I verified that they have really altered their session to set the cursor_sharing parameter to FORCE!!! So what’s going on here?

The problem in this case is that even if you set the cursor_sharing parameter to FORCE if you still keep the calls like:

BEGIN
Package_Name.P_UPDATE_procedure(p_in_p1,…);
END;

Oracle will not perform bind variables substitution (even when cursor sharing is set to force). In order for this substitution to occur we need to change the above code to be as follows (see http://apress.com/book/view/9781590596364)

   CALL Package_Name.P_UPDATE_procedure(p_in_p1,…);

So they did the change and so we obtain the new picture taken from v$sql:

SELECT
    substr(sql_text,1,55)
   ,executions
FROM v$sql
WHERE sql_text LIKE '%Package_Name.P_UPDATE%'
AND executions <= 2
;

which gives now the new following picture:

sql_text                                                 executions
------------------------------------------------------- ----------
CALL Package_Name.P_UPDATE_procedure (:"SYS_B_00",TO        897
CALL Package_Name.P_UPDATE_procedure (:"SYS_B_00",TO       8380
CALL Package_Name.P_UPDATE_procedure (:"SYS_B_00",TO         24
CALL Package_Name.P_UPDATE_procedure (:"SYS_B_00",TO      18671

Woo. See how the number of executions has gone so high 18.671. The shared pool becomes very attractive now.

It is also known that setting cursor_sharing to force may have unexpected side effects; for example when there exist a function based index having literals in its definition (substr (col_nam, 4, 2) for example), in this case you will notice that your query which was using your function based index starts not using it since you’ve set your cursor sharing to force. This is simply because oracle will change your query predicate from substr (col_name, 4,2) to something like substr (col_name,:SYS_B_0,SYS_B_1) which will not match your query predicate anymore and hence will not use your index.

The bottom line is that setting the cursor_sharing to force has to be a temporary solution and that the definitive solution should be to use correctly bind variables when calling PL/SQL stored object.

June 7, 2011

Trouble shooting read by other sessions wait event

Filed under: Oracle — hourim @ 6:02 pm

Have you ever been asked how you trouble shoot an Oracle performance problem? The first answer to this question that should pop up immediately  is: “is this performance problem instance wide or localized in a SQL or PL/SQL piece of code?”  For the former you need (if licensed to) an AWR report and in order to diagnose where time is spent. For the later you should investigate where time is spent by having an explain plan taken from the memory or by investigating the design and/or the statistics of related data base objects.

But,  bear in mind that in case of instance wide performance problem you should collect an AWR report with a snapshot that corresponds to the time interval when the application was performing badly.  In other words, if you are asked to investigate an instance wide performance problem you should ask the end user how long he has been suffering from this performance problem? If the end user answer is 15 minutes then you should ask to generate an AWR report for a period of 15 minutes and during a period where the application is suffering.

Several weeks ago I received a phone call from one of my customer relating an instance wide performance problem (he said that from time to time the application hangs during 15-20 minutes). Below is the first AWR report I have been sent for investigation

Top 5 Timed Events

Event                          Waits   Time(s)    Avg Wait(ms)    % Total Call Time     Wait Class</span>
enq: TX - row lock contention  8,468    25,344       2,993            78.7             Application
read by other session          714,628  4,604         6               14.3             User I/O
db file sequential read        323,264  1,977         6               6.1              User I/O
CPU time                                171                           .5
db file scattered read          1,885   49           26               .2               User I/O

When I saw that 78% of the response time has been spent on a transaction row lock contention I immediately verified that there are no bitmap indexes nor unindexed foreign keys (using Tom Kyte script http://tkyte.blogspot.com/2009/10/httpasktomoraclecomtkyteunindex.html) on this OLTP 10gR2 data base.

So what else then?

In the process of trouble shooting you should remember; remember what have you read in the past few months or sometime in the past couple of years ago?  Oh yes, I did remember that Jonathan Lewis has blogged on a similar case (http://jonathanlewis.wordpress.com/2008/02/06/trouble-shooting-2/#more-320) .  Then I did understand that the TX – row lock contention wait event could be just a side effect of this read by other session wait event on which I should concentrate my efforts. Typical scenario is that an end user starts an update and locks a couple of rows. This update is followed by a critical query which, in normal situation completes in a fraction of seconds, starts doing an extreme amount of physical I/O. In the meantime other sessions will request the same data from the same block and start waiting for the physical I/O of the first session to complete putting data in the buffer cache. This is the read by other session wait event.

The second day, I was still thinking about the first AWR report when I received a second AWR corresponding to a fresh problem that occurred right an hour ago. And here below is that new report:

Top 5 Timed Events                                         Avg %Total
54     ~~~~~~~~~~~~~~~~~~                                        wait   Call
55     Event                                 Waits    Time (s)   (ms)   Time Wait Class
56     ------------------------------ ------------ ----------- ------ ------ ----------
57     read by other session               896,860       5,670      6   67.7   User I/O
58     db file sequential read             417,699       2,551      6   30.4   User I/O
59     CPU time                                            184           2.2
60     control file parallel write           2,407           3      1    0.0 System I/O
61     log file parallel write               2,921           2      1    0.0 System I/O

All this seems now very coherent. There is no TX-row lock contention wait event in this new report. There is more than 67% of the response time have been spent on the read by other session wait event while the remaining 30% has been allocated to the db file sequential read wait event.

A “read by other session” is a wait event that occurs when one session wants to read a block another session is reading from disk and loading into the memory (buffer cache). In short, I want to read a block from disk and someone else is reading it for me; hence I have to wait. But in my actual situation I have many sessions trying to read the same block from the disk. This means that the object those sessions are trying to read can’t be kept in the buffer cache or the cache is filled by other process so that our hot object can’t stay in the buffer cache generating this “read by other session wait” event and consequently more physical I/O.

So what else then?

Generally, there is a correlation between physical I/O, read by other session and “Sql Ordered by read” part of the AWR. This is why I’ve drilled down to this part to observe this:

SQL ordered by Reads           DB/Inst: xxxx/xxxx  Snaps: 13491-13492
-> Total Disk Reads:         506,742
-> Captured SQL account for      2.0% of Total
Reads              CPU     Elapsed
Physical Reads  Executions    per Exec   %Total Time (s)  Time (s)    SQL Id
-------------- ----------- ------------- ------ -------- --------- -------------
5,203           7         743.3    1.0    10.48     39.39 a6hkd1d84h84p
Module: JDBC Thin Client
SELECT  tab1.ID
      , tab1.col2
      , tab1.id2
      , tab1.col1
      , tab1.col3
      , tab1.col4
      , tab1.col5
      , tab1.col6
      , tab1.col7
      , tab1.col8
      , tab1.col9
      , tab1.col10
      , tab1.col11
      , tab2.ID
      , tab3.col3
     , tab4.col4
FROM tab1 LEFT OUTER JOIN tab2
    ON tab1.acl_id = tab2.ID
LEFT OUTER JOIN tab3
     ON tab1.ID = tab3.node_id
LEFT OUTER JOIN tab4
     ON tab1.ID = tab4.node_id
WHERE tab1.ID = :1
AND tab1.id2 IN (:2, :3, :4, :5, :6, :7, :8);

Ah we’ve narrowed the problem now up to the above query.

So what else then?

Since we’ve the SQL responsible of the important part of physical I/O, let’s drill down to the ‘Segment per physical read’ part of the AWR in order to localize the ‘hot’ object:

Segments by Physical Reads      DB/Inst: xxxx/xxxx  Snaps: 13491-13492
-> Total Physical Reads:         506,742
-> Captured Segments account for    98.8% of Total

Tablespace                      Subobject  Obj.      Physical
Owner         Name    Object Name            Name     Type         Reads  %Total
---------- ---------- -------------------- ---------- ----- ------------ -------
xxxx        xxxx       SYS_C0010893                    INDEX      472,853   93.31
xxxx        xxxx       TABLE_T3                        TABLE        9,051    1.79
xxxx        xxxx       FK_TABLE_T3                     INDEX        7,036    1.39
xxxx        xxxx       SYS_C0010896                    INDEX        2,918     .58
xxxx        xxxx       TABLE_T1                        TABLE        2,252     .44
-------------------------------------------------------------

Oh yes; there is an index there (SYS_C0010893 ) which is consuming 93% of the whole physical I/O.

Finally, I asked to have the explain plan of the above SQL query which I have reproduced here below (explain plan has been taken with the help of DBMS_DISPLAY_AWR package):

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                       | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                |                         |       |       |  4891 (100)|          |
|   1 |  NESTED LOOPS OUTER             |                         |  8487 |  2320K|  4891   (1)| 00:00:59 |
|   2 |   NESTED LOOPS OUTER            |                         |   758 |   167K|  1313   (1)| 00:00:16 |
|   3 |    NESTED LOOPS OUTER           |                         |   187 | 40392 |   751   (0)| 00:00:10 |
|   4 |     INLIST ITERATOR             |                         |       |       |            |          |
|   5 |      TABLE ACCESS BY INDEX ROWID| TABLE_T1                |   187 | 26367 |   377   (0)| 00:00:05 |
|   6 |       INDEX UNIQUE SCAN         | SYS_C0010893            |   187 |       |   193   (0)| 00:00:03 |
|   7 |     TABLE ACCESS BY INDEX ROWID | TABLE_T2                |     1 |    75 |     2   (0)| 00:00:01 |
|   8 |      INDEX UNIQUE SCAN          | SYS_C0010774            |     1 |       |     1   (0)| 00:00:01 |
|   9 |    INDEX RANGE SCAN             | SYS_C0010896            |     4 |    40 |     3   (0)| 00:00:01 |
|  10 |   TABLE ACCESS BY INDEX ROWID   | TABLE_T3                |    11 |   594 |     5   (0)| 00:00:01 |
|  11 |    INDEX RANGE SCAN             | FK_TABLE_T3             |    11 |       |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Where I see the presence of this hot SYS_C0010893 index.
Oh I will not finish without displaying the buffer pool advisory part of the problematic AWR:

            Snap Id      Snap Time         Sessions Curs/Sess
            --------- ------------------- --------- ----------
Begin Snap:   13491     25-Mei-11 09:00:56        36       6.1
End Snap:     13492     25-Mei-11 10:00:30        36       6.1
Elapsed:                59.56 (mins)
DB Time:                109.69 (mins)

 

Cache Sizes
~~~~~~~~~~~                       Begin        End
                                 ---------- ----------
               Buffer Cache:       776M       776M  Std Block Size:         8K
               Shared Pool Size:   96M        96M      Log Buffer:     6,924K

Load Profile
~~~~~~~~~~~~                            Per Second       Per Transaction
                                        ---------------  ---------------
                Redo size:              1,432.57                159.56
                Logical reads:            984.03                109.60
                Block changes:              9.01                  1.00
                Physical reads:           141.79                 15.79
                Physical writes:            1.09                  0.12
                User calls:                47.98                  5.34
                Parses:                     7.37                  0.82
                Hard parses:                0.63                  0.07
                Sorts:                      3.66                  0.41
                Logons:                     0.05                  0.01
                Executes:                  14.46                  1.61
                Transactions:               8.98

% Blocks changed per Read:    0.92    Recursive Call %:    75.36
Rollback per transaction %:   97.99       Rows per Sort:     8.40

Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait             %:   76.32       Redo NoWait %:  100.00
Buffer  Hit               %:   85.59    In-memory Sort %:  100.00
Library Hit               %:   88.79        Soft Parse %:   91.43
Execute to Parse          %:   49.02         Latch Hit %:   91.43
Parse CPU to Parse Elapsd %:   44.68     % Non-Parse CPU:   98.06

Shared Pool Statistics        Begin    End
                             ------  ------
            Memory Usage %:   78.60   77.65
   % SQL with executions>1:   80.50   85.53
 % Memory for SQL w/exec>1:   76.65   82.61

Buffer Pool Advisory                   DB/Inst: xxxx/xxxx  Snap: 13492
-> Only rows with estimated physical reads >0 are displayed
-> ordered by Block Size, Buffers For Estimate
Est
Phys
Size for   Size      Buffers for   Read          Estimated
P    Est (M) Factor         Estimate Factor     Physical Reads
--- -------- ------ ---------------- ------ ------------------
D         72     .1            8,991    1.5          1,955,905
D        144     .2           17,982    1.5          1,919,097
D        216     .3           26,973    1.5          1,905,107
D        288     .4           35,964    1.4          1,869,577
D        360     .5           44,955    1.4          1,852,229
D        432     .6           53,946    1.3          1,743,851
D        504     .6           62,937    1.3          1,738,194
D        576     .7           71,928    1.3          1,727,611
D        648     .8           80,919    1.1          1,367,322
D        720     .9           89,910    1.0          1,295,649
D        776    1.0           96,903    1.0          1,293,144   ----- actual size
D        792    1.0           98,901    1.0          1,292,411
D        864    1.1          107,892    1.0          1,242,738
D        936    1.2          116,883    0.9          1,212,406
D      1,008    1.3          125,874    0.9          1,132,996
D      1,080    1.4          134,865    0.8          1,029,577
D      1,152    1.5          143,856    0.7            929,565
D      1,224    1.6          152,847    0.6            732,968  ----- suggested size
D      1,296    1.7          161,838    0.6            720,034
D      1,368    1.8          170,829    0.6            714,496
D      1,440    1.9          179,820    0.6            712,110
-------------------------------------------------------------

Knowing the limitation I have that is  (a) I can’t change the query (b) I can’t create a new index and as far as it is a problem of physical I/O perpetuated by a “read by other sessions” wait event, I have suggested  the client  to increase the buffer cache size from 776MB to 1224MB.

I am still waiting for their feedback.

Blog at WordPress.com.

Tony's Oracle Tips

Tony Hasler's light hearted approach to learning about Oracle

Richard Foote's Oracle Blog

Focusing Specifically On Oracle Indexes, Database Administration and Some Great Music

Hatem Mahmoud's blog

Just another blog : Databases, Linux and other stuffs

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Diagnostician

Performance troubleshooting as exact science

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)