Mohamed Houri’s Oracle Notes

April 29, 2017

Index and mixed order by

Filed under: Index,Tuning — hourim @ 12:10 pm

The orderly arrangement of the index keys in the leaf blocks makes it possible for Oracle to traverse the index both backward and forward. It renders, as such, the CBO in the plausible capacity of eliminating a costly order by operation. This order by elimination is, however, possible provided the index is used and the query order is wanted in the exact direction of the index columns backward or forward.

Let’s clarify this with a simple example

The model

create table t1 
as select 
    rownum n1
   ,trunc((rownum-1)/3) n2
   ,mod(rownum,5) n3
   ,lpad('x',5) v1
from dual 
connect by level <=1e3;

create index t1_idx on t1(n1, n2, n3);

explain plan for 
select * from t1 
where n1 <=5 
order by 
      n1,n2, n3;

select * from table(dbms_xplan.display);

------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | 
------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     5 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1     |     5 |
|*  2 |   INDEX RANGE SCAN          | T1_IDX |     5 |
------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"<=5)

You should note in the above expected execution plan the absence of a SORT ORDER BY operation. This has been possible because the order in which the query result set is wanted matches perfectly the order in which the columns have been arranged in the index. Oracle is also able to avoid a SORT operation if the order of the query result set is wanted in the exact reverse direction of the index columns as shown below:

explain plan for 
select * from t1 
where n1 <=5 
order by 
      n1 desc,n2 desc, n3 desc;

select * from table(dbms_xplan.display);

-------------------------------------------------------
| Id  | Operation                    | Name   | Rows  |
-------------------------------------------------------
|   0 | SELECT STATEMENT             |        |     5 |
|   1 |  TABLE ACCESS BY INDEX ROWID | T1     |     5 |
|*  2 |   INDEX RANGE SCAN DESCENDING| T1_IDX |     5 |
-------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"<=5)

Again the SORT ORDER BY operation has been avoided thanks to the ability of Oracle to traverse the index in a descending order (backward). However, if the result set is wanted in an order that is neither in the index direction nor in its opposite one, then it will be impossible for Oracle to avoid a SORT operation even when the index is used as the following demonstrates:

explain plan for 
select * from t1 
where n1 <=5 
order by 
      n1 ,n2 desc, n3 ;

select * from table(dbms_xplan.display);

---------------------------------------------------------------
| Id  | Operation                            | Name   | Rows  |
---------------------------------------------------------------
|   0 | SELECT STATEMENT                     |        |     5 |
|   1 |  SORT ORDER BY                       |        |     5 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1     |     5 |
|*  3 |    INDEX RANGE SCAN                  | T1_IDX |     5 |
---------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("N1"<=5)

As you can see, this time, in contrast to the first two cases, a SORT ORDER BY operation kicks in. This operation becomes mandatory since the result set has been desired in an order that doesn’t match the order of the index columns. Spot in passing that when Oracle avoids a SORT ORDER BY it doesn’t visit the T1 table in a BATCHED mode and vice versa. I have already wrote a note about this here.

If you are in a situation where you want absolutely to avoid a costly SORT ORDER BY operation then you need to create a new matching index like the following one:

alter index t1_idx invisible;

create index t1_idx2 on t1(n1,n2 desc, n3);

explain plan for 
select * from t1 
where n1 <=5 
order by 
      n1 ,n2 desc, n3 ;

select * from table(dbms_xplan.display);

---------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | 
---------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     5 |    85 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1      |     5 |    85 |
|*  2 |   INDEX RANGE SCAN          | T1_IDX2 |     5 |       |
---------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"<=5)

The same desired order could also be achieved without a SORT ORDER BY operation by creating the following index:

drop index t1_idx2;

create index t1_idx2 on t1(n1 desc, n2, n3 desc);

explain plan for 
select * from t1 
where n1 <=5 
order by 
      n1 ,n2 desc, n3 ;

select * from table(dbms_xplan.display);

--------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | 
--------------------------------------------------------
|   0 | SELECT STATEMENT             |         |     5 | 
|   1 |  TABLE ACCESS BY INDEX ROWID | T1      |     5 |  
|*  2 |   INDEX RANGE SCAN DESCENDING| T1_IDX2 |     9 | 
--------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(SYS_OP_DESCEND("N1")>=HEXTORAW('3EF9FF') AND SYS_OP_DESCEND("N1") 
              IS NOT NULL)
       filter(SYS_OP_UNDESCEND(SYS_OP_DESCEND("N1"))<=5)

Looking at the above exotic predicate I am not sure that this last case will always outperform the preceding one.

So What?

Since you know roughly how an index should be designed to make a sort operation avoidable, are you going to change the ORDER BY clause to suit an existing index or create a new index to match the ORDER BY clause? The answer to this question has been magnificently given by Peter Gulutzan and Trudy Pelzer in their SQL Performance Tuning book “In both cases, we would have to say No. If you did either one, you’d be crossing the boundary between “taking advantage of a side effect” and “depending on a side effect.”

Because indexes exist mainly to support WHERE clause rather than ORDER BY I would very probably suggest to not change the existing index. Particularly when real life experience confirms that it is easier to add an index than to drop an existing one. But when we know how a well designed index can precisely cover a query and eventually avoid a sort operation, we can either challenge the client for the order in which he wants his result set or, all things being equal, design an index to match the order by clause.

Summary

Watch out your critical queries necessitating a certain order. If you are going to design an index covering one of their predicates, and all things being equal, think about designing this index so that its columns will match the query order by clause either backward or forward. But you shouldn’t being always forcing the CBO to use an index merely because it will avoid a SORT operation.

PS: this post has been inspired by an excellent blog on PostgreSQL by Bruce Momjian

Advertisements

April 8, 2015

The dark side of using bind variables : sharing everything

Filed under: Tuning — hourim @ 5:48 pm

An interesting academic situation happened last week which I, honestly believe, is worth a blog article as far as experienced DBA have spent time trying to solve it without success. An overnight job was running for hours in the night from 01/04 to 02/04. The on call DBA spent all the night killing and re-launching the job (sql_id) several attempts without any success. When I arrived at work the next day I was asked to help. As far as this job was still running, I generated the Real Time SQL monitoring report (RTSM) for the corresponding sql_id which showed the classical NESTED LOOP having a huge outer data row set driving an inner data set in which at least 50 different operations have been started 519K times while one operation has been executed 2M times. The corresponding execution plan contains 213 operations. The underlying query uses 628 user bind variables and 48 system generated bind variables (thanks to cursor sharing set to FORCE)

SQL Plan Monitoring Details (Plan Hash Value=1511784243)

Global Information
------------------------------
 Status              :  EXECUTING               
 Instance ID         :  2                       
 Session             :  xxxxx (350:9211)   
 SQL ID              :  dmh5vhkcm877v           
 SQL Execution ID    :  33554436                
 Execution Started   :  04/02/2015 07:52:03     
 First Refresh Time  :  04/02/2015 07:52:47     
 Last Refresh Time   :  04/02/2015 10:04:28     
 Duration            :  7947s                   
 Module/Action       :  wwwww
 Service             :  zzzzz               
 Program             :  wwwww  
 DOP Downgrade       :  100%    

Global Stats
===================================================================================
| Elapsed |   Cpu   |    IO    | Application | Concurrency | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   |  Waits(s)   |  Gets  | Reqs | Bytes |
====================================================================================
|    7900 |    7839 |       20 |        0.00 |        0.82 |   243M | 5946 | 659MB |
====================================================================================

The above 7,839 seconds spent consuming CPU with almost no user wait time represents the classical wrong NESTED LOOP operation starting many times several inner operations as mentioned above.

The job was running without any sign of improvement, the client was waiting for its critical report and I have a query with 700 bind variables honored via an execution plan of 213 operations to figure out how to make this report finishing smoothly as soon as possible.

I was dissecting the execution plan when the end user send me an e-mail saying that the same job ran successfully yesterday within 6 min. With that information in mind I have managed to get the RTSM of the yesterday successful job. The first capital information was that the yesterday query and the today not ending one used the same plan_hash_value (same execution plan). Comparing the 628 input bind variable values of both runs, I found that the yesterday job ran for a one month period (monthly job) while the current job is running for a one day interval (daily job).Of course the end user has not supplied any information about the kind of job they are currently running compared to the previous one. All what I have been told is the yesterday job completed in 6 minutes. It is only until I’ve found the difference in the input bind variable values that the end user said “the current run is for the daily job while the previous one was for the monthly job”.

And the sun starts rising. I was able to figure out that the two set of bind variables are not doing the same amount of work and sharing the same execution plan is probably not a good idea.  This is why I have suggested the DBA to do the following:

  • Kill the not ending session
  • Purge the sql_id from the shared pool
  • Ask the end user to re-launch the job
  • Cross fingers 🙂

And you know what?  The job completed within a couple of hundreds of seconds:

SQL Plan Monitoring Details (Plan Hash Value=2729107228)

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)         
 Instance ID         :  2                       
 Session             :  xxxxx (1063:62091) 
 SQL ID              :  dmh5vhkcm877v           
 SQL Execution ID    :  33554437                
 Execution Started   :  04/02/2015 10:43:17     
 First Refresh Time  :  04/02/2015 10:43:20     
 Last Refresh Time   :  04/02/2015 10:47:38     
 Duration            :  261s                    
 Module/Action       :  wwwww
 Service             :  zzzzz
 Program             :  wwwww
 Fetch Calls         :  57790     

Global Stats
==============================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Calls |  Gets  | Reqs | Bytes |
==============================================================================
|     134 |     107 |     7.10 |          17 | 57790 |    18M | 7857 | 402MB |
==============================================================================

This is the dark side of using bind variable: when sharing resource we share also execution plan. The current daily job was running using the plan optimized for the monthly job. The solution was to force the CBO compiling a new execution plan for the new input bind variable. The new plan (2729107228) is still showing 200 operations and several operations started 578K times. I have the intention to study both execution plans to know exactly from where the enhancement is coming. The clue here might be that in the first shared monthly execution plan the query, for a reason I am unable to figure out, run serially

 DOP Downgrade       :  100%   

While the new hard parsed execution has been executed in parallel:

 Parallel Execution Details (DOP=4 , Servers Allocated=20)

Bottom Line: when you have the intention to run a critical report once per day (and once per month) then it is worth to let the CBO compiling a new execution plan for each execution. All what you will have is one hard parse for one execution. This will never hurt from a memory and CPU point of view

April 6, 2015

BIND_EQUIV_FAILURE – Or when you will regret using Adaptive Cursor Sharing

Filed under: Tuning — hourim @ 4:42 pm

Previously, when I was asked to define the Adaptive Cursor Sharing(ACS) feature I’ve often used the following definition: “it represents an answer to the always threating and challenging Oracle task of sharing cursors and optimizing SQL”.

Time passes and I have altered a little bit this definition to become: “it represents a short answer to the always threating and challenging Oracle task of sharing cursors and optimizing SQL’’.

Time passes again and I ended up by drastically altering my initial ACS definition to become:“In certain very plausible situations, It might represent a serious threat for your application where you will be happy to disable it provided you have enough experience to identify the link between ACS and your threat”.

If you want to know what has changed my mind about ACS, then follow this situation taken from a real life running system I am going to summarize:

When you see something like this in the library cache (11.2.0.3.0)

SQL> select
        sql_id
       ,count(1)
     from
        v$sql
     where executions < 2
     group by sql_id
     having count(1) > 10
     order by 2 desc;

SQL_ID          COUNT(1)
------------- ----------
7zwq7z1nj7vga      44217

You start wondering what makes this sql_id having such a big count of different versions in memory.

After few minutes of investigation you end up by ruling out the bind variable hypothesis. And then you finish by asking yourself what the heck is this sql_id?

Hopefully Tanel Poder nonshared script shed a small light on that:

SQL> @nonshared 7zwq7z1nj7vga
Show why existing SQL child cursors were not reused (V$SQL_SHARED_CURSOR)...

-----------------
SQL_ID               : 7zwq7z1nj7vga
ADDRESS              : 000000406DBB30F8
CHILD_ADDRESS        : 00000042CE36F7E8
CHILD_NUMBER         : 0
BIND_EQUIV_FAILURE   : Y
REASON               :<ChildNode><ChildNumber>0</ChildNumber><ID>40</ID>
                      <reason>Bindmismatch(33)</reason><size>2x4</size>
                      <init_ranges_in_first_pass>0</init_ranges_in_first_pass>
                       <selectivity>1097868685</selectivity>
                      </ChildNode>

-----------------
SQL_ID               : 7zwq7z1nj7vga
ADDRESS              : 000000406DBB30F8
CHILD_ADDRESS        : 00000045B5C5E478
CHILD_NUMBER         : 1
BIND_EQUIV_FAILURE   : Y
REASON               :<ChildNode><ChildNumber>1</ChildNumber><ID>40</ID>
                      <reason>Bindmismatch(33)</reason><size>2x4</size>
                      <init_ranges_in_first_pass>0</init_ranges_in_first_pass>
                      <selectivity>915662630</selectivity>
                      </ChildNode>
-----------------
SQL_ID               : 7zwq7z1nj7vga
ADDRESS              : 000000406DBB30F8
CHILD_ADDRESS        : 00000038841E2868
CHILD_NUMBER         : 2
BIND_EQUIV_FAILURE   : Y
REASON               :<ChildNode><ChildNumber>2</ChildNumber><ID>40</ID>
                      <reason>Bindmismatch(33)</reason><size>2x4</size>
                      <init_ranges_in_first_pass>0</init_ranges_in_first_pass>
                      <selectivity>163647208</selectivity>
                      </ChildNode>
-----------------
SQL_ID               : 7zwq7z1nj7vga
ADDRESS              : 000000406DBB30F8
CHILD_ADDRESS        : 00000038841E2708
CHILD_NUMBER         : 3
BIND_EQUIV_FAILURE   : Y
REASON               :<ChildNode><ChildNumber>3</ChildNumber><ID>40</ID>
                      <reason>Bindmismatch(33)</reason><size>2x4</size>
                      <init_ranges_in_first_pass>0</init_ranges_in_first_pass>
                      <selectivity>4075662961</selectivity>
                      </ChildNode>

…/…

-----------------
SQL_ID               : 7zwq7z1nj7vga
ADDRESS              : 000000406DBB30F8
CHILD_ADDRESS        : 00000045B5C5E5D8
CHILD_NUMBER         : 99
BIND_EQUIV_FAILURE   : Y
REASON               :<ChildNode><ChildNumber>99</ChildNumber><ID>40</ID>
                      <reason>Bindmismatch(33)</reason><size>2x4</size>
                      <init_ranges_in_first_pass>0</init_ranges_in_first_pass>
                      <selectivity>3246589452</selectivity>
                      </ChildNode>

Moreover a direct select on the v$sql_shared_cursor shows this:

SQL> select
       count(1)
     from
         v$sql_shared_cursor
     where
        sql_id = '7zwq7z1nj7vga';

  COUNT(1)
----------
     45125

SQL> select
        count(1)
     from
       v$sql_shared_cursor
     where
        sql_id = '7zwq7z1nj7vga'
     and BIND_EQUIV_FAILURE = 'Y';

  COUNT(1)
----------
     45121

Hmmm…. A huge count of non shared child cursors due to BIND_EQUIV_FAILURE.

The official Oracle documentation about BIND_EQUIV_FAILURE says : the bind value’s selectivity does not match that used to optimize the existing child cursor.This definition together with the selectivity xml tag mentioned above gave me a first clue: Adaptive Cursor Sharing (in this case Extended Cursor Sharing).

SQL> select
       count(1)
    from
        v$sql_cs_selectivity
    where
      sql_id = '7zwq7z1nj7vga';

  COUNT(1)
----------
  16,847,320

That is an impressive number of records in this dynamic view. For a single sql_id we have about 17 million of rows in this ACS monitoring view!!! This is dramatically altering the execution time of the underlying sql_id query.

If you don’t know what v$sql_cs_selectivity view stands for then look:

Once a cursor becomes bind aware, each time this cursor is executed, the Extended Cursor Sharing layer code peeks at the bind variable values (and in this particular case there are 9 bind variables), and execute, behind the scene, a select against v$sql_cs_selectivity view in order to check if any existing child cursor already covers the selectivity of the peeked bind variables. If a child cursor is found it will be shared. If not then a new child cursor is optimized and inserted into v$sql_cs_selectivity with a new range of bind variable value selectivity.

In this particular case each time the Extended Cursor Sharing layer code fails to find a child cursor in v$sql_cs_selectivity with an adequate range of selectivity(BIND_EQUIV_FAILURE) and compile a new execution plan ending up by filling dramatically v$sql view with multiple “optimal” plans.

We have been asked to use ACS to answer the need of sharing cursor and optimizing SQL. We end up by having neither the first nor the second desire in this particular and very plausible case.

Few extra words about this runing system case:

  • Query is using 9 bind variables in 9 different predicates
  • Columns on which bind variables are used have histograms(Frequency and Height Balanced) collected on them
  • Query is a simple select on a single heap table
  • Table has 4 indexes (which produces 4 distinct execution plans among those 99 ones)

 

March 16, 2015

Diagnosing the past

Filed under: Tuning — hourim @ 8:08 pm

This is a simple note about diagnosing a situation that happened in the past.

A running application suffered a delay in many of its multi-user insert statements blocked on an enq: TM-row lock contention.

Capture2

The on call DBA was just killing the culprit blocking session when I received an e-mail asking to investigate the root cause of this lock . As far as this was a task of diagnosing a very recent past, using v$active_session_history imposes itself:

select sql_id,event, count(1)
from v$active_session_history
where sample_time > to_date('05032015 10:00:00','ddmmrrrr hh24:mi:ss')
and   sample_time < to_date('05032015 11:00:00','ddmmrrrr hh24:mi:ss')
and event like '%TM%'
group by sql_id,event
order by 3 desc;

3fbwp7qdqxk9v     enq: TM - contention    1

Surprisingly there was only 1 recorded enq: TM-row lock contention wait event during the exact same period of blocked insert statements; and the corresponding sql_id has nothing to do with the blocked inserts.

I stumped few minutes looking bizarrely to the above select and started thinking about the basics which say:

  • v$active_session_history is a sample of all active sessions taken every second.
  • dba_hist_active_sess_history isa one-in-ten samples of v$active_session_history

I knew as well that v$active_session_history being an in-memory buffer the retention period is henceforth depending on the size of the buffer and the volume of active sessions. But, I was diagnosing almost a real time situation; so why there were no  “functional” enq: TM-row lock contention in v$active_session_history?

Finally, I decided to use the less precise one-in-ten sample table and this is what I’ve got:

select sql_id,event, count(1)
from dba_hist_active_sess_history
where sample_time > to_date('05032015 10:00:00','ddmmrrrr hh24:mi:ss')
and   sample_time < to_date('05032015 11:00:00','ddmmrrrr hh24:mi:ss')
and event like '%TM%'
group by sql_id,event
order by 3 desc;

53xthsbv8d7yk     enq: TM - contention    4878
1w95zpw2fy021     enq: TM - contention    340
35ghv3bugv22a     enq: TM - contention    264
8b9nqpzs24n0t     enq: TM - contention    163
aqdaq2ybqkrpa     enq: TM - contention    156
50gygyqsha3nr     enq: TM - contention    103
fzfvzhjg0p6y0     enq: TM - contention    82
bs359cfsq4fvc     enq: TM - contention    80
15xpc3um0c3a2     enq: TM - contention    58
d0rndrymh0b18     enq: TM - contention    49
864jbgkbpvcnf     enq: TM - contention    40
9cn21y7hbya46     enq: TM - contention    36
8419w8jnhfa3m     enq: TM - contention    33
f71jbkdy94pph     enq: TM - contention    5
2zpyy8wbnp5d0     enq: TM - contention    3
0d6gq7b9j522p     enq: TM - contention    2

Normally what we can see in dba_hist_active_sess_history has certainly travelled via v$active_session_history; and the more recent is the situation the more is the chance we have to find this situation mentioned in v$active_session_history. Why then I have not found what I was expecting? Before answering this simple question let me tell you few words about how I have explained this TM lock and the solution I have proposed to get rid of it.

TM-enqueue is almost always related to an unindexed foreign key. The session killed by the DBA was deleting from a parent table (parent_tab). The underlying child table (child_tab) was pointing to this “deleted” parent table via an unindexed foreign key which consequently has been locked. In the meantime, the application was inserting concurrently into other tables. Those inserts add child values coming from the above locked child_tab table (which becomes a parent table in the eyes of the insert statements) which Oracle has to check their existence in the locked “parent” child_tab table and hence the lock sensation reported by the end user during their insert statements. The solution was simply to index the foreign key in the child_tab.

Back to the reason that prompted me to write this note, why v$active_session_history is not showing the same very recent TM lock as the dba_hist_active_sess_history? The answer is simply because I forget that the application is running under RAC instance and I was pointing to the other node of the RAC so that when I issued the same select against gv$ the discrepancy between the two view ceases immediately as shown below:

SQL> select sql_id,event, count(1)
    from gv$active_session_history
    where sample_time > to_date('05032015 10:00:00','ddmmrrrr hh24:mi:ss')
    and   sample_time < to_date('05032015 11:00:00','ddmmrrrr hh24:mi:ss')
    and event like '%TM%'
    group by sql_id,event
    order by 3 desc;

SQL_ID        EVENT                       COUNT(1)
------------- --------------------------- ---------
53xthsbv8d7yk enq: TM - contention        48483
1w95zpw2fy021 enq: TM - contention        3370
35ghv3bugv22a enq: TM - contention        2635
8b9nqpzs24n0t enq: TM - contention        1660
aqdaq2ybqkrpa enq: TM - contention        1548
50gygyqsha3nr enq: TM - contention        1035
fzfvzhjg0p6y0 enq: TM - contention        821
bs359cfsq4fvc enq: TM - contention        801
15xpc3um0c3a2 enq: TM - contention        585
d0rndrymh0b18 enq: TM - contention        491
864jbgkbpvcnf enq: TM - contention        378
9cn21y7hbya46 enq: TM - contention        366
8419w8jnhfa3m enq: TM - contention        331
f71jbkdy94pph enq: TM - contention        46
2zpyy8wbnp5d0 enq: TM - contention        33
0d6gq7b9j522p enq: TM - contention        15
dmpafdd7anvrw enq: TM - contention        1
3fbwp7qdqxk9v enq: TM - contention        1

select sql_id,event, count(1)
from dba_hist_active_sess_history
where sample_time > to_date('05032015 10:00:00','ddmmrrrr hh24:mi:ss')
and   sample_time < to_date('05032015 11:00:00','ddmmrrrr hh24:mi:ss')
and event like '%TM%'
group by sql_id,event
order by 3 desc;

53xthsbv8d7yk     enq: TM - contention    4878
1w95zpw2fy021     enq: TM - contention    340
35ghv3bugv22a     enq: TM - contention    264
8b9nqpzs24n0t     enq: TM - contention    163
aqdaq2ybqkrpa     enq: TM - contention    156
50gygyqsha3nr     enq: TM - contention    103
fzfvzhjg0p6y0     enq: TM - contention    82
bs359cfsq4fvc     enq: TM - contention    80
15xpc3um0c3a2     enq: TM - contention    58
d0rndrymh0b18     enq: TM - contention    49
864jbgkbpvcnf     enq: TM - contention    40
9cn21y7hbya46     enq: TM - contention    36
8419w8jnhfa3m     enq: TM - contention    33
f71jbkdy94pph     enq: TM - contention    5
2zpyy8wbnp5d0     enq: TM - contention    3
0d6gq7b9j522p     enq: TM - contention    2

Spot by the way the formidable illustration of the basics mentioned earlier in this article i.e. dba_hist_active_sess_history is a one-in-ten samples of v$active_session_history

48483/10 ~ 4878 
3370/10  ~  340
2635/10  ~ 264
1660/10  ~ 163
1548/10  ~ 156

February 14, 2015

Resumable time out

Filed under: Tuning — hourim @ 9:47 am

I was trying to create a very big table through a create table as select from a join between two huge tables forcing both parallel DDL and parallel QUERY and was mainly concerned about the amount of temp space the hash join between those two big tables will require to get my table smoothly created. I said hash join because my first attempt to create this table used a nested loop join which was driving a dramatic 100 million starts of table access by index rowid since more than 10 hours when I decided to kill the underlying session. The create table being a one-shot process I decided to hint the optimizer so that it will opt for a hash join operation instead of the initial nested loop. But my concern has been transferred from a long running non finishing SQL statement to how much my create table will need of TEMP space in order to complete successfully.

I launched the create table and started monitoring it with Tanel Poder snapper script, v$active_session_history and the Real Time SQL Monitoring feature (RTSM). This is what the monitoring was showing

SQL> select 
          event
	 ,count(1) 
      from 
          v$active_session_history 
      where sql_id = '0xhm7700rq6tt' 
      group by event 
      order by 2 desc;


EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
direct path read                                                       3202
direct path write temp                                                 1629
                                                                        979
db file scattered read                                                    8
CSS initialization                                                        7
CSS operation: query                                                      1

However few minutes later Tanel Poder snapper started showing the following dominant wait event

SQL> @snapper ash 5 1 all

----------------------------------------------------------------------------------------------------
Active% | INST | SQL_ID          | SQL_CHILD | EVENT                               | WAIT_CLASS
----------------------------------------------------------------------------------------------------
   800% |    1 | 0xhm7700rq6tt   | 0         | statement suspended, wait error to  | Configuration

--  End of ASH snap 1, end=2015-02-12 09:12:57, seconds=5, samples_taken=45

Event which keeps incrementing in v$active_session_history as shown below:

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
statement suspended, wait error to be cleared                          7400
direct path read                                                       3202
direct path write temp                                                 1629
                                                                        979
db file scattered read                                                    8
CSS initialization                                                        7
CSS operation: query                                                      1

7 rows selected.

SQL> /

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
statement suspended, wait error to be cleared                          7440
direct path read                                                       3202
direct path write temp                                                 1629
                                                                        979
db file scattered read                                                    8
CSS initialization                                                        7
CSS operation: query                                                      1

7 rows selected.

SQL> /

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
statement suspended, wait error to be cleared                          7480
direct path read                                                       3202
direct path write temp                                                 1629
                                                                        979
db file scattered read                                                    8
CSS initialization                                                        7
CSS operation: query                                                      1

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
statement suspended, wait error to be cleared                         11943
direct path read                                                       3202
direct path write temp                                                 1629
                                                                        980
db file scattered read                                                    8
CSS initialization                                                        7
CSS operation: query                                                      1

The unique wait event that was incrementing is that bizarre statement suspended, wait error to be cleared. The session from which I launched the create table statment was hanging without reporting any error.

As far as I was initially concerned by the TEMP space I checked the available space there

SQL> select 
       tablespace_name,
       total_blocks,
       used_blocks,
       free_blocks
     from v$sort_segment;

TABLESPACE_NAME                 TOTAL_BLOCKS USED_BLOCKS FREE_BLOCKS
------------------------------- ------------ ----------- -----------
TEMP                                 8191744     8191744           0    

No available free space left in TEMP tablespace

The RTSM of the corresponding sql_id was showing the following

=====================================================================================
| Id    |              Operation              |        Name        |  Rows   | Temp |
|       |                                     |                    |(Actual) |      |
=====================================================================================
|     0 | CREATE TABLE STATEMENT              |                    |       0 |      |
|     1 |   PX COORDINATOR                    |                    |         |      |
|     2 |    PX SEND QC (RANDOM)              | :TQ10003           |         |      |
|     3 |     LOAD AS SELECT                  |                    |         |      |
|     4 |      HASH GROUP BY                  |                    |         |      |
|     5 |       PX RECEIVE                    |                    |         |      |
|     6 |        PX SEND HASH                 | :TQ10002           |         |      |
|  -> 7 |         HASH JOIN                   |                    |       0 |  66G | --> spot this 
|       |                                     |                    |         |      |
|       |                                     |                    |         |      |
|       |                                     |                    |         |      |
|     8 |          PX RECEIVE                 |                    |    626M |      |
|     9 |           PX SEND BROADCAST         | :TQ10001           |    626M |      |
|    10 |            HASH JOIN                |                    |     78M |      |
|    11 |             PX RECEIVE              |                    |    372K |      |
|    12 |              PX SEND BROADCAST      | :TQ10000           |    372K |      |
|    13 |               PX BLOCK ITERATOR     |                    |   46481 |      |
|    14 |                INDEX FAST FULL SCAN | INDX_12453656_TABL |   46481 |      |
|    15 |             PX BLOCK ITERATOR       |                    |     88M |      |
|    16 |              TABLE ACCESS FULL      | TABL1              |     88M |      |
|       |                                     |                    |         |      |
| -> 17 |          PX BLOCK ITERATOR          |                    |    682M |      |
| -> 18 |           TABLE ACCESS FULL         | TABLE123456        |    682M |      |
|       |                                     |                    |         |      |
=====================================================================================

The HASH JOIN operation at line 7 has already gone above the available TEMP tablespace size which is 64G. I was wondering then, why my session has not been stopped with an ORA-01652 error instead of hanging there and letting the wait event statement suspended, wait error to be cleared popping up in my different monitoring tools.

After few minutes of googling tuning steps I ended up by executing the following query

SQL>select
        coord_session_id
       ,substr(sql_text, 1,10)
       ,error_msg
    from dba_resumable;

SESSION_ID  SQL        ERROR_MSG
----------- ---------  -------------------------------------------------------------------
146         create /*+  ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

This 146 SID is my create table session! It is in error but hanging and waiting on that statement suspend wait event.

After few minutes of googling tuning again I ended up by checking the following parameter:

SQL> sho parameter resumable

NAME                           TYPE                             VALUE
------------------------------ -------------------------------- ------
resumable_timeout              integer                          9000

Notice now how I finally have made a link between the title and the purpose of this article: resumable_timeout. The Oracle documentation states that when this parameter is set to a nonzero value the resumable space allocation might kicks in.

Put is simply what happens in my case is that since this parameter has been set to 9000 minutes and since my create table as select has encountered an ORA-01652 temp tablespace error, Oracle decided to put my session in a resumable state until someone will point out that space shortage,correct the situation and allow the session to continue its normal processing

The next step I did is to abort my session from its resumable state

SQL> exec dbms_resumable.abort(146);
 
PL/SQL procedure successfully completed.

Unfortunately I waited a couple of minutes in front of my sqlplus session waiting for it to resume without success so that I decided to use the brute force

ERROR:
ORA-03113: end-of-file on communication channel
Process ID: 690975
Session ID: 146 Serial number: 3837

Summary
I would suggest to do not set the resumable_timeout parameter to a nonzero value. Otherwise you will be hanging on a statement suspended, wait error to be cleared event during an amount of time indicated by this resumable_timeout parameter looking around for what is making your SQL statement hanging.

Footnote
Here below few extra words from Oracle documentation about Resumable space allocation

“Resumable space allocation avoids headaches and saves time by suspending, instead
of terminating, a large database operation requiring more disk space than is currently
available. While the operation is suspended, you can allocate more disk space on
the destination tablespace or increase the quota for the user. Once the low space
condition is addressed, the large database operation automatically picks up where it
left off.
As you might expect, the statements that resume are known as resumable statements.
The suspended statement, if it is part of a transaction, also suspends the transaction.
When disk space becomes available and the suspended statement resumes, the
transaction can be committed or rolled back whether or not any statements in
the transactions were suspended. The following conditions can trigger resumable
space allocation:
■ Out of disk space in a permanent or temporary tablespace
■ Maximum extents reached on a tablespace
■ User space quota exceeded
You can also control how long a statement can be suspended. The default time
interval is two hours, at which point the statement fails and returns an error message
to the user or application as if the statement was not suspended at all.
There are four general categories of commands that can be resumable:
(1) SELECT statements, (2) DML commands, (3) SQL*Loader operations, and
(4) DDL statements that allocate disk space.”

September 14, 2012

Tuning by TKPROF: a case study

Filed under: Tuning — hourim @ 11:21 am

One of our end users complains about a query that was suffering a performance pain. He sent me the query statement and its corresponding “tkrpofed” 10046 trace events and asked for help. This blog article represents a summary of my step by step tuning investigations of this performance problem

First of all what can we diagnose just with the following bit of information extracted from the tkprof trace?

 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.02       0.02          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch       44     34.35     420.94     132860     161300          0         624
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       48     34.37     420.96     132860     161300          0         624

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS	
Parsing user id: 122  

Nothing?

Not really. Several important information can be derived from the above trace file

1. We have almost the same number of physical reads (132860) and buffer reads (161300). This is a clear symptom of and a pointer to the presence of aggressive full table scans that should be investigated
2. There is a big difference between elapsed time (420.96 s) and CPU time (43.37). This means that the query spent a large amount of time waiting for something that should be investigated
3. The query used a big number of physical and logical read in order to generate a limited number of 624 rows via 2 executions. This means that this query starts by selecting a big number of rows and then threw away a large amount of this initial big number of selected rows. This is in contradiction with the mantra ‘’start small and keep small’’
4. There was one miss in library cache during parse which means that a hard parse have been done during the execution of the query

So, have you noticed how much of information we can diagnose simply by looking at those 4 lines of trace file? In fact there is still another important information which can be derived from the above 4 lines that I have preferred to ignore for the moment. We will be back to it in a moment.

Let’s continue the investigation using the next part of tkprof trace file. What can we derive from the following bit of information?

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                      46        0.00          0.00
  SQL*Net message from client                    46      128.81        387.60
  db file scattered read                       4844        1.28        387.95
  db file sequential read                        76        0.72          1.89

Nothing?

Not really. Oracle spent 387,95 seconds generating data through full table scan (db file scattered read, remember what I’ve said above about the number of physical and logical I/O) and about 387,60 seconds waiting for the client to ask this generated data (SQL*Net message from client) in 46 batches(Times Waited) of 15 rows

You might ask from where I have derived that number of 15 rows per client/server round trip? This is the information which can be immediately derived from the first 4 lines of trace and that I’ve preferred to delay it until this section.

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.02       0.02          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch       44     34.35     420.94     132860     161300          0         624
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       48     34.37     420.96     132860     161300          0         624

Oracle generated 624 rows using 2 executions and 44 fetches:

rows/Fetch = 624/44=14,18 ~ 15

This number represent the default arraysize the query used to exchange information between the server and the client:

So, the time Oracle spent generating data equals the time it waited for the client to ask for it!!! This is a symptom of a problem in the network activity. Changing the default arraysize from its default value of 15 to a higher value might help enhancing the response time of this query.

Look how much information we’ve already collected and remember: we still have not yet investigated the content of the SQL query!!!

Let’s continue our tkprof trace file investigation. What can we identify from the tkprof row source operation reproduced here below?

Rows     Row Source Operation
-------  ---------------------------------------------------
    312  HASH JOIN OUTER (cr=80650 pr=66430 pw=0 time=216,681,765 us)
    312   VIEW  (cr=40325 pr=33215 pw=0 time=114,999,408 us)
    312    MERGE JOIN CARTESIAN (cr=40325 pr=33215 pw=0 time=114,999,403 us)
     26     VIEW  (cr=40325 pr=33215 pw=0 time=114,998,980 us)
     26      HASH GROUP BY (cr=40325 pr=33215 pw=0 time=114,998,974 us)
  19827       HASH JOIN SEMI (cr=40325 pr=33215 pw=0 time=114,911,666 us)
  45426        PARTITION RANGE SINGLE PARTITION: 10 10 (cr=33455 pr=33215 pw=0 time=112,077,047 us)
  45426         TABLE ACCESS FULL ORDERS_MANAGMENTS PARTITION: 10 10 (cr=33455 pr=33215 pw=0 time=112,031,597 us)
1721227        PARTITION RANGE SINGLE PARTITION: 3 3 (cr=6870 pr=0 pw=0 time=7670 us)
1721227         TABLE ACCESS FULL ORDERS_SETTLMENT_TBS PARTITION: 3 3 (cr=6870 pr=0 pw=0 time=7646 us)
    312     BUFFER SORT (cr=0 pr=0 pw=0 time=208 us)
     12      VIEW  (cr=0 pr=0 pw=0 time=72 us)
     12       UNION-ALL  (cr=0 pr=0 pw=0 time=54 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=3 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=2 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=0 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=0 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=0 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=0 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=2 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=1 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=1 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=0 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=0 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=1 us)
    136   VIEW  (cr=40325 pr=33215 pw=0 time=101,678,318 us)
    136    SORT GROUP BY (cr=40325 pr=33215 pw=0 time=101,678,162 us)
  19827     HASH JOIN  (cr=40325 pr=33215 pw=0 time=101,446,253 us)
  45426      PARTITION RANGE SINGLE PARTITION: 10 10 (cr=33455 pr=33215 pw=0 time=98,542,613 us)
  45426       TABLE ACCESS FULL ORDERS_MANAGMENTS PARTITION: 10 10 (cr=33455 pr=33215 pw=0 time=98,542,597 us)
1721227      PARTITION RANGE SINGLE PARTITION: 3 3 (cr=6870 pr=0 pw=0 time=1,729,734 us)
1721227       TABLE ACCESS FULL ORDERS_SETTLMENT_TBS PARTITION: 3 3 (cr=6870 pr=0 pw=0 time=8491 us)

The row source operation represents the real execution plan followed by Oracle during the execution of the query. Let me divide (divide to conquer) this explain plan into 2 mains parts:

part 1

    312   VIEW  (cr=40325 pr=33215 pw=0 time=114,999,408 us)
    312    MERGE JOIN CARTESIAN (cr=40325 pr=33215 pw=0 time=114,999,403 us)
     26     VIEW  (cr=40325 pr=33215 pw=0 time=114,998,980 us)
     26      HASH GROUP BY (cr=40325 pr=33215 pw=0 time=114,998,974 us)
  19827       HASH JOIN SEMI (cr=40325 pr=33215 pw=0 time=114,911,666 us)
  45426        PARTITION RANGE SINGLE PARTITION: 10 10 (cr=33455 pr=33215 pw=0 time=112,077,047 us)
  45426         TABLE ACCESS FULL ORDERS_MANAGMENTS PARTITION: 10 10 (cr=33455 pr=33215 pw=0 time=112,031,597 us)
1721227        PARTITION RANGE SINGLE PARTITION: 3 3 (cr=6870 pr=0 pw=0 time=7670 us)
1721227         TABLE ACCESS FULL ORDERS_SETTLMENT_TBS PARTITION: 3 3 (cr=6870 pr=0 pw=0 time=7646 us)
    312     BUFFER SORT (cr=0 pr=0 pw=0 time=208 us)
     12      VIEW  (cr=0 pr=0 pw=0 time=72 us)
     12       UNION-ALL  (cr=0 pr=0 pw=0 time=54 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=3 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=2 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=0 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=0 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=0 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=0 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=2 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=1 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=1 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=0 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=0 us)
      1        FAST DUAL  (cr=0 pr=0 pw=0 time=1 us)

part 2

     136  VIEW  (cr=40325 pr=33215 pw=0 time=101,678,318 us)
    136    SORT GROUP BY (cr=40325 pr=33215 pw=0 time=101,678,162 us)
  19827     HASH JOIN  (cr=40325 pr=33215 pw=0 time=101,446,253 us)
  45426      PARTITION RANGE SINGLE PARTITION: 10 10 (cr=33455 pr=33215 pw=0 time=98,542,613 us)
  45426       TABLE ACCESS FULL ORDERS_MANAGMENTS PARTITION: 10 10 (cr=33455 pr=33215 pw=0 time=98,542,597 us)
1721227      PARTITION RANGE SINGLE PARTITION: 3 3 (cr=6870 pr=0 pw=0 time=1,729,734 us)
1721227       TABLE ACCESS FULL ORDERS_SETTLMENT_TBS PARTITION: 3 3 (cr=6870 pr=0 pw=0 time=8491 us)

The two parts are hash joined together via the first operation

Rows     Row Source Operation
-------  ---------------------------------------------------
    312  HASH JOIN OUTER (cr=80650 pr=66430 pw=0 time=216,681,765 us)

Producing 312 rows in 216 seconds. Multiply this by 2 executions and you will have the image reported by the tkprof summary i.e. 624 rows in 420 seconds.

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        2      0.02       0.02          0          0          0           0
Execute      2      0.00       0.00          0          0          0           0
Fetch       44     34.35     420.94     132860     161300          0         624
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total       48     34.37     420.96     132860     161300          0         624

If I look carefully to both part1 and part2 I can easily point out that the most consuming operations are:

Rows     Row Source Operation
-------  ------------------------------------------------------------------------------------------------------
45426         TABLE ACCESS FULL ORDERS_MANAGMENTS PARTITION: 10 10 (cr=33455 pr=33215 pw=0 time=112,031,597 us)
45426       TABLE ACCESS FULL ORDERS_MANAGMENTS PARTITION: 10 10 (cr=33455 pr=33215 pw=0 time=98,542,597 us)

Which are consuming 112 and 98 seconds respectively.Well, only 2 operations generating 200 seconds while the entire row source shows a total time of 216 seconds. It is clear that reviewing those two table scans will help enhancing the performance of your query.

You might have already pointed out the MERGE JOIN CARTESIAN operation done on a built table generating 26 rows. I will be back to this later in a new post.

What else? Do we have reached the point at which the SQL query content became essential?
Not really. But we are very close to the border of that necessity. Looking at the number of rows generated by this table access full operation (45426), an immediate question came to my mind: how many rows are there in this table?

  select Count(*) from ORDERS_MANAGMENTS  ----> 8,999,341 

Spot the obvious: the full scan operation is selecting 45426 records from a table having 8,999,341 records using a FULL TABLE SCAN!!! 0,005% of records via a FULL TABLE SCAN. Something is wrong here, isn’t it?
And now we can’t go further ahead without asking the content of the query and other information related to the table and its indexes. But that will be dealt with in the next post.

Create a free website or blog at WordPress.com.

Tony's Oracle Tips

Tony Hasler's light hearted approach to learning about Oracle

Richard Foote's Oracle Blog

Focusing Specifically On Oracle Indexes, Database Administration and Some Great Music

Hatem Mahmoud Oracle's blog

Just another Oracle blog : Database topics and techniques

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Diagnostician

Performance troubleshooting as exact science

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

EU Careers info

Your career in the European Union

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)