Mohamed Houri’s Oracle Notes

September 28, 2016

AWR and superfluous historical statistics

Filed under: ASH — hourim @ 6:11 pm

An active sql_id is subject to a capture and a load into ASH (gv$active_session_history). As long as this sql_id is present in ASH there is a 1/10 chance for it to be captured into AWR persisted historical tables as well. In addition to the ASH gv$active_session_history and its AWR alter-ego dba_hist_active_sess_history table, performance and tuning specialists are extensively using the dba_hist_sqlstat table into which snapshots of gv$sql are periodically sent by the MMON Lite process via the SGA-ASH circular buffer.

One of the queries I am using against this table is the following one which I am referring to it as histstats.sql (I think I have originally hijacked from the internet):

SELECT 
sn.snap_id,
plan_hash_value,
st.sql_profile,
executions_delta execs,
TRUNC(elapsed_time_delta/1e6/DECODE(executions_delta, 0, 1, executions_delta)) avg_etime,
  ROUND(disk_reads_delta    /DECODE(executions_delta,0,1, executions_delta),1) avg_pio,
  ROUND(buffer_gets_delta   /DECODE(executions_delta,0,1, executions_delta), 1) avg_lio ,
  ROUND(rows_processed_delta/DECODE(executions_delta,0, 1, executions_delta), 1) avg_rows
FROM 
  dba_hist_sqlstat st,
  dba_hist_snapshot sn
WHERE st.snap_id        = sn.snap_id
AND sql_id              = '&sql_id'
AND begin_interval_time >= to_date('&date','ddmmyyyy')
ORDER BY 1 ASC;

That’s said how would you read and interpret the following output of the above query taken from a running system for a particular sql_id?

SNAP_ID     PLAN_HASH_VALUE SQL_PROFILE                     EXECS AVG_ETIME AVG_LIO 
----------  --------------- ------------------------------ ----- ---------- ----------      
30838        726354567                                      0       7227    3945460      
30838        726354567                                      0       7227    3945460      
30839       4211030025      prf_2yhzvghb06vh4_4211030025    1          3      28715      
30839       4211030025      prf_2yhzvghb06vh4_4211030025    1          3      28715      
30839        726354567                                      0       7140    5219336      
30839        726354567                                      0       7140    5219336 
30840        726354567                                      0       7203    9389840      
30840        726354567                                      0       7203    9389840      
30840       4211030025      prf_2yhzvghb06vh4_4211030025    0       2817    7831649      
30840       4211030025      prf_2yhzvghb06vh4_4211030025    0       2817    7831649      
30841        726354567                                      0       7192    5200201      
30841        726354567                                      0       7192    5200201      
30841       4211030025      prf_2yhzvghb06vh4_4211030025    0          0          0            
30841       4211030025      prf_2yhzvghb06vh4_4211030025    0          0          0            
30842       4211030025      prf_2yhzvghb06vh4_4211030025    0          0          0      
30842       4211030025      prf_2yhzvghb06vh4_4211030025    0          0          0      
30842        726354567                                      0       4956    3183667      
30842        726354567                                      0       4956    3183667  

Or, to make things less confused, how would you interpret the different rows of snapshot 30841 reproduced below?

SNAP_ID     PLAN_HASH_VALUE SQL_PROFILE                     EXECS AVG_ETIME AVG_LIO 
----------  --------------- ------------------------------ ----- ---------- ----------      
30841        726354567                                      0       7192    5200201      
30841        726354567                                      0       7192    5200201      
30841       4211030025      prf_2yhzvghb06vh4_4211030025    0          0          0            
30841       4211030025      prf_2yhzvghb06vh4_4211030025    0          0          0            

What do those zeros values in the line with a not-null SQL Profile actually mean?

The answer to this question resides in the way Oracle dumps statistics of a SQL statement from standard memory into AWR tables. When an active sql_id having multiple child cursors in gv$sql, the MMON Lite process will (if the sql_id qualify for the capture of course) dump into the dba_hist_sqlstat view statistics of all corresponding distinct child cursors including those not used by the active sql_id at the capture time (even the statistics of non-shareable cursors are dumped into this view provided these cursors are present in gv$sql at the capture moment).

For example in the above list you can see that we have 4 execution statistics at snapshot 30841 of which the two first ones are actually using effectively the plan_hash_value 726354567. According to their elapsed time (7192) these executions did span multiple snapshots (this is a serial execution plan by the way). But the remaining lines, those showing a SQL Profile, are in fact superfluous and confusing. They have been captured only because they correspond to the presence of a second shareable child cursor (plan_hash_value 4211030025) in gv$sql at the moment of the capture (snapshot 30841). One of the indication that they are superfluous is the zeros statistics in their average logical I/O, physical I/O, executions and number of rows processed.

Now that the scene has been set we need a reproducible example. For that purpose any SQL statement having multiple child cursors in gv$sql will do the trick. An easy way of engineering such a case is to use Adaptive Cursor Sharing (in 12.1.0.2.0) among other different scenarios:

The model

INSERT INTO t1
SELECT level n1 ,
  CASE
    WHEN level = 1
    THEN 1
    WHEN level > 1
    AND level <= 101
    THEN 100
    WHEN level > 101
    AND level <= 1101
    THEN 1000
    WHEN level > 10001
    AND level <= 11000
    THEN 10000
    ELSE 1000000
  END n2
FROM dual
  CONNECT BY level < 1200150;
  
CREATE INDEX t1_i1 ON t1(n2);

BEGIN
  dbms_stats.gather_table_stats 
         (user 
         ,'t1' 
         ,method_opt => 'for all columns size auto'
         ,cascade => true 
         ,estimate_percent => dbms_stats.auto_sample_size 
         );
END;
/ 

Generating multiple child cursors using ACS

var ln2 number;
exec :ln2 := 100;

SQL> select count(1) FROm t1 WHERE n2 <= :ln2;

  COUNT(1)
----------
       101

exec :ln2 := 1000000
SQL> select count(1) FROm t1 WHERE n2 <= :ln2;

  COUNT(1)
----------
   1200149

SQL_ID  9sp9wvczrvpty, child number 0
-------------------------------------
select count(1) FROm t1 WHERE n2 <= :ln2

Plan hash value: 2432955788

---------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes |
---------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |
|   1 |  SORT AGGREGATE   |       |     1 |     3 |
|*  2 |   INDEX RANGE SCAN| T1_I1 |   218 |   654 |
---------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N2"<=:LN2)

In order for the above sql_id to be bind aware and get a new optimal execution plan at each execution, it needs to be executed once again (more details can be found here):

SQL> select count(1) FROm t1 WHERE n2 <= :ln2;

  COUNT(1)
----------
   1200149

SQL_ID  9sp9wvczrvpty, child number 1
-------------------------------------
select count(1) FROm t1 WHERE n2 <= :ln2

Plan hash value: 3724264953
----------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |
----------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |
|*  2 |   TABLE ACCESS FULL| T1   |  1191K|  3489K|
----------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("N2"<=:LN2)

Notice now that a new child cursor n°1 has been generated while the existing child cursor n° 0 switched to a non-shareable status. In order to wake it up, we need to run again the same query at the corresponding bind variable value:

exec :ln2 := 100;

select count(1) FROm t1 WHERE n2 <= :ln2;

SQL_ID  9sp9wvczrvpty, child number 2
-------------------------------------
select count(1) FROm t1 WHERE n2 <= :ln2

Plan hash value: 2432955788
---------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes |
---------------------------------------------------
|   0 | SELECT STATEMENT  |       |       |       |
|   1 |  SORT AGGREGATE   |       |     1 |     3 |
|*  2 |   INDEX RANGE SCAN| T1_I1 |   218 |   654 |
---------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 – access("N2"<=:LN2)

Finally, we have achieved the initial goal of having multiple execution plans in gv$sql for the above SQL statement:

SELECT 
  sql_id ,
  child_number ,
  first_load_time ,
  last_load_time ,
  plan_hash_value ,
  executions execs ,
  rows_processed ,
  trunc(buffer_gets/executions) lio ,
  is_shareable
FROM gv$sql
WHERE sql_id = '9sp9wvczrvpty' 
--and is_shareable = 'Y'    
order by last_load_time desc; 

SQL_ID        CHILD_NUMBER FIRST_LOAD_TIME     LAST_LOAD_TIME      PLAN_HASH_VALUE  EXECS ROWS_PROCESSED        LIO I
------------- ------------ ------------------- ------------------- --------------- ------ -------------- ---------- -
9sp9wvczrvpty            0 2016-09-28/07:19:21 2016-09-28/07:19:21      2432955788      2              2       1225 N
9sp9wvczrvpty            1 2016-09-28/07:19:21 2016-09-28/07:20:44      3724264953      1              1       2222 Y
9sp9wvczrvpty            2 2016-09-28/07:19:21 2016-09-28/07:22:03      2432955788      1              1          3 Y

Notice that the sql_id (9sp9wvczrvpty) has three child cursors of which child cursor 0 is not shareable anymore. There is still nothing sent into AWR table as shown below:

SQL>  @histstats
Enter value for sql_id: 9sp9wvczrvpty
Enter value for date: 27092016

no rows selected

Forcing AWR to dump SQL statistics
In order to make Oracle dumping statistics of this SQL statement we are going to force a manual AWR snapshot capture by means of the following package call:

SQL> exec dbms_workload_repository.create_snapshot;  

The call will create a snapshot and dump the above content of gv$sql into dba_hist_sqlstat table as shown below:

SQL>  @histstats
Enter value for sql_id: 9sp9wvczrvpty
Enter value for date: 27092016

   SNAP_ID SQL_ID        PLAN_HASH_VALUE SQL_PROFILE                     EXECS  AVG_ETIME    AVG_LIO  AVG_PIO  AVG_ROWS
---------- ------------- --------------- ------------------------------ ------ ---------- ---------- -------- ---------
      4305 9sp9wvczrvpty      2432955788                                     3          0        817        0         1
      4305 9sp9wvczrvpty      3724264953                                     1          0       2222        0         1

That what was clearly expected: Oracle has captured into dba_hist_sqlstat table 4 executions at plan_hash_value 2432955788 and 1 execution at plan_hash_value 3724264953.

Now that we have multiple child cursors present in gv$sql, if AWR is to dump the sql_id it will then dump all its child cursor even those not used during the captured time. For example if I run again the same query with bind variable (ln2= 1000000) Oracle will use the plan with plan_hash_value 3724264953; but it will, nevertheless, dump statistics of the other non-used plan_hash_value into dba_hist_sqlstat as well.

exec :ln2 := 1000000

select count(1) FROm t1 WHERE n2 <= :ln2;

exec dbms_workload_repository.create_snapshot;

SQL> @histstats
Enter value for sql_id: 9sp9wvczrvpty
Enter value for date: 27092016

SNAP_ID SQL_ID        PLAN_HASH_VALUE SQL_PROFILE    EXECS  AVG_ETIME    AVG_LIO  AVG_PIO  AVG_ROWS
----- ------------- --------------- ------------- ------ ---------- ---------- -------- ---------
 4305 9sp9wvczrvpty      2432955788                    3          0        817        0         1
 4305 9sp9wvczrvpty      3724264953                    1          0       2222        0         1
 
 4306 9sp9wvczrvpty      2432955788                    0          0          0        0         0
 4306 9sp9wvczrvpty      3724264953                    1          0       2222        0         1

That’s it. Notice how, as expected, Oracle has dumped two rows for the new snapshot 4306 of which the first one with plan_hash_value (2432955788) is superfluous and has been captured in AWR tables only because it was present in gv$sql at the snapshot capture time.

Now that we know that Oracle is dumping all plan_hash_value present in gv$sql at the snapshot capture time even those not used during this snapshot, instead of having the superfluous lines appearing in the historical execution statistics I have added a minor where clause to the above extensively used script to obtain this:

SELECT
 *
FROM
 (SELECT
   sn.snap_id
   ,plan_hash_value
   ,sql_profile
   ,executions_delta execs
   ,trunc(elapsed_time_delta/1e6/decode(executions_delta, 0, 1, executions_delta)) avg_etime
   ,round(disk_reads_delta/decode(executions_delta,0,1, executions_delta),1) avg_pio
   ,round(buffer_gets_delta/decode(executions_delta,0,1, executions_delta), 1) avg_lio
   ,round(px_servers_execs_delta/decode(executions_delta,0,1, executions_delta), 1) avg_px
   ,round(rows_processed_delta/decode(executions_delta,0, 1, executions_delta), 1) avg_rows
  FROM
     dba_hist_sqlstat st,
     dba_hist_snapshot sn
  WHERE st.snap_id = sn.snap_id
  AND sql_id = '&sql_id'
  AND begin_interval_time > to_date('&from_date','ddmmyyyy')
 )
 WHERE avg_lio != 0 – added clause
 ORDER by 1 asc;
SQL> @histstats2
Enter value for sql_id: 9sp9wvczrvpty
Enter value for from_date: 27092016

SNAP_ID PLAN_HASH_VALUE SQL_PROFILE  EXECS  AVG_ETIME  AVG_PIO    AVG_LIO     AVG_PX  AVG_ROWS
------- --------------- ----------- ------ ---------- -------- ---------- ---------- ---------
   4305      2432955788                  3          0        0      817.7          0         1
   4305      3724264953                  1          0        0       2222          0         1

   4306      3724264953                  1          0        0       2222          0         1

The above added where clause excludes SQL statements with 0 logical I/O.

But wait a moment!!!

This might, at the same time, exclude vital information from the historical statistics. For example the following situation will not consume any logical I/O but can spent a lot time locked waiting for a resource to be released:

SQL-1 > select * from t1 where rownum <=  3 for update;

        N1         N2
---------- ----------
      1825    1000000
      1826    1000000
      1827    1000000

SQL-2> lock table t1 in exclusive mode;  

The second session will obviously hang. Wait a couple of seconds, go back to session , issue a rollback and watch out the new situation of session n°2

SQL-1> rollback;

SQL-2> lock table t1 in exclusive mode;

Table(s) Locked.

All what remains to do is to take a manual snapshot and check out how many logical I/O have been consumed by the SQL statement which has tried to lock the table in exclusive mode:

SQL> exec dbms_workload_repository.create_snapshot;

select sql_id
from gv$sql where sql_text like '%lock table t1%'
and sql_text not like '%v$sql%';

SQL_ID
-------------
a9nb52un8wqf4

SQL> @histstats
Enter value for sql_id: a9nb52un8wqf4
Enter value for date: 27092016

SNAP_ID SQL_ID        PLAN_HASH_VALUE SQL_PROFILE  EXECS  AVG_ETIME    AVG_LIO  AVG_PIO  AVG_ROWS
------- ------------- --------------- ----------- ------ ---------- ---------- -------- ---------
   4308 a9nb52un8wqf4               0                  1        159          0        0         0

That’s how a SQL can take a long time to complete while consuming 0 Logical I/O. This is why I have added a second extra where clause to the histstats2 script as shown below:

SELECT 
*
FROM
  (SELECT
    sn.snap_id ,
    plan_hash_value ,
    executions_delta execs ,
TRUNC(elapsed_time_delta/1e6/DECODE(executions_delta, 0, 1, executions_delta)) avg_etime ,
    ROUND(disk_reads_delta      /DECODE(executions_delta,0,1, executions_delta),1) avg_pio ,
    ROUND(buffer_gets_delta     /DECODE(executions_delta,0,1, executions_delta), 1) avg_lio ,
    ROUND(px_servers_execs_delta/DECODE(executions_delta,0,1, executions_delta), 1) avg_px ,
    ROUND(rows_processed_delta  /DECODE(executions_delta,0, 1, executions_delta), 1) avg_rows
  FROM 
    dba_hist_sqlstat st,
    dba_hist_snapshot sn
  WHERE st.snap_id        = sn.snap_id
  AND sql_id              = '&sql_id'
  AND begin_interval_time > to_date('&from_date','ddmmyyyy')
  )
WHERE avg_lio != 0 
     OR (avg_lio   =0 AND avg_etime > 0) 
 ORDER by 1 asc;  
     

This new script (histats3.sql), when applied to the two sql_id we have investigated in this post gives respectively:

SQL> @HistStats3
Enter value for sql_id: 9sp9wvczrvpty
Enter value for from_date: 27092016

   SNAP_ID PLAN_HASH_VALUE  EXECS  AVG_ETIME  AVG_PIO    AVG_LIO     AVG_PX  AVG_ROWS
---------- --------------- ------ ---------- -------- ---------- ---------- ---------
      4305      2432955788      3          0        0      817.7          0         1
      4305      3724264953      1          0        0       2222          0         1
      4306      3724264953      1          0        0       2222          0         1

SQL> @HistStats3
Enter value for sql_id: a9nb52un8wqf4
Enter value for from_date: 27092016

   SNAP_ID PLAN_HASH_VALUE  EXECS  AVG_ETIME  AVG_PIO    AVG_LIO     AVG_PX  AVG_ROWS
---------- --------------- ------ ---------- -------- ---------- ---------- ---------
      4308               0      1        159        0          0          0         0

It goes without saying that I will be very grateful to have feedback about the added where clause. I might have neglected a situation in which both Logical I/O and elapsed time are not greater than 0 and that are important to appear in the historical statistics.

Last but not least you might have already pointed out the apparition of the AVG_PX column in the above output? That will concern the next article about ASH-AWR statistics for parallel queries. Stay tuned I have a nice example from a running system to share with you and where this column reveals to be very helpful. I may also show a version of the script which includes the end_of_fetch_count column which reveals also to be a valuable information to exploit when trying to understand average elapsed time of queries coming from a web-service and where a time out has been implemented.

September 27, 2016

TEMP Table transformation and PQ_SLAVE_mismatch

Filed under: Oracle — hourim @ 7:20 pm

A SQL Profiled query opted for a different execution plan despite the Note at the bottom of its execution plan indicating that a SQL Profile has been used. The new plan makes the query failing with the classical parallel query error due to a lack of TEMP space following a massive parallel broadcast distribution

A SQL Profiled query opted for a different execution plan despite the Note at the bottom of its execution plan indicating that a SQL Profile has been used. The new plan makes the query failing with the classical parallel query error due to a lack of TEMP space following a massive parallel broadcast distribution

ORA-12801: error signaled in parallel query server P013
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

Initially this query fails when it reached 32GB of TEMP space. When waked up, the on call DBA augmented the TEMP space but, unfortunately, the next run failed as well when it reached 137GB of TEMP space.

When it was my turn to investigate this issue, instead of continuing enlarging the TEMP tablespace, or changing the parallel distribution method, I decided to figure out why Oracle is refusing to use the SQL Profile and was compiling a new plan practically at each execution. Applying Tanel Poder nonshared script to the sql_id of the real world query gives this:

SQL> @nonshared 1b7g55gx40k79
Show why existing SQL child cursors were not reused (V$SQL_SHARED_CURSOR)...

INST_ID			              : 1
SQL_ID                        : gx28sa7z20btn
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF92115E6B0
CHILD_NUMBER                  : 1
REASON                        : <reason>PQ Slave mismatch(5)</reason>

-----------------
INST_ID			              : 1
SQL_ID                        : gx28sa7z20btn
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF92115E6B0
CHILD_NUMBER                  : 2
REASON                        : <reason>PQ Slave mismatch(5)</reason>
PQ_SLAVE_MISMATCH             :  Y

-----------------
INST_ID			              : 1
SQL_ID                        : gx28sa7z20btn
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF92115E6B0
CHILD_NUMBER                  : 3
REASON                        : <reason>PQ Slave mismatch(5)</reason>
PQ_SLAVE_MISMATCH             :  Y

-----------------
INST_ID			              : 1
SQL_ID                        : gx28sa7z20btn
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF92115E6B0
CHILD_NUMBER                  : 4
REASON                        : <reason>PQ Slave mismatch(5)</reason>
PQ_SLAVE_MISMATCH             :  Y
-----------------
INST_ID			              : 1
SQL_ID                        : gx28sa7z20btn
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF92115E6B0
CHILD_NUMBER                  : 5
REASON                        : <reason>PQ Slave mismatch(5)</reason>
PQ_SLAVE_MISMATCH             :  Y
-----------------

What does this PQ_SLAVE_MISMATCH non sharing reason mean?
Oracle defines it as

(Y|N) Top-level slave decides not to share cursor

This very short definition seems indicating that a parallel slave refused to share the execution plan of its Query Coordinator and decided to hard parse its proper execution plan even though they are both (the QC and the PX slaves) running in the same instance in an 11.2.0.4 release.

I spent a couple of minutes looking at the query trying to simplify it until I found the part of it causing the execution plan mismatch. Having got a clue of what is happening in this real life query I engineered the following model with which I have reproduced the same behaviour in 11.2.0.4 and 12.1.0.1.0. Look at the following setup (where you will recognize one of the Jonathan Lewis table scripts):

select banner from gv$version where rownum=1;

BANNER
-----------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production

create table t1
as
 with generator as (
      select  --+ materialize
	     rownum id
	  from dual
	  connect by level <=1000
)
select 
    trunc((rownum -1)/2) product_id
   ,mod(rownum-1,2)      flag1
   ,mod(rownum-1,3)      flag2
   ,rownum               n1
   ,lpad(rownum,30)      v1
from
    generator v1
   ,generator v2
 where rownum <= 1e6;
 
create table t2
as 
select 

     level id
     ,date  '2012-06-07' + mod((level-1)*5,10)+ interval '5' minute start_date
     ,rpad( 'xx',10) padding
from 
   dual
connect by level <=1e6;
 
begin
 dbms_stats.gather_table_stats(user, 't1', method_opt => 'for all columns size 1');
 dbms_stats.gather_table_stats(user, 't2', method_opt => 'for all columns size 1');
 end;
 /

And here’s below the query I will be using in order to reproduce the non-sharing reason:

with got_my_dates 
 as (select /*+ materialize */
          id
		 ,padding
		 ,start_date
	 from 
	      t2
	 where 
	   start_date   > to_date('07/06/2012','dd/mm/yyyy')
	 and start_date <= to_date('10/06/2012','dd/mm/yyyy')
	 )
select
     /*+ parallel(4) */
     t1.*
	,cte.padding
	,cte.start_date
from
     t1
	,got_my_dates cte
where
    t1.product_id = cte.id
and t1.product_id <= 1e3;

A simple remark before starting the experiment. In the real life query the “with subquery” has been automatically materialized by Oracle because it is called two times in the main query. This is why the materialize hint I used above might not be absolutely necessary for the behaviour, I will be explaining hereinafter, to happen.
Let’s also confirm, before starting the experiment, that this query has not been previously parsed and as such is completely unknown:

SQL> select sql_id
    from gv$sql
    where sql_text like '%got_my_dates%'
    and sql_text not like '%v$sql%';

no rows selected

In the following I will execute the above query, get it execution plan, and show how many child cursor it has used during this very first execution:

SQL> – run query

SQL_ID  1b7g55gx40k79, child number 0
-------------------------------------

Plan hash value: 2708956082
-------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                       | Rows  | Bytes |    TQ  |IN-OUT|
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                            |       |       |        |      |
|   1 |  TEMP TABLE TRANSFORMATION |                            |       |       |        |      |
|   2 |   PX COORDINATOR           |                            |       |       |        |      |
|   3 |    PX SEND QC (RANDOM)     | :TQ10000                   |   994K|    22M|  Q1,00 | P->S |
|   4 |     LOAD AS SELECT         |                            |       |       |  Q1,00 | PCWP |
|   5 |      PX BLOCK ITERATOR     |                            |   994K|    22M|  Q1,00 | PCWC |
|*  6 |       TABLE ACCESS FULL    | T2                         |   994K|    22M|  Q1,00 | PCWP |
|   7 |   PX COORDINATOR           |                            |       |       |        |      |
|   8 |    PX SEND QC (RANDOM)     | :TQ20001                   |  1830 |   134K|  Q2,01 | P->S |
|*  9 |     HASH JOIN              |                            |  1830 |   134K|  Q2,01 | PCWP |
|  10 |      JOIN FILTER CREATE    | :BF0000                    |  1830 | 84180 |  Q2,01 | PCWP |
|  11 |       PX RECEIVE           |                            |  1830 | 84180 |  Q2,01 | PCWP |
|  12 |        PX SEND BROADCAST   | :TQ20000                   |  1830 | 84180 |  Q2,00 | P->P |
|  13 |         PX BLOCK ITERATOR  |                            |  1830 | 84180 |  Q2,00 | PCWC |
|* 14 |          TABLE ACCESS FULL | T1                         |  1830 | 84180 |  Q2,00 | PCWP |
|* 15 |      VIEW                  |                            |   994K|    27M|  Q2,01 | PCWP |
|  16 |       JOIN FILTER USE      | :BF0000                    |   994K|    22M|  Q2,01 | PCWP |
|  17 |        PX BLOCK ITERATOR   |                            |   994K|    22M|  Q2,01 | PCWC |
|* 18 |         TABLE ACCESS FULL  | SYS_TEMP_0FD9D6633_140F243 |   994K|    22M|  Q2,01 | PCWP |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)
       filter(("START_DATE">TO_DATE(' 2012-06-07 00:00:00', 'syyyy-mm-dd hh24:mi:ss') 
              AND "START_DATE"<=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
   9 - access("T1"."PRODUCT_ID"="CTE"."ID")
  14 - access(:Z>=:Z AND :Z<=:Z)
       filter("T1"."PRODUCT_ID"<=1000)
  15 - filter("CTE"."ID"<=1000)
  18 - access(:Z>=:Z AND :Z<=:Z)
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"C0"))
 
Note
-----
- Degree of Parallelism is 4 because of hint


SQL> @gv$sql
Enter value for sql_id: 1b7g55gx40k79

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER LAST_LOAD_TIME      EXECUTIONS
------------- --------------- ------------ ------------------- ----------
1b7g55gx40k79      2708956082            0 2016-09-26/07:30:23          1
1b7g55gx40k79      2708956082            1 2016-09-26/07:30:24          0

Notice how a completely new query produces, during its very first execution, two child cursors. Interestingly, the number of executions seems to indicate that Oracle used the child cursor 0 while the number of executions of the child cursor n°1 has not been incremented.

The reason for this double child cursors is:

SQL> @nonshared 1b7g55gx40k79
Show why existing SQL child cursors were not reused (V$SQL_SHARED_CURSOR)...

SQL_ID                        : 1b7g55gx40k79
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF92115E6B0
CHILD_NUMBER                  : 0
REASON                        : <reason>PQ Slave mismatch(5)</reason>
CON_ID                        : 1

-----------------

SQL_ID                        : 1b7g55gx40k79
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF9214A40E8
CHILD_NUMBER                  : 1
PQ_SLAVE_MISMATCH             : Y
REASON                        : <reason>PQ Slave mismatch(5)</reason>
CON_ID                        : 1
-----------------

So here we are: exactly at the same situation as the real world query.

Another remark which is worth to be mentioned here is that the execution plan of child cursor n°1 is exactly identical to the child cursor n° 0 shown above except this bizarre Note at the bottom:

SQL_ID  1b7g55gx40k79, child number 1
-------------------------------------
with got_my_dates  as (select /*+ materialize */           id    
,padding    ,start_date   from        t2   where     start_date   > 
to_date('07/06/2012','dd/mm/yyyy')   and start_date <= 
to_date('10/06/2012','dd/mm/yyyy')   ) select      /*+ parallel(4) */   
   t1.*  ,cte.padding  ,cte.start_date from      t1  ,got_my_dates cte 
where     t1.product_id = cte.id and t1.product_id <= 1e3
 
Note
-----
- automatic DOP: Computed Degree of Parallelism is 4

I don’t clearly get where this automatic DOP is coming from since the auto DOP parameter is not set in my personal laptop nor in the real world application:

SQL> show parameter parallel_degree_policy

PARAMETER_NAME                  TYPE        VALUE
------------------------------- ----------- -------
parallel_degree_policy          string      MANUAL

Last but not least, while the query was always requesting 2*DOP (8) parallel serves, Oracle managed, systematically, to give it 12 (and sometimes 16) parallel servers:

SQL> select
         sql_id
         ,process_name px_slave
     from gv$sql_monitor
     where sql_id = '1b7g55gx40k79'
     and sql_exec_id =16777216
     and  trunc(sql_exec_start) =to_date('27092016','ddmmyyyy')
     and sql_text is null
     order by 2 ;

SQL_ID        PX_SL
------------- -----
1b7g55gx40k79 p000
1b7g55gx40k79 p000
1b7g55gx40k79 p001
1b7g55gx40k79 p001
1b7g55gx40k79 p002
1b7g55gx40k79 p002
1b7g55gx40k79 p003
1b7g55gx40k79 p003
1b7g55gx40k79 p004
1b7g55gx40k79 p005
1b7g55gx40k79 p006
1b7g55gx40k79 p007

12 rows selected.


Global Information
------------------------------
 Status              :  DONE (ALL ROWS)     
 Instance ID         :  1                   
 Session             :  C##MHOURI (7:4495)  
 SQL ID              :  1b7g55gx40k79       
 SQL Execution ID    :  16777216            
 Execution Started   :  09/27/2016 07:29:34 
 First Refresh Time  :  09/27/2016 07:29:34 
 Last Refresh Time   :  09/27/2016 07:29:45 
 Duration            :  11s                 
 Module/Action       :  SQL*Plus/-          
 Service             :  orcl                
 Program             :  sqlplus.exe         
 Fetch Calls         :  3                

Parallel Execution Details (DOP=4 , Servers Allocated=12)

This information of Servers Allocated that are greater than the requested servers is not an innocent information and should always kept your attention.

That’s said, as you might have already guessed via the title of this post, the simplification of the real word query shows that the parallel slave is refusing to share the QC execution plan because of the materialisation of the Common Table Expression. This is why when I pre-empted the materialisation of the CTE via the /*+ inline */ hint the parallel slave shared the execution plan of its QC as shown below:

with got_my_dates
 as (select /*+ inline */
          id
            ,padding
            ,start_date
    from
         t2
    where
      start_date   > to_date('07/06/2012','dd/mm/yyyy')
    and start_date <= to_date('10/06/2012','dd/mm/yyyy')
    )
select
     /*+ parallel(4) */
     t1.*
   ,cte.padding
   ,cte.start_date
from
     t1
   ,got_my_dates cte
where
    t1.product_id = cte.id
and t1.product_id <= 1e3;

1000 rows selected.

SQL> start xpsimp

SQL_ID  4h1qa708b9p3j, child number 0
-------------------------------------
Plan hash value: 2637578939
--------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes |    TQ  |IN-OUT|
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |       |       |        |      |
|   1 |  PX COORDINATOR             |          |       |       |        |      |
|   2 |   PX SEND QC (RANDOM)       | :TQ10002 |   681 | 47670 |  Q1,02 | P->S |
|*  3 |    HASH JOIN BUFFERED       |          |   681 | 47670 |  Q1,02 | PCWP |
|   4 |     JOIN FILTER CREATE      | :BF0000  |   681 | 16344 |  Q1,02 | PCWP |
|   5 |      PX RECEIVE             |          |   681 | 16344 |  Q1,02 | PCWP |
|   6 |       PX SEND HYBRID HASH   | :TQ10000 |   681 | 16344 |  Q1,00 | P->P |
|   7 |        STATISTICS COLLECTOR |          |       |       |  Q1,00 | PCWC |
|   8 |         PX BLOCK ITERATOR   |          |   681 | 16344 |  Q1,00 | PCWC |
|*  9 |          TABLE ACCESS FULL  | T2       |   681 | 16344 |  Q1,00 | PCWP |
|  10 |     PX RECEIVE              |          |  1830 | 84180 |  Q1,02 | PCWP |
|  11 |      PX SEND HYBRID HASH    | :TQ10001 |  1830 | 84180 |  Q1,01 | P->P |
|  12 |       JOIN FILTER USE       | :BF0000  |  1830 | 84180 |  Q1,01 | PCWP |
|  13 |        PX BLOCK ITERATOR    |          |  1830 | 84180 |  Q1,01 | PCWC |
|* 14 |         TABLE ACCESS FULL   | T1       |  1830 | 84180 |  Q1,01 | PCWP |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."PRODUCT_ID"="ID")
   9 - access(:Z>=:Z AND :Z<=:Z)
       filter(("ID"<=1000 AND "START_DATE">TO_DATE(' 2012-06-07 00:00:00', 'syyyy-mm-dd
               hh24:mi:ss') 
       AND  "START_DATE"<=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
  14 - access(:Z>=:Z AND :Z<=:Z)
       filter(("T1"."PRODUCT_ID"<=1000 AND
       SYS_OP_BLOOM_FILTER(:BF0000,"T1"."PRODUCT_ID")))
 
Note
-----
   - Degree of Parallelism is 4 because of hint
 

Notice below how, now that the CTE is not materialized, the query is using a single child cursor :

SQL> @gv$sql2
Enter value for sql_id: 4h1qa708b9p3j

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER LAST_LOAD_TIME      EXECUTIONS END_OF_FETCH_COUNT
------------- --------------- ------------ ------------------- ---------- ------------------
4h1qa708b9p3j      2637578939            0 2016-09-27/18:00:54          1                  1

SQL> @nonshared 4h1qa708b9p3j
Show why existing SQL child cursors were not reused (V$SQL_SHARED_CURSOR)...

SQL_ID                        : 4h1qa708b9p3j
ADDRESS                       : 00007FF921744A58
CHILD_ADDRESS                 : 00007FF921744698
CHILD_NUMBER                  : 0
REASON                        :
CON_ID                        : 1
-----------------

And spot as well that the non materialization of the CTE is so that the number of requested parallel server (2*DOP) equals the number of allocated servers as shown below :

SQL> select
        sql_id
       ,process_name px_slave
     from gv$sql_monitor
     where sql_id = '4h1qa708b9p3j'
     and sql_exec_id =16777216
     and trunc(sql_exec_start) = trunc(sysdate)
     and sql_text is null
     order by 2 ;

SQL_ID        PX_SL
------------- -----
4h1qa708b9p3j p000
4h1qa708b9p3j p001
4h1qa708b9p3j p002
4h1qa708b9p3j p003
4h1qa708b9p3j p004
4h1qa708b9p3j p005
4h1qa708b9p3j p006
4h1qa708b9p3j p007

8 rows selected.

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)     
 Instance ID         :  1                   
 Session             :  C##MHOURI (7:4495)  
 SQL ID              :  4h1qa708b9p3j       
 SQL Execution ID    :  16777216            
 Execution Started   :  09/27/2016 18:00:54 
 First Refresh Time  :  09/27/2016 18:00:54 
 Last Refresh Time   :  09/27/2016 18:00:57 
 Duration            :  3s                  
 Module/Action       :  SQL*Plus/-          
 Service             :  orcl                
 Program             :  sqlplus.exe         
 Fetch Calls         :  3                 

Parallel Execution Details (DOP=4 , Servers Allocated=8)

And for those who can’t deal without execution plan here’s below the new execution plan

Plan hash value: 2637578939
--------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes |    TQ  |IN-OUT|
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |       |       |        |      |
|   1 |  PX COORDINATOR             |          |       |       |        |      |
|   2 |   PX SEND QC (RANDOM)       | :TQ10002 |   738 | 51660 |  Q1,02 | P->S |
|*  3 |    HASH JOIN BUFFERED       |          |   738 | 51660 |  Q1,02 | PCWP |
|   4 |     JOIN FILTER CREATE      | :BF0000  |   737 | 17688 |  Q1,02 | PCWP |
|   5 |      PX RECEIVE             |          |   737 | 17688 |  Q1,02 | PCWP |
|   6 |       PX SEND HYBRID HASH   | :TQ10000 |   737 | 17688 |  Q1,00 | P->P |
|   7 |        STATISTICS COLLECTOR |          |       |       |  Q1,00 | PCWC |
|   8 |         PX BLOCK ITERATOR   |          |   737 | 17688 |  Q1,00 | PCWC |
|*  9 |          TABLE ACCESS FULL  | T2       |   737 | 17688 |  Q1,00 | PCWP |
|  10 |     PX RECEIVE              |          |  1986 | 91356 |  Q1,02 | PCWP |
|  11 |      PX SEND HYBRID HASH    | :TQ10001 |  1986 | 91356 |  Q1,01 | P->P |
|  12 |       JOIN FILTER USE       | :BF0000  |  1986 | 91356 |  Q1,01 | PCWP |
|  13 |        PX BLOCK ITERATOR    |          |  1986 | 91356 |  Q1,01 | PCWC |
|* 14 |         TABLE ACCESS FULL   | T1       |  1986 | 91356 |  Q1,01 | PCWP |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."PRODUCT_ID"="ID")
   9 - access(:Z>=:Z AND :Z<=:Z)
       filter(("ID"<=1000 AND "START_DATE">TO_DATE(' 2012-06-07 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND 
              "START_DATE"<=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
  14 - access(:Z>=:Z AND :Z<=:Z)
       filter(("T1"."PRODUCT_ID"<=1000 AND SYS_OP_BLOOM_FILTER(:BF0000,"T1"."PRODUCT_ID")))
 
Note
-----
- Degree of Parallelism is 4 because of hint

While the initial execution plan uses two DFO trees, the new one managed to use only a single DFO trees with 3 DFOs. Having multiple parallelisers (or DFOs) might sometimes create issues like in this current case of TEMP TABLE transformation and PQ_SLAVE_MISMATCH.

BOTTOM LINE

Even though this issue doesn’t reproduce in 12.0.1.2, for previous Oracle releases, watch out your parallel queries using a temp table transformation. You might be confronted to an execution plan instability due to the PQ_SLAVE_MISMATCH non-sharing reason where a parallel slave refuses to share the execution plan of its query coordinator. Bear in mind as well that in such situation even a SQL Profile will not succeed to definitely guarantee the same execution plan at each execution.

Create a free website or blog at WordPress.com.

Tony's Oracle Tips

Tony Hasler's light hearted approach to learning about Oracle

Richard Foote's Oracle Blog

Focusing Specifically On Oracle Indexes, Database Administration and Some Great Music

Hatem Mahmoud's blog

Just another blog : Databases, Linux and other stuffs

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Diagnostician

Performance troubleshooting as exact science

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)