Mohamed Houri’s Oracle Notes

May 24, 2020

Why my hint has not been used?

Filed under: Hint,Oracle — hourim @ 12:20 pm

While waiting for Tanel Poder hacking session about sql hint invalidity which I encourage everyone to subscribe and to follow, here’re few particular cases where hints are not obeyed by the Oracle Cost Based Optimizer. It happens when the hint asks the CBO to do something which is either impossible or not yet implemented in its algorithm. When two hints are contradictory one of them will be ignored by the CBO to the detriment of the other one. We will see that the victim hint seems not to be cost based dependent but rather rule based. Let’s see this via a reproducible example

JPPD and HASH JOIN

To allow data filtering as early as possible, Oracle can use a JPPD transformation in order to apply a predicate from the outer query block (or main query block) into the contained non-mergeable query block. As such, a precise index range scan can be used and data are filtered much earlier.

Here’s a simple model with which I will illustrate this transformation:

SQL> select banner from v$version;

BANNER
----------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 – Production

create table t1 as 
  select
      rownum n1
	 ,mod(rownum,5) n2
	 ,trunc( (rownum - 1 / 3) ) n3
  from
     dual
connect by level <= 1e2;

create table t2 as
  select
     rownum n1
    ,mod(rownum,3) n2
	,trunc( (rownum - 1 / 5) ) n3
from
    dual
 connect by level <= 1e3;

create table t3 as 
   select 
     rownum n1
	,lpad('x',4000,'x') vc
from dual
connect by level <= 100;

alter table t2 add constraint t2_pk primary key (n1);
alter table t3 add constraint t3_pk primary key (n1);

begin
  dbms_stats.gather_table_stats(user, 't1' );
  dbms_stats.gather_table_stats(user, 't2' );
  dbms_stats.gather_table_stats(user, 't3' );
end;
/

Let’s now execute the following query and get its corresponding execution plan

select 
   t1.n1, t1.n3
from 
   t1
  left join
   (select t2.n1 from t2, t3
    where t2.n1 = t3.n1
	) vw_1
on t1.n1 = vw_1.n1; 

Plan hash value: 3070139659
--------------------------------------------------------------
| Id  | Operation               | Name  | Rows  | Cost (%CPU)|
--------------------------------------------------------------
|   0 | SELECT STATEMENT        |       |       |     3 (100)|
|   1 |  NESTED LOOPS OUTER     |       |   100 |     3   (0)|
|   2 |   TABLE ACCESS FULL     | T1    |   100 |     3   (0)|
|   3 |   VIEW PUSHED PREDICATE |       |     1 |     0   (0)|  --> JPPD used
|   4 |    NESTED LOOPS         |       |     1 |     0   (0)|
|*  5 |     INDEX UNIQUE SCAN   | T3_PK |     1 |     0   (0)|
|*  6 |     INDEX UNIQUE SCAN   | T2_PK |     1 |     0   (0)|
--------------------------------------------------------------
Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$639F1A6F")
      PUSH_PRED(@"SEL$776AA54E" "VW_1"@"SEL$1" 1)
      OUTLINE_LEAF(@"SEL$776AA54E")
      MERGE(@"SEL$8812AA4E" &gt;"SEL$E8571221")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SEL$E8571221")
      ANSI_REARCH(@"SEL$3")
      OUTLINE(@"SEL$8812AA4E")
      ANSI_REARCH(@"SEL$1")
      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$1")
      FULL(@"SEL$776AA54E" "T1"@"SEL$1")
      NO_ACCESS(@"SEL$776AA54E" "VW_1"@"SEL$1")
      LEADING(@"SEL$776AA54E" "T1"@"SEL$1" "VW_1"@"SEL$1")
      USE_NL(@"SEL$776AA54E" "VW_1"@"SEL$1")
      INDEX(@"SEL$639F1A6F" "T3"@"SEL$2" ("T3"."N1"))
      INDEX(@"SEL$639F1A6F" "T2"@"SEL$2" ("T2"."N1"))
      LEADING(@"SEL$639F1A6F" "T3"@"SEL$2" "T2"@"SEL$2")
      USE_NL(@"SEL$639F1A6F" "T2"@"SEL$2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T3"."N1"="T1"."N1")
   6 - access("T2"."N1"="T1"."N1")

Notice at operation Id n°5 that the predicate

on t1.n1 = vw_1.n1; 

Has been pushed inside the view vw_1 to become

("T3"."N1"="T1"."N1")

This is simply what a JPPD transformation is.

Let’s now take the NESTED LOOPS hint from the above execution plan outline

USE_NL(@"SEL$776AA54E" "VW_1"@"SEL$1")

change it into a HASH JOIN

USE_HASH(@"SEL$776AA54E" "VW_1"@"SEL$1")

and use it in the above query as shown below:

select /*+ 
         USE_HASH(@"SEL$776AA54E" "VW_1"@"SEL$1")
	  */
   t1.n1, t1.n3
from 
   t1
  left join
   (select t2.n1 from t2, t3
    where t2.n1 = t3.n1
	) vw_1
on t1.n1 = vw_1.n1; 

----------------------------------------------------------
| Id  | Operation            | Name  | Rows  |Cost (%CPU)|
----------------------------------------------------------
|   0 | SELECT STATEMENT     |       |       |    4 (100)|
|*  1 |  HASH JOIN OUTER     |       |   100 |    4   (0)| --> HASH JOIN used
|   2 |   TABLE ACCESS FULL  | T1    |   100 |    3   (0)|
|   3 |   VIEW               |       |   100 |    1   (0)|
|   4 |    NESTED LOOPS      |       |   100 |    1   (0)|
|   5 |     INDEX FULL SCAN  | T3_PK |   100 |    1   (0)|
|*  6 |     INDEX UNIQUE SCAN| T2_PK |     1 |    0   (0)|
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."N1"="VW_1"."N1")
   6 - access("T2"."N1"="T3"."N1")

Notice that, while the HASH JOIN hint has been obeyed, the JPPD transformation has, however, not been used.

Let’s then force the JPPD transformation via the PUSH_PRED hint and see what happens

select /*+ 
         USE_HASH(@"SEL$776AA54E" "VW_1"@"SEL$1")
	 PUSH_PRED(@"SEL$776AA54E" "VW_1"@"SEL$1" 1)
	*/
   t1.n1, t1.n3
from 
   t1
  left join
   (select t2.n1 from t2, t3
    where t2.n1 = t3.n1
	) vw_1
on t1.n1 = vw_1.n1; 

-------------------------------------------------------------
| Id  | Operation               | Name  | Rows  |Cost (%CPU)|
-------------------------------------------------------------
|   0 | SELECT STATEMENT        |       |       |    3 (100)|
|   1 |  NESTED LOOPS OUTER     |       |   100 |    3   (0)| --> NESTED LOOPS vs HASH JOIN
|   2 |   TABLE ACCESS FULL     | T1    |   100 |    3   (0)|
|   3 |   VIEW PUSHED PREDICATE |       |     1 |    0   (0)| --> JPPD used
|   4 |    NESTED LOOPS         |       |     1 |    0   (0)|
|*  5 |     INDEX UNIQUE SCAN   | T3_PK |     1 |    0   (0)|
|*  6 |     INDEX UNIQUE SCAN   | T2_PK |     1 |    0   (0)|
-------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T3"."N1"="T1"."N1")
   6 - access("T2"."N1"="T1"."N1")

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
   1 -  SEL$776AA54E / VW_1@SEL$1
         U -  USE_HASH(@"SEL$776AA54E" "VW_1"@"SEL$1")   

Bizarrely, the HASH JOIN hint has, this time, not been used. This non-utilization of the HASH JOIN hint is also confirmed by the 19c Hint Report:

Total hints for statement: 1 (U - Unused (1)

Why?

Oracle decided to ignore the HASH JOIN hint in profit of the PUSH_PRED hint simply because JPPD transformation is not possible with HASH JOIN. When Oracle decides to follow the JPPD directive it knows that this will be possible only under a NESTED LOOPS join operation. Consequently it has to ignore the HASH JOIN hint.

The 10053 trace file is not very verbose in this case. Here’s below what we can link to the non-obeyed HASH JOIN hint from this trace file:

*********************************
Number of join permutations tried: 1
*********************************
Enumerating distribution method (advanced)
--- Distribution method for 
join between T1[T1](serial) and VW_1[from$_subquery$_002](serial); jm = 1; right side access path = TableScan
---- serial Hash-Join ---> NONE     

In the Adaptive plan section of the same trace file we can read that the HASH JOIN is not legal

AP: adaptive joins bypassed for table VW_1 @ SEL$776AA54E due to Hash join is not legal

Probably through this illegality Oracle is indicating that it is impossible to obey the HASH JOIN in presence of a PUSH PREDICATE hint.

It looks like, whatever the cost of the JPPD is, it will be always prefered to the HASH JOIN as the JPPD has certainly the advantage of filtering data much earlier in the query execution path.

JPPD and ROWNUM

I think that I have already explained why one should always consider using the rownum pseudo-column with careful attention. Here’re two rownum usage side effects that I am aware of:

  • Where rownum =1 might automatically switch to the buggy FIRST_ROWS mode
  •  Using a rownum stops view merging and JPPD transformation

Let’s examine the second case via the above query

select /*+ 
	  PUSH_PRED(@"SEL$776AA54E" "VW_1"@"SEL$1" 1)
	*/
   t1.n1, t1.n3
from 
   t1
  left join
   (select t2.n1 from t2, t3
    where t2.n1 = t3.n1
	) vw_1
on t1.n1 = vw_1.n1; 

--------------------------------------------------------------
| Id  | Operation               | Name  | Rows  | Cost (%CPU)|
--------------------------------------------------------------
|   0 | SELECT STATEMENT        |       |       |    11 (100)|
|   1 |  NESTED LOOPS OUTER     |       | 10000 |    11  (19)|
|   2 |   TABLE ACCESS FULL     | T1    | 10000 |     9   (0)|
|   3 |   VIEW PUSHED PREDICATE |       |     1 |     0   (0)| --> JPPD used
|   4 |    NESTED LOOPS         |       |     1 |     0   (0)|
|*  5 |     INDEX UNIQUE SCAN   | T3_PK |     1 |     0   (0)|
|*  6 |     INDEX UNIQUE SCAN   | T2_PK |     1 |     0   (0)|
--------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T3"."N1"="T1"."N1")
   6 - access("T2"."N1"="T1"."N1")
select /*+ 
	  PUSH_PRED(@"SEL$776AA54E" "VW_1"@"SEL$1" 1)
	*/
   t1.n1, t1.n3
from 
   t1
  left join
   (select t2.n1 from t2, t3
    where t2.n1 = t3.n1
    and rownum >1    --------------> spot this here
	) vw_1
on t1.n1 = vw_1.n1; 

-------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Cost (%CPU)|
-------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |       |    10 (100)|
|*  1 |  HASH JOIN RIGHT OUTER |       | 10000 |    10   (0)|
|   2 |   VIEW                 |       |   100 |     1   (0)|
|   3 |    COUNT               |       |       |            |
|*  4 |     FILTER             |       |       |            |
|   5 |      NESTED LOOPS      |       |   100 |     1   (0)|
|   6 |       INDEX FULL SCAN  | T3_PK |   100 |     1   (0)|
|*  7 |       INDEX UNIQUE SCAN| T2_PK |     1 |     0   (0)|
|   8 |   TABLE ACCESS FULL    | T1    | 10000 |     9   (0)|
-------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."N1"="VW_1"."N1")
   4 - filter(ROWNUM>1)
   7 - access("T2"."N1"="T3"."N1")

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
   1 -  SEL$776AA54E / VW_1@SEL$1
         U -  PUSH_PRED(@"SEL$776AA54E" "VW_1"@"SEL$1" 1)

Notice how the use of the ROWNUM pseudo-column in the contained query block impeached the JPPD hint to be obeyed. Hopefully in this case the corresponding 10053 trace file shows clearly why the JPPD transformation (hint) has been ignored

---------------------
QUERY BLOCK SIGNATURE
---------------------
  signature (): qb_name=SEL$E8571221 nbfros=1 flg=0
    fro(0): flg=1 objn=0 hint_alias="from$_subquery$_005"@"SEL$3"

JPPD:     JPPD bypassed: View has unsupported constructs.
Check Basic Validity for Non-Union View for query block SEL$2 (#0)
JPPD:     JPPD bypassed: View contains a rownum reference.

JPPD:     JPPD bypassed: View contains a rownum reference. ------------> here
FPD: Considering simple filter push in query block SEL$2 (#0)
"T2"."N1"="T3"."N1" AND ROWNUM >1

Summary

There are many other situations like the ones exposed above out of which the SUBQUERY_PRUNING hint which forces a PARTITION RANGE SUBQUERY but only under a HASH JOIN operation or under a NESTED LOOPS but only when the partitioned table is on the right hand side of the join (inner row source). The combination of SUBQUERY_PRUNING and USE_NL hint would be impossible to be obeyed at the same time since this will force a recursive SQL ( SELECT distinct TBL$OR$IDX$PART$NUM) that would have been executed as many times as there are rows in the outer row source.

I may continue blogging about similar non-obeyed hints in a separate blog article to keep the current one about 5 minutes of read.

May 16, 2020

SQL*Net break/reset to client

Filed under: Oracle — hourim @ 6:10 pm

This is a draft I wrote in 2015. I don’t know why I have not published it by that time? probably I was looking for something to add in this context. Anyway, I decided to give it a rebirth right now  after almost 5 years of dormant time.

A SQL*Net break/reset to client wait event was popping out at a dramatic rate so that the client application ASH data was abnormally flushed out from memory. It was then vital to find what is causing this wait event to show up at this high rate. This article aims to explain the steps I have followed trying to figure out the sql_id at the origin of this wait event.

The first article, dealing with SQL*Net break/reset to client error, that pops up when you Google for it, is this one by Tanel Poder.  It explains clearly what this event represents and gives a reproducible example.  In the real life case I’ve faced  ASH was showing this:

SQL> select event, count(1)
    from gv$active_session_history
    where sample_time between to_date('10072015 11:00:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('10072015 12:00:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
SQL*Net break/reset to client                                        164205
                                                                      10438
db file sequential read                                                2960
SQL*Net message from dblink                                            2499
direct path read                                                       1322
log file sync                                                           427
log file parallel write                                                 411
SQL*Net more data from client                                           183
control file sequential read                                             98
Backup: MML create a backup piece                                        82
…/…

Let’s try to find, naively, the sql_id behind this dramatic 164,205 SQL*Net break/reset to client recorded waits

SQL>  select sql_id, count(1)
      from gv$active_session_history
      where sample_time between to_date('10072015 11:00:00', 'ddmmyyyy hh24:mi:ss')
                        and     to_date('10072015 12:00:00', 'ddmmyyyy hh24:mi:ss')
      and event = 'SQL*Net break/reset to client'
      group by sql_id
      order by 2 desc;

SQL_ID          COUNT(1)
------------- ----------
                  164205 

Damn!! It is a NULL sql_id

What about the sessions experimenting this wait event

SQL> select session_id, count(1)
     from gv$active_session_history
     where sample_time between to_date('10072015 11:00:00', 'ddmmyyyy hh24:mi:ss')
                       and     to_date('10072015 12:00:00', 'ddmmyyyy hh24:mi:ss')
     and event = 'SQL*Net break/reset to client'
     group by session_id
     order by 2 desc;

SESSION_ID   COUNT(1)
---------- ----------
       151       3575
      2846       3575
       465       3575
      1791       3575
      2465       3575
       853       3575
      1216       3575
      1270       3575
      1397       3575
      2912       3575
        90       3575
       412       3575
      1155       3575
      1838       3575
      1957       3575
      2777       3575
      1038       3575 
      1290       3575
      1455       3575
      2648       3575
      2860       3575
      2988       3575
       344       3575
       533       3575
       337       3575
      2424       3575
      1028       3575
      1779       3575
      2047       3575
      2412       3575
       136       3575
       591       3575
       773       3575
       900       3575
       913       3575
      1335       3575
      1657       3575
      2101       3575
      2363       3575
       202       3575
      2226       3575
        16       3575
        79       3575
      1916       3575
      2362       3575
      1705       1429
       908       1181 

More than 45 sessions experimenting this wait event. Let’s then try to get the sql_id of one of those sessions:

SQL> select sql_id, count(1)
    from gv$active_session_history
    where sample_time between to_date('10072015 11:00:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('10072015 12:00:00', 'ddmmyyyy hh24:mi:ss')
    and session_id = 151
    group by sql_id
    order by 2 desc;

SQL_ID          COUNT(1)
------------- ----------
                    3515 

Damn!! It is a NULL sql_id again.

Let’s now get the list of wait events the above session (151) was experimenting


SQL> select event, count(1)
     from gv$active_session_history
     where sample_time between to_date('10072015 11:00:00', 'ddmmyyyy hh24:mi:ss')
                       and     to_date('10072015 12:00:00', 'ddmmyyyy hh24:mi:ss')
     and session_id = 151
     group by event
     order by 2 desc;

EVENT                            COUNT(1)
-------------------------------- -------
SQL*Net break/reset to client    3453

I am going around a vicious circle! Isn’t it?

What if I try, now, to join those particular sessions with v$process view to see, at least, what their process are accomplishing

SQL> with got_my_sid
    as(
       select
            session_id
          , count(1)
       from
           v$active_session_history
       where sample_time between to_date('10072015 11:00:00', 'ddmmyyyy hh24:mi:ss')
                         and     to_date('10072015 12:00:00', 'ddmmyyyy hh24:mi:ss')
      and event = 'SQL*Net break/reset to client'
      group by session_id
      )
   select
          p.username
         ,p.tracefile
         ,p.pga_used_mem
         ,p.pga_alloc_mem
    from
         v$process p
        ,v$session s
   where
         p.addr=s.paddr
    and exists (select null
               from got_my_sid a
               where s.sid = a.session_id);

USERNAME     TRACEFILE                                                                        PGA_USED_MEM PGA_ALLOC_MEM
------------ -------------------------------------------------------------------------------- ------------ -------------
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_116986.trc               44210323      46756230
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_58127.trc                22468107      26308998
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_58151.trc                51969971      56979846
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_69733.trc                42309691      45969798
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_69747.trc               103456803     115241350
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_69759.trc                 5183995       8810886
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_58169.trc               122509563     126316934
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_94674.trc                 4810835      10711430
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_77524.trc                 4867723      10711430
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_77727.trc                48775851      57307526
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_77837.trc                 5697963       6517126
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_77962.trc                 4926947       7500166
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76614.trc               142568163     170160518
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76622.trc                10887843      12743046
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76634.trc                13751019      16675206
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76662.trc                17715523      18772358
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76670.trc                 4852883       7631238
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76676.trc                 5040315       6517126
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76734.trc                20039043      22114694
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76741.trc               110836667     127889798
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76749.trc                29546107      33255814
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76792.trc                 5336891       6975878
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11735.trc               141510371     167014790
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11741.trc                50941459      54882694
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11759.trc                13313739      15102342
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11772.trc                45210595      56390022
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11784.trc                16259099      18641286
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_58289.trc                86902963     103838086
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76815.trc                10668723      13005190
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11808.trc                13386251      15036806
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11844.trc                64225827      68252038
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11856.trc                50413091      54620550
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_117006.trc               38196715      46952838
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_117012.trc               27739851      30241158
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11862.trc               114249963     120418694
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11868.trc                90697083      95973766
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_117042.trc               78140859      79982982
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11892.trc               188237195     226324870
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_117078.trc              141404363     171209094
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_117092.trc              142334875     166883718
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_117098.trc               44335187      46887302
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_117112.trc               98170435     117469574
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11898.trc                 5171715       7696774
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11904.trc                 4970579       5534086
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_95446.trc               142625387     164524422
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_58609.trc                 5278147       7041414
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_58689.trc                 5316451       6910342 

Finally I have something to get my teeth into.

It looks like sessions that were suffering from the SQL*Net break/reset to client wait event have generated several trace files. And, hopefully, one of those trace files shows the following error:

ORA-04030: out of process memory when trying to allocate 824504 bytes (pga heap,kco buffer)
ORA-04030: out of process memory when trying to allocate 3032 bytes (kkoutlCreatePh,kkotrb : kkoabr) 

The same trace file shows, as well, the SQL Statement which has originated this very high number of  SQL*Net break/reset to client wait event

dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=d6jgtxn25741y) -----
select * from ( select alias.cols as skey87_0_, alias.skey as skey81_1_ 

This is how I drilled down to the root cause of this wait event.

 

 

May 10, 2020

Why my execution plan has not been shared: Part 6

Filed under: Oracle — hourim @ 9:57 am

In this sixth installment of the series of articles about the non-sharing reasons that prevent Oracle from using an existing execution plan, I am going to dig into a new supplementary reason which is Session Specific Cursor Session Mismatch. For the sake of completeness here’s below the list of already investigated non-sharing reasons:

I didn’t find any definition from the official Oracle documentation for the Session Specific Cursor non-sharing reason. So let’s start immediately by building a test case with which we will provoke the apparition of this non-sharing reason

SQL> select banner from v$version;

BANNER
----------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

SQL> select distinct sid from v$mystat;

       SID
----------
        16
		
SQL> create global temporary table t1 (owner varchar2(128), object_type varchar2(30)) on commit preserve rows;

Table created.

SQL> insert into t1 select owner, object_type from all_objects where owner ! = 'SYS';

19296 rows created.

SQL> commit;

SQL> col prefs format a20
SQL> select dbms_stats.get_prefs('GLOBAL_TEMP_TABLE_STATS') prefs from dual;

PREFS
--------------------
SESSION

SQL> exec dbms_stats.gather_table_stats(user, 't1');

Using session id n°16, I have created a simple global temporary table, inserted into it few rows and ended up by gathering statistics on this table using the default private SESSION preference. Simply put, statistics gathered under this preference are not propagated to all other sessions. This is why it is also known as PRIVATE statistic. The current session global temporary table (GTT) statistics are not shared with other sessions using the same GTT.

This is clearly shown via the following select


SQL> SELECT
        table_name,
        blocks,
        num_rows,
        scope
    FROM
        user_tab_statistics
    WHERE
        table_name = 'T1';

TABLE_NAME       BLOCKS   NUM_ROWS SCOPE
------------ ---------- ---------- -------
T1                                 SHARED
T1                   55      19296 SESSION

As you can see, the only available statistics are those that are local to the current session.

Let’s now execute a simple query and get its execution plan

SQL>select owner, count(1)
    from t1
    where object_type = 'TABLE'
    group by owner
    order by 2 desc;

OWNER                            COUNT(1)
------------------------------ ----------
MDSYS                                 151
SYSTEM                                131
ORDDATA                                90
XDB                                    56
CTXSYS                                 53
DVSYS                                  44
GSMADMIN_INTERNAL                      42
WMSYS                                  40
LBACSYS                                22
DBSNMP                                 20
OJVMSYS                                 6
C##MHOURI                               5
APPQOSSYS                               5
ORDSYS                                  4
DBSFWUSER                               3
OUTLN                                   3
OLAPSYS                                 2
AUDSYS                                  1

18 rows selected.

SQL> select * from table(dbms_xplan.display_cursor);

SQL_ID  c1j862cvqgh99, child number 0
-------------------------------------
select owner, count(1) from t1 where object_type = 'TABLE' group by
owner order by 2 desc

Plan hash value: 2808104874
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |    19 (100)|          |
|   1 |  SORT ORDER BY      |      |    24 |   360 |    19  (11)| 00:00:01 |
|   2 |   HASH GROUP BY     |      |    24 |   360 |    19  (11)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| T1   |   772 | 11580 |    17   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("OBJECT_TYPE"='TABLE')

Note
-----
   - Global temporary table session private statistics used
   

The above Note at the bottom of the execution plan is clearly indicating that Oracle has used the session private statistics of the GTT.

To end up the setup I am going to run the same query but from a different session as shown below:


SQL> select distinct sid from v$mystat;

       SID
----------
       394
 
SQL> select count(1) from t1;

  COUNT(1)
----------
         0
		 
SQL> insert into t1 select owner, object_type from all_objects where owner  = 'CTXSYS';
	   		
SQL>select owner, count(1)
    from t1
    where object_type = 'TABLE'
    group by owner
    order by 2 desc;

OWNER            COUNT(1)
-------------- ----------
CTXSYS                 53

SQL> select * from table(dbms_xplan.display_cursor);

SQL_ID  c1j862cvqgh99, child number 1  --> new child cursor
-------------------------------------
select owner, count(1) from t1 where object_type = 'TABLE' group by
owner order by 2 desc

Plan hash value: 2808104874
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |     4 (100)|          |
|   1 |  SORT ORDER BY      |      |    53 |  4399 |     4  (50)| 00:00:01 |
|   2 |   HASH GROUP BY     |      |    53 |  4399 |     4  (50)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| T1   |    53 |  4399 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("OBJECT_TYPE"='TABLE')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

As you can see the same query has refused to share the previous execution plan (child cursor 0 from session 16) and has hard parsed a new child cursor n°1 (in session 394). This is simply due to the following reason:

SQL> @nonshared c1j862cvqgh99

Show why existing SQL child cursors were not reused (V$SQL_SHARED_CURSOR)...

SQL_ID                        : c1j862cvqgh99
ADDRESS                       : 00007FFCCD36B030
CHILD_ADDRESS                 : 00007FFCCD3698D8
CHILD_NUMBER                  : 0
REASON                        : <ChildNode><ChildNumber>0</ChildNumber><ID>46</ID>
                                <reason>Session Specific Cursor SessionMismatch(1)</reason>
								<size>7x4</size><session_mismatch>1</session_mismatch>
								<current_qcsesidx>394</current_qcsesidx>       --> here the current SID
								<current_qcsesser>58570</current_qcsesser><current_qcinst>1</current_qcinst>
								<stored_qcsesidx>16</stored_qcsesidx> --> the SID of the original child cursor
								<stored_qcsesser>58601</stored_qcsesser>
								<stored_qcinst>1</stored_qcinst></ChildNode>
CON_ID                        : 1
-----------------
SQL_ID                        : c1j862cvqgh99
ADDRESS                       : 00007FFCCD36B030
CHILD_ADDRESS                 : 00007FFCCD348328
CHILD_NUMBER                  : 1
REASON                        :
CON_ID                        : 1
-----------------

PL/SQL procedure successfully completed.

Conclusion

In light of what I have demonstrated above I can, finally, define this non-sharing reason as:

(Y|N) The session specific cursor environment does not match the existing child cursor

With the first session mismatch being:

  • SessionMismatch(1) –> points to GTT private statistics

March 7, 2020

SQL Monitor flaw

Filed under: SQL Monitor — hourim @ 8:13 am

Observe the following real-life real time SQL Monitor report and try to figure out why the Duration is not equal to the Elapsed time

 Global Information
------------------------------
 Status              :  DONE (ERROR)              
 Instance ID         :  3                         
 Session             :  XXXX (1016:28171) 
 SQL ID              :  94dp7vscw26sf             
 SQL Execution ID    :  50331648                  
 Execution Started   :  01/21/2020 13:50:26       
 First Refresh Time  :  01/21/2020 13:50:26       
 Last Refresh Time   :  01/21/2020 13:50:30       
 Duration            :  4s                        
 Module/Action       :  SQL*Plus/-                
 Service             :  YYYYY                   
 Program             :  sqlplus.exe               

Global Stats
===================================================================================
| Elapsed |   Cpu   |    IO    | Application | Cluster  | Buffer |    Offload     |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) |  Gets  | Returned Bytes |
===================================================================================
|    1981 |     249 |     1687 |        0.00 |       45 |     8M |           92MB |
===================================================================================

In the following series of articles about the SQL monitor Duration and Elapsed Time:

I’ve explained why we should always rely on the Duration information as it represents the wall clock time of the query. The Elapsed time doesn’t include time spent by Idle session fetching data. And, when a SQL statement runs parallelly, the Elapsed time includes execution time of both the Query Coordinator and all its parallel server slaves.

So, If I tell you that there is no parsing effect in the above query and that there are no fetch calls, you would normally infer that it has really spent 4 seconds to complete; right?

What if I tell you that this query took approximatively 1981 seconds? Will you believe me?

Indeed, this is a parallel query using DOP 8 which took about 1981 seconds and where the Duration (4 seconds) represents only the time taken by the Query Coordinator. This is due to a flaw I have already explained in this article but where I’ve not been so smart, at that time, to point out the Duration flaw as well.

DONE (ERROR)

In contrast to what the status is saying, this query completes without error. And that is the first clue that should always put you on the right path when trouble shooting performance issue via SQL monitor reports. When this 12cR2 SQL query has been handled to me for investigation I immediately launched it again and here’s below the different phases reported by the SQL Monitor:

Global Information
------------------------------
 Status              :  EXECUTING                 
 Instance ID         :  3                         
 Session             :  XXXX (1016:28171) 
 SQL ID              :  94dp7vscw26sf             
 SQL Execution ID    :  50331648                  
 Execution Started   :  01/21/2020 13:50:26       
 First Refresh Time  :  01/21/2020 13:50:26       
 Last Refresh Time   :  01/21/2020 13:50:30       
 Duration            :  119s                      
 Module/Action       :  SQL*Plus/-                
 Service             :  YYYYY
 Program             :  sqlplus.exe               

Global Stats
===================================================================================
| Elapsed |   Cpu   |    IO    | Application | Cluster  | Buffer |    Offload     |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) |  Gets  | Returned Bytes |
===================================================================================
|     126 |      43 |       82 |        0.00 |     1.98 |   459K |           84MB |
===================================================================================
Parallel Execution Details (DOP=8 , Servers Allocated=16)

../..

Global Information
------------------------------
 Status              :  EXECUTING                 
 Instance ID         :  3                         
 Session             :  XXXX (1016:28171) 
 SQL ID              :  94dp7vscw26sf             
 SQL Execution ID    :  50331648                  
 Execution Started   :  01/21/2020 13:50:26       
 First Refresh Time  :  01/21/2020 13:50:26       
 Last Refresh Time   :  01/21/2020 13:50:30       
 Duration            :  1026s                     
 Module/Action       :  SQL*Plus/-                
 Service             :  YYYYY
 Program             :  sqlplus.exe               

Global Stats
==========================================================================
| Elapsed |   Cpu   |    IO    | Application | Cluster  |    Offload     |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Returned Bytes |
==========================================================================
|    1049 |     146 |      879 |        0.00 |       24 |           88MB |
==========================================================================
Parallel Execution Details (DOP=8 , Servers Allocated=16)

So far so good right?

In fact, if you look closely there is already something odd? Isn’t it?

Have you spotted out that the Last Refresh Time column is not changing?

Anyway, while the query was still running I launched the following query to see how the workload is distributed among the different parallel server slaves

@ getTimeSqlmon.sql
col exec_start format a25
col process    format a10
col status     format a15
select
    -- sql_id
    sql_exec_id   
   ,to_char(sql_exec_start,'dd/mm/yyyy hh24:mi:ss') exec_start
   ,to_char(sysdate,'dd/mm/yyyy hh24:mi:ss') current_time   
   ,process_name process
   ,round((sysdate-sql_exec_start) * 60 * 24,2) "minutes"
   ,round(elapsed_time/1e6,3) "elaps(sec)"
   ,round(cpu_time/1e6,3) "cpu(sec)"
   ,sql_plan_hash_value   "plan hash value"
   ,status
from gv$sql_monitor
where
  sql_id = '&sql_id'
order by exec_start,action;

SQL> @GetTimeSQLMon2
Enter value for sql_id: 94dp7vscw26sf

SQL_EXEC_ID EXEC_START            PROCE    mm elaps(sec)   cpu(sec) SQL_PLAN_HASH_VALUE STATUS
----------- --------------------- ----- ----- ---------- ---------- ------------------- ------
   50331648 21/01/2020 13:50:26   p002  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p003  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p008  16.35      3.364       .478        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p000  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p00f  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p007  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p00d  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p005  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p004  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p00c  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p001  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p009  16.35    978.915    136.371        524852868 EXECUTING ---------->>>>>>
   50331648 21/01/2020 13:50:26   p00e  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p006  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   ora   16.35       .152      4.125        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p00b  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p00a  16.35          0          0        524852868 EXECUTIN
                                  
 17 rows selected.   
SQL> @GetTimeSQLMon2
Enter value for sql_id: 94dp7vscw26sf

SQL_EXEC_ID EXEC_START           PROCESS  minutes elaps(sec)   cpu(sec) plan hash value STATUS   
----------- -------------------- -------- ------- ---------- ---------- --------------- ---------
   50331648 21/01/2020 13:50:26  p002       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p003       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p008       30.28      3.364       .478       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p000       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p00f       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p007       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p00d       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p005       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p004       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p00c       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p001       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p009       30.28   1813.916    225.829       524852868 EXECUTING ---------->>>>>>
   50331648 21/01/2020 13:50:26  p00e       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p006       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  ora        30.28       .152      4.125       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p00b       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p00a       30.28          0          0       524852868 EXECUTING
                                             
17 rows selected.

As you can see there is only a single parallel slave which is working (p009). All other slaves are waiting.

I managed to display, roughly, the number of minutes spent by each process not doing anything i.e. waiting. But if you read this article you will know that when a process waits for more than 30 minutes, its status will change to DONE (ERROR). Spot below how this is true:

SQL> @GetTimeSQLMon2
Enter value for sql_id: 94dp7vscw26sf

SQL_EXEC_ID EXEC_START            PROCESS  minutes elaps(sec)   cpu(sec)  STATUS
----------- --------------------- ------- -------- ---------- ----------  ------------
   50331648 21/01/2020 13:50:26   p002       30.58          0          0  DONE (ERROR) ---------->>>>>> IN ERROR after 30 min waiting
   50331648 21/01/2020 13:50:26   p003       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p008       30.58      3.364       .478  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p000       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00f       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p007       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00d       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p005       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p004       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00c       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p001       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p009       30.58   1833.977    228.496  EXECUTING    ---------->>>>>>
   50331648 21/01/2020 13:50:26   p00e       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p006       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   ora        30.58       .152      4.125  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00b       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00a       30.58          0          0  DONE (ERROR)
                                          
17 rows selected.

And so on until the p009 process ends successfully

SQL> @GetTimeSQLMon2
Enter value for sql_id: 94dp7vscw26sf

SQL_EXEC_ID EXEC_START            PROCESS       minutes elaps(sec)   cpu(sec) STATUS
----------- --------------------- ---------- ---------- ---------- ---------- ---------------
   50331648 21/01/2020 13:50:26   p002            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p003            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p008            42.88      3.364       .478 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p000            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00f            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p007            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00d            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p005            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p004            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00c            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p001            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p009            42.88   1942.557    244.777 DONE (ALL ROWS) ---------->>>>>>
   50331648 21/01/2020 13:50:26   p00e            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p006            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   ora             42.88       .152      4.125 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00b            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00a            42.88          0          0 DONE (ERROR)

17 rows selected.

A couple of minutes before the p009 PX slave finishes its work, the Duration was almost reporting a correct time as the following proves:

Global Information
------------------------------
 Status              :  EXECUTING                 
 Instance ID         :  3                         
 Session             :  XXXX (1016:28171) 
 SQL ID              :  94dp7vscw26sf             
 SQL Execution ID    :  50331648                  
 Execution Started   :  01/21/2020 13:50:26       
 First Refresh Time  :  01/21/2020 13:50:26       
 Last Refresh Time   :  01/21/2020 13:50:30       
 Duration            :  1283s   ------>>>>>   spot the Duration here                        
 Module/Action       :  SQL*Plus/-                
 Service             :  YYYYY
 Program             :  sqlplus.exe               

Global Stats
========================================================================
| Elapsed |   Cpu   |    IO    | Application | Buffer |    Offload     |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   |  Gets  | Returned Bytes |
========================================================================
|    1309 |     174 |     1106 |        0.00 |     5M |           88MB |
========================================================================
Parallel Execution Details (DOP=8 , Servers Allocated=16)

Unfortunately, at the end of the query, the Duration is wrongly updated to show 4 seconds instead of the wall clock time which should be around 1981 seconds

Global Information
------------------------------
 Status              :  DONE (ERROR)              
 Instance ID         :  3                         
 Session             :  XXXX (1016:28171) 
 SQL ID              :  94dp7vscw26sf             
 SQL Execution ID    :  50331648                  
 Execution Started   :  01/21/2020 13:50:26       
 First Refresh Time  :  01/21/2020 13:50:26       
 Last Refresh Time   :  01/21/2020 13:50:30       
 Duration            :  4s   ------>>>>>   and spot the Duration at the end of the query                   
 Module/Action       :  SQL*Plus/-                
 Service             :  YYYYY
 Program             :  sqlplus.exe               

Global Stats
=====================================================================
| Elapsed |   Cpu   |    IO    | Cluster  | Buffer |    Offload     |
| Time(s) | Time(s) | Waits(s) | Waits(s) |  Gets  | Returned Bytes |
=====================================================================
|    1981 |     249 |     1687 |       45 |     8M |           92MB |
=====================================================================
Parallel Execution Details (DOP=8 , Servers Allocated=16)

Those 4 seconds represent either the time difference between the start of the query and the last refresh time or the time spent by the query coordinator which is 3.364+0.478 = 3.842 ~ 4 seconds

In passing here how the workload has been distributed when I fixed the issue

SQL> @getTimeSQLMon2
Enter value for sql_id: 2hrxjvurz5c0p

SQL_EXEC_ID EXEC_START            PROCESS minutes elaps(sec)   cpu(sec)  STATUS
----------- --------------------- ------- ------- ---------- ----------  ---------------
   50331648 21/01/2020 16:09:40   p002      10.88       .403        .52  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p009      10.88     38.332      5.065  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p000      10.88       .425       .533  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p008      10.88     38.138      4.623  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p003      10.88       .468       .628  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p00b      10.88     38.345      4.993  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p00d      10.88     38.206      4.362  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p005      10.88       .453       .613  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p00f      10.88     38.279      5.089  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p007      10.88       .429       .537  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p00c      10.88     38.211      5.443  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p004      10.88       .474       .632  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   ora       10.88       2.12      1.507  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p00e      10.88     38.348      6.344  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p006      10.88       .413       .613  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p001      10.88       .432       .589  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p00a      10.88     38.624      5.481  DONE (ALL ROWS)

17 rows selected.

Summary

When you retrospectively analyze a parallel query performance issue through its corresponding SQL monitoring showing a DONE (ERROR) status, then you should log in your brain the followings:

• There is a probability that the time reported by the Duration is wrong as it either reports the QC time or the time difference between the start of the query and the last time it has been refreshed
• Don’t trust the DONE (ERROR) status. The query might have completed successfully, and this status value is simply indicating that the QC and one or many of its parallel slaves have been waiting for more than 30 minutes

January 5, 2020

PHV2

Filed under: Tuning — hourim @ 2:47 pm

PLAN_HASH_VALUE_2 (Phv2) includes the hash value of the execution(PLAN_HASH_VALUE) and the hash value of its predicate part. Whenever, I have a doubt about the difference between two similar execution plans I use the following query to get their corresponding Phv2:

-- phv2.sql
SELECT
  p.sql_id
 ,p.plan_hash_value
 ,p.child_number
 ,t.phv2
FROM 
  gv$sql_plan p
 ,xmltable('for $i in /other_xml/info
             where $i/@type eq "plan_hash_2"
             return $i'
             passing xmltype(p.other_xml)
             columns phv2 number path '/'
            ) t
WHERE 
     p.sql_id = '&sql_id'
AND p.other_xml is not null;

Phv2 is important when trying to understand why an enabled and accepted SPM baseline plan has not been used. Indeed, the CBO will use the SPM plan provided it will be able to produce, at query execution time, an execution plan having a PHV2 that equals the PLAN_ID of the SPM plan stored in the SPM baseline as shown below in the CBO-SPM plan selection diagram:

There exist different reasons which make two execution plans having the same plan_hash_value but a different Phv2 of which I can list two:

There is, though, an effort deployed by Oracle to make Phv2 independent from the generated CBO view names like it is already the case for the materialized CTE SYS_TEMP table transformation view as explained by Dominic Brooks in this blog post.

That’s said, a couple of weeks ago I observed an interesting practical real life case :

SQL> @phv2
Enter value for sql_id: 9m80uw87u330r

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER       PHV2
------------- --------------- ------------ ----------
9m80uw87u330r      1858552262            1  513024182
9m80uw87u330r      1858552262            0  513024182

As you can see, the above parent SQL_ID has two child cursors n°0 and n°1 having the same PLAN_HASH_VALUE and the same PHV2. But they are performing differently as shown by the following execution statistics where child n°0 consumes more physical I/O then child n°1 :

SQL> @sqlstats
Enter value for sql_id: 9m80uw87u330r

     CHILD PLAN_HASH_VALUE   AVG_GETS   AVG_PIOS  AVG_ETIME     EXECS
---------- --------------- ---------- ---------- ---------- ---------
         0      1858552262    1786506    1275800 453.184926         1
         1      1858552262    3408017       4095 18.6310735         4

I would have loved to direct you to the excellent presentation done by Mauro Pagano about same plan but different performance to have an idea about the performance difference between these two child cursors. But despite they have the same PLAN_HASH_VALUE and the same PHV2 they are, in this case, two really different execution plans as the followings prove:

SQL> select * from table(dbms_xplan.display_cursor('9m80uw87u330r',null));

SQL_ID  9m80uw87u330r, child number 0
-------------------------------------
Plan hash value: 1858552262
------------------------------------------------------------------------------------
| Id  | Operation                     | Name                    | Rows  | Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                         |       |          |
|   1 |  CONCATENATION                |                         |       |          |
|*  2 |   HASH JOIN                   |                         |     1 | 00:03:17 |
|*  3 |    TABLE ACCESS FULL          | UB_XYZBLTYVG_VALLEE_PAD | 46302 | 00:00:01 |
|*  4 |    TABLE ACCESS BY INDEX ROWID| RB_SALA                 |    34M| 00:01:50 | --> Notice the presence of *
|*  5 |     INDEX RANGE SCAN          | RB_SALA_CER_IDX         |    34M| 00:00:05 |
|*  6 |   HASH JOIN                   |                         |     1 | 00:09:03 |
|*  7 |    TABLE ACCESS FULL          | UB_XYZBLTYVG_VALLEE_PAD | 46302 | 00:00:01 |
|*  8 |    TABLE ACCESS BY INDEX ROWID| RB_SALA                 |    93M| 00:05:02 |
|*  9 |     INDEX RANGE SCAN          | RB_SALA_CER_IDX         |    93M| 00:00:15 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("D"."SUB_ACCT"<>(-:SYS_B_21))
   5 - access("D"."STATE"=:SYS_B_15 AND "D"."CERMETHOD"=:SYS_B_16)
       filter("D"."CREATEDBY"<>:SYS_B_20)
   9 - access("D"."STATE"=:SYS_B_14)
       filter(("D"."CREATEDBY"<>:SYS_B_20 AND (LNNVL("D"."STATE"=:SYS_B_15) OR
              LNNVL("D"."CERMETHOD"=:SYS_B_16))))
Note
-----
   - SQL profile prf_9m80uw87u330r_2229773696 used for this statement
   - SQL patch "PATCH_9m80uw87u330r" used for this statement


SQL_ID  9m80uw87u330r, child number 1
-------------------------------------
Plan hash value: 1858552262
------------------------------------------------------------------------------------
| Id  | Operation                     | Name                    | Rows  | Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                         |       |          |
|   1 |  CONCATENATION                |                         |       |          |
|*  2 |   HASH JOIN                   |                         |     1 | 00:04:58 |
|*  3 |    TABLE ACCESS FULL          | UB_XYZBLTYVG_VALLEE_PAD | 46302 | 00:00:01 |
|   4 |    TABLE ACCESS BY INDEX ROWID| RB_SALA                 |    34M| 00:03:30 | --> Notice the absence of *
|*  5 |     INDEX RANGE SCAN          | RB_SALA_CER_IDX         |    34M| 00:00:05 |
|*  6 |   HASH JOIN                   |                         |     1 | 00:13:38 |
|*  7 |    TABLE ACCESS FULL          | UB_XYZBLTYVG_VALLEE_PAD | 46302 | 00:00:01 |
|   8 |    TABLE ACCESS BY INDEX ROWID| RB_SALA                 |    93M| 00:09:37 |
|*  9 |     INDEX RANGE SCAN          | RB_SALA_CER_IDX         |    93M| 00:00:13 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("D"."STATE"=:SYS_B_15 AND "D"."CERMETHOD"=:SYS_B_16)
       filter(("D"."CREATEDBY"<>:SYS_B_20 AND "D"."SUB_ACCT"<>(-:SYS_B_21)))   
   9 - access("D"."STATE"=:SYS_B_14)
       filter(("D"."CREATEDBY"<>:SYS_B_20 AND "D"."SUB_ACCT"<>(-:SYS_B_21) AND
              (LNNVL("D"."STATE"=:SYS_B_15) OR LNNVL("D"."CERMETHOD"=:SYS_B_16))))

Note
-----
   - SQL profile prf_9m80uw87u330r_2229773696 used for this statement
   - SQL patch "PATCH_9m80uw87u330r" used for this statement

I have deleted from the two execution plans predicates n°2,3,6, 7 and 8 since they are not playing any role in this context.

Two important points can be pointed out here:

  • The first interesting point to emphasize is that there is no filter predicate on the RB_SALA table at line id n° 4 in the second execution plan (absence of *)
  •  The second difference resides in the filter predicates n° 5 and 9 of the second execution plan. They have an extra clause on the SUB_ACCT column.

The drastic performance enhancement I brought to this query, was particularly due to the elimination of the filter predicate n°4 on the big table RB_SALA. Filtering at the index level reduces the total physical I/O consumption (4,095 versus 1,275,800) making the execution time of cursor child n°1 very interesting (18 seconds versus 453).

If you want to know the whole story, then here’s what I did:

One of my clients asked me to look at a critical query that was not very well performing following an upgrade from 11gR2 to 12cR1. This query was using the execution plan of child n°0. I knew immediately that if I can get rid of the filter predicate from the RB_SALA at line n°4 I will make this query performing very well. Therefore, I created the following index :

SQL> create /*+ parallel(4) */ index RB_SALA_CER_IDX_MHO on RB_SALA(STATE, CERMETHOD, CREATEDBY, SUB_ACCT) compress 3;

By the way spot how I managed to create the index using parallelism without having to unset the DOP of the created index afterward.

For the sake of completeness there exist in this application the following index which is used by a SQL Profile

RB_SALA_CER_IDX on RB_SALA(STATE, CERMETHOD, CREATEDBY);

I disabled the SQL patch and the SQL Profile and asked the client to launch the query, which unfortunately refused to use the new index and was, this time, dramatically slow.

After several failed tests trying to make the CBO using the new index I decided to use the brute force

prompt renaming existing RB_SALA_CER_IDX index used in SQL Profile
alter index RB_SALA_CER_IDX rename to RB_SALA_CER_IDX_OLD;
--
prompt setting existing RB_SALA_CER_IDX index invisible
alter index RB_SALA_CER_IDX_OLD invisible;
--
prompt rename newly created index to the old existing one so that it will be used by the SQL Profile 
alter index RB_SALA_CER_IDX_MHO rename to RB_SALA_CER_IDX;

With the above three commands I produced the performant and acceptable execution plan represented by child n°1 having the same PLAN_HASH_VALUE and the same PHV2 as that of the bad performing execution plan

How to model this case

Here’s an easy way to model this case:

SQL> create table t1 as select rownum n1, trunc((rownum -1)/3) n2 , 'xY' vc 
    from dual connect by level <=10; 

SQL> create index t1_idx on t1(n2);

SQL> select * from t1 where n2=0 and vc='xY';

        N1         N2 VC
---------- ---------- --
         1          0 xY
         2          0 xY
         3          0 xY

SQL_ID  gzfq0fm0jf4v9, child number 0
-------------------------------------
select * from t1 where n2=0 and vc='xY'

Plan hash value: 1775246573
--------------------------------------------------------------
| Id  | Operation                           | Name   | Rows  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |       |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1     |     3 |
|*  2 |   INDEX RANGE SCAN                  | T1_IDX |     3 |
--------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("VC"='xY')
   2 - access("N2"=0)

SQL> @Phv2
Enter value for sql_id: gzfq0fm0jf4v9

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER       PHV2
------------- --------------- ------------ ----------
gzfq0fm0jf4v9      1775246573            0  409377851

SQL> alter index t1_idx invisible;

SQL> alter index t1_idx rename to t1_idx_old;

SQL> create index t1_idx on t1(n2, vc);

SQL> select * from t1 where n2=0 and vc='xY';

        N1         N2 VC
---------- ---------- --
         1          0 xY
         2          0 xY
         3          0 xY

SQL_ID  gzfq0fm0jf4v9, child number 0
-------------------------------------
select * from t1 where n2=0 and vc='xY'

Plan hash value: 1775246573
--------------------------------------------------------------
| Id  | Operation                           | Name   | Rows  |
--------------------------------------------------------------
|   0 | SELECT STATEMENT                    |        |       |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1     |     3 |
|*  2 |   INDEX RANGE SCAN                  | T1_IDX |     3 |
--------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N2"=0 AND "VC"='xY')

SQL> @phv2
Enter value for sql_id: gzfq0fm0jf4v9

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER       PHV2
------------- --------------- ------------ ----------
gzfq0fm0jf4v9      1775246573            0  409377851

Bottom Line

Don’t be surprised to see two non-identical execution plans having the same PLAN_HASH_VALUE, the same PHV2, generating the same rows but performing differently. This occurs particularly when

  •  a filter predicate on table access operation is deleted
  •  a filter predicate is enriched by an extra condition

December 23, 2019

PUSH SUBQUERY

Filed under: Tuning — hourim @ 8:27 am

In the previous article we examined the performance benefit brought by the scalar subquery caching effect when a call to a PL/SQL function is wrapped into a select from dual. Scalar subquery caching, almost always, reduces the number of PL/SQL function recursive calls. In this article I wanted to share with you the observations I have made in the same real-life query when I replaced the call to the PL/SQL function by its content.

The real-life query running under a 12cR1 Exadata machine and using a PL/SQL function in the where clause looks like the following:

SQL> SELECT /*+ gather_plan_statistics */
2     a.xy_etp_dat_deb,
3     a.xy_etp_num,
4     a.xy_bat_id
5  FROM
6     ps_xy_exp_etp_exe a
7  WHERE
8       a.xy_bat_id = f_get_id('BJOBD176')
9   AND a.xy_etp_dat_deb = (select
10                               max(b.xy_etp_dat_deb)
11                           from ps_xy_exp_etp_exe b
12                           where a.xy_bat_id = b.xy_bat_id
13                           )
14   AND a.xy_etp_num     = (select
15                              max(c.xy_etp_num)
16                           from ps_xy_exp_etp_exe c
17                           where c.xy_bat_id    = a.xy_bat_id
18                           and c.xy_etp_dat_deb = a.xy_etp_dat_deb
19                           );

Elapsed: 00:00:01.80

Statistics
---------------------------------------------------------
18605  recursive calls
0  db block gets
297806  consistent gets
0  physical reads
0  redo size
482  bytes sent via SQL*Net to client
372  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
1  sorts (memory)
0  sorts (disk)
1  rows processed

Instead of surrounding the PL/SQL function at line 8 by a “select from dual” as explained in the previous article, I decided to use its content directly into the where clause as shown below at line 8:

SQL> SELECT /*+ gather_plan_statistics */
2     a.xy_etp_dat_deb,
3     a.xy_etp_num,
4     a.xy_bat_id
5  FROM
6     ps_xy_exp_etp_exe a
7  WHERE
8       a.xy_bat_id = (select xy_bat_id from ps_xy_exp_bat where xy_bat_nom = 'BJOBD176')
9  AND a.xy_etp_dat_deb = (select
10                               max(b.xy_etp_dat_deb)
11                           from ps_xy_exp_etp_exe b
12                           where a.xy_bat_id = b.xy_bat_id
13                          )
14   AND a.xy_etp_num     = (select
15                              max(c.xy_etp_num)
16                           from ps_xy_exp_etp_exe c
17                           where c.xy_bat_id    = a.xy_bat_id
18                           and c.xy_etp_dat_deb = a.xy_etp_dat_deb
19                           );
Elapsed: 00:00:00.04

Statistics
-------------------------------------------------------
0  recursive calls
0  db block gets
142  consistent gets
0  physical reads
0  redo size
482  bytes sent via SQL*Net to client
372  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
1  sorts (memory)
0  sorts (disk)
1  rows processed

As you can see, replacing the PL/SQL function call by its content has eliminated the number of recursive calls as we went from 18506 recursive to 0 call and from 2 seconds to 4 milliseconds.

Let’s try to understand what happened so that we got such a performant improvement. First the row source execution plan (for the sake of simplicity I am using the predicate part taken from the explain plan for command):

Plan hash value: 408154845
------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name              | Starts | E-Rows | A-Rows |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |                   |      1 |        |      1 |
|*  1 |  FILTER                                 |                   |      1 |        |      1 |
|*  2 |   HASH JOIN                             |                   |      1 |      1 |      1 |
|*  3 |    TABLE ACCESS STORAGE FULL            | PS_XY_EXP_ETP_EXE |      1 |     63 |     76 |
|*  4 |     TABLE ACCESS STORAGE FULL FIRST ROWS| PS_XY_EXP_BAT     |      1 |      1 |      1 |
|   5 |    VIEW                                 | VW_SQ_1           |      1 |    296 |    296 |
|   6 |     SORT GROUP BY                       |                   |      1 |    296 |    296 |
|   7 |      TABLE ACCESS STORAGE FULL          | PS_XY_EXP_ETP_EXE |      1 |  18697 |  18605 |
|   8 |   SORT AGGREGATE                        |                   |      1 |      1 |      1 |
|*  9 |    TABLE ACCESS STORAGE FULL            | PS_XY_EXP_ETP_EXE |      1 |      1 |      1 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("A"."XY_ETP_NUM"= (SELECT MAX("C"."XY_ETP_NUM") FROM "PS_XY_EXP_ETP_EXE" "C" WHERE
"C"."XY_ETP_DAT_DEB"=:B1 AND "C"."XY_BAT_ID"=:B2))
2 - access("A"."XY_ETP_DAT_DEB"="MAX(B.XY_ETP_DAT_DEB)" AND "A"."XY_BAT_ID"="ITEM_1")
3 - storage("A"."XY_BAT_ID"= (SELECT "XY_BAT_ID" FROM "PS_XY_EXP_BAT" "PS_XY_EXP_BAT" WHERE
"XY_BAT_NOM"='BJOBD176'))
filter("A"."XY_BAT_ID"= (SELECT "XY_BAT_ID" FROM "PS_XY_EXP_BAT" "PS_XY_EXP_BAT" WHERE
"XY_BAT_NOM"='BJOBD176'))
4 - storage("XY_BAT_NOM"='BJOBD176')
filter("XY_BAT_NOM"='BJOBD176')
9 - filter("C"."XY_ETP_DAT_DEB"=:B1 AND "C"."XY_BAT_ID"=:B2)

That’s a weird non-common execution plan, right?

How could it be differently when we see these two unusual operations at lines Id n°3 and n°4?

Usually, a PARENT-CHILD operation is, for example, an index feeding its parent table with ROWIDS which, then, get filtered by the parent table access by index ROWID. But, here, we have a parent TABLE ACCESS STORAGE FULL of PS_XY_EXP_ETP_EXE at line Id n°3 and an unrelated child operation TABLE ACCESS STORAGE FULL FIRST ROWS of table PS_XY_EXP_BAT at line Id n°4.

What I, very often do, when trying to understand what cunning transformation has been done by Oracle behind the scenes is to check the outline section of the execution plan:

Outline Data
-------------

/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
DB_VERSION('12.1.0.2')
../..
OUTLINE(@"SEL$3")
OUTLINE(@"SEL$BD9E0841")
OUTLINE(@"SEL$1")
FULL(@"SEL$B9151BA2" "A"@"SEL$1")
NO_ACCESS(@"SEL$B9151BA2" "VW_SQ_1"@"SEL$BD9E0841")
LEADING(@"SEL$B9151BA2" "A"@"SEL$1" "VW_SQ_1"@"SEL$BD9E0841")
USE_HASH(@"SEL$B9151BA2" "VW_SQ_1"@"SEL$BD9E0841")
PUSH_SUBQ(@"SEL$2")                              --> this is where the subuqery is pushed
PQ_FILTER(@"SEL$B9151BA2" SERIAL)
FULL(@"SEL$291F8F59" "B"@"SEL$3")
FULL(@"SEL$4" "C"@"SEL$4")
END_OUTLINE_DATA
*/

Indeed, there is that hint PUSH_SUBQ(@”SEL$2″) which indicates that he scalar subquery against PS_XY_EXP_BAT table has been first evaluated once (Starts =1, Id operation n°4) and then, the result of this evaluation, has been pushed as a predicate against the PS_XY_EXP_ETP_EXE table (Id operation n°3). And this is exactly what made PS_XY_EXP_BAT table a child subordinate action to PS_XY_EXP_ETP_EXE parent table.

If Oracle had not pushed the scalar subquery down the execution plan tree, it would have produced the following execution plan:

explain plan for
SELECT
a.xy_etp_dat_deb,
a.xy_etp_num,
a.xy_bat_id
FROM
ps_xy_exp_etp_exe a
WHERE
a.xy_bat_id = (select /*+ NO_PUSH_SUBQ(@"SEL$2")*/ xy_bat_id from ps_xy_exp_bat where xy_bat_nom = 'BJOBD176')
AND a.xy_etp_dat_deb = (select
max(b.xy_etp_dat_deb)
from ps_xy_exp_etp_exe b
where a.xy_bat_id = b.xy_bat_id)
AND a.xy_etp_num     = (select
max(c.xy_etp_num)
from ps_xy_exp_etp_exe c
where c.xy_bat_id    = a.xy_bat_id
and c.xy_etp_dat_deb = a.xy_etp_dat_deb);

Plan hash value: 1136506203
----------------------------------------------------------------------------------------------
| Id  | Operation                             | Name              | Starts | E-Rows | A-Rows |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                   |      1 |        |      1 |
|*  1 |  FILTER                               |                   |      1 |        |      1 |
|*  2 |   HASH JOIN                           |                   |      1 |    296 |    296 |
|*  3 |    HASH JOIN                          |                   |      1 |    296 |    318 |
|   4 |     VIEW                              | VW_SQ_1           |      1 |    296 |    296 |
|   5 |      SORT GROUP BY                    |                   |      1 |    296 |    296 |
|   6 |       TABLE ACCESS STORAGE FULL       | PS_XY_EXP_ETP_EXE |      1 |  18697 |  18605 |--1
|   7 |     TABLE ACCESS STORAGE FULL         | PS_XY_EXP_ETP_EXE |      1 |  18697 |  18605 |
|   8 |    VIEW                               | VW_SQ_2           |      1 |  18697 |  17933 |
|   9 |     SORT GROUP BY                     |                   |      1 |  18697 |  17933 |
|  10 |      TABLE ACCESS STORAGE FULL        | PS_XY_EXP_ETP_EXE |      1 |  18697 |  18605 |
|* 11 |   TABLE ACCESS STORAGE FULL FIRST ROWS| PS_XY_EXP_BAT     |      1 |      1 |      1 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("A"."XY_BAT_ID"= (SELECT /*+ NO_PUSH_SUBQ */ "XY_BAT_ID" FROM "PS_XY_EXP_BAT"
"PS_XY_EXP_BAT" WHERE "XY_BAT_NOM"='BJOBD176'))
2 - access("A"."XY_ETP_NUM"="MAX(C.XY_ETP_NUM)" AND "ITEM_2"="A"."XY_BAT_ID" AND
"ITEM_3"="A"."XY_ETP_DAT_DEB")
3 - access("A"."XY_ETP_DAT_DEB"="MAX(B.XY_ETP_DAT_DEB)" AND "A"."XY_BAT_ID"="ITEM_1")
11 - storage("XY_BAT_NOM"='BJOBD176')
filter("XY_BAT_NOM"='BJOBD176')

Notice now how the scalar subquery has been left until all the other predicates have been applied to the parent query block (operation Id n°1). This is the case where the subquery is evaluated as a FILTER predicate for every row in the result set. In the PUSH_SUBQ former case, Oracle runs the subquery for each row in PS_XY_EXP_ETP_EXE table and join the 76 survived rows with the rest of the data. This means that when the subquery has been pushed, the first operation in the execution plan tree has eliminated almost all rows and kept only 76. However, in the NO_PUSH_SUBQ case, the first operation(id n°6 in plan with plan hash value 1136506203) in the execution plan produced 18605 rows. Since Oracle started bigger, more data have then to be carried out through the rest of the plan causing, consequently, more work and more time.

Summary

Through two distinct articles, we’ve examined a performance issue of a real-life query using a PL/SQL function in one of its WHERE clause to FILTER rows from the parent query block. In the first article we outlined how we can take benefit from scalar subquery cache effect simply by surrounding the PL/SQL function call with a SELECT FROM DUAL. In the second article (current one) we examined how sometimes it is better to replace the PL/SQL function by its content in the scalar subquery WHERE clause. Proceeding as such, we can offer the Oracle optimizer engine the possibility to push the result of the scalar subquery where clause deeper in the execution plan path; filtering, as such, non-qualified rows much earlier and carrying less data through the rest of the execution plan.

December 18, 2019

Scalar subquery caching : the select from dual trick

Filed under: Oracle — hourim @ 3:00 pm

This short note explains why you should think about wrapping your PL/SQL function calls in a SELECT FROM DUAL so that you will get the performance benefit of scalar subquery caching whenever possible.

Recursive calls

Here’s below a real-life query running under a 12cR1 Exadata machine and using a call to a PL/SQL function in line number 8 below:

SQL> SELECT /*+ gather_plan_statistics */
2     a.xy_etp_dat_deb,
3     a.xy_etp_num,
4     a.xy_bat_id
5  FROM
6     sp_xy_exp_etp_exe a
7  WHERE
8       a.xy_bat_id = f_get_id('BJOBD176')
9   AND a.xy_etp_dat_deb = (select
10                               max(b.xy_etp_dat_deb)
11                           from sp_xy_exp_etp_exe b
12                           where a.xy_bat_id = b.xy_bat_id
13                           )
14   AND a.xy_etp_num     = (select
15                              max(c.xy_etp_num)
16                           from sp_xy_exp_etp_exe c
17                           where c.xy_bat_id    = a.xy_bat_id
18                           and c.xy_etp_dat_deb = a.xy_etp_dat_deb
19                           );

Elapsed: 00:00:01.82

Statistics
----------------------------------------------------------
18605  recursive calls
0  db block gets
297806  consistent gets
0  physical reads
0  redo size
482  bytes sent via SQL*Net to client
372  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
1  sorts (memory)
0  sorts (disk)
1  rows processed

I was immediately intrigued by the 18605 recursive calls that were systematically appearing at each execution of the above query. Therefore, I started trying to figure out the root cause of those recursive calls. The corresponding row-source execution plan is:

Plan hash value: 1967369873
--------------------------------------------------------------------------------------
| Id  | Operation                     | Name              | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                   |      1 |        |      1 |
|*  1 |  FILTER                       |                   |      1 |        |      1 |
|*  2 |   HASH JOIN                   |                   |      1 |      1 |      1 |
|*  3 |    TABLE ACCESS STORAGE FULL  | SP_XY_EXP_ETP_EXE |      1 |     63 |     76 | --> this
|   4 |    VIEW                       | VW_SQ_1           |      1 |    296 |    296 |
|   5 |     SORT GROUP BY             |                   |      1 |    296 |    296 |
|   6 |      TABLE ACCESS STORAGE FULL| SP_XY_EXP_ETP_EXE |      1 |  18697 |  18605 | --> this
|   7 |   SORT AGGREGATE              |                   |      1 |      1 |      1 |
|*  8 |    TABLE ACCESS STORAGE FULL  | SP_XY_EXP_ETP_EXE |      1 |      1 |      1 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("A"."XY_ETP_NUM"=)
2 - access("A"."XY_ETP_DAT_DEB"="MAX(B.XY_ETP_DAT_DEB)" AND "A"."XY_BAT_ID"="ITEM_1")
3 - filter("A"."XY_BAT_ID"="F_GET_ID"('BJOBD176'))
8 - filter(("C"."XY_ETP_DAT_DEB"=:B1 AND "C"."XY_BAT_ID"=:B2))

I don’t think that the 18605 A-Rows at operation Id n°6 and the 18605 recursive calls are a result of a mere coincidence. They are inevitably related to each other. Let’s then try to find out this relationship. One way to do that is to look at a 10046-trace of the above query:

SQL> alter session set tracefile_identifier ='QueryRecursiveCalls';

SQL> @46on 12
SQL –- execute query here
SQL> @46off

The TKRPOFED trace file shows the followings:

SELECT /*+ gather_plan_statistics */
a.xy_etp_dat_deb,
a.xy_etp_num,
a.xy_bat_id
FROM
sp_xy_exp_etp_exe a
WHERE
a.xy_bat_id = f_get_id('BJOBD176')
AND a.xy_etp_dat_deb = (select
max(b.xy_etp_dat_deb)
from sp_xy_exp_etp_exe b
where a.xy_bat_id = b.xy_bat_id)
AND a.xy_etp_num     = (select
max(c.xy_etp_num)
from sp_xy_exp_etp_exe c
where c.xy_bat_id    = a.xy_bat_id
and c.xy_etp_dat_deb = a.xy_etp_dat_deb)

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.02       0.02          0         12          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        0      0.00       0.00          0          0          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        2      0.02       0.02          0         12          0           0

SQL ID: 5gb8jzxq0zcj4 Plan Hash: 1613371837

SELECT XY_BAT_ID
FROM
SP_XY_EXP_BAT WHERE XY_BAT_NOM = :B1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute  11146      0.93       0.95          0          0          0           0
Fetch    11146      0.83       0.90          0     178336          0       11146
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    22293      1.77       1.85          0     178336          0       11146

As you we can see, there is a recursive SELECT against SP_XY_EXP_BAT table consuming almost all the total time of the query (1.85 seconds). This recursive SELECT, identified by SQL_ID 5gb8jzxq0zcj4, is nothing else than the content of the PL/SQL function used in line number 8 of the original main query and which I am reproducing here below:

WHERE
a.xy_bat_id = f_get_id('BJOBD176');

The capital bind variable :B1 in the TKPROF trace file represents another indication of a SQL embedded in PL/SQL. I can even get the captured value of this bind variable as shown below:

col value_string format a10
col last_captured format a20
select
last_captured
,value_string
from
v$sql_bind_capture
where sql_id = '5gb8jzxq0zcj4';

LAST_CAPTURED        VALUE_STRI
-------------------- ----------
05/08/19             BJOBD176

While the number of executions of the recursive SELECT shown in the TKPROF (11,146) is not exactly the number I was expecting i.e. 18,605, I can, nevertheless, see that the number of executions of the corresponding SQL_ID (5gb8jzxq0zcj4) stored in memory is exactly 18,605

SQL> @gv$sql
Enter value for sql_id: 5gb8jzxq0zcj4

SQL_ID        CHILD_NUMBER PLAN_HASH_VALUE TO_CHAR(P.LAST_ACTI EXECUTIONS
------------- ------------ --------------- ------------------- ----------
5gb8jzxq0zcj4            0      1613371837 05/08/2019 11:27:19      18605

So far, we have identified that the 18605 recursive calls represent simply the 18605 executions of predicate n°3 above which involves the call to the PL/SQL function. But what is, exactly, the relationship between those 18605 recursive calls and the 18605 A-Rows of line 6 in the above execution plan?

The 18605 recursive calls are strongly related to the number of rows present in the SP_XY_EXP_ETP_EXE table.

SQL> select count(1) from SP_XY_EXP_ETP_EXE;

COUNT(1)
----------
18605

Indeed, the PL/SQL function was called as many times as there are rows in this table producing 76 rows as confirmed via the A-Rows column of operation at line n°3 in the above execution plan:

SQL> select count(1) from SP_XY_EXP_ETP_EXE
where xy_bat_id = f_get_id('BJOBD176');

COUNT(1)
----------
76

How to fix the problem while still using the PL/SQL function?

The problem resorts to the multiple executions (18605) of the predicate n°3 on table SP_XY_EXP_ETP_EXE:

3 - filter("A"."XY_BAT_ID"="F_GET_ID"('BJOBD176'))

This happens even though the input to the function, BJOBD176, is constant.

If we surround the call to that function with a SELECT FROM DUAL Oracle will use scalar subquery caching producing, as such, a massive reduction in the number of calls to that function as the following proves:

SELECT /*+ gather_plan_statistics */
a.xy_etp_dat_deb,
a.xy_etp_num,
a.xy_bat_id
FROM
sp_xy_exp_etp_exe a
WHERE
a.xy_bat_id = (select f_get_id('BJOBD176') from dual)
AND a.xy_etp_dat_deb = (select
max(b.xy_etp_dat_deb)
from sp_xy_exp_etp_exe b
where a.xy_bat_id = b.xy_bat_id
)
AND a.xy_etp_num     = (select
max(c.xy_etp_num)
from sp_xy_exp_etp_exe c
where c.xy_bat_id    = a.xy_bat_id
and c.xy_etp_dat_deb = a.xy_etp_dat_deb
);

Elapsed: 00:00:00.02 --> drastic execution time reduction
--------------------------------------------------------------------------------------
| Id  | Operation                     | Name              | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                   |      1 |        |      1 |
|*  1 |  FILTER                       |                   |      1 |        |      1 |
|*  2 |   HASH JOIN                   |                   |      1 |      1 |      1 |
|*  3 |    TABLE ACCESS STORAGE FULL  | SP_XY_EXP_ETP_EXE |      1 |     63 |     76 |
|   4 |     FAST DUAL                 |                   |      1 |      1 |      1 |
|   5 |    VIEW                       | VW_SQ_1           |      1 |    296 |    296 |
|   6 |     SORT GROUP BY             |                   |      1 |    296 |    296 |
|   7 |      TABLE ACCESS STORAGE FULL| SP_XY_EXP_ETP_EXE |      1 |  18697 |  18605 |
|   8 |   SORT AGGREGATE              |                   |      1 |      1 |      1 |
|*  9 |    TABLE ACCESS STORAGE FULL  | SP_XY_EXP_ETP_EXE |      1 |      1 |      1 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("A"."XY_ETP_NUM"=)
2 - access("A"."XY_ETP_DAT_DEB"="MAX(B.XY_ETP_DAT_DEB)" AND"A"."XY_BAT_ID"="ITEM_1")
3 - storage("A"."XY_BAT_ID"=)
filter("A"."XY_BAT_ID"=)
9 - filter(("C"."XY_ETP_DAT_DEB"=:B1 AND "C"."XY_BAT_ID"=:B2))

Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
142  consistent gets
0  physical reads
0  redo size
482  bytes sent via SQL*Net to client
372  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
1  sorts (memory)
0  sorts (disk)
1  rows processed

Surrounding the call to the function with a select from dual allows Oracle to cache the result of the function for identical results. Therefore, the number of recursive calls drops from 18605 to only a single call and the query completes in about 2 milli-seconds instead of 2 seconds when scalar subquery caching was not possible.

Summary

As explained by Tom Kyte in this excellent article I can confirm that I have, almost always, improved the performance of my queries when I wrapped the PL/SQL function call into a SELECT FROM DUAL. This trick allows us to get the performance benefit of the scalar subquery cache.
In the next article I will show you what happens to the real-life query when I have replaced the PL/SQL function call by its content.

December 9, 2019

AWR bulk collect

Filed under: Oracle — hourim @ 5:08 pm

There are many posts out there explaining how to collect a bunch of AWR reports which I could have just referenced and leave it there, but as much for my own documentation as anything else, I decided to write this simple note as an answer to my client request of generating and analyzing 49 one-hour AWR reports.

First, the SQL script to generate a bulk of AWR reports between two dates:

-- +----------------------------------------------------------------------------+
-- | Author : Mohamed Houri                                                     |
-- |----------------------------------------------------------------------------|
-- | DATABASE : 12cR1                                                           |
-- | Name     : AwrBulkcollect.sql                                              |
-- | PURPOSE  : Dynamically create a SQL script to generate a list of AWR       |
-- |            reports between two snapshots dates.                            |
-- | NOTE     : As with any code, ensure to test this script in a development   |
-- |            environment before attempting to run it in production.          |
-- | Remarks : CHR(10) adds a new line                                          |
-- |           SET termout off so that AWR content will not be displayed        |
-- |           For RAC configuration consider the instance number               |       
-- |          Enter input date in this format :'mmddyyyy hh24:mi:ss'            |
-- |                                                                            |
-- ------------------------------------------------------------------------------
set termout off
set head off
set define off
spool collectAWRs.sql

SELECT
    'spool awr_XYZ_inst_1_'
    || t.si
    || '_'
    || t.se
    || '.text '
    || CHR(10)
    || 'SELECT * FROM TABLE(dbms_workload_repository.awr_report_text('
    || t.dbid
    || ','
    || t.instance_number
    || ','
    || t.si
    || ','
    || t.se
    || '));'
    || CHR(10)
    || ' spool off;'
FROM
    (SELECT
            dbid,
            snap_id si,
            snap_id + 1 se,
            instance_number
     FROM
            dba_hist_snapshot
     WHERE
            begin_interval_time >   TO_DATE('27102019 18:00:00', 'ddmmyyyy hh24:mi:ss')
      AND end_interval_time   <=  TO_DATE('27102019 19:30:00', 'ddmmyyyy hh24:mi:ss')
      AND  instance_number = 1
      ) t;
	
spool off;
-- +-------------------------------------+	
  prompt running collectAWRs.sql script |
-- +-------------------------------------+
@collectAWRs.sql

Depending on the number of snapshots found between the two entered dates the output of the script might look like the following:

spool awr_XYZ_inst_1_11923_11924.text                                                                                   
SELECT * FROM TABLE(dbms_workload_repository.awr_report_text(4209020461,1,11923,11924));                                
spool off;                                                                                                             
                                                                                                                        
spool awr_XYZ_inst_1_11922_11923.text                                                                                   
SELECT * FROM TABLE(dbms_workload_repository.awr_report_text(4209020461,1,11922,11923));                                
spool off;     

etc.…

When the script ends you can see such a list of AWR files

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
-a----       10/28/2019   2:45 PM         451751 awr_XYZ_inst_1_11946_11947.text
-a----       10/28/2019   2:46 PM         464089 awr_XYZ_inst_1_11952_11953.text
-a----       10/28/2019   2:47 PM         446377 awr_XYZ_inst_1_11945_11946.text
-a----       10/28/2019   2:48 PM         494261 awr_XYZ_inst_1_11949_11950.text
-a----       10/28/2019   2:49 PM         447109 awr_XYZ_inst_1_11954_11955.text
-a----       10/28/2019   2:50 PM         440269 awr_XYZ_inst_1_11956_11957.text
-a----       10/28/2019   2:51 PM         448697 awr_XYZ_inst_1_11955_11956.text
-a----       10/28/2019   2:52 PM         441979 awr_XYZ_inst_1_11942_11943.text
-a----       10/28/2019   2:53 PM         444789 awr_XYZ_inst_1_11943_11944.text
-a----       10/28/2019   2:54 PM         424755 awr_XYZ_inst_1_11947_11948.text
-a----       10/28/2019   2:55 PM         441857 awr_XYZ_inst_1_11951_11952.text
-a----       10/28/2019   2:55 PM         445155 awr_XYZ_inst_1_11944_11945.text
-a----       10/28/2019   2:56 PM         468731 awr_XYZ_inst_1_11948_11949.text
-a----       10/28/2019   2:57 PM         449185 awr_XYZ_inst_1_11950_11951.text
-a----       10/28/2019   2:58 PM         428541 awr_XYZ_inst_1_11953_11954.text
-a----       10/28/2019   2:59 PM         442711 awr_XYZ_inst_1_11957_11958.text

Top 10 Foreground Events

This list of AWR reports has been requested by the Oracle support so that tuning suggestions can be given to my client. One of the suggestions given by Oracle was that reducing the number of log file sync wait event will help improving the performance of the application. I was curious to know what in the AWR reports list makes Oracle support suggesting this log file sync issue. So, I decided to examine the 45 Top 10 Foreground Events to see how much the log file sync wait event participate to the % of the total DB time.

For that, I used the following simple PowerShell script to generate a single file containing only the Top 10 Foreground Events of the totality of the 45 AWR reports:

$SearchString = 'Top 10 Foreground Events by Total Wait Time'
$AllAwrs=Get-ChildItem -Recurse -Include *.text	
for ($i=0; $i -lt $AllAwrs.Count; $i++)
 {
	$result = Select-String $SearchString $AllAwrs[$i].FullName -Context 0, 14 | % {$_.Context.PostContext} 
    $result |  Out-File Top10ForegroundWaitevents.txt   -Append	
 }     	


And, finally, all what I have to do was to analyze the Top10ForegroundWaitevents.txt file

 Event                                 Waits Time Avg(ms)   time Wait Class 
------------------------------ ------------ ---- ------- ------ ---------- 
DB CPU                                      1820           55.2            
db file scattered read               44,406 1032      23   31.3 User I/O   
db file sequential read           2,346,969 735.       0   22.3 User I/O   
direct path write temp                4,085 50.1      12    1.5 User I/O   
log file sync                         3,688 18.3       5     .6 Commit     --> only 0.6%
direct path read temp                13,173 14.2       1     .4 User I/O   
enq: TX - row lock contention           133    1       7     .0 Applicatio 
control file sequential read         11,846   .6       0     .0 System I/O 
db file parallel read                    13   .6      43     .0 User I/O   
direct path read                      2,556   .5       0     .0 User I/O   
                                                                           
                                            Tota    Wait   % DB            
Event                                 Waits Time Avg(ms)   time Wait Class 
------------------------------ ------------ ---- ------- ------ ---------- 
db file scattered read              212,869 3439      16   94.1 User I/O   
DB CPU                                      422.           11.5            
control file sequential read         11,846   .5       0     .0 System I/O 
OJVM: Generic                           139   .5       3     .0 Other      
db file sequential read               5,270   .3       0     .0 User I/O   
log file sync                            67   .3       4     .0 Commit     --> almost 0.0% 
Disk file operations I/O                373    0       0     .0 User I/O   
direct path read                        108    0       0     .0 User I/O   
SQL*Net message to client             1,706    0       0     .0 Network    
direct path sync                          1    0       2     .0 User I/O   
                                                                           
                                            Tota    Wait   % DB            
Event                                 Waits Time Avg(ms)   time Wait Class 
------------------------------ ------------ ---- ------- ------ ---------- 
DB CPU                                      3918           91.8            
db file scattered read               11,553  235      20    5.5 User I/O   
db file sequential read              28,353 87.4       3    2.0 User I/O   
log file sync                         9,537   78       8    1.8 Commit     --> only 1.8% 
resmgr:become active                      4   10    2512     .2 Scheduler  
enq: TX - row lock contention           344  4.3      13     .1 Applicatio 
direct path read temp                 1,241    1       1     .0 User I/O   
direct path read                      6,324    1       0     .0 User I/O   
direct path write temp                  130   .8       6     .0 User I/O   
control file sequential read         11,871   .7       0     .0 System I/O

Etc.….

Bottom-line

In this very simple note I gave a SQL script to generate multiple AWR reports between two dates. I have also examined, via a PowerShell script, how to loop over the content of those AWR reports and extract only an AWR section into a single file for a better and clean analysis.

November 28, 2019

DOP Downgrade

Filed under: Oracle — hourim @ 5:53 pm

Suppose you are confronted to a performance issue due to an initially optimized parallel execution plan that has been 100% DOP downgraded. Suppose also that this DOP downgrade is due to an Oracle Resource Manager max DOP limit directive.
Would you be able to point out this DOP downgrade root cause using gv$sql_monitor view? Would you be able as well to say, a posteriori (using AWR tables), that your issue was due to ORM max DOP limit?

This blog post aims to answer the above two questions.

1.Replaying the issue using a reproducible model

While it is not so straightforward to link a real-life performance issue to a DOP downgrade due to a resource manager maxim allowed DOP, it is nevertheless, very easy to model and reproduce this case at will.

RDEV>  create table t1 as select a.* from dba_objects a,dba_objects where rownum<= 1e6;

RDEV2> alter table t1 parallel 4;

RDEV2> select count(1) from t1 where owner >= 'A';

Parallel Execution Details (DOP=4 , Servers Allocated=4)
===================================================================
|      Name      | Type  | Server# | Elapsed |   Cpu   |    IO    |
|                |       |         | Time(s) | Time(s) | Waits(s) |
===================================================================
| PX Coordinator | QC    |         |    0.12 |    0.01 |          |
| p000           | Set 1 |       1 |    0.20 |    0.03 |     0.15 |
| p001           | Set 1 |       2 |    0.20 |    0.03 |     0.15 |
| p002           | Set 1 |       3 |    0.20 |    0.04 |     0.15 |
| p003           | Set 1 |       4 |    0.20 |    0.04 |     0.12 |
===================================================================
-------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  |    TQ  |IN-OUT|
-------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |       |        |      |
|   1 |  SORT AGGREGATE        |          |     1 |        |      |
|   2 |   PX COORDINATOR       |          |       |        |      |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |  Q1,00 | P->S |
|   4 |     SORT AGGREGATE     |          |     1 |  Q1,00 | PCWP |
|   5 |      PX BLOCK ITERATOR |          |  1000K|  Q1,00 | PCWC |
|*  6 |       TABLE ACCESS FULL| T1       |  1000K|  Q1,00 | PCWP |
-------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)
       filter("OWNER">='A')

Note
-----
   - Degree of Parallelism is 4 because of table property
 

As you can see, the Degree Of Parallelism 4 has been honored and the SQL query has been executed using Parallelism.
However, if under the hood, I create an Oracle Resource Manager which will limit the maximum DOP allowed to be used by the current user (RDEV) then the same query will be DOP downgraded causing performance pain like
in one of my real-life case (a blog post will follow):

BEGIN
-- clear pending area
  DBMS_RESOURCE_MANAGER.clear_pending_area();
  DBMS_RESOURCE_MANAGER.create_pending_area();
  
-- create resource plan
  DBMS_RESOURCE_MANAGER.create_plan(
    plan    => 'DOPDOWNGRAD_RPL',
    comment => 'Resource Plan limiting DOP');
    
-- create consumer group
  DBMS_RESOURCE_MANAGER.create_consumer_group(
    consumer_group => 'DOPDONWGRAD_RSG',
    comment        => 'LOW Parallel Priority group');

-- assign the DOP limitation to the consumer group
  DBMS_RESOURCE_MANAGER.create_plan_directive (
    plan                     => 'DOPDOWNGRAD_RPL',
    group_or_subplan         => 'DOPDONWGRAD_RSG',
    comment                  => 'LOW PDOP Priority',    
    parallel_degree_limit_p1 => 1);

-- This is mandatory to avoid ORA-29377: consumer group OTHER_GROUPS is not part of top-plan DOPDOWNGRAD_RPL
  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    PLAN             => 'DOPDOWNGRAD_RPL',
    GROUP_OR_SUBPLAN => 'OTHER_GROUPS',
    MGMT_P1          => 10,
    COMMENT          => 'Directive for OTHER_GROUPS (mandatory)');
    
 -- validate the resource plan
  DBMS_RESOURCE_MANAGER.validate_pending_area;
  DBMS_RESOURCE_MANAGER.submit_pending_area();
END;
/

The preceding anonymous PL/SLQ block creates the DOPDOWNGRAD_RPL resource plan, attaches to it the DOPDONWGRAD_RSG consumer group and assigns a directive to this consumer group so that maximum DOP allowed by this consumer group would be 1.

To finish the setup, I will assign the user RDEV to this consumer group and set the DOPDOWNGRAD_RPL plan as the default resource plan:

-- Affect user RDEV to resource group
BEGIN
    DBMS_RESOURCE_MANAGER.CREATE_PENDING_AREA();
    DBMS_RESOURCE_MANAGER.SET_CONSUMER_GROUP_MAPPING(
       ATTRIBUTE      => DBMS_RESOURCE_MANAGER.ORACLE_USER,
       VALUE          => 'RDEV',
       CONSUMER_GROUP => 'DOPDONWGRAD_RSG');
	  
   DBMS_RESOURCE_MANAGER.VALIDATE_PENDING_AREA();
   DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA();
END;
/

We are now ready to run again the same query and check whether it has been executed parallelly or it has been pre-empted by the DOPDOWNGRAD_RPL resource plan to use parallelism:

RDEV> alter system set resource_manager_plan='DOPDOWNGRAD_RPL';

System altered.

RDEV> show parameter resource_manager_plan

NAME                    TYPE        VALUE
----------------------- ----------- ----------------
resource_manager_plan   string      DOPDOWNGRAD_RPL

Let’s execute the same query using RDEV user. Remember that this user belongs to an oracle resource manager group with a parallel degree limit directive as shown above:

SQL> SELECT /*+ monitor */ count(1) from t1 where owner >= 'A';

  COUNT(1)
----------
   1000000

RDEV> @xpsimp

SQL_ID  b60wdaqys7wf9, child number 0
-------------------------------------
SELECT /*+ monitor */ count(1) from t1 where owner >= 'A'

Plan hash value: 3110199320
---------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes |   TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |       |       |       |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |     4 |       |      |            |
|   2 |   PX COORDINATOR       |          |       |       |       |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |     4 | Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |     4 | Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          |  1000K|  3906K| Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL| T1       |  1000K|  3906K| Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)
       filter("OWNER">='A')

The absence of the Note line at the bottom of the execution plan represents, already, a first indication that the query has not been executed using parallelism. Indeed, this is confirmed via the corresponding SQL Monitoring report:

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)     
 Instance ID         :  1                   
 Session             :  RDEV (273:24189)    
 SQL ID              :  b60wdaqys7wf9       
 SQL Execution ID    :  16777216            
 Execution Started   :  11/07/2019 09:02:53 
 First Refresh Time  :  11/07/2019 09:02:53 
 Last Refresh Time   :  11/07/2019 09:02:53 
 Duration            :  .414398s            
 Module/Action       :  SQL*Plus/-          
 Service             :  diiqazbx            
 Program             :  sqlplus.exe         
 DOP Downgrade       :  100%        --> here            
 Fetch Calls         :  1   

The ACTIVE type of the same SQL Monitoring report shows the reason:

Other Plan Line Statistics

   Min DOP after downgrade 1
         downgrade reason  352

Thanks to Yasin Baskan, I knew that the description of the downgrade reason codes differs from release to release. We need to use the following query in the corresponding database (12cR2 in the current case) to get the actual significance of each code:

-- +-----------------------------------|
-- query supplied by Yasin Baskan
-- SQL> desc x$qksxa_reason
--		 Name           Type
--		 -------------- ---------------|
--		 ADDR           RAW(8)
--		 INDX           NUMBER
--		 INST_ID        NUMBER
--		 CON_ID         NUMBER
--		 QKSXAREASONS   VARCHAR2(255)
-- +-----------------------------------|
col qksxareasons format a100
col indx         format 999
SELECT
    indx
   ,qksxareasons
FROM
    x$qksxa_reason
WHERE
    qksxareasons like '%DOP downgrade%';   	


INDX QKSXAREASONS
---- -----------------------------------------------------
 351 DOP downgrade due to adaptive DOP
 352 DOP downgrade due to resource manager max DOP
 353 DOP downgrade due to insufficient number of processes
 354 DOP downgrade because slaves failed to join

We finally, have found the real root cause of the simulated performance issue: 352 DOP downgrade due to resource manager max DOP

2. Back to the aim of the article

The first questions that came to my mind when I have started troubleshooting the DOP Downgrade real-life case were:

• Can I find any Oracle Resource manager related information in the gv$sql_monitor view ?
• Can I find historical (AWR) execution statistics of any active Oracle Resource manager actions ?

2.1. v$sql_monitor

As per regards to the first question, hopefully the gv$sql_monitor view contains 4 columns which seem to be strictly related to the Oracle resource manager as shown below with column name starting by RM_:

SQL> @desc gv$sql_monitor

 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 INST_ID                                            NUMBER
 KEY                                                NUMBER
 REPORT_ID                                          NUMBER
 STATUS                                             VARCHAR2(19)
 USER#                                              NUMBER
 USERNAME                                           VARCHAR2(128)
../..
 RM_LAST_ACTION                                     VARCHAR2(48)
 RM_LAST_ACTION_REASON                              VARCHAR2(128)
 RM_LAST_ACTION_TIME                                DATE
 RM_CONSUMER_GROUP                                  VARCHAR2(128)

Let’s see then what Oracle has stored in those four RM_ columns:

RDEV> SELECT
 decode(process_name, 'ora', 'QC', process_name) as process_name
 ,rm_consumer_group
 ,rm_last_action
 ,rm_last_action_reason
 ,rm_last_action_time
     FROM
        gv$sql_monitor
    WHERE 
        sql_id = 'b60wdaqys7wf9';

PROCES RM_CONSUMER_GROUP RM_LAST_ACTION RM_LAST_ACTION_REASO RM_LAST_ACTION_TIME
------ ----------------- -------------- -------------------- --------------------
QC     DOPDONWGRAD_RSG   {null}               {null}               {null}

Unfortunately, it looks like only the column RM_CONSUMER_GROUP is filled up by Oracle. The remaining RM_ columns are always null.

However, it looks like that whenever the column RM_CONSUMER_GROUP contains a value which is not equal to OTHER_GROUPS (like in the current example where RM_CONSUMER_GROUP= DOPDONWGRAD_RSG), we can infer that the current monitored query
is subject to a resource manager directive which might influence its execution.

For example, let’s execute the same query when connected with a different user (RDEV2) and see what information will be stored in the gv$sql_monitor view:

RDEV2> SELECT /*+ monitor */ count(1) from t1 where owner >= 'A'

COUNT(1)
----------
   1000000

SQL_ID  b60wdaqys7wf9, child number 2
-------------------------------------
SELECT /*+ monitor */ count(1) from t1 where owner >= 'A'
Plan hash value: 3110199320
--------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |       |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |        |      |            |
|   2 |   PX COORDINATOR       |          |       |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          |  1000K|  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL| T1       |  1000K|  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)
       filter("OWNER">='A')

Note
-----
   - Degree of Parallelism is 4 because of table property

RDEV2> SELECT  
   username
  ,decode(process_name, 'ora', 'QC', process_name) as process_name
  ,rm_consumer_group
  ,rm_last_action
  ,rm_last_action_reason
  ,rm_last_action_time
      FROM
        gv$sql_monitor
      WHERE 
        sql_id = 'b60wdaqys7wf9'
order by last_refresh_time;

USERNAME  PROCES  RM_CONSUMER_GROUP  RM_LAST_ACTION  RM_LAST_ACTION_REASO  RM_LAST_ACTION_TIME
-------- -------- ----------------- ---------------- -------------------- --------------------
RDEV      QC     DOPDONWGRAD_RSG    {null}           {null}                {null}
RDEV2     QC     OTHER_GROUPS       {null}           {null}                {null}                 
{null}    p002   OTHER_GROUPS       {null}           {null}                {null}                 
{null}    p000   OTHER_GROUPS       {null}           {null}                {null}                 
{null}    p003   OTHER_GROUPS       {null}           {null}                {null}                 
{null}    p001   OTHER_GROUPS       {null}           {null}                {null}     

When the same query is executed by user RDEV2 there is no DOP downgrade because this user doesn’t belong to any user created resource manager group. There are two information backing this claim:

• User RDEV2 has OTHER_GROUPS as a default consumer group
• There is 4 PX server associated with the QC of user RDEV2

3.2. AWR and Oracle Resource manager
The second investigation was to see whether I can, a posteriori, diagnose that my query underwent a resource manager directive; particularly those concerning parallel degree limit. For that reason, I wrote the following script

SQL> @getConsGrpHist
SELECT
   begin_time
  ,execs 
  ,avg_etime
  ,avg_px
 -- ,trunc(avg_etime/decode(avg_px,0,1,avg_px) ,2) avg_px_time
  ,plan_name
  ,cpu_managed
  ,parallel   
FROM
   (SELECT 	   
        sn.begin_interval_time      begin_time
       ,st.executions_delta         execs 
       ,st.end_of_fetch_count_delta end_of_fetch 
       ,trunc(st.elapsed_time_delta/1e6/decode(st.executions_delta, 0, 1,st.executions_delta))   avg_etime
       ,round(st.px_servers_execs_delta/decode(st.executions_delta,0,1, st.executions_delta), 1) avg_px	  
       ,sr.plan_name
       ,sr.cpu_managed
       ,sr.parallel_execution_managed parallel    
    FROM 
         dba_hist_rsrc_plan sr
        ,dba_hist_snapshot  sn
        ,dba_hist_sqlstat   st
    WHERE
	sr.snap_id = sn.snap_id
    AND sn.snap_id = st.snap_id
    AND sn.begin_interval_time between to_date('&from_date', 'ddmmyyyy hh24:mi:ss')
				      and    to_date('&to_date'  , 'ddmmyyyy hh24:mi:ss')
    AND st.sql_id ='&sql_id'
    AND plan_name != 'DEFAULT_MAINTENANCE_PLAN'	
	) 
order by begin_time asc;      

SQL> @GetConsGrpHist
Enter value for from_date: 08112019 15:07:00
Enter value for to_date: 08112019 15:09:00
Enter value for sql_id: b60wdaqys7wf9

BEGIN_TIME                 EXECS  AVG_ETIME  AVG_PX PLAN_NAME       CPU_MANAGED PARALLEL
------------------------- ------ ---------- ------- --------------- ----------- ----------
08-NOV-19 03.07.08.369 PM      1          0       0 DOPDOWNGRAD_RPL ON          FULL
08-NOV-19 03.07.08.369 PM      1          0       0 DOPDOWNGRAD_RPL ON          FULL

Indeed, I can see that my query has used a non-default resource plan (DOPDOWNGRAD_RPL) in which a parallel statement queuing is enabled as indicated by the FULL value of the PARALLEL column.

Bottom-line

In this simple Note I outlined two main points:

• Show how to identify a DOP downgrade and its reason
• Show how to link a non-default Oracle resource manager to a SQL_ID by checking the value of the column RM_CONSUMER_GROUP from the gv$sql_monitor view
• Show that the remaining 3 resource manager columns (RM_LAST_ACTION, RM_LAST_ACTION_REASON, RM_LAST_ACTION_TIME) in the gv$sql_monitor view are, unfortunately, not maintained by Oracle
• Show how to look in AWR historical execution for queries subject to user defined Oracle Resource Manager directives

November 21, 2019

Incremental, histogram and fix control

Filed under: Statistics — hourim @ 5:35 pm

Due to my zeal to understand the curious comportment of the non-usage of histogram in a partitioned table under incremental mode, I decided to write this 4th part article in the series of blog posts on wrong cardinality estimation due to the non-utilization of histogram during query optimization:

  •  Part I where I wrongly put the blame of wrong cardinality estimation on corrupted histogram
  •  Part II where I examined via a real-life example the impact of this wrong cardinality estimation on the query performance
  •  Part III where I showed that, in fact, the wrong cardinality estimation is due to the incremental mode which, under certain circumstances, forces Oracle to ignore the usage of histogram

In this 4th Part, I would like to share with you a couple of ways you can use to make Oracle not ignoring the usage of histogram while keeping the incremental mode up (TRUE).

1. Fix Control 13583722

In Part III I mentioned that the dbms_stats trace file shows the presence of the 13583722 fix control

DBMS_STATS:          Need Actual Values (DSC_EAVS)                    
DBMS_STATS:          Partition: XZ_JOB_T_12                           
DBMS_STATS:          Histogram Type: HYBRID Data Type: 2              
DBMS_STATS:          Histogram Flags: 4 Histogram Gathering Flags: 10 
DBMS_STATS:          Incremental: TRUE Fix Control 13583722: 1        

The definition of this fix control is :

SQL> select  bugno, description from V$SYSTEM_FIX_CONTROL where bugno = '13583722';

     BUGNO DESCRIPTION
---------- ----------------------------------------------------------------
  13583722 preserve partition level histograms & actual values for incremen

Let’s see what happens if I disable this fix control?

But first, here’s below the cardinality estimation before modifying the fix control :

select /*+ full(a) */
         count(1)
     from
        XZ_JOB a
     where EMPLID>= 'XZ990501004'
     and   EMPLID<= 'XZ990562000'; 

  COUNT(1)
----------
     44724

Plan hash value: 2916242030
----------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Pstart| Pstop |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |       |       |
|   1 |  SORT AGGREGATE             |        |     1 |       |       |
|   2 |   PARTITION RANGE SINGLE    |        |     7 |    42 |    42 |
|*  3 |    TABLE ACCESS STORAGE FULL| XZ_JOB |     7 |    42 |    42 |
----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - storage("EMPLID">='XZ990501004' AND "EMPLID"<='XZ990562000')
       filter("EMPLID">='XZ990501004' AND "EMPLID"<='XZ990562000')

This wrong cardinality estimation of 7 instead of 44K, is due to Oracle ignoring the usage of histogram when visiting partition n°42

@getPartTab
compute sum label 'Total num_rows' of num_rows on report
break   on report
set verify off
select 
    table_name
   ,partition_name     pname
   ,partition_position ppos
   ,num_rows
from    
  (select
       table_name
      ,partition_name      
      ,partition_position  	
      ,num_rows	   
   from
      all_tab_partitions
  where 
    table_owner = upper('&owner')
  and
    table_name = upper('&table_name')
);

Enter value for owner: sysadm
Enter value for table_name: XZ_JOB

Enter value for owner: sysadm
Enter value for table_name: XZ_JOB

TABLE_NAME   PNAME             PPOS   NUM_ROWS 
------------ --------------- ------ ---------- 
XZ_JOB       XZ_JOB_T_1           1     721565 
XZ_JOB       XZ_JOB_T_2           2     213356 
XZ_JOB       XZ_JOB_T_3           3      35906 
../..
XZ_JOB       XZ_JOB_T_42         42      46919  -->
XZ_JOB       XZ_JOB_T_43         43      39403 
XZ_JOB       XZ_JOB_T_48         48     110980 
XZ_JOB       XZ_JOB_T_49         49    1342747 
                                      ----------
Total num_rows                         5118455

@PartHist
SQL> select
       partition_name
      ,num_distinct
      ,density
      ,histogram
       ,notes
    from
        all_part_col_statistics
    where owner    = 'SYSADM'
    and table_name = 'XZ_JOB'
    and column_name = 'EMPLID';

PARTITION_NAME  NUM_DISTINCT    DENSITY HISTOGRAM  NOTES
--------------- ------------ ---------- ---------- ---------------------
XZ_JOB_T_1             14954    ,000067 HYBRID
XZ_JOB_T_2              5141    ,000035 HYBRID
XZ_JOB_T_3              1059    ,000941 HYBRID     HIST_FOR_INCREM_STATS
../..
XZ_JOB_T_42             6372     ,00008 HYBRID     HIST_FOR_INCREM_STATS -->
XZ_JOB_T_43             6071    ,000165 HYBRID     HIST_FOR_INCREM_STATS
XZ_JOB_T_48            21812    ,000046 HYBRID     HIST_FOR_INCREM_STATS
XZ_JOB_T_49           243840    ,000004 HYBRID     HIST_FOR_INCREM_STATS

49 rows selected.

E-Rows = partition num_rows/ NDV = 46919/6372 = 7.36~7

And here’s the new cardinality estimation after the fix control change

SQL> alter session set "_fix_control" = '13583722:0';

SQL> exec dbms_stats.delete_table_stats
          (ownname=>'SYSADM',tabname=>'XZ_JOB',partname=>'XZ_JOB_T_42');

SQL> exec dbms_stats.gather_table_stats
        (ownname=>'SYSADM',tabname=>'XZ_JOB',          
         partname=>'XZ_JOB_T_42',granularity=>'PARTITION');

@PartHist
PARTITION_NAME  NUM_DISTINCT    DENSITY HISTOGRAM NOTES
--------------- ------------ ---------- --------- ----------------------
XZ_JOB_T_1             14954    ,000067 HYBRID
XZ_JOB_T_2              5141    ,000035 HYBRID
XZ_JOB_T_3              1059    ,000941 HYBRID    HIST_FOR_INCREM_STATS
../..
XZ_JOB_T_42             6372    ,000082 HYBRID
XZ_JOB_T_43             6071    ,000165 HYBRID    HIST_FOR_INCREM_STATS
XZ_JOB_T_48            21812    ,000046 HYBRID    HIST_FOR_INCREM_STATS
XZ_JOB_T_49           243840    ,000004 HYBRID

49 rows selected.

select /*+ full(a) */
         count(1)
     from
        XZ_JOB a
     where EMPLID>= 'XZ990501004'
     and   EMPLID<= 'XZ990562000'
  COUNT(1)
----------
     44724

----------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Pstart| Pstop |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |       |       |       |
|   1 |  SORT AGGREGATE             |        |     1 |       |       |
|   2 |   PARTITION RANGE SINGLE    |        | 44649 |    42 |    42 |
|*  3 |    TABLE ACCESS STORAGE FULL| XZ_JOB | 44649 |    42 |    42 |
----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - storage(("EMPLID">='XZ990501004' AND "EMPLID"<='XZ990562000'))
       filter(("EMPLID">='XZ990501004' AND "EMPLID"<='XZ990562000'))

As you can see the fix control 13583722 has made Oracle not ignoring the usage of histogram when visiting partition n°42 (Notes column is null) which explains why the new cardinality estimation is excellent.

2. Fix Control 16726844

As far as fix control goes I decided to check whether I can find any other fix control that might be related to this incremental-histogram issue:

SQL> select bugno, value, description
     from v$system_fix_control
     where description like '%incremen%';

     BUGNO      VALUE DESCRIPTION
---------- ---------- ----------------------------------------------------------------
  10175079          1 increment kafcoref to simulate obsolete fropqop list
  13583722          1 preserve partition level histograms & actual values for incremen
  16726844          1 consider histogram request in incremental stats gathering

The fix control 16726844 seems to be also related to the usage of histogram when incremental mode is set to TRUE. Let’s change its value and check again (before changing the fix control I have of course reproduced the wrong cardinality estimation of 7):

SQL> alter session set "_fix_control" = '16726844:0';

SQL> exec dbms_stats.delete_table_stats(ownname=>'SYSADM',tabname=>'XZ_JOB',partname=>'XZ_JOB_T_42');

SQL> exec dbms_stats.gather_table_stats(ownname=>'SYSADM',tabname=>'XZ_JOB', partname=>'XZ_JOB_T_42',granularity=>'PARTITION');

----------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Pstart| Pstop |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |       |       |       |
|   1 |  SORT AGGREGATE             |        |     1 |       |       |
|   2 |   PARTITION RANGE SINGLE    |        | 44598 |    42 |    42 |
|*  3 |    TABLE ACCESS STORAGE FULL| XZ_JOB | 44598 |    42 |    42 |
----------------------------------------------------------------------

Predicate Information (identified by operation id):
--------------------------------------------------
   3 - storage(("EMPLID">='XZ990501004' AND "EMPLID"<='XZ990562000'))
       filter(("EMPLID">='XZ990501004' AND "EMPLID"<='XZ990562000')))

Indeed, this fix control has also made Oracle using the histogram when visiting partition n°42.

3. For columns size 254 EMPLID

The last option I have checked and used to workaround this issue of not using histogram in presence of incremental mode is to force histograms to be kept by manually specifying bucket size 254:

  method_opt => 'FOR ALL COLUMNS SIZE AUTO, FOR COLUMNS SIZE 254 EMPLID'
SQL> exec dbms_stats.delete_table_stats(ownname=>'SYSADM',tabname=>'XZ_JOB');

SQL> BEGIN 
        dbms_stats.gather_table_stats
        (ownname=>'SYSADM',tabname=>'XZ_JOB'
         ,method_opt => 'for all columns size auto, for columns size 254 emplid'
	 ,degree => 4);
     END;
     /

SQL> @GlobHist

TABLE_NAME   COLUMN_NAME   HISTOGRAM       NOTES
------------ ------------- --------------- ------------
XZ_JOB       EMPLID        HYBRID          INCREMENTAL

SQL> @PartHist

PARTITION_NAME  NUM_DISTINCT    DENSITY HISTOGRAM       NOTES
--------------- ------------ ---------- --------------- -----------
XZ_JOB_T_1             14954    ,000067 HYBRID
XZ_JOB_T_10             1232    ,000805 HYBRID
XZ_JOB_T_11             1191    ,000832 HYBRID
XZ_JOB_T_12             1153    ,000861 HYBRID
XZ_JOB_T_13             1187    ,000837 HYBRID
XZ_JOB_T_14             1390    ,000703 HYBRID
XZ_JOB_T_15             1733    ,000555 HYBRID
XZ_JOB_T_16             1950    ,000476 HYBRID
XZ_JOB_T_17             1842    ,000512 HYBRID
XZ_JOB_T_18             1584    ,000611 HYBRID
XZ_JOB_T_19             1714     ,00056 HYBRID
XZ_JOB_T_2              5141    ,000031 HYBRID
XZ_JOB_T_20             1410    ,000695 HYBRID
XZ_JOB_T_21             1460    ,000671 HYBRID
XZ_JOB_T_22             1899    ,000493 HYBRID
XZ_JOB_T_23             1916    ,000489 HYBRID
XZ_JOB_T_24             1568    ,000618 HYBRID
XZ_JOB_T_25             1550    ,000627 HYBRID
XZ_JOB_T_26             1697    ,000568 HYBRID
XZ_JOB_T_27             1637    ,000593 HYBRID
XZ_JOB_T_28             1528    ,000639 HYBRID
XZ_JOB_T_29             1629    ,000595 HYBRID
XZ_JOB_T_3              1059    ,000941 HYBRID
XZ_JOB_T_30             1479    ,000662 HYBRID
XZ_JOB_T_31             1480    ,000661 HYBRID
XZ_JOB_T_32             1839    ,000514 HYBRID
XZ_JOB_T_33             3083    ,000319 HYBRID
XZ_JOB_T_34             7756    ,000028 HYBRID
XZ_JOB_T_35             2789    ,000359 HYBRID
XZ_JOB_T_36             3669    ,000273 HYBRID
XZ_JOB_T_37             7947    ,000071 HYBRID
XZ_JOB_T_38             3277    ,000305 HYBRID
XZ_JOB_T_39             8466    ,000118 HYBRID
XZ_JOB_T_4               960     ,00104 HYBRID
XZ_JOB_T_40             7836    ,000044 HYBRID
XZ_JOB_T_41             4873    ,000162 HYBRID
XZ_JOB_T_42             6372    ,000077 HYBRID
XZ_JOB_T_43             6071    ,000165 HYBRID
XZ_JOB_T_44             9642    ,000104 HYBRID
XZ_JOB_T_45             9121     ,00011 HYBRID
XZ_JOB_T_46             8028    ,000125 HYBRID
XZ_JOB_T_47             5635    ,000177 HYBRID
XZ_JOB_T_48            21812    ,000046 HYBRID
XZ_JOB_T_49           243840    ,000004 HYBRID
XZ_JOB_T_5              1764    ,000541 HYBRID
XZ_JOB_T_6              1379    ,000712 HYBRID
XZ_JOB_T_7              1050    ,000949 HYBRID
XZ_JOB_T_8              1016    ,000982 HYBRID
XZ_JOB_T_9              1032    ,000965 HYBRID
                
49 rows selected.
SQL> select /*+ full(a) */
  2           count(1)
  3       FROM
  4          XZ_JOB a
  5       where EMPLID>= 'XZ990501004'
  6       and   EMPLID<= 'XZ990562000'; 
  COUNT(1)
----------
     44724

----------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Pstart| Pstop |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |       |       |       |
|   1 |  SORT AGGREGATE             |        |     1 |       |       |
|   2 |   PARTITION RANGE SINGLE    |        | 44645 |    42 |    42 |
|*  3 |    TABLE ACCESS STORAGE FULL| XZ_JOB | 44645 |    42 |    42 |
----------------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
      DB_VERSION('12.1.0.2')
      OPT_PARAM('_optimizer_extended_cursor_sharing' 'none')
      OPT_PARAM('_gby_hash_aggregation_enabled' 'false')
      OPT_PARAM('_optimizer_extended_cursor_sharing_rel' 'none')
      OPT_PARAM('_optimizer_use_feedback' 'false')
      OPT_PARAM('_fix_control' '18302923:1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "A"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - storage(("EMPLID">='XZ990501004' AND "EMPLID"<='XZ990562000'))
       filter(("EMPLID">='XZ990501004' AND "EMPLID"<='XZ990562000'))

Again, changing the method_opt parameter value so that histogram on the EMPLID column is forced has made Oracle not only create the HYBRID histogram but to use them during query optimization even when the incremental mode is set to TRUE.

4. Bottom-line

Understandably, when incremental mode is set to TRUE Oracle might ignore the usage of histogram during query optimization. To avoid wrong cardinality estimation in such a situation you can use one of the following fixes:

  •  Set the preference incremental mode to FALSE (at table or global level)
  •  Unset the fix control 13583722
  •  Unset the fix control 16726844
  •  Force Oracle to create histogram on the desired column by manually specifying the bucket size 254:
     method_opt => 'for all columns size auto, for columns size 254 COL1' 
Next Page »

Create a free website or blog at WordPress.com.

Tony's Oracle Tips

Tony Hasler's light hearted approach to learning about Oracle

Richard Foote's Oracle Blog

Focusing Specifically On Oracle Indexes, Database Administration and Some Great Music

Hatem Mahmoud Oracle's blog

Just another Oracle blog : Database topics and techniques

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Diagnostician

Performance troubleshooting as exact science

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)