Mohamed Houri’s Oracle Notes

September 21, 2020

OR-Expansion: bypassed – query block has subquery correlated to non-parent

Filed under: Oracle — hourim @ 6:04 pm

It looks like Oracle 19c release comes up with a new heuristic impeachment for the OR-expansion transformation:

  ORE: bypassed - query block has subquery correlated to non-parent.

Let’s see this in action via a reproducible model based on a real-life query which completes instantaneously in 12cR2 but needs 2 minutes when ran in 19c

1. Reproducible model

First, the three tables approaching as much as possible the real-life case.

SQL> select banner_full from v$version;

BANNER_FULL
------------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.3.0.0.0 
 create table t1 as 
    select rownum id
          , trunc((rownum -1/3)) n2
          , trunc(rownum -1/5) n3
          , mod(rownum,10) n4
          , lpad('x',10) vc 
    from dual 
connect by level <=1e3;

create table t2 as 
     select rownum id
          , trunc((rownum -1/6)) n2
          , trunc(rownum -1/8) n3
          , mod(rownum,5) n4
          , lpad('z',10) vc 
      from dual 
      connect by level <=1e3;
      
create table t3 as 
       select rownum id
            , trunc(sysdate + rownum -1/3) d1
            , trunc(dbms_random.value(1,1000)) r 
        from dual 
        connect by level <=20;

And then the query

explain plan for
  select /*+ or_expand */
      t1.id
	 ,t1.n2
	 ,t2.vc
  from
     t1, t2
  where
     t1.id = t2.id
  and trim(t1.vc) = 'x'
  or t1.n3 = (select
                  t3.r
              from t3
              where d1 = trunc(sysdate +1)
              )
  and t1.n4 = (select
                   max(n4)
               from t1 e
               where e.n4 = t1.n3
               and e.n2 = (select 
                               max(f.n2)
                           from t1 f
                           where f.n3 = t1.n3 -- this is the root cause of the or expansion bypass
						   )
			   );

As you can see this query is already in a DNF (Disjunctive Normal Form) since it has two distinct ORed conjuncts as the following shows:

  select 
      {list of columns}
  from
     t1, t2
  where
     t1.id = t2.id
  and 
    (trim(t1.vc) = 'x' –- conjunct n°1
  or 
     t1.n3 = (select   –- conjunct n°2
                  t3.r
              from t3
              where d1 = trunc(sysdate +1)
              )    
     )
  and t1.n4 = (select
                   max(n4)
               from t1 e
               where e.n4 = t1.n3
               and e.n2 = (select 
                               max(f.n2)
                           from t1 f
                           where f.n3 = t1.n3)
			   );

For such kind of queries, Oracle doesn’t need to have a DNF beforehand transformation prior to envisage the OR- expansion.

Now, here’re below, respectively, the 12cR2 and 19c execution plan of the above query

-------------------------------------------------------------------------------------------
| Id  | Operation               | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                 |    11 |   363 |  3024   (1)| 00:00:01 |
|   1 |  VIEW                   | VW_ORE_F79C84EE |    11 |   363 |  3024   (1)| 00:00:01 |
|   2 |   UNION-ALL             |                 |       |       |            |          |
|*  3 |    FILTER               |                 |       |       |            |          |
|   4 |     MERGE JOIN CARTESIAN|                 |  1000 | 26000 |     6   (0)| 00:00:01 |
|*  5 |      TABLE ACCESS FULL  | T1              |     1 |    15 |     3   (0)| 00:00:01 |
|*  6 |       TABLE ACCESS FULL | T3              |     1 |    12 |     2   (0)| 00:00:01 |
|   7 |      BUFFER SORT        |                 |  1000 | 11000 |     3   (0)| 00:00:01 |
|   8 |       TABLE ACCESS FULL | T2              |  1000 | 11000 |     3   (0)| 00:00:01 |
|   9 |     SORT AGGREGATE      |                 |     1 |     7 |            |          |
|* 10 |      FILTER             |                 |       |       |            |          |
|* 11 |       TABLE ACCESS FULL | T1              |   100 |   700 |     3   (0)| 00:00:01 |
|  12 |       SORT AGGREGATE    |                 |     1 |     8 |            |          |
|* 13 |        TABLE ACCESS FULL| T1              |     1 |     8 |     3   (0)| 00:00:01 |
|* 14 |    FILTER               |                 |       |       |            |          |
|* 15 |     HASH JOIN           |                 |    10 |   410 |     6   (0)| 00:00:01 |
|* 16 |      TABLE ACCESS FULL  | T1              |    10 |   260 |     3   (0)| 00:00:01 |
|  17 |      TABLE ACCESS FULL  | T2              |  1000 | 15000 |     3   (0)| 00:00:01 |
|* 18 |     TABLE ACCESS FULL   | T3              |     1 |    12 |     2   (0)| 00:00:01 |
|  19 |     SORT AGGREGATE      |                 |     1 |     7 |            |          |
|* 20 |      FILTER             |                 |       |       |            |          |
|* 21 |       TABLE ACCESS FULL | T1              |   100 |   700 |     3   (0)| 00:00:01 |
|  22 |       SORT AGGREGATE    |                 |     1 |     8 |            |          |
|* 23 |        TABLE ACCESS FULL| T1              |     1 |     8 |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("T1"."N4"= (SELECT MAX("N4") FROM "T1" "E" WHERE "E"."N2"= (SELECT
              MAX("F"."N2") FROM "T1" "F" WHERE "F"."N3"=:B1) AND "E"."N4"=:B2))
   5 - filter("T1"."N3"= (SELECT "T3"."R" FROM "T3" "T3" WHERE
              "D1"=TRUNC(SYSDATE@!+1)))
   6 - filter("D1"=TRUNC(SYSDATE@!+1))
  10 - filter("E"."N2"= (SELECT MAX("F"."N2") FROM "T1" "F" WHERE "F"."N3"=:B1))
  11 - filter("E"."N4"=:B1)
  13 - filter("F"."N3"=:B1)
  14 - filter(LNNVL("T1"."N3"= (SELECT "T3"."R" FROM "T3" "T3" WHERE
              "D1"=TRUNC(SYSDATE@!+1))) OR LNNVL("T1"."N4"= (SELECT MAX("N4") FROM "T1" "E"
              WHERE "E"."N2"= (SELECT MAX("F"."N2") FROM "T1" "F" WHERE "F"."N3"=:B1) AND
              "E"."N4"=:B2)))
  15 - access("T1"."ID"="T2"."ID")
  16 - filter(TRIM("T1"."VC")='x')
  18 - filter("D1"=TRUNC(SYSDATE@!+1))
  20 - filter("E"."N2"= (SELECT MAX("F"."N2") FROM "T1" "F" WHERE "F"."N3"=:B1))
  21 - filter("E"."N4"=:B1)
  23 - filter("F"."N3"=:B1)
------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |   110 |  4510 |  2201   (2)| 00:00:01 |
|*  1 |  FILTER               |      |       |       |            |          |
|   2 |   MERGE JOIN CARTESIAN|      |  1000K|    39M|  2201   (2)| 00:00:01 |
|   3 |    TABLE ACCESS FULL  | T1   |  1000 | 26000 |     4   (0)| 00:00:01 |
|   4 |    BUFFER SORT        |      |  1000 | 15000 |  2197   (2)| 00:00:01 |
|   5 |     TABLE ACCESS FULL | T2   |  1000 | 15000 |     2   (0)| 00:00:01 |
|*  6 |   TABLE ACCESS FULL   | T3   |     1 |    12 |     3   (0)| 00:00:01 |
|   7 |   SORT AGGREGATE      |      |     1 |     7 |            |          |
|*  8 |    FILTER             |      |       |       |            |          |
|*  9 |     TABLE ACCESS FULL | T1   |   100 |   700 |     4   (0)| 00:00:01 |
|  10 |     SORT AGGREGATE    |      |     1 |     8 |            |          |
|* 11 |      TABLE ACCESS FULL| T1   |     1 |     8 |     4   (0)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("T1"."ID"="T2"."ID" AND TRIM("T1"."VC")='x' OR "T1"."N3"=
              (SELECT "T3"."R" FROM "T3" "T3" WHERE "D1"=TRUNC(SYSDATE@!+1)) AND
              "T1"."N4"= (SELECT MAX("N4") FROM "T1" "E" WHERE "E"."N2"= (SELECT
              MAX("F"."N2") FROM "T1" "F" WHERE "F"."N3"=:B1) AND "E"."N4"=:B2))
   6 - filter("D1"=TRUNC(SYSDATE@!+1))
   8 - filter("E"."N2"= (SELECT MAX("F"."N2") FROM "T1" "F" WHERE
              "F"."N3"=:B1))
   9 - filter("E"."N4"=:B1)
  11 - filter("F"."N3"=:B1)

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------

   1 -  SEL$1
         U -  or_expand 

Notice how the OR-expansion has been used in 12cR2 and ignored in 19c.

The impeachment reason recorded in the 10053 trace file is

 ORE: bypassed - query block has subquery correlated to non-parent. 

It appears 6 times in the trace file


ORE: Predicate chain after QB validity check - SEL$4
"F"."N3"="T1"."N3"
ORE: bypassed - query block has subquery correlated to non-parent

ORE: Predicate chain after QB validity check - SEL$3
"E"."N4"="T1"."N3" AND "E"."N2"= (SELECT MAX("F"."N2") FROM "T1" "F")
ORE: bypassed - query block has subquery correlated to non-parent.

ORE: Predicate chain after QB validity check - SEL$1
"T1"."ID"="T2"."ID" AND TRIM("T1"."VC")='x' OR "T1"."N3"= (SELECT "T3"."R" FROM "T3" "T3") 
AND "T1"."N4"= (SELECT MAX("E"."N4") FROM "T1" "E")
ORE: bypassed - query block has subquery correlated to non-parent.

ORE: Predicate chain after QB validity check - SEL$4
"F"."N3"=:B1
ORE: bypassed - query block has subquery correlated to non-parent.


ORE: Predicate chain after QB validity check - SEL$3
"E"."N4"=:B1 AND "E"."N2"= (SELECT MAX("F"."N2") FROM "T1" "F")
ORE: bypassed - query block has subquery correlated to non-parent.

ORE: Predicate chain after QB validity check - SEL$1
"T1"."ID"="T2"."ID" AND TRIM("T1"."VC")='x' OR "T1"."N3"= (SELECT "T3"."R" FROM "T3" "T3") 
AND "T1"."N4"= (SELECT MAX("E"."N4") FROM "T1" "E")
ORE: bypassed - query block has subquery correlated to non-parent.

But it is fairly likely that the impeachment reason is due to this


ORE: Predicate chain after QB validity check - SEL$4
"F"."N3"="T1"."N3"
ORE: bypassed - query block has subquery correlated to non-parent

Because if I comment this part of the query then the OR expansion will be used


select /*+ or_expand */
      t1.id
     ,t1.n2
     ,t2.vc
  from
     t1, t2
  where
     t1.id = t2.id
  and trim(t1.vc) = 'x'
  or t1.n3 = (select
                  t3.r
              from t3
              where d1 = trunc(sysdate +1)
              )
  and t1.n4 = (select
                   max(n4)
               from t1 e
               where e.n4 = t1.n3
               and e.n2 = (select
                               max(f.n2)
                           from t1 f
                           --where f.n3 = t1.n3
                           )
               ); 

---------------------------------------------------------------------------------------------
| Id  | Operation                 | Name            | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                 |       |       |    31 (100)|          |
|   1 |  VIEW                     | VW_ORE_F79C84EE |  1010 | 33330 |    31   (4)| 00:00:01 |
|   2 |   UNION-ALL               |                 |       |       |            |          |
|   3 |    MERGE JOIN CARTESIAN   |                 |  1000 | 52000 |    17   (6)| 00:00:01 |
|*  4 |     HASH JOIN             |                 |     1 |    41 |    13   (8)| 00:00:01 |
|   5 |      VIEW                 | VW_SQ_1         |     1 |    26 |     9  (12)| 00:00:01 |
|   6 |       HASH GROUP BY       |                 |     1 |     7 |     9  (12)| 00:00:01 |
|*  7 |        TABLE ACCESS FULL  | T1              |     1 |     7 |     4   (0)| 00:00:01 |
|   8 |         SORT AGGREGATE    |                 |     1 |     4 |            |          |
|   9 |          TABLE ACCESS FULL| T1              |  1000 |  4000 |     4   (0)| 00:00:01 |
|* 10 |      TABLE ACCESS FULL    | T1              |     1 |    15 |     4   (0)| 00:00:01 |
|* 11 |       TABLE ACCESS FULL   | T3              |     1 |    12 |     3   (0)| 00:00:01 |
|  12 |     BUFFER SORT           |                 |  1000 | 11000 |    13   (8)| 00:00:01 |
|  13 |      TABLE ACCESS FULL    | T2              |  1000 | 11000 |     4   (0)| 00:00:01 |
|* 14 |    FILTER                 |                 |       |       |            |          |
|* 15 |     HASH JOIN             |                 |    10 |   410 |     8   (0)| 00:00:01 |
|* 16 |      TABLE ACCESS FULL    | T1              |    10 |   260 |     4   (0)| 00:00:01 |
|  17 |      TABLE ACCESS FULL    | T2              |  1000 | 15000 |     4   (0)| 00:00:01 |
|* 18 |     TABLE ACCESS FULL     | T3              |     1 |    12 |     3   (0)| 00:00:01 |
|  19 |     SORT AGGREGATE        |                 |     1 |     7 |            |          |
|* 20 |      TABLE ACCESS FULL    | T1              |     1 |     7 |     4   (0)| 00:00:01 |
|  21 |       SORT AGGREGATE      |                 |     1 |     4 |            |          |
|  22 |        TABLE ACCESS FULL  | T1              |  1000 |  4000 |     4   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("T1"."N4"="MAX(N4)" AND "ITEM_1"="T1"."N3")
   7 - filter("E"."N2"=)
  10 - filter("T1"."N3"=)
  11 - filter("D1"=TRUNC(SYSDATE@!+1))
  14 - filter((LNNVL("T1"."N3"=) OR LNNVL("T1"."N4"=)))
  15 - access("T1"."ID"="T2"."ID")
  16 - filter(TRIM("T1"."VC")='x')
  18 - filter("D1"=TRUNC(SYSDATE@!+1))
  20 - filter(("E"."N4"=:B1 AND "E"."N2"=))

2. Work-around

In my 19c real life query which was taking 150 seconds because the OR-expansion has been refused, I’ve simply used the /*+ use_concat */ hint to get the 12cR2 execution time i.e few milliseconds:

select /*+ use_concat */
      t1.id
     ,t1.n2
     ,t2.vc
  from
     t1, t2
  where
     t1.id = t2.id
  and trim(t1.vc) = 'x'
  or t1.n3 = (select
                  t3.r
              from t3
              where d1 = trunc(sysdate +1)
              )
  and t1.n4 = (select
                   max(n4)
               from t1 e
               where e.n4 = t1.n3
               and e.n2 = (select
                               max(f.n2)
                           from t1 f
                           where f.n3 = t1.n3
                           )
               );
-------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |  1010 | 41410 |    30   (0)| 00:00:01 |
|   1 |  CONCATENATION         |      |       |       |            |          |
|*  2 |   FILTER               |      |       |       |            |          |
|   3 |    MERGE JOIN CARTESIAN|      |  1000 | 41000 |     8   (0)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1   |     1 |    26 |     4   (0)| 00:00:01 |
|*  5 |      TABLE ACCESS FULL | T3   |     1 |    12 |     3   (0)| 00:00:01 |
|   6 |     BUFFER SORT        |      |  1000 | 15000 |     4   (0)| 00:00:01 |
|   7 |      TABLE ACCESS FULL | T2   |  1000 | 15000 |     4   (0)| 00:00:01 |
|   8 |    SORT AGGREGATE      |      |     1 |     7 |            |          |
|*  9 |     FILTER             |      |       |       |            |          |
|* 10 |      TABLE ACCESS FULL | T1   |   100 |   700 |     4   (0)| 00:00:01 |
|  11 |      SORT AGGREGATE    |      |     1 |     8 |            |          |
|* 12 |       TABLE ACCESS FULL| T1   |     1 |     8 |     4   (0)| 00:00:01 |
|* 13 |   FILTER               |      |       |       |            |          |
|* 14 |    HASH JOIN           |      |    10 |   410 |     8   (0)| 00:00:01 |
|* 15 |     TABLE ACCESS FULL  | T1   |    10 |   260 |     4   (0)| 00:00:01 |
|  16 |     TABLE ACCESS FULL  | T2   |  1000 | 15000 |     4   (0)| 00:00:01 |
|* 17 |    TABLE ACCESS FULL   | T3   |     1 |    12 |     3   (0)| 00:00:01 |
|  18 |    SORT AGGREGATE      |      |     1 |     7 |            |          |
|* 19 |     FILTER             |      |       |       |            |          |
|* 20 |      TABLE ACCESS FULL | T1   |   100 |   700 |     4   (0)| 00:00:01 |
|  21 |      SORT AGGREGATE    |      |     1 |     8 |            |          |
|* 22 |       TABLE ACCESS FULL| T1   |     1 |     8 |     4   (0)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("T1"."N4"= (SELECT MAX("N4") FROM "T1" "E" WHERE
              "E"."N2"= (SELECT MAX("F"."N2") FROM "T1" "F" WHERE "F"."N3"=:B1) AND
              "E"."N4"=:B2))
   4 - filter("T1"."N3"= (SELECT "T3"."R" FROM "T3" "T3" WHERE
              "D1"=TRUNC(SYSDATE@!+1)))
   5 - filter("D1"=TRUNC(SYSDATE@!+1))
   9 - filter("E"."N2"= (SELECT MAX("F"."N2") FROM "T1" "F" WHERE
              "F"."N3"=:B1))
  10 - filter("E"."N4"=:B1)
  12 - filter("F"."N3"=:B1)
  13 - filter(LNNVL("T1"."N3"= (SELECT "T3"."R" FROM "T3" "T3" WHERE
              "D1"=TRUNC(SYSDATE@!+1))) OR LNNVL("T1"."N4"= (SELECT MAX("N4") FROM
              "T1" "E" WHERE "E"."N2"= (SELECT MAX("F"."N2") FROM "T1" "F" WHERE
              "F"."N3"=:B1) AND "E"."N4"=:B2)))
  14 - access("T1"."ID"="T2"."ID")
  15 - filter(TRIM("T1"."VC")='x')
  17 - filter("D1"=TRUNC(SYSDATE@!+1))
  19 - filter("E"."N2"= (SELECT MAX("F"."N2") FROM "T1" "F" WHERE
              "F"."N3"=:B1))
  20 - filter("E"."N4"=:B1)
  22 - filter("F"."N3"=:B1)

2. Conclusion

It looks like 19c has introduced a new heuristics or-expansion limit:

ORE: bypassed - query block has subquery correlated to non-parent

So, if you come to encounter it, then you know a couple of details on how to reproduce it when it happens, why it has been implemented, and how to work around it

August 19, 2020

A 2 min tuning case

Filed under: Oracle — hourim @ 1:20 pm

Here’s a practical performance troubleshooting case which took me a couple of minutes to fix but about one hour to write down 😊. It comes via an end user complaint about a load process (insert/select) triggered by Oracle Warehouse Builder(OWB) which was running since the last 3 days. When the on-call DBA handled this request to me, he said that he has already run a SQL tuning advisor and he was still unable to find the root cause of this performance degradation. Before I started the investigation I already knew what was happening in this case. Indeed, almost always, if a query lasts a couple of days without erroring out, then there is a big chance that a NESTED LOOPS is certainly for something in this delay. And, in this case, the CBO would have probably messed up its cardinality estimation by opting for the wrong JOIN method. That is why, based upon the DBA explanation, I have had, already, in my mind that this ETL load process is suffering from a lack of table/index/column statistics

Anyway, below is exactly the steps I followed to fix this issue

-- Get the SQL_ID of the running process
SQL> select sql_id, sql_exec_id from gv$sql_monitor where status = 'EXECUTING';

Hopefully, there was only a single SQL_ID in an EXECUTION phase.

-- Get the corresponding SQL Monitoring report
SQL> SELECT DBMS_SQLTUNE.report_sql_monitor(
  		sql_id       => ' bsgj02zjrfdbj',
  		type         => 'TEXT',
  		report_level => 'ALL') AS report
FROM dual;

Global Information
------------------------------
 Status                                 :  EXECUTING           
 Instance ID                            :  1                   
 Session                                :  XYZDATA (245:14813) 
 SQL ID                                 :  bsgj02zjrfdbj       
 SQL Execution ID                       :  16777227            
 Execution Started                      :  10/31/2019 14:26:21 
 First Refresh Time                     :  10/31/2019 14:26:33 
 Last Refresh Time                      :  11/04/2019 10:56:29 
 Duration                               :  333009s             
 Module/Action                          :  CCC_CCS/EXECUTION   
 Service                                :  MYAPL_1             
 Program                                :  JDBC Thin Client    
 PLSQL Entry Ids (Object/Subprogram)    :  10328167,7          
 PLSQL Current Ids (Object/Subprogram)  :  10328167,7          

Global Stats
======================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   |  Gets  | Reqs | Bytes |
======================================================================
|  553389 |  329894 |   223477 |          18 |     4G |   4G | 104TB |
======================================================================

SQL Plan Monitoring Details (Plan Hash Value=2281702856)

================================================================================================
| Id   |               Operation                |      Name       |  Rows   | Execs |   Rows   |
|      |                                        |                 | (Estim) |       | (Actual) |
================================================================================================
|    0 | INSERT STATEMENT                       |                 |         |     1 |          |
|    1 |   LOAD TABLE CONVENTIONAL              |                 |         |     1 |        0 |
|      |                                        |                 |         |       |          |
|    2 |    NESTED LOOPS OUTER                  |                 |      2M |     1 |    19932 |
|    3 |     PARTITION LIST OR                  |                 |      2M |     1 |    19932 |
| -> 4 |      TABLE ACCESS FULL                 | TT_XYZ_BAE      |      2M |     1 |    19932 |
|    5 |     PARTITION RANGE ITERATOR           |                 |       1 | 19932 |    19932 |
|    6 |      TABLE ACCESS BY LOCAL INDEX ROWID | T1_TAB          |       1 | 19932 |    19932 |
|      |                                        |                 |         |       |          |
|      |                                        |                 |         |       |          |
| -> 7 |       BITMAP CONVERSION TO ROWIDS      |                 |         | 19932 |       4G |
| -> 8 |        BITMAP INDEX SINGLE VALUE       | XY99_T1_TAB_BRI |         | 19932 |       1M |
|      |                                        |                 |         |       |          |
================================================================================================

As I’ve expected, there is a NESTED LOOPS operation in the mix(operation Id n°2). In addition, the cardinality estimation of 1 at line operation 6 is almost always an indication of a wrong estimation done by the CBO. Add to this the 4G rows generated when visiting the XY99_T1_TAB_BRI bitmap index and you end up by suspecting a statistic issue with the T1_TAB table.

So, the next immediate step I did was to check the statistics of the T1_TAB table and the XY99_T1_TAB_BRI bitmap index :

SQL> @GetPartTab
Enter value for owner: XYZDATA
Enter value for table_name: T1_TAB

TABLE_NAME   PNAME                PPOS   NUM_ROWS TABLESPACE_NAME  LAST_ANAL
------------ ------------------ ------ ---------- ---------------- -------------------
T1_TAB       P_T1_TAB_200001         1          0 XY99_TBS_DONN    26/10/2019 15:26:55
T1_TAB       P_T1_TAB_201708         2    3934618 XY99_TBS_DONN    26/10/2019 15:21:50
T1_TAB       P_T1_TAB_201709         3    3948128 XY99_TBS_DONN    26/10/2019 14:35:11
T1_TAB       P_T1_TAB_201710         4    4078620 XY99_TBS_DONN    26/10/2019 13:47:20
T1_TAB       P_T1_TAB_201711         5    4093099 XY99_TBS_DONN    26/10/2019 12:52:13
T1_TAB       P_T1_TAB_201712         6    4059354 XY99_TBS_DONN    26/10/2019 11:53:52
T1_TAB       P_T1_TAB_201801         7    4080096 XY99_TBS_DONN    26/10/2019 10:58:33
T1_TAB       P_T1_TAB_201802         8    4096904 XY99_TBS_DONN    26/10/2019 09:58:16
T1_TAB       P_T1_TAB_201803         9    4115548 XY99_TBS_DONN    26/10/2019 08:59:04
T1_TAB       P_T1_TAB_201804        10    4175894 XY99_TBS_DONN    26/10/2019 07:54:15
T1_TAB       P_T1_TAB_201805        11    4187923 XY99_TBS_DONN    26/10/2019 07:05:42
T1_TAB       P_T1_TAB_201806        12    4224300 XY99_TBS_DONN    26/10/2019 06:21:28
T1_TAB       P_T1_TAB_201807        13    4233116 XY99_TBS_DONN    26/10/2019 05:30:46
T1_TAB       P_T1_TAB_201808        14    4242821 XY99_TBS_DONN    26/10/2019 04:42:50
T1_TAB       P_T1_TAB_201809        15    4291935 XY99_TBS_DONN    26/10/2019 03:49:06
T1_TAB       P_T1_TAB_201810        16    2596486 XY99_TBS_DONN    26/10/2019 02:57:36
T1_TAB       P_T1_TAB_201811        17    4305042 XY99_TBS_DONN    26/10/2019 02:08:20
T1_TAB       P_T1_TAB_201812        18    4314987 XY99_TBS_DONN    26/10/2019 01:12:00
T1_TAB       P_T1_TAB_201901        19    4333586 XY99_TBS_DONN    26/10/2019 00:21:52
T1_TAB       P_T1_TAB_201902        20    4428940 XY99_TBS_DONN    25/10/2019 23:22:28
T1_TAB       P_T1_TAB_201903        21    4493339 XY99_TBS_DONN    25/10/2019 22:17:44
T1_TAB       P_T1_TAB_201904        22    4420400 XY99_TBS_DONN    25/10/2019 20:54:34
T1_TAB       P_T1_TAB_201905        23    4408412 XY99_TBS_DONN    25/10/2019 19:43:10
T1_TAB       P_T1_TAB_201906        24    4277786 XY99_TBS_DONN    25/10/2019 18:49:03
T1_TAB       P_T1_TAB_201907        25    4250320 XY99_TBS_DONN    25/10/2019 17:39:26
T1_TAB       P_T1_TAB_201908        26          0 XY99_TBS_DONN    25/10/2019 16:53:29  --> looks suspicious
                                       ----------
Total num_rows                         99591654

SQL> @GetPartInd
Enter value for owner: XYZDATA
Enter value for index_name: XY99_T1_TAB_BRI

INDEX_NAME        PNAME              PPOS   NUM_ROWS TABLESPACE_NAME LAST_ANAL
----------------- ---------------- ------ ---------- --------------- -------------------
XY99_T1_TAB_BRI   P_T1_TAB_200001       1          0 XY99_TBS_INDX   26/10/2019 15:26:56
XY99_T1_TAB_BRI   P_T1_TAB_201708       2         67 XY99_TBS_INDX   26/10/2019 15:22:13
XY99_T1_TAB_BRI   P_T1_TAB_201709       3         64 XY99_TBS_INDX   26/10/2019 14:35:34
XY99_T1_TAB_BRI   P_T1_TAB_201710       4         66 XY99_TBS_INDX   26/10/2019 13:47:40
XY99_T1_TAB_BRI   P_T1_TAB_201711       5         67 XY99_TBS_INDX   26/10/2019 12:52:32
XY99_T1_TAB_BRI   P_T1_TAB_201712       6         66 XY99_TBS_INDX   26/10/2019 11:54:15
XY99_T1_TAB_BRI   P_T1_TAB_201801       7         66 XY99_TBS_INDX   26/10/2019 10:58:57
XY99_T1_TAB_BRI   P_T1_TAB_201802       8         67 XY99_TBS_INDX   26/10/2019 09:58:44
XY99_T1_TAB_BRI   P_T1_TAB_201803       9         67 XY99_TBS_INDX   26/10/2019 08:59:31
XY99_T1_TAB_BRI   P_T1_TAB_201804      10         68 XY99_TBS_INDX   26/10/2019 07:54:36
XY99_T1_TAB_BRI   P_T1_TAB_201805      11         68 XY99_TBS_INDX   26/10/2019 07:06:05
XY99_T1_TAB_BRI   P_T1_TAB_201806      12         69 XY99_TBS_INDX   26/10/2019 06:21:52
XY99_T1_TAB_BRI   P_T1_TAB_201807      13         69 XY99_TBS_INDX   26/10/2019 05:31:09
XY99_T1_TAB_BRI   P_T1_TAB_201808      14         69 XY99_TBS_INDX   26/10/2019 04:43:13
XY99_T1_TAB_BRI   P_T1_TAB_201809      15         72 XY99_TBS_INDX   26/10/2019 03:49:29
XY99_T1_TAB_BRI   P_T1_TAB_201810      16        200 XY99_TBS_INDX   26/10/2019 02:57:51
XY99_T1_TAB_BRI   P_T1_TAB_201811      17         70 XY99_TBS_INDX   26/10/2019 02:08:46
XY99_T1_TAB_BRI   P_T1_TAB_201812      18         70 XY99_TBS_INDX   26/10/2019 01:12:25
XY99_T1_TAB_BRI   P_T1_TAB_201901      19         70 XY99_TBS_INDX   26/10/2019 00:22:16
XY99_T1_TAB_BRI   P_T1_TAB_201902      20         72 XY99_TBS_INDX   25/10/2019 23:22:55
XY99_T1_TAB_BRI   P_T1_TAB_201903      21         73 XY99_TBS_INDX   25/10/2019 22:18:13
XY99_T1_TAB_BRI   P_T1_TAB_201904      22         72 XY99_TBS_INDX   25/10/2019 20:55:06
XY99_T1_TAB_BRI   P_T1_TAB_201905      23         72 XY99_TBS_INDX   25/10/2019 19:43:34
XY99_T1_TAB_BRI   P_T1_TAB_201906      24         70 XY99_TBS_INDX   25/10/2019 18:49:27
XY99_T1_TAB_BRI   P_T1_TAB_201907      25         70 XY99_TBS_INDX   25/10/2019 17:39:51
XY99_T1_TAB_BRI   P_T1_TAB_201908      26          0 XY99_TBS_INDX   25/10/2019 16:53:31 --> looks suspicious
                                          ----------
Total num_rows                                  1784

The 26th partition looks suspicious. For the sake of simplicity and quick fix, I asked the end user whether their process deals with the last created partition? and the answer was YES.

Then, I killed the session, gathered the statistics, and asked the end-user to relaunch the process

SQL> @GetPartTab
Enter value for owner: XYZDATA
Enter value for table_name: T1_TAB

TABLE_NAME    PNAME               PPOS   NUM_ROWS TABLESPACE_NAME  LAST_ANAL
------------- ----------------- ------ ---------- ---------------- -------------------
T1_TAB        P_T1_TAB_200001        1          0 XY99_TBS_DONN    26/10/2019 15:26:55
T1_TAB        P_T1_TAB_201708        2    3934618 XY99_TBS_DONN    26/10/2019 15:21:50
T1_TAB        P_T1_TAB_201709        3    3948128 XY99_TBS_DONN    26/10/2019 14:35:11
T1_TAB        P_T1_TAB_201710        4    4078620 XY99_TBS_DONN    26/10/2019 13:47:20
T1_TAB        P_T1_TAB_201711        5    4093099 XY99_TBS_DONN    26/10/2019 12:52:13
T1_TAB        P_T1_TAB_201712        6    4059354 XY99_TBS_DONN    26/10/2019 11:53:52
T1_TAB        P_T1_TAB_201801        7    4080096 XY99_TBS_DONN    26/10/2019 10:58:33
T1_TAB        P_T1_TAB_201802        8    4096904 XY99_TBS_DONN    26/10/2019 09:58:16
T1_TAB        P_T1_TAB_201803        9    4115548 XY99_TBS_DONN    26/10/2019 08:59:04
T1_TAB        P_T1_TAB_201804       10    4175894 XY99_TBS_DONN    26/10/2019 07:54:15
T1_TAB        P_T1_TAB_201805       11    4187923 XY99_TBS_DONN    26/10/2019 07:05:42
T1_TAB        P_T1_TAB_201806       12    4224300 XY99_TBS_DONN    26/10/2019 06:21:28
T1_TAB        P_T1_TAB_201807       13    4233116 XY99_TBS_DONN    26/10/2019 05:30:46
T1_TAB        P_T1_TAB_201808       14    4242821 XY99_TBS_DONN    26/10/2019 04:42:50
T1_TAB        P_T1_TAB_201809       15    4291935 XY99_TBS_DONN    26/10/2019 03:49:06
T1_TAB        P_T1_TAB_201810       16    2596486 XY99_TBS_DONN    26/10/2019 02:57:36
T1_TAB        P_T1_TAB_201811       17    4305042 XY99_TBS_DONN    26/10/2019 02:08:20
T1_TAB        P_T1_TAB_201812       18    4314987 XY99_TBS_DONN    26/10/2019 01:12:00
T1_TAB        P_T1_TAB_201901       19    4333586 XY99_TBS_DONN    26/10/2019 00:21:52
T1_TAB        P_T1_TAB_201902       20    4428940 XY99_TBS_DONN    25/10/2019 23:22:28
T1_TAB        P_T1_TAB_201903       21    4493339 XY99_TBS_DONN    25/10/2019 22:17:44
T1_TAB        P_T1_TAB_201904       22    4420400 XY99_TBS_DONN    25/10/2019 20:54:34
T1_TAB        P_T1_TAB_201905       23    4408412 XY99_TBS_DONN    25/10/2019 19:43:10
T1_TAB        P_T1_TAB_201906       24    4277786 XY99_TBS_DONN    25/10/2019 18:49:03
T1_TAB        P_T1_TAB_201907       25    4250320 XY99_TBS_DONN    25/10/2019 17:39:26
T1_TAB        P_T1_TAB_201908       26    4185925 XY99_TBS_DONN    04/11/2019 18:51:48
                                       ----------                  
Total num_rows                         103777579                   


SQL> @getPartInd
Enter value for owner: XYZDATA
Enter value for index_name: XY99_T1_TAB_BRI

INDEX_NAME        PNAME               PPOS   NUM_ROWS TABLESPACE_NAME   LAST_ANAL
----------------- ----------------- ------ ---------- ----------------- -------------------
XY99_T1_TAB_BRI   P_T1_TAB_200001        1          0 XY99_TBS_INDX     26/10/2019 15:26:56
XY99_T1_TAB_BRI   P_T1_TAB_201708        2         67 XY99_TBS_INDX     26/10/2019 15:22:13
XY99_T1_TAB_BRI   P_T1_TAB_201709        3         64 XY99_TBS_INDX     26/10/2019 14:35:34
XY99_T1_TAB_BRI   P_T1_TAB_201710        4         66 XY99_TBS_INDX     26/10/2019 13:47:40
XY99_T1_TAB_BRI   P_T1_TAB_201711        5         67 XY99_TBS_INDX     26/10/2019 12:52:32
XY99_T1_TAB_BRI   P_T1_TAB_201712        6         66 XY99_TBS_INDX     26/10/2019 11:54:15
XY99_T1_TAB_BRI   P_T1_TAB_201801        7         66 XY99_TBS_INDX     26/10/2019 10:58:57
XY99_T1_TAB_BRI   P_T1_TAB_201802        8         67 XY99_TBS_INDX     26/10/2019 09:58:44
XY99_T1_TAB_BRI   P_T1_TAB_201803        9         67 XY99_TBS_INDX     26/10/2019 08:59:31
XY99_T1_TAB_BRI   P_T1_TAB_201804       10         68 XY99_TBS_INDX     26/10/2019 07:54:36
XY99_T1_TAB_BRI   P_T1_TAB_201805       11         68 XY99_TBS_INDX     26/10/2019 07:06:05
XY99_T1_TAB_BRI   P_T1_TAB_201806       12         69 XY99_TBS_INDX     26/10/2019 06:21:52
XY99_T1_TAB_BRI   P_T1_TAB_201807       13         69 XY99_TBS_INDX     26/10/2019 05:31:09
XY99_T1_TAB_BRI   P_T1_TAB_201808       14         69 XY99_TBS_INDX     26/10/2019 04:43:13
XY99_T1_TAB_BRI   P_T1_TAB_201809       15         72 XY99_TBS_INDX     26/10/2019 03:49:29
XY99_T1_TAB_BRI   P_T1_TAB_201810       16        200 XY99_TBS_INDX     26/10/2019 02:57:51
XY99_T1_TAB_BRI   P_T1_TAB_201811       17         70 XY99_TBS_INDX     26/10/2019 02:08:46
XY99_T1_TAB_BRI   P_T1_TAB_201812       18         70 XY99_TBS_INDX     26/10/2019 01:12:25
XY99_T1_TAB_BRI   P_T1_TAB_201901       19         70 XY99_TBS_INDX     26/10/2019 00:22:16
XY99_T1_TAB_BRI   P_T1_TAB_201902       20         72 XY99_TBS_INDX     25/10/2019 23:22:55
XY99_T1_TAB_BRI   P_T1_TAB_201903       21         73 XY99_TBS_INDX     25/10/2019 22:18:13
XY99_T1_TAB_BRI   P_T1_TAB_201904       22         72 XY99_TBS_INDX     25/10/2019 20:55:06
XY99_T1_TAB_BRI   P_T1_TAB_201905       23         72 XY99_TBS_INDX     25/10/2019 19:43:34
XY99_T1_TAB_BRI   P_T1_TAB_201906       24         70 XY99_TBS_INDX     25/10/2019 18:49:27
XY99_T1_TAB_BRI   P_T1_TAB_201907       25         70 XY99_TBS_INDX     25/10/2019 17:39:51
XY99_T1_TAB_BRI   P_T1_TAB_201908       26         68 XY99_TBS_INDX     04/11/2019 18:52:14
                                           ----------
Total num_rows                                   1852

When this insert/select has been relaunched it has completed in less than one hour as shown by the corresponding SQL monitoring report:

Global Information
------------------------------
 Status                                 :  EXECUTING           
 Instance ID                            :  1                   
 Session                                :  XYZDATA (245:14813) 
 SQL ID                                 :  bsgj02zjrfdbj       
 SQL Execution ID                       :  16777216           
 Execution Started                      :  11/05/2019 10:02:17
 First Refresh Time                     :  11/05/2019 10:02:24
 Last Refresh Time                      :  11/05/2019 11:00:03
 Duration                               :  3466s     -------------------> The new execution time        
 Module/Action                          :  CCC_CCS/EXECUTION   
 Service                                :  MYAPL_1             
 Program                                :  JDBC Thin Client    
 PLSQL Entry Ids (Object/Subprogram)    :  10328167,7          
 PLSQL Current Ids (Object/Subprogram)  :  10328167,7          

-- The new execution plan
Global Stats
======================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   |  Gets  | Reqs | Bytes |
======================================================================
|  3766   |  607    |   3158   |         0.00|      5M|  219K|  142GB|
======================================================================

SQL Plan Monitoring Details (Plan Hash Value=2281702856)
================================================================================================
| Id   |               Operation                |      Name       |  Rows   | Execs |   Rows   |
|      |                                        |                 | (Estim) |       | (Actual) |
================================================================================================
|    0 | INSERT STATEMENT                       |                 |         |     1 |          |
|    1 |   LOAD TABLE CONVENTIONAL              |                 |         |     1 |          |
|    2 |    HASH JOIN OUTER                     |                 |      2M |     1 |        0 |
|    3 |     PARTITION LIST OR                  |                 |      2M |     1 |     971K |
|    4 |      TABLE ACCESS FULL                 | TT_XYZ_BAE      |      2M |     2 |     971K |
|    5 |     PARTITION RANGE ITERATOR           |                 |     104M|     1 |     100M |
|    6 |      TABLE ACCESS FULL                 | T1_TAB          |     104M|    26 |     100M |
================================================================================================

Bottom Line

This simple note shows how diagnosing and fixing a real-life performance issue can, sometimes, be accomplished in a couple of minutes (in this case the time fixing equals to the duration of the call to dbms_stats package). It will not be always as obvious as the current case tends to indicate but, I can say that I’ve very often spent more time answering, explaining, and writing on how I have fixed a performance issue than the time it took me to diagnose and fix this performance pain.

August 9, 2020

SET operation to join

Filed under: Oracle — hourim @ 10:06 am

In this article about semi-join in modern relational databases I’ve shown that, neither Oracle nor Postgres, are able to transform a SET operation like INTERSECT or MINUS into a SEMI-JOIN. Jonathan Lewis has, however, explained in a comment to this article, that Oracle has been, in fact, able to handle conversions from INTERSECT (and minus) to semi join (and anti-join) since 10g. But the conversion is only possible by changing the default value of the _convert_set_to_join parameter from FALSE to TRUE. Starting from 19.1.0.1 the conversion is now done automatically because Oracle has finally decided to let this transformation happening by default as outlined by Tanel Poder cofep script

SQL> @cofep 12.2.0.1 19.1.0.1
Compare Optimizer_Features_Enable Parameter differences
for values 12.2.0.1 and 19.1.0.1

PARAMETER              '12.2.0.1'    '19.1.0.1'   DESCRIPTION
---------------------- ------------- ------------ --------------------------------------------
_convert_set_to_join   FALSE         TRUE         enables conversion of set operator to join

Here’s below a simple demonstration

create table t1
as 
select
   rownum n1
  ,mod(rownum,5) n2
  ,trunc( (rownum -1/3)) n3
from
  dual
connect by
  level <= 1e4;  
  
create table t2
as 
select
   rownum n1
  ,mod(rownum,3) n2
  ,trunc( (rownum -1/5)) n3
from
  dual
connect by
  level <= 5;    

exec dbms_stats.gather_table_stats(user, ‘t1’) ;
exec dbms_stats.gather_table_stats(user, ‘t2’) ;
SQL> show parameter optimizer_features

PARAMETER_NAME                  TYPE        VALUE
------------------------------- ----------- ---------
optimizer_features_enable       string      19.1.0

SQL> @pd _convert_set
Show all parameters and session values from x$ksppi/x$ksppcv...

       NUM N_HEX NAME                  VALUE    DESCRIPTION
---------- ----- --------------------- -------- ------------------------------------------
      3718   E86 _convert_set_to_join  FALSE    enables conversion of set operator to join

SQL> select t2.n1 from t2 intersect select t1.n1 from t1;

        N1
----------
         1
         2
         3
         4
         5

SQL> select * from table(dbms_xplan.display_cursor);

----------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |
----------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |
|   1 |  INTERSECTION       |      |       |       |
|   2 |   SORT UNIQUE       |      |     5 |    15 |
|   3 |    TABLE ACCESS FULL| T2   |     5 |    15 |
|   4 |   SORT UNIQUE       |      | 10000 | 40000 |
|   5 |    TABLE ACCESS FULL| T1   | 10000 | 40000 |
----------------------------------------------------

As you can see the intersect operation has not been transformed into a semi-join because the corresponding hidden parameter is by default set to FALSE.

But let’s now do the same experiment under the new optimizer 19.1.0.1 and check the difference:

SQL> alter session set optimizer_features_enable='19.1.0.1';

Session altered.

SQL> select t2.n1 from t2 intersect select t1.n1 from t1;

        N1
----------
         1
         2
         4
         5
         3

SQL> @xpsimp

SQL_ID  0atbzgvynmj51, child number 1
-------------------------------------
select t2.n1 from t2 intersect select t1.n1 from t1

Plan hash value: 3703458891
----------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes |
----------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |
|   1 |  HASH UNIQUE        |      |     5 |    35 |
|*  2 |   HASH JOIN SEMI    |      |     5 |    35 |
|   3 |    TABLE ACCESS FULL| T2   |     5 |    15 |
|   4 |    TABLE ACCESS FULL| T1   | 10000 | 40000 |
----------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(SYS_OP_MAP_NONNULL("T2"."N1")=SYS_OP_MAP_NONNULL("T1"."N1"))

Spot now how the INTERSECT set operation has been automatically transformed into a HASH JOIN SEMI operation. This becomes possible by default because, starting from optimier_features_enable 19.1.0.1, the default value of the hidden parameter _convert_set_to_join is set to TRUE.

While I was trying to print out the execution plan non-sharing reason I spotted out something interesting:

SQL> @nonshared 0atbzgvynmj51
Show why existing SQL child cursors were not reused (V$SQL_SHARED_CURSOR)...

SQL_ID                        : 0atbzgvynmj51
ADDRESS                       : 00007FFB65E682E0
CHILD_ADDRESS                 : 00007FFB65E65D80
CHILD_NUMBER                  : 0
HASH_MATCH_FAILED             : Y
REASON                        : <ChildNode><ChildNumber>0</ChildNumber><ID>3</ID>
                                <reason>Optimizer mismatch(12)</reason><size>2x476</size>
								<optimizer_features_enable> 19.1.0  19.1.0.1  </optimizer_features_enable>
								<_optimizer_undo_cost_change> 19.1.0  19.1.0.1 </_optimizer_undo_cost_change>
								<_convert_set_to_join> false true              </_convert_set_to_join>
							    <_subquery_pruning_mv_enabled> false  true     </_subquery_pruning_mv_enabled>
							    <_optimizer_push_down_distinct> 0   5   </_optimizer_push_down_distinct>
							    <_fix_control_key> 0  260074645         </_fix_control_key>
							    <_px_shared_hash_join> false  true      </_px_shared_hash_join>
							    </ChildNode>
CON_ID                        : 1
-----------------
SQL_ID                        : 0atbzgvynmj51
ADDRESS                       : 00007FFB65E682E0
CHILD_ADDRESS                 : 00007FFB65C23F38
CHILD_NUMBER                  : 1
OPTIMIZER_MISMATCH            : Y
REASON                        :
CON_ID                        : 1
-----------------

PL/SQL procedure successfully completed.

It looks like Oracle is now showing all the new optimizer parameters that have changed between the optimizer used to compile child number 0 and that used to compile child cursor n°1. This is confirmed using Tanel Poder cofep script

SQL> @cofep 19.1.0 19.1.0.1
Compare Optimizer_Features_Enable Parameter differences
for values 19.1.0 and 19.1.0.1

PARAMETER                       '19.1.0'   '19.1.0.1' DESCRIPTION
------------------------------- ---------- ---------- ----------------------------------------------------------------------
_optimizer_push_down_distinct   0          5          push down distinct from query block to table
optimizer_features_enable       19.1.0     19.1.0.1   optimizer plan compatibility parameter
_optimizer_undo_cost_change     19.1.0     19.1.0.1   optimizer undo cost change
_subquery_pruning_mv_enabled    FALSE      TRUE       enable the use of subquery predicates with MVs to perform pruning
_convert_set_to_join            FALSE      TRUE       enables conversion of set operator to join
_px_shared_hash_join            FALSE      TRUE       enable/disable shared hash join

6 rows selected.

August 6, 2020

Historical column histogram

Filed under: Oracle — hourim @ 4:03 pm

Tracking column histogram modifications is not a common task among the Oracle DBA and developer’s sphere. And, why one would want to know the past column histogram values? In this blog post I intend to give a script serving the first purpose and show an example where this script can be of a great help.

1. Historical column histogram values

Here’s below the script which I have named h_hist1 (you will see later why I decided to use this script name):

/* ----------------------------------------------------------------------------------|
|Author : Mohamed Houri                                                              |
|Date   : 03/07/2020                                                                 |
|Scope  : This script gives historical column histogram values                       |
|          -- I am using sys.WRI$_OPTSTAT_HISTHEAD_HISTORY for this purpose			 |		
|          -- I am only able to say whether, previously, there was HISTOGRAM or not  |
|          -- I can't show the historical type of Histogram 				         |
|																					 |
|Usage  :  SQL> @h_hist1                                                             |
|			Enter value for table_name: t1                                           |
|			Enter value for owner: test                                              |
|			Enter value for col_name: n2     				                         |
-------------------------------------------------------------------------------------|*/
col object_name   	    format a20
col column_name  		format a12
col last_analyzed 		format a20
col prev_last_analyzed  format a20
col histogram           format a16
col prev_histogram      format a16
WITH sq AS 
    (
     SELECT
	      object_id
		 ,object_name
		 ,subobject_name
	 FROM
	     dba_objects
	 WHERE
	     object_name    = upper ('&&table_name')
	 AND owner          = upper('&&owner')
	 AND subobject_name IS NULL
	 )
SELECT
	 object_name
	,column_name
	,lead(prev_histogram,1,histogram) over (order by last_analyzed) histogram
	,last_analyzed
	,prev_histogram
	,prev_last_analyzed
FROM
   (
     SELECT
	     object_name
		,column_name
		,(select histogram from all_tab_col_statistics where owner = upper('&&owner') 
		  and table_name = upper('&&table_name') and column_name = upper('&&col_name')) histogram
		,last_analyzed
		,stat_time prev_last_analyzed
		,row_number() over (order by last_analyzed) rn
		,case when round(derivedDensity,9)= round(density,9) then 'NONE' else 'HISTOGRAM' end prev_histogram
	 FROM
	    (
		 SELECT
		     object_name
			,column_name
			,to_char(savtime ,'dd/mm/yyyy hh24:mi:ss')     last_analyzed
			,to_char(timestamp# ,'dd/mm/yyyy hh24:mi:ss') stat_time
			,density
			,1/distcnt derivedDensity
			,row_number() over (order by savtime) rn
			,lag(case when round(1/distcnt,9) = round(density,9) then 'NONE' else 'HISTOGRAM' end) over(order by savtime) hist_histogram
		 FROM
		    sys.WRI$_OPTSTAT_HISTHEAD_HISTORY
			INNER JOIN sq ON object_id = obj#
			INNER JOIN (SELECT 
			                column_id
						   ,column_name
						FROM
						    dba_tab_columns
						WHERE
						    column_name = upper('&&col_name')
						AND table_name  = upper('&&table_name') 
			            AND owner       = upper('&&owner')
						) ON intcol# = column_id
	)
WHERE
   rn >= 1 --exlcude/include the very first dbms_stat
   )
ORDER BY
    last_analyzed;

And here’s below a simple illustration usage:

SQL> exec dbms_stats.gather_table_stats(user, 't1', method_opt => 'for all columns size auto');
SQL> exec dbms_stats.gather_table_stats(user, 't1', method_opt => 'for all columns size 1');
SQL> exec dbms_stats.gather_table_stats(user, 't1', method_opt => 'for all columns size 3');

SQL> @h_hist1
Enter value for table_name: t1
Enter value for owner: c##mhouri
Enter value for col_name: n2

OBJECT_NAME  COLUMN_NAME  HISTOGRAM        LAST_ANALYZED        PREV_HISTOGRAM   PREV_LAST_ANALYZED
------------ ------------ ---------------- -------------------- ---------------- --------------------
T1           N2           HISTOGRAM        06/08/2020 15:39:00  HISTOGRAM        25/06/2020 18:42:06
T1           N2           NONE             06/08/2020 15:43:58  HISTOGRAM        06/08/2020 15:38:59
T1           N2           TOP-FREQUENCY    06/08/2020 16:03:19  NONE             06/08/2020 15:43:58

As you can see we went from HISTOGRAM on August the 6th at 15:39 to NONE at 15:43 then to TOP-FREQUENCY at 16:03. If I gather again statistics on the same table then here’s what the h_hist1 script will show:

SQL> exec dbms_stats.gather_table_stats(user, 't1', method_opt => 'for all columns size auto');
SQL> @h_hist1

OBJECT_NAME COLUMN_NAME  HISTOGRAM        LAST_ANALYZED        PREV_HISTOGRAM   PREV_LAST_ANALYZED
----------- ------------ ---------------- -------------------- ---------------- --------------------
T1          N2           HISTOGRAM        06/08/2020 15:39:00  HISTOGRAM        25/06/2020 18:42:06
T1          N2           NONE             06/08/2020 15:43:58  HISTOGRAM        06/08/2020 15:38:59
T1          N2           HISTOGRAM        06/08/2020 16:03:19  NONE             06/08/2020 15:43:58
T1          N2           FREQUENCY        06/08/2020 16:23:00  HISTOGRAM        06/08/2020 16:03:18

While I lost the histogram type (TOP-FREQENCY) as I go back through the historical path of the n2 column, I am, however, able to say whether, previously, this column was having a histogram or not.

When you Google using the words “historical histogram Oracle”, the first hit is a blog post by Nigel Bayliss: How do I get histogram history? The script shared by Nigel has been named h_hist. Here’s below the output of this script when applied to the current t1/n2 case:

SQL> @h_hist t1 c##mhouri
Table : C##MHOURI.T1

Column: N2                                         Last analyzed: 2020-06-08 16:22:59 [Current: FREQUENCY]
-     2020-06-08 15:39:00     0 ->     5 buckets CHANGE
-     2020-06-08 15:43:58     5 ->     0 buckets CHANGE
-     2020-06-08 16:03:19     0 ->     3 buckets CHANGE
-     2020-06-08 16:22:59     3 ->     5 buckets CHANGE

PL/SQL procedure successfully completed.

It coincides perfectly with my script but, for practical reasons, I do prefer using my script 🙂

2. When you might need this script?

Here’s an example to illustrate the utility of the h_hist1 script

SQL> select count(1) from t where n2= 100;

  COUNT(1)
----------
       100

SQL_ID  2k5g2apy78hj8, child number 1
-------------------------------------
Plan hash value: 1882749816
------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes |
------------------------------------------------------
|   0 | SELECT STATEMENT  |          |       |       |
|   1 |  SORT AGGREGATE   |          |     1 |     3 |
|*  2 |   INDEX RANGE SCAN| T_ACS_I1 |   100 |   300 |
------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N2"=:SYS_B_1)

SQL> select count(1) from t where n2= 1e6;

  COUNT(1)
----------
   1099049

SQL_ID  2k5g2apy78hj8, child number 2
-------------------------------------
Plan hash value: 2966233522
---------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |
---------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |
|*  2 |   TABLE ACCESS FULL| T    |  1099K|  3219K|
---------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("N2"=:SYS_B_1)

We have a query switching from a FULL TABLE SCAN to an INDEX RANGE SCAN depending on the number of rows processed by each bind variable value. So far so good until you are asked to trouble shoot a performance issue caused by this query since it is sticking into an INDEX RANGE SCAN execution plan whatever the bind variable value is

SQL> select count(1) from t where n2= 1e6;

  COUNT(1)
----------
   1099049

------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes |
------------------------------------------------------
|   0 | SELECT STATEMENT  |          |       |       |
|   1 |  SORT AGGREGATE   |          |     1 |     3 |
|*  2 |   INDEX RANGE SCAN| T_ACS_I1 |   240K|   703K|
------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N2"=:SYS_B_1)

This performance pain occurred because someone somewhere has deleted the histogram of column n2 from table t as the following proves:

SQL> @h_hist1

OBJECT_NAME COLUMN_NAME  HISTOGRAM        LAST_ANALYZED        PREV_HISTOGRAM   PREV_LAST_ANALYZED
----------- ------------ ---------------- -------------------- ---------------- --------------------
T           N2           HISTOGRAM        06/08/2020 17:10:44  HISTOGRAM
T           N2           NONE             06/08/2020 17:32:45  HISTOGRAM        06/08/2020 17:10:43 

Thanks to the FREQUENCY histogram on the column n2, the previous cursor was bind sensitive and bind aware allowing, as such, the query to have the best execution plan possible per regards to the bind variable value. But as soon as the n2 column has lost its histogram and the cursor has been flushed from memory, the new cursor was neither bind sensitive nor bind aware explaining why the corresponding query was sticking in the INDEX RANGE SCAN execution plan.

select
   dbms_stats.report_stats_operations(since => sysdate -1) text_line
from dual;

----------------------------------------------------------------------------------------
| Operation Id | Operation              | Target              | Start Time             |
----------------------------------------------------------------------------------------
| 5219         | gather_table_stats     | "C##MHOURI"."T"     | 06/08/20               |
|              |                        |                     | 17:32:45,333000 +02:00 |
----------------------------------------------------------------------------------------

select
   dbms_stats.report_single_stats_operation
      (opid         => 5219
      ,detail_level => 'ALL')
from dual;

---------------------------------------------------------------------------------------------
| Operation | Operation          | Target          | Session | Additional Info              |
| Id        |                    |                 | Id      |                              |
---------------------------------------------------------------------------------------------
| 5219      | gather_table_stats | "C##MHOURI"."T" | 395     | Parameters: [block_sample:   |
|           |                    |                 |         | FALSE] [cascade: NULL]       |
|           |                    |                 |         | [concurrent: FALSE] [degree: |
|           |                    |                 |         | NULL] [estimate_percent:     |
|           |                    |                 |         | DBMS_STATS.AUTO_SAMPLE_SIZE] |
|           |                    |                 |         | [force: FALSE] [granularity: |
|           |                    |                 |         | AUTO] [method_opt: for all   |
|           |                    |                 |         | columns size 1]              |
|           |                    |                 |         | [no_invalidate: NULL]        |
|           |                    |                 |         | [ownname: C##MHOURI]         |
|           |                    |                 |         | [partname: ]                 |
|           |                    |                 |         | [reporting_mode: FALSE]      |
|           |                    |                 |         | [statid: ] [statown: ]       |
|           |                    |                 |         | [stattab: ] [stattype: DATA] |
|           |                    |                 |         | [tabname: t]                 |
---------------------------------------------------------------------------------------------

Spot how the last dbms_stat run (operation id n°5219) did not gather histogram because it used:

[method_opt: for all columns size 1] 

3. Conclusion

Tracking column histogram modifications can be very handy in certain circumstances. The h_hist1 script provided in this blog post could help in such case of situations

August 5, 2020

On the impact of stale statistics when enabling Foreign keys

Filed under: Oracle,Statistics — hourim @ 12:55 pm

If you are going to enable a foreign key integrity constraint, then make sure you have fresh and representative statistics on both child and parent table ends. If this is not so, then you might lose a lot of time enabling your constraint because of a non-easy to spot recursive query.

Here’s below a summary of a real life 11.2.0.4 case:

SQL Text
------------------------------
BEGIN XZA_DDL.ENABLE_FK(owner => 'ABCDXY', nomtab => 'TABLE_ABC_C', nomfk => NULL); END;

Global Information
------------------------------
 Status              :  EXECUTING           
 Instance ID         :  1                   
 Session             :  XYZ_ABC(13:25335)  
 SQL ID              :  46bn1bvfkkpvn       
 SQL Execution ID    :  16777216            
 Execution Started   :  01/13/2020 16:54:37 
 First Refresh Time  :  01/13/2020 16:54:43 
 Last Refresh Time   :  01/14/2020 15:23:35 
 Duration            :  80939s              
 Module/Action       :  SQL Developer/-     
 Service             :  SYS$USERS           
 Program             :  SQL Developer       

Global Stats
===========================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | PL/SQL  |  Other   | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Time(s) | Waits(s) |  Gets  | Reqs | Bytes |
===========================================================================================
|   80937 |   79809 |     3.27 |        1.74 |    0.00 |     1124 |     2G |  215 |  21MB |
===========================================================================================

The very long enabling FK process was still in EXECUTING status after more than 22 hours as shown above. Let’s check what ASH can tell us about this situation:

SQL> select sql_id, count(1)
  2  from gv$active_session_history
  3  where
  4      sample_time between to_date('14012020 12:00:00', 'ddmmyyyy hh24:mi:ss')
  5                  and     to_date('14012020 15:45:00', 'ddmmyyyy hh24:mi:ss')
  6  group by sql_id
  7  order by 2 desc
  8  ;

SQL_ID          COUNT(1)
------------- ----------
                   17801
608srf0vf5f3q      12937 -->  alter table ABCDXY. TABLE_ABC_C
a8gtvr24afy70      10919
1vtd595xys9fp       9135
7q4kq2auz89x1       4203
fdxa2ph5250x1       3058

Manifestly it seems that we must focus our attention on the 608srf0vf5f3q SQL_ID.

From a wait event point of view, ASH shows this:

SQL> select event, count(1)
  2  from gv$active_session_history
  3  where
  4      sample_time between to_date('14012020 12:00:00', 'ddmmyyyy hh24:mi:ss')
  5                  and     to_date('14012020 15:45:00', 'ddmmyyyy hh24:mi:ss')
  6  group by event
  7  order by 2 desc;

EVENT                      COUNT(1)
------------------------ ----------
                              48756
row cache lock                12194
db file parallel write         6867
db file scattered read         6519
db file sequential read        5356
direct path read               1794
…/../

A lot of CPU burning followed by the row cache lock wait event. However, the SQL_ID 608srf0vf5f3q is reported to be only burning CPU as the followings proves:

SQL> select event, count(1)
  2  from gv$active_session_history
  3  where
  4      sample_time between to_date('14012020 12:00:00', 'ddmmyyyy hh24:mi:ss')
  5                  and     to_date('14012020 15:45:00', 'ddmmyyyy hh24:mi:ss')
  6  and sql_id = '608srf0vf5f3q'
  7  group by event
  8  order by 2 desc;

EVENT     COUNT(1)
------- ----------
             12937

Since the enabling process was still running I decided to “snap’’ the SID (13) of the PL/SQL stored package which I got from the corresponding gv$sql_monitor

SQL> @snapper all 5 1 13
Sampling SID 13 with interval 5 seconds, taking 1 snapshots...

-- Session Snapper v4.26 - by Tanel Poder ( http://blog.tanelpoder.com/snapper ) 
- Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! 

------------------------------------------------------------------------------------------------
SID, USERNAME, TYPE, STATISTIC                            ,         DELTA, HDELTA/SEC,    %TIME,
------------------------------------------------------------------------------------------------
 13, XYZ_ABC, STAT, session logical reads                ,        142350,     27.03k,         ,
 13, XYZ_ABC, STAT, consistent gets                      ,        142351,     27.03k,         ,
 13, XYZ_ABC, STAT, consistent gets from cache           ,        142352,     27.03k,         ,
 13, XYZ_ABC, STAT, consistent gets from cache (fastpath),        142352,     27.03k,         ,
 13, XYZ_ABC, STAT, logical read bytes from cache        ,    1166147584,    221.41M,         , --> this looks interesting
 13, XYZ_ABC, STAT, calls to kcmgcs                      ,            22,       4.18,         ,
 13, XYZ_ABC, STAT, no work - consistent read gets       ,        142334,     27.02k,         ,
 13, XYZ_ABC, STAT, table scans (short tables)           ,             3,        .57,         ,
 13, XYZ_ABC, STAT, table scan rows gotten               ,      23484525,      4.46M,         ,
 13, XYZ_ABC, STAT, table scan blocks gotten             ,        142335,     27.02k,         ,
 13, XYZ_ABC, STAT, buffer is pinned count               ,             3,        .57,         ,
 13, XYZ_ABC, TIME, DB CPU                               ,       6003174,      1.14s,   114.0%,
 13, XYZ_ABC, TIME, sql execute elapsed time             ,       6003624,      1.14s,   114.0%,
 13, XYZ_ABC, TIME, DB time                              ,       6003624,      1.14s,   114.0%,

--  End of Stats snap 1, end=2020-01-14 15:30:20, seconds=5.3

----------------------------------------------------------------------------------
  ActSes   %Thread | INST | SQL_ID          | SQL_CHILD | EVENT  | WAIT_CLASS
----------------------------------------------------------------------------------
    1.00    (100%) |    1 | 608srf0vf5f3q   | 0         | ON CPU | ON CPU

--  End of ASH snap 1, end=2020-01-14 15:30:20, seconds=5, samples_taken=48, AAS=1

Nonchalantly, all the investigations are showing that the culprit query is 608srf0vf5f3q which is burning a lot of CPU. The 221 M of logical read bytes from cache could very well correlate with high CPU consumption. But we still don’t know why and who is reading those hundreds of millions of logical reads from cache?

In addition, the ash_wait_chain script applied to this SQL_ID 608srf0vf5f3q shows that this query is not blocked by any other process:

SQL> @ash_wait_chains event2 sql_id='608srf0vf5f3q' "timestamp'2020-01-14 12:00:00'" "timestamp'2020-01-14 15:45:00'"

-- Display ASH Wait Chain Signatures script v0.4 BETA by Tanel Poder ( http://blog.tanelpoder.com )

%This     SECONDS     AAS WAIT_CHAIN    LAST_SEEN
------ ---------- ------- ------------- --------------------
 100%       12937     1.0 -> ON CPU     2020-01-14 15:44:59

That’s weird. Isn’t it?

After a couple of minutes going round and round in circles, I decided to use the brute force by:

  • Killing the process
  •  Activating the 10046 trace
  •  Running the process for a couple of minutes
  •  Stopping again the process
  •  Analyzing the trace file
alter session set tracefile_identifier = EnablingForeignKey;
@46on 12
exec XZA_DDL.ENABLE_FK(owner => 'ABCDXY', nomtab => 'TABLE_ABC_C', nomfk => NULL);
@46off

And here’s below what I found in the TKPROFED generated trace file

SQL ID: 56tj8jdcw15jb Plan Hash: 2257352843

select /*+ all_rows ordered dynamic_sampling(2) */ A.rowid, :1, :2, :3 
from
 "ABCDXY"."TABLE_ABC_C" A , "ABCDXY"."TABLE_PARENT_ABC" B where( 
  "A"."COL_1" is not null and "A"."COL_2" is not null and "A"."COL_3" 
  is not null and "A"."COL_4" is not null) and( "B"."COL_1" (+)= 
  "A"."COL_1" and "B"."COL_2" (+)= "A"."COL_2" and "B"."COL_3" (+)=
   "A"."COL_3" and "B"."COL_4" (+)= "A"."COL_4") and( "B"."COL_1" is 
  null or "B"."COL_2" is null or "B"."COL_3" is null or "B"."COL_4" is 
  null)


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1    217.83     224.03          0    6179071          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3    217.83     224.03          0    6179071          0           0

This recursive query was enough for me to understand and fix this issue. I’ve then started by checking the historical execution of this recursive query as shown below:

SQL> @SqlHistStat
Enter value for sql_id: 56tj8jdcw15jb
Enter value for from_date: 01012020

SNAP_BEGIN                PLAN_HASH_VALUE    EXECS END_OF_FETCH  AVG_ETIME     AVG_LIO   AVG_ROWS
------------------------- --------------- -------- ------------ ----------  ---------- ----------
13-JAN-20 04.00.44.563 PM      2257352843        1            0        381    10934401          0
13-JAN-20 05.00.56.481 PM      2257352843        0            0       3551   104544327          0
13-JAN-20 06.00.05.981 PM      2257352843        0            0       3609   104844651          0
13-JAN-20 07.00.15.180 PM      2257352843        0            0       3619    84153912          0
13-JAN-20 08.00.29.468 PM      2257352843        0            0       3610   104607796          0
13-JAN-20 09.00.38.801 PM      2257352843        0            0       3619    93027006          0
13-JAN-20 10.00.49.488 PM      2257352843        0            0       3595   102089647          0
13-JAN-20 11.00.59.446 PM      2257352843        0            0       3545    99557582          0
14-JAN-20 12.00.09.039 AM      2257352843        0            0       3609   104246681          0
14-JAN-20 01.00.18.823 AM      2257352843        0            0       3608   106861170          0
14-JAN-20 02.00.28.120 AM      2257352843        0            0       3611   103242627          0
14-JAN-20 03.00.37.711 AM      2257352843        0            0       3610   106157348          0
14-JAN-20 04.00.47.161 AM      2257352843        0            0       3608   103770559          0
14-JAN-20 05.00.56.531 AM      2257352843        0            0       3547   103304809          0
14-JAN-20 06.00.05.487 AM      2257352843        0            0       3609   105243323          0
14-JAN-20 07.00.14.675 AM      2257352843        0            0       3608   107326829          0
14-JAN-20 08.00.23.442 AM      2257352843        0            0       3608   108137854          0
14-JAN-20 09.00.32.133 AM      2257352843        0            0       3609   107183542          0
14-JAN-20 10.00.41.111 AM      2257352843        0            0       3610   104855775          0
14-JAN-20 11.00.50.825 AM      2257352843        0            0       3554   100223228          0
14-JAN-20 12.00.00.751 PM      2257352843        0            0       3610    99540252          0
14-JAN-20 01.00.11.282 PM      2257352843        0            0       3614   101153086          0
14-JAN-20 02.00.21.974 PM      2257352843        0            0       3610    98943412          0
14-JAN-20 03.00.32.525 PM      2257352843        1            0        436    12358214          0

24 rows selected.

As you can see, through the END_OF_FETCH column, this query has spanned 24 one-hour snapshots without completing. It uses the following execution plan:

--------------------------------------------------------------------------------
| Id  | Operation                  | Name              | Rows  | Pstart| Pstop |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                   |       |       |       |
|*  1 |  FILTER                    |                   |       |       |       |
|   2 |   NESTED LOOPS OUTER       |                   |     1 |       |       |
|   3 |    PARTITION RANGE ALL     |                   |     1 |     1 |     2 |
|   4 |     TABLE ACCESS FULL      | TABLE_ABC_C       |     1 |     1 |     2 |
|   5 |    PARTITION RANGE ITERATOR|                   |     1 |   KEY |   KEY |
|*  6 |     TABLE ACCESS FULL      | TABLE_PARENT_ABC  |     1 |   KEY |   KEY |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(("B"."COL_1" IS NULL OR "B"."COL_2" IS NULL OR "B"."COL_3" IS NULL OR
              "B"."COL_4" IS NULL))
   6 - filter(("B"."COL_1"="A"."COL_1" AND "B"."COL_2"="A"."COL_2" AND
               "B"."COL_3"="A"."COL_3" AND "B"."COL_4"="A"."COL_4"))

I don’t know why whenever I have to fix a very long non-ending process, I almost always end up by finding that a NESTED LOOPS is for something in this pain 😊

It is clear now that the TABLE_ABC_C child table and its parent TABLE_PARENT_ABC need to have fresh statistics before relaunching the FK enabling process. I always keep saying that, when an estimated cardinality of 1 is reported in the execution plan, then there is a big change that this is due to a non-fresh or non-representative statistic.

SQL> exec dbms_stats.gather_table_stats ('ABCDXY', 'TABLE_ABC_C', cascade => true,         no_invalidate => false);
SQL> exec dbms_stats.gather_table_stats ('ABCDXY', 'TABLE_PARENT_ABC', cascade => true, no_invalidate => false);

SQL> @HistStat
Enter value for owner: ABCDXY
Enter value for table_name: TABLE_ABC_C

OBJECT_NAME    OBJECT_TYPE  PREV_STAT_TIME           ROWCNT SAMPLESIZE ESTIMATE_PCT
-------------- ------------ -------------------- ---------- ---------- ------------
TABLE_ABC_C  TABLE       
TABLE_ABC_C  TABLE        09/01/2020 16:59:25           0          0            0
TABLE_ABC_C  TABLE        14/01/2020 16:34:17      946798     946798          100
TABLE_ABC_C  TABLE        14/01/2020 17:09:02      946798     946798          100

SQL> @HistStat
Enter value for owner: ABCDXY
Enter value for table_name: TABLE_PARENT_ABC

OBJECT_NAME         OBJECT_TYPE  PREV_STAT_TIME        ROWCNT SAMPLESIZE ESTIMATE_PCT
------------------- ------------ -------------------- ------- ---------- ------------
TABLE_PARENT_ABC   TABLE       
TABLE_PARENT_ABC   TABLE        09/01/2020 16:59:25        0          0            0
TABLE_PARENT_ABC   TABLE        14/01/2020 16:34:43  9032840    9032840          100
TABLE_PARENT_ABC   TABLE        14/01/2020 17:08:52  9032840    9032840          100

And finally, the coast is clear to re-enable the foreign key in about 10 seconds as shown below:

SQL> exec DBA_DDL.ENABLE_FK(owner => 'ABCDXY', nomtab => 'TABLE_ABC_C', nomfk => NULL);

PL/SQL procedure successfully completed.

Elapsed: 00:00:10.29 --> 10 seconds!

This has been possible because the new execution plan switched from that dramatic NL to a more suitable HASH JOIN as shown below:

SQL_ID  56tj8jdcw15jb, child number 0
-------------------------------------
Plan hash value: 1651427782
------------------------------------------------------------------------------------------
| Id  | Operation            | Name              | Rows  | Bytes |TempSpc| Pstart| Pstop |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                   |       |       |       |       |       |
|   1 |  PARTITION RANGE ALL |                   |     1 |    90 |       |     1 |     3 |
|*  2 |   FILTER             |                   |       |       |       |       |       |
|*  3 |    HASH JOIN OUTER   |                   |     1 |    90 |    18M|       |       |
|   4 |     TABLE ACCESS FULL| TABLE_ABC_C       |   946K|    46M|       |     1 |     3 |
|   5 |     TABLE ACCESS FULL| TABLE_PARENT_ABC  |  9032K|   335M|       |     1 |     3 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("B"."COL_1" IS NULL OR "B"."COL_2" IS NULL OR "B"."COL_3" IS NULL 
              OR "B"."COL_4" IS NULL))
   3 - access("B"."COL_4"="A"."COL_4" AND "B"."COL_1"="A"."COL_1" AND
              "B"."COL_2"="A"."COL_2" AND "B"."COL_3"="A"."COL_3")

SQL> @sqlstats
Enter value for sql_id: 56tj8jdcw15jb

     CHILD PLAN_HASH_VALUE   AVG_GETS   AVG_PIOS  AVG_ETIME     EXECS
---------- --------------- ---------- ---------- ---------- ---------
         0      1651427782     133519     109286  27.018412         1

Bottom Line

Regardless of the kind of Oracle SQL operations you are executing you should always make sure you have given, beforehand, Oracle enough statistics information about your table/index/column. If this is not so, then be sure that you will certainly end up by stopping to be lucky and by facing a performance trouble. And, in the case of enabling Foreign key constraint, this performance pain will be exacerbated by a recursive query involving the parent-child tables. If this recursive query has a wrong execution plan because of lack of statistics, then it will drastically delay the FK enabling execution time and it will be very hard to point out that the root cause of the performance pain is due to that recursive query.

July 9, 2020

Oracle LuxOUG – Virtual Tech Days

Filed under: Oracle — hourim @ 2:36 pm

My friend Rodrigo Mufalani asked me to speak about the Adaptive and Extensive Cursor Sharing during the LuxOUG – Virtual Tech Days in English, and for a Luxembourg local team in French. Here’re below the English presentation and its corresponding slide

I will update this blog post when the French presentation will be done

May 24, 2020

Why my hint has not been used?

Filed under: Hint,Oracle — hourim @ 12:20 pm

While waiting for Tanel Poder hacking session about sql hint invalidity which I encourage everyone to subscribe and to follow, here’re few particular cases where hints are not obeyed by the Oracle Cost Based Optimizer. It happens when the hint asks the CBO to do something which is either impossible or not yet implemented in its algorithm. When two hints are contradictory one of them will be ignored by the CBO to the detriment of the other one. We will see that the victim hint seems not to be cost based dependent but rather rule based. Let’s see this via a reproducible example

JPPD and HASH JOIN

To allow data filtering as early as possible, Oracle can use a JPPD transformation in order to apply a predicate from the outer query block (or main query block) into the contained non-mergeable query block. As such, a precise index range scan can be used and data are filtered much earlier.

Here’s a simple model with which I will illustrate this transformation:

SQL> select banner from v$version;

BANNER
----------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 – Production

create table t1 as 
  select
      rownum n1
	 ,mod(rownum,5) n2
	 ,trunc( (rownum - 1 / 3) ) n3
  from
     dual
connect by level <= 1e2;

create table t2 as
  select
     rownum n1
    ,mod(rownum,3) n2
	,trunc( (rownum - 1 / 5) ) n3
from
    dual
 connect by level <= 1e3;

create table t3 as 
   select 
     rownum n1
	,lpad('x',4000,'x') vc
from dual
connect by level <= 100;

alter table t2 add constraint t2_pk primary key (n1);
alter table t3 add constraint t3_pk primary key (n1);

begin
  dbms_stats.gather_table_stats(user, 't1' );
  dbms_stats.gather_table_stats(user, 't2' );
  dbms_stats.gather_table_stats(user, 't3' );
end;
/

Let’s now execute the following query and get its corresponding execution plan

select 
   t1.n1, t1.n3
from 
   t1
  left join
   (select t2.n1 from t2, t3
    where t2.n1 = t3.n1
	) vw_1
on t1.n1 = vw_1.n1; 

Plan hash value: 3070139659
--------------------------------------------------------------
| Id  | Operation               | Name  | Rows  | Cost (%CPU)|
--------------------------------------------------------------
|   0 | SELECT STATEMENT        |       |       |     3 (100)|
|   1 |  NESTED LOOPS OUTER     |       |   100 |     3   (0)|
|   2 |   TABLE ACCESS FULL     | T1    |   100 |     3   (0)|
|   3 |   VIEW PUSHED PREDICATE |       |     1 |     0   (0)|  --> JPPD used
|   4 |    NESTED LOOPS         |       |     1 |     0   (0)|
|*  5 |     INDEX UNIQUE SCAN   | T3_PK |     1 |     0   (0)|
|*  6 |     INDEX UNIQUE SCAN   | T2_PK |     1 |     0   (0)|
--------------------------------------------------------------
Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('19.1.0')
      DB_VERSION('19.1.0')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$639F1A6F")
      PUSH_PRED(@"SEL$776AA54E" "VW_1"@"SEL$1" 1)
      OUTLINE_LEAF(@"SEL$776AA54E")
      MERGE(@"SEL$8812AA4E" &gt;"SEL$E8571221")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SEL$E8571221")
      ANSI_REARCH(@"SEL$3")
      OUTLINE(@"SEL$8812AA4E")
      ANSI_REARCH(@"SEL$1")
      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$1")
      FULL(@"SEL$776AA54E" "T1"@"SEL$1")
      NO_ACCESS(@"SEL$776AA54E" "VW_1"@"SEL$1")
      LEADING(@"SEL$776AA54E" "T1"@"SEL$1" "VW_1"@"SEL$1")
      USE_NL(@"SEL$776AA54E" "VW_1"@"SEL$1")
      INDEX(@"SEL$639F1A6F" "T3"@"SEL$2" ("T3"."N1"))
      INDEX(@"SEL$639F1A6F" "T2"@"SEL$2" ("T2"."N1"))
      LEADING(@"SEL$639F1A6F" "T3"@"SEL$2" "T2"@"SEL$2")
      USE_NL(@"SEL$639F1A6F" "T2"@"SEL$2")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T3"."N1"="T1"."N1")
   6 - access("T2"."N1"="T1"."N1")

Notice at operation Id n°5 that the predicate

on t1.n1 = vw_1.n1; 

Has been pushed inside the view vw_1 to become

("T3"."N1"="T1"."N1")

This is simply what a JPPD transformation is.

Let’s now take the NESTED LOOPS hint from the above execution plan outline

USE_NL(@"SEL$776AA54E" "VW_1"@"SEL$1")

change it into a HASH JOIN

USE_HASH(@"SEL$776AA54E" "VW_1"@"SEL$1")

and use it in the above query as shown below:

select /*+ 
         USE_HASH(@"SEL$776AA54E" "VW_1"@"SEL$1")
	  */
   t1.n1, t1.n3
from 
   t1
  left join
   (select t2.n1 from t2, t3
    where t2.n1 = t3.n1
	) vw_1
on t1.n1 = vw_1.n1; 

----------------------------------------------------------
| Id  | Operation            | Name  | Rows  |Cost (%CPU)|
----------------------------------------------------------
|   0 | SELECT STATEMENT     |       |       |    4 (100)|
|*  1 |  HASH JOIN OUTER     |       |   100 |    4   (0)| --> HASH JOIN used
|   2 |   TABLE ACCESS FULL  | T1    |   100 |    3   (0)|
|   3 |   VIEW               |       |   100 |    1   (0)|
|   4 |    NESTED LOOPS      |       |   100 |    1   (0)|
|   5 |     INDEX FULL SCAN  | T3_PK |   100 |    1   (0)|
|*  6 |     INDEX UNIQUE SCAN| T2_PK |     1 |    0   (0)|
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."N1"="VW_1"."N1")
   6 - access("T2"."N1"="T3"."N1")

Notice that, while the HASH JOIN hint has been obeyed, the JPPD transformation has, however, not been used.

Let’s then force the JPPD transformation via the PUSH_PRED hint and see what happens

select /*+ 
         USE_HASH(@"SEL$776AA54E" "VW_1"@"SEL$1")
	 PUSH_PRED(@"SEL$776AA54E" "VW_1"@"SEL$1" 1)
	*/
   t1.n1, t1.n3
from 
   t1
  left join
   (select t2.n1 from t2, t3
    where t2.n1 = t3.n1
	) vw_1
on t1.n1 = vw_1.n1; 

-------------------------------------------------------------
| Id  | Operation               | Name  | Rows  |Cost (%CPU)|
-------------------------------------------------------------
|   0 | SELECT STATEMENT        |       |       |    3 (100)|
|   1 |  NESTED LOOPS OUTER     |       |   100 |    3   (0)| --> NESTED LOOPS vs HASH JOIN
|   2 |   TABLE ACCESS FULL     | T1    |   100 |    3   (0)|
|   3 |   VIEW PUSHED PREDICATE |       |     1 |    0   (0)| --> JPPD used
|   4 |    NESTED LOOPS         |       |     1 |    0   (0)|
|*  5 |     INDEX UNIQUE SCAN   | T3_PK |     1 |    0   (0)|
|*  6 |     INDEX UNIQUE SCAN   | T2_PK |     1 |    0   (0)|
-------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T3"."N1"="T1"."N1")
   6 - access("T2"."N1"="T1"."N1")

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
   1 -  SEL$776AA54E / VW_1@SEL$1
         U -  USE_HASH(@"SEL$776AA54E" "VW_1"@"SEL$1")   

Bizarrely, the HASH JOIN hint has, this time, not been used. This non-utilization of the HASH JOIN hint is also confirmed by the 19c Hint Report:

Total hints for statement: 1 (U - Unused (1)

Why?

Oracle decided to ignore the HASH JOIN hint in profit of the PUSH_PRED hint simply because JPPD transformation is not possible with HASH JOIN. When Oracle decides to follow the JPPD directive it knows that this will be possible only under a NESTED LOOPS join operation. Consequently it has to ignore the HASH JOIN hint.

The 10053 trace file is not very verbose in this case. Here’s below what we can link to the non-obeyed HASH JOIN hint from this trace file:

*********************************
Number of join permutations tried: 1
*********************************
Enumerating distribution method (advanced)
--- Distribution method for 
join between T1[T1](serial) and VW_1[from$_subquery$_002](serial); jm = 1; right side access path = TableScan
---- serial Hash-Join ---> NONE     

In the Adaptive plan section of the same trace file we can read that the HASH JOIN is not legal

AP: adaptive joins bypassed for table VW_1 @ SEL$776AA54E due to Hash join is not legal

Probably through this illegality Oracle is indicating that it is impossible to obey the HASH JOIN in presence of a PUSH PREDICATE hint.

It looks like, whatever the cost of the JPPD is, it will be always prefered to the HASH JOIN as the JPPD has certainly the advantage of filtering data much earlier in the query execution path.

JPPD and ROWNUM

I think that I have already explained why one should always consider using the rownum pseudo-column with careful attention. Here’re two rownum usage side effects that I am aware of:

  • Where rownum =1 might automatically switch to the buggy FIRST_ROWS mode
  •  Using a rownum stops view merging and JPPD transformation

Let’s examine the second case via the above query

select /*+ 
	  PUSH_PRED(@"SEL$776AA54E" "VW_1"@"SEL$1" 1)
	*/
   t1.n1, t1.n3
from 
   t1
  left join
   (select t2.n1 from t2, t3
    where t2.n1 = t3.n1
	) vw_1
on t1.n1 = vw_1.n1; 

--------------------------------------------------------------
| Id  | Operation               | Name  | Rows  | Cost (%CPU)|
--------------------------------------------------------------
|   0 | SELECT STATEMENT        |       |       |    11 (100)|
|   1 |  NESTED LOOPS OUTER     |       | 10000 |    11  (19)|
|   2 |   TABLE ACCESS FULL     | T1    | 10000 |     9   (0)|
|   3 |   VIEW PUSHED PREDICATE |       |     1 |     0   (0)| --> JPPD used
|   4 |    NESTED LOOPS         |       |     1 |     0   (0)|
|*  5 |     INDEX UNIQUE SCAN   | T3_PK |     1 |     0   (0)|
|*  6 |     INDEX UNIQUE SCAN   | T2_PK |     1 |     0   (0)|
--------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T3"."N1"="T1"."N1")
   6 - access("T2"."N1"="T1"."N1")
select /*+ 
	  PUSH_PRED(@"SEL$776AA54E" "VW_1"@"SEL$1" 1)
	*/
   t1.n1, t1.n3
from 
   t1
  left join
   (select t2.n1 from t2, t3
    where t2.n1 = t3.n1
    and rownum >1    --------------> spot this here
	) vw_1
on t1.n1 = vw_1.n1; 

-------------------------------------------------------------
| Id  | Operation              | Name  | Rows  | Cost (%CPU)|
-------------------------------------------------------------
|   0 | SELECT STATEMENT       |       |       |    10 (100)|
|*  1 |  HASH JOIN RIGHT OUTER |       | 10000 |    10   (0)|
|   2 |   VIEW                 |       |   100 |     1   (0)|
|   3 |    COUNT               |       |       |            |
|*  4 |     FILTER             |       |       |            |
|   5 |      NESTED LOOPS      |       |   100 |     1   (0)|
|   6 |       INDEX FULL SCAN  | T3_PK |   100 |     1   (0)|
|*  7 |       INDEX UNIQUE SCAN| T2_PK |     1 |     0   (0)|
|   8 |   TABLE ACCESS FULL    | T1    | 10000 |     9   (0)|
-------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."N1"="VW_1"."N1")
   4 - filter(ROWNUM>1)
   7 - access("T2"."N1"="T3"."N1")

Hint Report (identified by operation id / Query Block Name / Object Alias):
Total hints for statement: 1 (U - Unused (1))
---------------------------------------------------------------------------
   1 -  SEL$776AA54E / VW_1@SEL$1
         U -  PUSH_PRED(@"SEL$776AA54E" "VW_1"@"SEL$1" 1)

Notice how the use of the ROWNUM pseudo-column in the contained query block impeached the JPPD hint to be obeyed. Hopefully in this case the corresponding 10053 trace file shows clearly why the JPPD transformation (hint) has been ignored

---------------------
QUERY BLOCK SIGNATURE
---------------------
  signature (): qb_name=SEL$E8571221 nbfros=1 flg=0
    fro(0): flg=1 objn=0 hint_alias="from$_subquery$_005"@"SEL$3"

JPPD:     JPPD bypassed: View has unsupported constructs.
Check Basic Validity for Non-Union View for query block SEL$2 (#0)
JPPD:     JPPD bypassed: View contains a rownum reference.

JPPD:     JPPD bypassed: View contains a rownum reference. ------------> here
FPD: Considering simple filter push in query block SEL$2 (#0)
"T2"."N1"="T3"."N1" AND ROWNUM >1

Summary

There are many other situations like the ones exposed above out of which the SUBQUERY_PRUNING hint which forces a PARTITION RANGE SUBQUERY but only under a HASH JOIN operation or under a NESTED LOOPS but only when the partitioned table is on the right hand side of the join (inner row source). The combination of SUBQUERY_PRUNING and USE_NL hint would be impossible to be obeyed at the same time since this will force a recursive SQL ( SELECT distinct TBL$OR$IDX$PART$NUM) that would have been executed as many times as there are rows in the outer row source.

I may continue blogging about similar non-obeyed hints in a separate blog article to keep the current one about 5 minutes of read.

May 16, 2020

SQL*Net break/reset to client

Filed under: Oracle — hourim @ 6:10 pm

This is a draft I wrote in 2015. I don’t know why I have not published it by that time? probably I was looking for something to add in this context. Anyway, I decided to give it a rebirth right now  after almost 5 years of dormant time.

A SQL*Net break/reset to client wait event was popping out at a dramatic rate so that the client application ASH data was abnormally flushed out from memory. It was then vital to find what is causing this wait event to show up at this high rate. This article aims to explain the steps I have followed trying to figure out the sql_id at the origin of this wait event.

The first article, dealing with SQL*Net break/reset to client error, that pops up when you Google for it, is this one by Tanel Poder.  It explains clearly what this event represents and gives a reproducible example.  In the real life case I’ve faced  ASH was showing this:

SQL> select event, count(1)
    from gv$active_session_history
    where sample_time between to_date('10072015 11:00:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('10072015 12:00:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
SQL*Net break/reset to client                                        164205
                                                                      10438
db file sequential read                                                2960
SQL*Net message from dblink                                            2499
direct path read                                                       1322
log file sync                                                           427
log file parallel write                                                 411
SQL*Net more data from client                                           183
control file sequential read                                             98
Backup: MML create a backup piece                                        82
…/…

Let’s try to find, naively, the sql_id behind this dramatic 164,205 SQL*Net break/reset to client recorded waits

SQL>  select sql_id, count(1)
      from gv$active_session_history
      where sample_time between to_date('10072015 11:00:00', 'ddmmyyyy hh24:mi:ss')
                        and     to_date('10072015 12:00:00', 'ddmmyyyy hh24:mi:ss')
      and event = 'SQL*Net break/reset to client'
      group by sql_id
      order by 2 desc;

SQL_ID          COUNT(1)
------------- ----------
                  164205 

Damn!! It is a NULL sql_id

What about the sessions experimenting this wait event

SQL> select session_id, count(1)
     from gv$active_session_history
     where sample_time between to_date('10072015 11:00:00', 'ddmmyyyy hh24:mi:ss')
                       and     to_date('10072015 12:00:00', 'ddmmyyyy hh24:mi:ss')
     and event = 'SQL*Net break/reset to client'
     group by session_id
     order by 2 desc;

SESSION_ID   COUNT(1)
---------- ----------
       151       3575
      2846       3575
       465       3575
      1791       3575
      2465       3575
       853       3575
      1216       3575
      1270       3575
      1397       3575
      2912       3575
        90       3575
       412       3575
      1155       3575
      1838       3575
      1957       3575
      2777       3575
      1038       3575 
      1290       3575
      1455       3575
      2648       3575
      2860       3575
      2988       3575
       344       3575
       533       3575
       337       3575
      2424       3575
      1028       3575
      1779       3575
      2047       3575
      2412       3575
       136       3575
       591       3575
       773       3575
       900       3575
       913       3575
      1335       3575
      1657       3575
      2101       3575
      2363       3575
       202       3575
      2226       3575
        16       3575
        79       3575
      1916       3575
      2362       3575
      1705       1429
       908       1181 

More than 45 sessions experimenting this wait event. Let’s then try to get the sql_id of one of those sessions:

SQL> select sql_id, count(1)
    from gv$active_session_history
    where sample_time between to_date('10072015 11:00:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('10072015 12:00:00', 'ddmmyyyy hh24:mi:ss')
    and session_id = 151
    group by sql_id
    order by 2 desc;

SQL_ID          COUNT(1)
------------- ----------
                    3515 

Damn!! It is a NULL sql_id again.

Let’s now get the list of wait events the above session (151) was experimenting


SQL> select event, count(1)
     from gv$active_session_history
     where sample_time between to_date('10072015 11:00:00', 'ddmmyyyy hh24:mi:ss')
                       and     to_date('10072015 12:00:00', 'ddmmyyyy hh24:mi:ss')
     and session_id = 151
     group by event
     order by 2 desc;

EVENT                            COUNT(1)
-------------------------------- -------
SQL*Net break/reset to client    3453

I am going around a vicious circle! Isn’t it?

What if I try, now, to join those particular sessions with v$process view to see, at least, what their process are accomplishing

SQL> with got_my_sid
    as(
       select
            session_id
          , count(1)
       from
           v$active_session_history
       where sample_time between to_date('10072015 11:00:00', 'ddmmyyyy hh24:mi:ss')
                         and     to_date('10072015 12:00:00', 'ddmmyyyy hh24:mi:ss')
      and event = 'SQL*Net break/reset to client'
      group by session_id
      )
   select
          p.username
         ,p.tracefile
         ,p.pga_used_mem
         ,p.pga_alloc_mem
    from
         v$process p
        ,v$session s
   where
         p.addr=s.paddr
    and exists (select null
               from got_my_sid a
               where s.sid = a.session_id);

USERNAME     TRACEFILE                                                                        PGA_USED_MEM PGA_ALLOC_MEM
------------ -------------------------------------------------------------------------------- ------------ -------------
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_116986.trc               44210323      46756230
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_58127.trc                22468107      26308998
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_58151.trc                51969971      56979846
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_69733.trc                42309691      45969798
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_69747.trc               103456803     115241350
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_69759.trc                 5183995       8810886
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_58169.trc               122509563     126316934
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_94674.trc                 4810835      10711430
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_77524.trc                 4867723      10711430
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_77727.trc                48775851      57307526
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_77837.trc                 5697963       6517126
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_77962.trc                 4926947       7500166
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76614.trc               142568163     170160518
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76622.trc                10887843      12743046
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76634.trc                13751019      16675206
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76662.trc                17715523      18772358
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76670.trc                 4852883       7631238
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76676.trc                 5040315       6517126
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76734.trc                20039043      22114694
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76741.trc               110836667     127889798
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76749.trc                29546107      33255814
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76792.trc                 5336891       6975878
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11735.trc               141510371     167014790
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11741.trc                50941459      54882694
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11759.trc                13313739      15102342
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11772.trc                45210595      56390022
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11784.trc                16259099      18641286
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_58289.trc                86902963     103838086
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_76815.trc                10668723      13005190
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11808.trc                13386251      15036806
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11844.trc                64225827      68252038
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11856.trc                50413091      54620550
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_117006.trc               38196715      46952838
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_117012.trc               27739851      30241158
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11862.trc               114249963     120418694
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11868.trc                90697083      95973766
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_117042.trc               78140859      79982982
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11892.trc               188237195     226324870
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_117078.trc              141404363     171209094
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_117092.trc              142334875     166883718
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_117098.trc               44335187      46887302
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_117112.trc               98170435     117469574
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11898.trc                 5171715       7696774
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_11904.trc                 4970579       5534086
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_95446.trc               142625387     164524422
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_58609.trc                 5278147       7041414
oracle       /oracle/base/diag/rdbms/pwrwdp1/CLSWPROD1/trace/CLSWPROD1_ora_58689.trc                 5316451       6910342 

Finally I have something to get my teeth into.

It looks like sessions that were suffering from the SQL*Net break/reset to client wait event have generated several trace files. And, hopefully, one of those trace files shows the following error:

ORA-04030: out of process memory when trying to allocate 824504 bytes (pga heap,kco buffer)
ORA-04030: out of process memory when trying to allocate 3032 bytes (kkoutlCreatePh,kkotrb : kkoabr) 

The same trace file shows, as well, the SQL Statement which has originated this very high number of  SQL*Net break/reset to client wait event

dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=d6jgtxn25741y) -----
select * from ( select alias.cols as skey87_0_, alias.skey as skey81_1_ 

This is how I drilled down to the root cause of this wait event.

 

 

May 10, 2020

Why my execution plan has not been shared: Part 6

Filed under: Oracle — hourim @ 9:57 am

In this sixth installment of the series of articles about the non-sharing reasons that prevent Oracle from using an existing execution plan, I am going to dig into a new supplementary reason which is Session Specific Cursor Session Mismatch. For the sake of completeness here’s below the list of already investigated non-sharing reasons:

I didn’t find any definition from the official Oracle documentation for the Session Specific Cursor non-sharing reason. So let’s start immediately by building a test case with which we will provoke the apparition of this non-sharing reason

SQL> select banner from v$version;

BANNER
----------------------------------------------------------------------
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production

SQL> select distinct sid from v$mystat;

       SID
----------
        16
		
SQL> create global temporary table t1 (owner varchar2(128), object_type varchar2(30)) on commit preserve rows;

Table created.

SQL> insert into t1 select owner, object_type from all_objects where owner ! = 'SYS';

19296 rows created.

SQL> commit;

SQL> col prefs format a20
SQL> select dbms_stats.get_prefs('GLOBAL_TEMP_TABLE_STATS') prefs from dual;

PREFS
--------------------
SESSION

SQL> exec dbms_stats.gather_table_stats(user, 't1');

Using session id n°16, I have created a simple global temporary table, inserted into it few rows and ended up by gathering statistics on this table using the default private SESSION preference. Simply put, statistics gathered under this preference are not propagated to all other sessions. This is why it is also known as PRIVATE statistic. The current session global temporary table (GTT) statistics are not shared with other sessions using the same GTT.

This is clearly shown via the following select


SQL> SELECT
        table_name,
        blocks,
        num_rows,
        scope
    FROM
        user_tab_statistics
    WHERE
        table_name = 'T1';

TABLE_NAME       BLOCKS   NUM_ROWS SCOPE
------------ ---------- ---------- -------
T1                                 SHARED
T1                   55      19296 SESSION

As you can see, the only available statistics are those that are local to the current session.

Let’s now execute a simple query and get its execution plan

SQL>select owner, count(1)
    from t1
    where object_type = 'TABLE'
    group by owner
    order by 2 desc;

OWNER                            COUNT(1)
------------------------------ ----------
MDSYS                                 151
SYSTEM                                131
ORDDATA                                90
XDB                                    56
CTXSYS                                 53
DVSYS                                  44
GSMADMIN_INTERNAL                      42
WMSYS                                  40
LBACSYS                                22
DBSNMP                                 20
OJVMSYS                                 6
C##MHOURI                               5
APPQOSSYS                               5
ORDSYS                                  4
DBSFWUSER                               3
OUTLN                                   3
OLAPSYS                                 2
AUDSYS                                  1

18 rows selected.

SQL> select * from table(dbms_xplan.display_cursor);

SQL_ID  c1j862cvqgh99, child number 0
-------------------------------------
select owner, count(1) from t1 where object_type = 'TABLE' group by
owner order by 2 desc

Plan hash value: 2808104874
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |    19 (100)|          |
|   1 |  SORT ORDER BY      |      |    24 |   360 |    19  (11)| 00:00:01 |
|   2 |   HASH GROUP BY     |      |    24 |   360 |    19  (11)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| T1   |   772 | 11580 |    17   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("OBJECT_TYPE"='TABLE')

Note
-----
   - Global temporary table session private statistics used
   

The above Note at the bottom of the execution plan is clearly indicating that Oracle has used the session private statistics of the GTT.

To end up the setup I am going to run the same query but from a different session as shown below:


SQL> select distinct sid from v$mystat;

       SID
----------
       394
 
SQL> select count(1) from t1;

  COUNT(1)
----------
         0
		 
SQL> insert into t1 select owner, object_type from all_objects where owner  = 'CTXSYS';
	   		
SQL>select owner, count(1)
    from t1
    where object_type = 'TABLE'
    group by owner
    order by 2 desc;

OWNER            COUNT(1)
-------------- ----------
CTXSYS                 53

SQL> select * from table(dbms_xplan.display_cursor);

SQL_ID  c1j862cvqgh99, child number 1  --> new child cursor
-------------------------------------
select owner, count(1) from t1 where object_type = 'TABLE' group by
owner order by 2 desc

Plan hash value: 2808104874
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |       |       |     4 (100)|          |
|   1 |  SORT ORDER BY      |      |    53 |  4399 |     4  (50)| 00:00:01 |
|   2 |   HASH GROUP BY     |      |    53 |  4399 |     4  (50)| 00:00:01 |
|*  3 |    TABLE ACCESS FULL| T1   |    53 |  4399 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("OBJECT_TYPE"='TABLE')

Note
-----
   - dynamic statistics used: dynamic sampling (level=2)

As you can see the same query has refused to share the previous execution plan (child cursor 0 from session 16) and has hard parsed a new child cursor n°1 (in session 394). This is simply due to the following reason:

SQL> @nonshared c1j862cvqgh99

Show why existing SQL child cursors were not reused (V$SQL_SHARED_CURSOR)...

SQL_ID                        : c1j862cvqgh99
ADDRESS                       : 00007FFCCD36B030
CHILD_ADDRESS                 : 00007FFCCD3698D8
CHILD_NUMBER                  : 0
REASON                        : <ChildNode><ChildNumber>0</ChildNumber><ID>46</ID>
                                <reason>Session Specific Cursor SessionMismatch(1)</reason>
								<size>7x4</size><session_mismatch>1</session_mismatch>
								<current_qcsesidx>394</current_qcsesidx>       --> here the current SID
								<current_qcsesser>58570</current_qcsesser><current_qcinst>1</current_qcinst>
								<stored_qcsesidx>16</stored_qcsesidx> --> the SID of the original child cursor
								<stored_qcsesser>58601</stored_qcsesser>
								<stored_qcinst>1</stored_qcinst></ChildNode>
CON_ID                        : 1
-----------------
SQL_ID                        : c1j862cvqgh99
ADDRESS                       : 00007FFCCD36B030
CHILD_ADDRESS                 : 00007FFCCD348328
CHILD_NUMBER                  : 1
REASON                        :
CON_ID                        : 1
-----------------

PL/SQL procedure successfully completed.

Conclusion

In light of what I have demonstrated above I can, finally, define this non-sharing reason as:

(Y|N) The session specific cursor environment does not match the existing child cursor

With the first session mismatch being:

  • SessionMismatch(1) –> points to GTT private statistics

March 7, 2020

SQL Monitor flaw

Filed under: SQL Monitor — hourim @ 8:13 am

Observe the following real-life real time SQL Monitor report and try to figure out why the Duration is not equal to the Elapsed time

 Global Information
------------------------------
 Status              :  DONE (ERROR)              
 Instance ID         :  3                         
 Session             :  XXXX (1016:28171) 
 SQL ID              :  94dp7vscw26sf             
 SQL Execution ID    :  50331648                  
 Execution Started   :  01/21/2020 13:50:26       
 First Refresh Time  :  01/21/2020 13:50:26       
 Last Refresh Time   :  01/21/2020 13:50:30       
 Duration            :  4s                        
 Module/Action       :  SQL*Plus/-                
 Service             :  YYYYY                   
 Program             :  sqlplus.exe               

Global Stats
===================================================================================
| Elapsed |   Cpu   |    IO    | Application | Cluster  | Buffer |    Offload     |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) |  Gets  | Returned Bytes |
===================================================================================
|    1981 |     249 |     1687 |        0.00 |       45 |     8M |           92MB |
===================================================================================

In the following series of articles about the SQL monitor Duration and Elapsed Time:

I’ve explained why we should always rely on the Duration information as it represents the wall clock time of the query. The Elapsed time doesn’t include time spent by Idle session fetching data. And, when a SQL statement runs parallelly, the Elapsed time includes execution time of both the Query Coordinator and all its parallel server slaves.

So, If I tell you that there is no parsing effect in the above query and that there are no fetch calls, you would normally infer that it has really spent 4 seconds to complete; right?

What if I tell you that this query took approximatively 1981 seconds? Will you believe me?

Indeed, this is a parallel query using DOP 8 which took about 1981 seconds and where the Duration (4 seconds) represents only the time taken by the Query Coordinator. This is due to a flaw I have already explained in this article but where I’ve not been so smart, at that time, to point out the Duration flaw as well.

DONE (ERROR)

In contrast to what the status is saying, this query completes without error. And that is the first clue that should always put you on the right path when trouble shooting performance issue via SQL monitor reports. When this 12cR2 SQL query has been handled to me for investigation I immediately launched it again and here’s below the different phases reported by the SQL Monitor:

Global Information
------------------------------
 Status              :  EXECUTING                 
 Instance ID         :  3                         
 Session             :  XXXX (1016:28171) 
 SQL ID              :  94dp7vscw26sf             
 SQL Execution ID    :  50331648                  
 Execution Started   :  01/21/2020 13:50:26       
 First Refresh Time  :  01/21/2020 13:50:26       
 Last Refresh Time   :  01/21/2020 13:50:30       
 Duration            :  119s                      
 Module/Action       :  SQL*Plus/-                
 Service             :  YYYYY
 Program             :  sqlplus.exe               

Global Stats
===================================================================================
| Elapsed |   Cpu   |    IO    | Application | Cluster  | Buffer |    Offload     |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) |  Gets  | Returned Bytes |
===================================================================================
|     126 |      43 |       82 |        0.00 |     1.98 |   459K |           84MB |
===================================================================================
Parallel Execution Details (DOP=8 , Servers Allocated=16)

../..

Global Information
------------------------------
 Status              :  EXECUTING                 
 Instance ID         :  3                         
 Session             :  XXXX (1016:28171) 
 SQL ID              :  94dp7vscw26sf             
 SQL Execution ID    :  50331648                  
 Execution Started   :  01/21/2020 13:50:26       
 First Refresh Time  :  01/21/2020 13:50:26       
 Last Refresh Time   :  01/21/2020 13:50:30       
 Duration            :  1026s                     
 Module/Action       :  SQL*Plus/-                
 Service             :  YYYYY
 Program             :  sqlplus.exe               

Global Stats
==========================================================================
| Elapsed |   Cpu   |    IO    | Application | Cluster  |    Offload     |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Returned Bytes |
==========================================================================
|    1049 |     146 |      879 |        0.00 |       24 |           88MB |
==========================================================================
Parallel Execution Details (DOP=8 , Servers Allocated=16)

So far so good right?

In fact, if you look closely there is already something odd? Isn’t it?

Have you spotted out that the Last Refresh Time column is not changing?

Anyway, while the query was still running I launched the following query to see how the workload is distributed among the different parallel server slaves

@ getTimeSqlmon.sql
col exec_start format a25
col process    format a10
col status     format a15
select
    -- sql_id
    sql_exec_id   
   ,to_char(sql_exec_start,'dd/mm/yyyy hh24:mi:ss') exec_start
   ,to_char(sysdate,'dd/mm/yyyy hh24:mi:ss') current_time   
   ,process_name process
   ,round((sysdate-sql_exec_start) * 60 * 24,2) "minutes"
   ,round(elapsed_time/1e6,3) "elaps(sec)"
   ,round(cpu_time/1e6,3) "cpu(sec)"
   ,sql_plan_hash_value   "plan hash value"
   ,status
from gv$sql_monitor
where
  sql_id = '&sql_id'
order by exec_start,action;

SQL> @GetTimeSQLMon2
Enter value for sql_id: 94dp7vscw26sf

SQL_EXEC_ID EXEC_START            PROCE    mm elaps(sec)   cpu(sec) SQL_PLAN_HASH_VALUE STATUS
----------- --------------------- ----- ----- ---------- ---------- ------------------- ------
   50331648 21/01/2020 13:50:26   p002  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p003  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p008  16.35      3.364       .478        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p000  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p00f  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p007  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p00d  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p005  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p004  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p00c  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p001  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p009  16.35    978.915    136.371        524852868 EXECUTING ---------->>>>>>
   50331648 21/01/2020 13:50:26   p00e  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p006  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   ora   16.35       .152      4.125        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p00b  16.35          0          0        524852868 EXECUTING
   50331648 21/01/2020 13:50:26   p00a  16.35          0          0        524852868 EXECUTIN
                                  
 17 rows selected.   
SQL> @GetTimeSQLMon2
Enter value for sql_id: 94dp7vscw26sf

SQL_EXEC_ID EXEC_START           PROCESS  minutes elaps(sec)   cpu(sec) plan hash value STATUS   
----------- -------------------- -------- ------- ---------- ---------- --------------- ---------
   50331648 21/01/2020 13:50:26  p002       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p003       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p008       30.28      3.364       .478       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p000       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p00f       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p007       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p00d       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p005       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p004       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p00c       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p001       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p009       30.28   1813.916    225.829       524852868 EXECUTING ---------->>>>>>
   50331648 21/01/2020 13:50:26  p00e       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p006       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  ora        30.28       .152      4.125       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p00b       30.28          0          0       524852868 EXECUTING
   50331648 21/01/2020 13:50:26  p00a       30.28          0          0       524852868 EXECUTING
                                             
17 rows selected.

As you can see there is only a single parallel slave which is working (p009). All other slaves are waiting.

I managed to display, roughly, the number of minutes spent by each process not doing anything i.e. waiting. But if you read this article you will know that when a process waits for more than 30 minutes, its status will change to DONE (ERROR). Spot below how this is true:

SQL> @GetTimeSQLMon2
Enter value for sql_id: 94dp7vscw26sf

SQL_EXEC_ID EXEC_START            PROCESS  minutes elaps(sec)   cpu(sec)  STATUS
----------- --------------------- ------- -------- ---------- ----------  ------------
   50331648 21/01/2020 13:50:26   p002       30.58          0          0  DONE (ERROR) ---------->>>>>> IN ERROR after 30 min waiting
   50331648 21/01/2020 13:50:26   p003       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p008       30.58      3.364       .478  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p000       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00f       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p007       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00d       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p005       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p004       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00c       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p001       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p009       30.58   1833.977    228.496  EXECUTING    ---------->>>>>>
   50331648 21/01/2020 13:50:26   p00e       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p006       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   ora        30.58       .152      4.125  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00b       30.58          0          0  DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00a       30.58          0          0  DONE (ERROR)
                                          
17 rows selected.

And so on until the p009 process ends successfully

SQL> @GetTimeSQLMon2
Enter value for sql_id: 94dp7vscw26sf

SQL_EXEC_ID EXEC_START            PROCESS       minutes elaps(sec)   cpu(sec) STATUS
----------- --------------------- ---------- ---------- ---------- ---------- ---------------
   50331648 21/01/2020 13:50:26   p002            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p003            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p008            42.88      3.364       .478 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p000            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00f            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p007            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00d            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p005            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p004            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00c            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p001            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p009            42.88   1942.557    244.777 DONE (ALL ROWS) ---------->>>>>>
   50331648 21/01/2020 13:50:26   p00e            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p006            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   ora             42.88       .152      4.125 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00b            42.88          0          0 DONE (ERROR)
   50331648 21/01/2020 13:50:26   p00a            42.88          0          0 DONE (ERROR)

17 rows selected.

A couple of minutes before the p009 PX slave finishes its work, the Duration was almost reporting a correct time as the following proves:

Global Information
------------------------------
 Status              :  EXECUTING                 
 Instance ID         :  3                         
 Session             :  XXXX (1016:28171) 
 SQL ID              :  94dp7vscw26sf             
 SQL Execution ID    :  50331648                  
 Execution Started   :  01/21/2020 13:50:26       
 First Refresh Time  :  01/21/2020 13:50:26       
 Last Refresh Time   :  01/21/2020 13:50:30       
 Duration            :  1283s   ------>>>>>   spot the Duration here                        
 Module/Action       :  SQL*Plus/-                
 Service             :  YYYYY
 Program             :  sqlplus.exe               

Global Stats
========================================================================
| Elapsed |   Cpu   |    IO    | Application | Buffer |    Offload     |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   |  Gets  | Returned Bytes |
========================================================================
|    1309 |     174 |     1106 |        0.00 |     5M |           88MB |
========================================================================
Parallel Execution Details (DOP=8 , Servers Allocated=16)

Unfortunately, at the end of the query, the Duration is wrongly updated to show 4 seconds instead of the wall clock time which should be around 1981 seconds

Global Information
------------------------------
 Status              :  DONE (ERROR)              
 Instance ID         :  3                         
 Session             :  XXXX (1016:28171) 
 SQL ID              :  94dp7vscw26sf             
 SQL Execution ID    :  50331648                  
 Execution Started   :  01/21/2020 13:50:26       
 First Refresh Time  :  01/21/2020 13:50:26       
 Last Refresh Time   :  01/21/2020 13:50:30       
 Duration            :  4s   ------>>>>>   and spot the Duration at the end of the query                   
 Module/Action       :  SQL*Plus/-                
 Service             :  YYYYY
 Program             :  sqlplus.exe               

Global Stats
=====================================================================
| Elapsed |   Cpu   |    IO    | Cluster  | Buffer |    Offload     |
| Time(s) | Time(s) | Waits(s) | Waits(s) |  Gets  | Returned Bytes |
=====================================================================
|    1981 |     249 |     1687 |       45 |     8M |           92MB |
=====================================================================
Parallel Execution Details (DOP=8 , Servers Allocated=16)

Those 4 seconds represent either the time difference between the start of the query and the last refresh time or the time spent by the query coordinator which is 3.364+0.478 = 3.842 ~ 4 seconds

In passing here how the workload has been distributed when I fixed the issue

SQL> @getTimeSQLMon2
Enter value for sql_id: 2hrxjvurz5c0p

SQL_EXEC_ID EXEC_START            PROCESS minutes elaps(sec)   cpu(sec)  STATUS
----------- --------------------- ------- ------- ---------- ----------  ---------------
   50331648 21/01/2020 16:09:40   p002      10.88       .403        .52  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p009      10.88     38.332      5.065  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p000      10.88       .425       .533  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p008      10.88     38.138      4.623  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p003      10.88       .468       .628  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p00b      10.88     38.345      4.993  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p00d      10.88     38.206      4.362  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p005      10.88       .453       .613  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p00f      10.88     38.279      5.089  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p007      10.88       .429       .537  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p00c      10.88     38.211      5.443  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p004      10.88       .474       .632  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   ora       10.88       2.12      1.507  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p00e      10.88     38.348      6.344  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p006      10.88       .413       .613  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p001      10.88       .432       .589  DONE (ALL ROWS)
   50331648 21/01/2020 16:09:40   p00a      10.88     38.624      5.481  DONE (ALL ROWS)

17 rows selected.

Summary

When you retrospectively analyze a parallel query performance issue through its corresponding SQL monitoring showing a DONE (ERROR) status, then you should log in your brain the followings:

• There is a probability that the time reported by the Duration is wrong as it either reports the QC time or the time difference between the start of the query and the last time it has been refreshed
• Don’t trust the DONE (ERROR) status. The query might have completed successfully, and this status value is simply indicating that the QC and one or many of its parallel slaves have been waiting for more than 30 minutes

Next Page »

Blog at WordPress.com.

Tony's Oracle Tips

Tony Hasler's light hearted approach to learning about Oracle

Richard Foote's Oracle Blog

Focusing Specifically On Oracle Indexes, Database Administration and Some Great Music

Hatem Mahmoud Oracle's blog

Just another Oracle blog : Database topics and techniques

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Diagnostician

Performance troubleshooting as exact science

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)