Mohamed Houri’s Oracle Notes

September 2, 2015

CBO decision: unique or non-unique index?

Filed under: Oracle — hourim @ 5:43 pm

I have been asked to look at one of those few particular frustrating situations that only running systems can procure. It is an update of a single table using a complete set of its primary key columns in order to locate and update a unique row. This update looks like:

UPDATE T1
SET 
  {list of columns}
WHERE 
    T1_DATE    = :B9
AND T1_I_E_ID  = :B8
AND T1_TYPE    = :B7
AND DATE_TYPE  = :B6
AND T1_AG_ID   = :B5
AND T1_ACC_ID  = :B4
AND T1_SEC_ID  = :B3
AND T1_B_ID    = :B2
AND T1_FG_ID   = :B1;

The 9 columns in the above where clause represent the primary key of the T1 table. You might be surprised to know that this update didn’t used the primary key index and preferred instead a range scan of an existing 3 columns index plus a table access by index rowid to locate and update a unique row:

----------------------------------------------------------------------------
| Id  | Operation                    | Name   | Rows  | Bytes | Cost (%CPU)|
----------------------------------------------------------------------------
|   0 | UPDATE STATEMENT             |        |       |       |     1 (100)|
|   1 |  UPDATE                      | T1     |       |       |            |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1     |     1 |   122 |     1   (0)|
|   3 |    INDEX RANGE SCAN          | IDX_T1 |     1 |       |     1   (0)|
----------------------------------------------------------------------------
   2 - filter("T1_TYPE"=:B7 AND "DATE_TYPE"=:B6 AND "T1_I_E_ID"=:B8 AND 
              "T1_AG_ID"=TO_NUMBER(:B5) AND "T1_DATE"=TO_TIMESTAMP(:B9) AND 
              "T1_FG_ID"=TO_NUMBER(:B1))
   3 - access("T1_SEC_ID"=TO_NUMBER(:B3) AND "T1_B_ID"=TO_NUMBER(:B2) AND 
              "T1_ACC_ID"=TO_NUMBER(:B4)) 

The same update, when hinted with the primary key index, uses the following much desirable execution plan:

------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)|
------------------------------------------------------------------
|   0 | UPDATE STATEMENT   |        |     1 |   126 |     1   (0)|
|   1 |  UPDATE            | T1     |       |       |            |
|*  2 |   INDEX UNIQUE SCAN| PK_T19 |     1 |   126 |     1   (0)|
------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1_I_E_ID"=:B8 AND "T1_TYPE"=:B7 AND "DATE_TYPE"=:B6 
              AND "T1_AG_ID"=TO_NUMBER(:B5) AND "T1_ACC_ID"=TO_NUMBER(:B4) AND 
              "T1_SEC_ID"=TO_NUMBER(:B3) AND "T1_B_ID"=TO_NUMBER(:B2) AND 
              "T1_FG_ID"=TO_NUMBER(:B1) AND "T1_DATE"=TO_TIMESTAMP(:B9))

I don’t know how Oracle manage to get the same cost (Cost = 1) for two completely different indexes one with 9 columns and one(which is a subset of the first index) with only 3 columns (not necessarily of the same starting order though)?

So why Oracle has not selected the primary key unique index?

First here are below the available statistics on the primary key columns:

SQL> select
      column_name
     ,num_distinct
     ,num_nulls
     ,histogram
    from
      all_tab_col_statistics
    where
     table_name = 'T1'
   and
    column_name in ('T1_DATE'
                   ,'T1_I_E_ID'
                   ,'T1_TYPE'
                   ,'DATE_TYPE'
                   ,'T1_AG_ID'
                   ,'T1_ACC_ID'
                   ,'T1_SEC_ID'
                   ,'T1_B_ID'
                   ,'T1_FG_ID' );

COLUMN_NAME         NUM_DISTINCT  NUM_NULLS HISTOGRAM
------------------- ------------ ---------- ---------------
T1_I_E_ID              2          0 		FREQUENCY
T1_TYPE                5          0 		FREQUENCY
DATE_TYPE              5          0 		FREQUENCY
T1_AG_ID               106        0 		FREQUENCY
T1_ACC_ID             182         0 		FREQUENCY
T1_DATE               2861        0 		HEIGHT BALANCED
T1_SEC_ID             3092480     0 		NONE
T1_B_ID               1452        0 		HEIGHT BALANCED
T1_FG_ID              1           0 		FREQUENCY

And here’s the corresponding “update” 10053 trace file restricted only to the important part related to my investigations

***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: T1  Alias: T1
    #Rows: 387027527  #Blks:  6778908  AvgRowLen:  126.00  ChainCnt:  0.00

Index Stats::
Index: IDX_T1  Col#: 7 8 5
 LVLS: 3  #LB: 1568007  #DK: 3314835  LB/K: 1.00   DB/K: 25.00  CLUF: 84758374.00

Index: PK_T19  Col#: 1 2 3 4 5 7 8 37 6
 LVLS: 4  #LB: 4137117  #DK: 377310281  LB/K: 1.00  DB/K: 1.00  CLUF: 375821219.00

And the part of same trace file where the index choice is done

Access Path: index (UniqueScan)
    Index: PK_T19
    resc_io: 5.00  resc_cpu: 37647 ----------------------> spot this
    ix_sel: 0.000000  ix_sel_with_filters: 0.000000 
    Cost: 1.00  Resp: 1.00  Degree: 1
    ColGroup Usage:: PredCnt: 3  Matches Full:  Partial: 
    ColGroup Usage:: PredCnt: 3  Matches Full:  Partial:

Access Path: index (AllEqRange)
    Index: IDX_T1
    resc_io: 5.00  resc_cpu: 36797 ----------------------> spot this
    ix_sel: 0.000000  ix_sel_with_filters: 0.000000 
    Cost: 1.00  Resp: 1.00  Degree: 1

Best:: AccessPath: IndexRange
  Index: IDX_T1
    Cost: 1.00  Degree: 1  Resp: 1.00  Card: 0.00  Bytes: 0

Looking closely to the above trace file I didn’t find any difference in the index costing information (ix_sel_with_filters, resc_io, cost) which favours the non-unique IDX_T1 index over the PK_T19 primary key unique index except the resc_cpu value which equals 36797 for the former and 37647 for the latter index. I haven’t considered the clustering factor information because the index I wanted the CBO to use is a unique index. The two indexes have the same cost in this case. What extra information the CBO is using in this case to prefer the non-unique index over the unique one?

This issue remembered me an old otn thread in which the Original Poster says that, under the default CPU costing model, when two indexes have the same cost, Oracle will consider using the less CPU expensive index.

As far as I have a practical case of two different indexes with the same cost, I have decided to check this assumption by changing the costing model from CPU to I/O

SQL> alter session set "_optimizer_cost_model"=io;

SQL> explain plan for 
UPDATE T1
SET 
  {list of columns}
WHERE 
    T1_DATE   = :B9
AND T1_I_E_ID = :B8
AND T1_TYPE   = :B7
AND DATE_TYPE = :B6
AND T1_AG_ID  = :B5
AND T1_ACC_ID = :B4
AND T1_SEC_ID = :B3
AND T1_B_ID   = :B2
AND T1_FG_ID  = :B1;

SQL> select * from table(dbms_xplan.display);

Plan hash value: 704748203
-------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost  |
-------------------------------------------------------------
|   0 | UPDATE STATEMENT   |        |     1 |   126 |     1 |
|   1 |  UPDATE            | T1     |       |       |       |
|*  2 |   INDEX UNIQUE SCAN| PK_T19 |     1 |   126 |     1 |
-------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1_I_E_ID"=:B8 AND "T1_TYPE"=:B7 AND
              "DATE_TYPE"=:B6 AND "T1_AG_ID"=TO_NUMBER(:B5) AND
              "T1_ACC_ID"=TO_NUMBER(:B4) AND "T1_SEC_ID"=TO_NUMBER(:B3) AND
              "T1_B_ID"=TO_NUMBER(:B2) AND "T1_FG_ID"=TO_NUMBER(:B1) AND
              "T1_DATE"=TO_TIMESTAMP(:B9))

Note
-----
   - cpu costing is off (consider enabling it)

Spot on. We get the desired primary key index without any help.

However changing the default costing model is not acceptable in the client PRODUCTION database. Continuing my root causes investigations I was getting the feeling that histograms are messing up this CBO index choice. This is why I decided to give it a try and get rid of histograms and analyze the corresponding 10053 trace file:

SQL> exec dbms_stats.gather_table_stats (user, 'T1', method_opt => 'for all columns size 1');


***************************************
BASE STATISTICAL INFORMATION
***********************
Table Stats::
  Table: T1  Alias: T1
    #Rows: 387029216  #Blks:  6778908  AvgRowLen:  126.00  ChainCnt:  0.00

Index Stats::
  Index: IDX_T1  Col#: 7 8 5
  LVLS: 3  #LB: 1625338  #DK: 3270443  LB/K: 1.00   DB/K: 26.00  CLUF: 87831324.00
  
  Index: PK_T19  Col#: 1 2 3 4 5 7 8 37 6
  LVLS: 4  #LB: 4335908  #DK: 395902926  LB/K: 1.00  DB/K: 1.00  CLUF: 394578898.00

Access Path: index (UniqueScan)
    Index: PK_T19
    resc_io: 5.00  resc_cpu: 37647
    ix_sel: 0.000000  ix_sel_with_filters: 0.000000 
    Cost: 1.00  Resp: 1.00  Degree: 1
  ColGroup Usage:: PredCnt: 3  Matches Full: #1  Partial:  Sel: 0.0000
  ColGroup Usage:: PredCnt: 3  Matches Full: #1  Partial:  Sel: 0.0000

  Access Path: index (AllEqRange)
    Index: IDX_T1
    resc_io: 31.00  resc_cpu: 370081
    ix_sel: 0.000000  ix_sel_with_filters: 0.000000 
    Cost: 6.20  Resp: 6.20  Degree: 1 ------> spot the cost

Access Path: index (AllEqUnique)
    Index: PK_T19
    resc_io: 5.00  resc_cpu: 37647
    ix_sel: 0.000000  ix_sel_with_filters: 0.000000 
    Cost: 1.00  Resp: 1.00  Degree: 1
 One row Card: 1.000000

  Best:: AccessPath: IndexUnique
  Index: PK_T19
    Cost: 1.00  Degree: 1  Resp: 1.00  Card: 1.00  Bytes: 0

And now that the cost of accessing the non-unique index became 6 times (Cost = 6,2) greater than that of the unique index (Cost =1) Oracle preferred the primary key index without any help:

============
Plan Table
============
-------------------------------------------------+----------------------+
| Id  | Operation           | Name  | Rows  | Bytes | Cost  | Time      |
-------------------------------------------------+----------------------+
| 0   | UPDATE STATEMENT    |       |       |       |     1 |           |
| 1   |  UPDATE             | T1    |       |       |       |           |
| 2   |   INDEX UNIQUE SCAN | PK_T19|     1 |   126 |     1 |  00:00:01 |
-------------------------------------------------+-----------------------+

The final acceptable decision to solve this issue was to hint an instance of the same query to use the primary key index and attach an SQL profile to the original packaged query using the plan_hash_value of the primary key index execution plan.

Bottom Line: under the default CPU costing model, when two (or more) indexes have the same cost Oracle will prefer using the index that it is going to consume the less amount of CPU (resc_cpu). And, before jumping on collecting histograms (particularly the Height Balanced ones) by default, be informed that they do participate in the perception Oracle has on the amount of CPU the different indexes will need and ultimately on the CBO index desirability.

August 21, 2015

Cardinality Feedback: a practical case

Filed under: Oracle — hourim @ 5:36 pm

Here it is an interesting case of cardinality feedback collected from an 11.2.0.3 running system. A simple query against a single table has a perfect first execution response time with, according to the human eyes, a quite acceptable difference between Oracle cardinality estimates and actual rows as shown below:

SELECT 
   tr_id
FROM 
    t1 t1
WHERE 
     t1.t1_col_name= 'GroupID'
AND  t1.t1_col_value= '6276931'
AND EXISTS(SELECT 
               1 
            FROM  
                t1 t2
            WHERE t1.tr_id   = t2.tr_id
            AND   t2.t1_col_name= 'TrRangeOrder'
            AND   t2.t1_col_value= 'TrOrderPlace'
           );

SQL_ID  8b3tv5uh8ckfb, child number 0
-------------------------------------

Plan hash value: 1066392926
--------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                    | Starts | E-Rows | A-Rows |   A-Time   |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                         |      1 |        |      1 |00:00:00.14 |
|   1 |  NESTED LOOPS SEMI           |                         |      1 |      1 |      1 |00:00:00.14 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1                      |      1 |      1 |      6 |00:00:00.07 |
|*  3 |    INDEX RANGE SCAN          | IDX_T1_NAME_VALUE       |      1 |      1 |      6 |00:00:00.03 |
|*  4 |   TABLE ACCESS BY INDEX ROWID| T1                      |      6 |      1 |      1 |00:00:00.07 |
|*  5 |    INDEX UNIQUE SCAN         | T1_PK                   |      6 |      1 |      6 |00:00:00.07 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."T1_COL_NAME"='GroupID' AND "T1"."T1_COL_VALUE"='6276931')
   4 - filter("T2"."T1_COL_VALUE"='TrOrderPlace')
   5 - access("T1"."TR_ID"="T2"."TR_ID" AND 
               "T2"."T1_COL_NAME"='TrRangeOrder')

And here it is the second dramatic execution plan and response time due, this time, to cardinality feedback optimisation:


SQL_ID  8b3tv5uh8ckfb, child number 1
-------------------------------------
Plan hash value: 3786385867
----------------------------------------------------------------------------------------------------------
| Id  | Operation                      | Name                    | Starts | E-Rows | A-Rows |   A-Time   |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |                         |      1 |        |      1 |00:10:40.14 |
|   1 |  NESTED LOOPS                  |                         |      1 |        |      1 |00:10:40.14 |
|   2 |   NESTED LOOPS                 |                         |      1 |      1 |    787K|00:09:31.00 |
|   3 |    SORT UNIQUE                 |                         |      1 |      1 |    787K|00:02:44.83 |
|   4 |     TABLE ACCESS BY INDEX ROWID| T1                      |      1 |      1 |    787K|00:02:41.58 |
|*  5 |      INDEX RANGE SCAN          | IDX_T1_NAME_VALUE       |      1 |      1 |    787K|00:00:36.46 |
|*  6 |    INDEX UNIQUE SCAN           | T1_PK                   |    787K|      1 |    787K|00:06:45.25 |
|*  7 |   TABLE ACCESS BY INDEX ROWID  | T1                      |    787K|      1 |      1 |00:05:00.24 |
----------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T2"."T1_COL_NAME"='TrRangeOrder' AND "T2"."T1_COL_VALUE"='TrOrderPlace')
   6 - access("T1"."TR_ID"="T2"."TR_ID" AND "T1"."T1_COL_NAME"='GroupID')
   7 - filter("T1"."T1_COL_VALUE"='6276931')

Note
-----
   - cardinality feedback used for this statement

There is no real noticeable difference between actual and estimated rows in the first run of the query (E-Rows =1 versus A-Rows = 6) which implies a new re-optimisation. But Oracle did it and marked the child cursor n°0 candidate for a cardinality feedback:

SQL> select
       sql_id
      ,child_number
      ,use_feedback_stats
    from
      v$sql_shared_cursor
    where
      sql_id = '8b3tv5uh8ckfb';

SQL_ID        CHILD_NUMBER U
------------- ------------ -
8b3tv5uh8ckfb            0 Y

The bad news however with this Oracle decision is that we went from a quasi-instantaneous response time to a catastrophic 10 min. In the first plan the always suspicious estimated ‘’1’’ cardinality is not significantly far from actual rows (6), so why then Oracle has decided to re-optimize the first cursor? It might be “possible” that when Oracle rounds up its cardinality estimation to 1 for a cursor that has been previously monitored for cardinality feedback, it flags somewhere that this cursor is subject to a re-optimization during its next execution whatever the actual rows will be (close to 1 or not)?

Fortunately, this second execution has also been marked for re-optimisation:

SQL> select
       sql_id
      ,child_number
      ,use_feedback_stats
    from
      v$sql_shared_cursor
    where
      sql_id = '8b3tv5uh8ckfb';

SQL_ID        CHILD_NUMBER U
------------- ------------ -
8b3tv5uh8ckfb            0 Y
8b3tv5uh8ckfb            1 Y

And the third execution of the query produces the following interesting execution plan

SQL_ID  8b3tv5uh8ckfb, child number 2
-------------------------------------

Plan hash value: 1066392926
--------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                    | Starts | E-Rows | A-Rows |   A-Time   |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                         |      1 |        |      1 |00:00:00.01 |
|   1 |  NESTED LOOPS SEMI           |                         |      1 |      1 |      1 |00:00:00.01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1                      |      1 |      6 |      6 |00:00:00.01 |
|*  3 |    INDEX RANGE SCAN          | IDX_T1_NAME_VALUE       |      1 |      6 |      6 |00:00:00.01 |
|*  4 |   TABLE ACCESS BY INDEX ROWID| T1                      |      6 |      1 |      1 |00:00:00.01 |
|*  5 |    INDEX UNIQUE SCAN         | T1_PK                   |      6 |      1 |      6 |00:00:00.01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."T1_COL_NAME"='GroupID' AND "T1"."T1_COL_VALUE"='6276931')
   4 - filter("T2"."T1_COL_VALUE"='TrOrderPlace')
   5 - access("T1"."TR_ID"="T2"."TR_ID" AND
              "T2"."T1_COL_NAME"='TrRangeOrder')

Note
-----
   - cardinality feedback used for this statement   

Oracle is back to its first execution plan. The new estimations coincide perfectly with the actuals so that Oracle decided to stop monitoring this cursor with cardinality feedback as shown below:

SQL> select
       sql_id
      ,child_number
      ,use_feedback_stats
    from
      v$sql_shared_cursor
    where
      sql_id = '8b3tv5uh8ckfb';

SQL_ID        CHILD_NUMBER U
------------- ------------ -
8b3tv5uh8ckfb            0 Y
8b3tv5uh8ckfb            1 Y
8b3tv5uh8ckfb            2 N

Several questions come to my mind at this stage of the investigation:

  1.  What are the circumstances for which Oracle marks a cursor for cardinality feedback optimisation?
  2. How Oracle decides that E-Rows are significantly different from A-Rows and henceforth a cursor re-optimization will be done? In other words is E-Rows =1 significantly different from A-Rows =6? Or does that suspicious cardinality 1 participate in Oracle decision to re-optimize a cursor monitored with cardinality feedback?

Let’s try to answer the first question. There is only one unique table involved in this query with two conjunctive predicates. The two predicate columns have the following statistics

SQL> select
        column_name
       ,num_distinct
       ,density
       ,histogram
     from 
	    all_tab_col_statistics
     where
        table_name = 'T1'
     and
       column_name in ('T1_COL_NAME','T1_COL_VALUE');

COLUMN_NAME     NUM_DISTINCT    DENSITY HISTOGRAM
--------------- ------------ ---------- ---------------
T1_COL_NAME           103     4,9781E-09 FREQUENCY
T1_COL_VALUE      14833664   ,000993049  HEIGHT BALANCED

The presence of histograms, particularly the HEIGHT BALANCED, on these two columns participates strongly in the Oracle decision to monitor the cursor for cardinality feedback. In order to be sure of it I decided to get rid of histograms from both columns and re-query again:

SQL> select
        column_name
       ,num_distinct
       ,density
       ,histogram
     from 
	    all_tab_col_statistics
     where
        table_name = 'T1'
     and
       column_name in ('T1_COL_NAME','T1_COL_VALUE');

COLUMN_NAME     NUM_DISTINCT    DENSITY HISTOGRAM
--------------- ------------ ---------- ---------
T1_COL_NAME           103    ,009708738 NONE
T1_COL_VALUE      15477760   6,4609E-08 NONE

The new cursor is not anymore monitored with the cardinality feedback as shown below:

SQL_ID  fakc7vfbu1mam, child number 0
-------------------------------------

Plan hash value: 739349168
--------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                    | Starts | E-Rows | A-Rows |   A-Time   |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                         |      1 |        |      1 |00:02:00.68 |
|*  1 |  HASH JOIN SEMI              |                         |      1 |      6 |      1 |00:02:00.68 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1                      |      1 |      6 |      6 |00:00:00.01 |
|*  3 |    INDEX RANGE SCAN          | IDX_T1_NAME_VALUE       |      1 |      6 |      6 |00:00:00.01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| T1                      |      1 |      6 |    787K|00:02:00.14 |
|*  5 |    INDEX RANGE SCAN          | IDX_T1_NAME_VALUE       |      1 |      6 |    787K|00:00:12.36 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."TR_ID"="T2"."TR_ID")
   3 - access("T1"."T1_COL_NAME"='GroupID' AND "T1"."T1_COL_VALUE"='6276931')
   5 - access("T2"."T1_COL_NAME"='TrRangeOrder' AND "T2"."T1_COL_VALUE"='TrOrderPlace')

   SQL> select
         sql_id
        ,child_number
        ,use_feedback_stats
    from
       v$sql_shared_cursor
    where
        sql_id = 'fakc7vfbu1mam';

SQL_ID        CHILD_NUMBER U
------------- ------------ -
fakc7vfbu1mam            0 N --> cursor not re-optimisable 

Without histograms on the two columns Oracle has not monitored the query for cardinality feedback. Unfortunately getting rid of histogram was not an option accepted by the client nor changing this packaged query to force the optimizer not unnesting the EXISTS subquery with its parent query as far as the later is always generating a couple of rows that will not hurt performance when filtered with the EXIST subquery. Attaching a SQL Profile has also been discarded because several copies of the same query are found in the packaged application which would have necessitated a couple of extra SQL Profiles.

The last option that remains at my hands was to collect extended statistics so that Oracle will be able to get accurate estimations and henceforth will stop using cardinality feedback

SQL> SELECT
       dbms_stats.create_extended_stats
       (ownname   => user
       ,tabname   => 'T1'
       ,extension => '(T1_COL_NAME,T1_COL_VALUE)'
      )
  FROM dual;

DBMS_STATS.CREATE_EXTENDED_STATS(
---------------------------------
SYS_STUE3EBVNLB6M1SYS3A07$LD52

SQL> begin
      dbms_stats.gather_table_stats
            (user
           ,'T1'
           ,method_opt    => 'for columns SYS_STUE3EBVNLB6M1SYS3A07$LD52 size skewonly'
           ,cascade       => true
           ,no_invalidate => false
            );
    end;
    /

SQL> select
       column_name
      ,num_distinct
      ,density
      ,histogram
    from all_tab_col_statistics
    where
        table_name = 'T1'
    and column_name in ('T1_COL_NAME','T1_COL_VALUE', 'SYS_STUE3EBVNLB6M1SYS3A07$LD52');

COLUMN_NAME                    NUM_DISTINCT    DENSITY HISTOGRAM
------------------------------ ------------ ---------- ---------------
SYS_STUE3EBVNLB6M1SYS3A07$LD52     18057216 ,000778816 HEIGHT BALANCED
T1_COL_NAME                          103    4,9781E-09 FREQUENCY
T1_COL_VALUE                     14833664   ,000993049 HEIGHT BALANCED


SQL_ID  dn6p58b9b6348, child number 0
-------------------------------------
Plan hash value: 1066392926
--------------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name                    | Starts | E-Rows | A-Rows |   A-Time   |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                         |      1 |        |      1 |00:00:00.01 |
|   1 |  NESTED LOOPS SEMI           |                         |      1 |      3 |      1 |00:00:00.01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1                      |      1 |      3 |      6 |00:00:00.01 |
|*  3 |    INDEX RANGE SCAN          | IDX_T1_NAME_VALUE       |      1 |      3 |      6 |00:00:00.01 |
|*  4 |   TABLE ACCESS BY INDEX ROWID| T1                      |      6 |    832K|      1 |00:00:00.01 |
|*  5 |    INDEX UNIQUE SCAN         | T1_PK                   |      6 |      1 |      6 |00:00:00.01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."T1_COL_NAME"='GroupID' AND "T1"."T1_COL_VALUE"='6276931')
   4 - filter("T2"."T1_COL_VALUE"='TrOrderPlace')
   5 - access("T1"."TR_ID"="T2"."TR_ID" AND
              "T2"."T1_COL_NAME"='TrRangeOrder')

SQL> select
       sql_id
      ,child_number
      ,use_feedback_stats
    from
      v$sql_shared_cursor
    where
      sql_id = 'dn6p58b9b6348';
   
SQL_ID        CHILD_NUMBER U
------------- ------------ -
dn6p58b9b6348            0 N

This time, for E-Rows = 3 and A-Rows =6, Oracle decided that there is no significant difference between cardinality estimates and the actual rows so that the cursor is not anymore subject to cardinality feedback optimization.

You might have pointed out that I have forced the Extended Statistics column to have histogram. Otherwise the cardinality feedback will kicks off. In fact I have conducted several experiments to see when the cardinality feedback occurs and when not depending on the existence or the absence of the column group extension, its type of statistics and the statistics that have been gathered on the underlying two columns predicates:
cardinality feedback

 

August 12, 2015

Adaptive Cursor Sharing triggering mechanism

Filed under: cursor sharing — hourim @ 10:08 pm

Inspired by Dominic Brooks’ last post on SQL Plan Management choices, I decided to do the same work about my thoughts on Adaptive and Extended Cursor Sharing triggering mechanism:

ACS triggering diagramOnce a cursor is bind aware and subject to an eventual plan optimization at each execution keep a careful eye on the number of cursors the Extended Cursor Sharing Layer are going to produce

August 5, 2015

Flash back causing library cache: mutex X

Filed under: Oracle — hourim @ 5:28 pm

Recently one of our applications suffered from a severe performance issue. It is an application running on a database(11.2.0.4.0) used to validate a pre-production release. This performance issue has delayed the campaign test and the validation process for more than 3 days. The ASH data taken during the altered performance period shows this:

SQL> select event, count(1)
    from gv$active_session_history
    where sample_time between to_date('15072015 16:00:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('15072015 16:30:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;

EVENT                                COUNT(1)
----------------------------------- ----------
library cache: mutex X                    3928
kksfbc child completion                    655
cursor: pin S wait on X                    580
PX Deq: Slave Session Stats                278
                                           136
db file sequential read                    112
cursor: pin S                               35
null event                                  26
latch: shared pool                          15
cursor: mutex S                             13
library cache lock                          11
read by other session                       10
log file parallel write                      5
PX Deq: Signal ACK EXT                       3
os thread startup                            3
log file sync                                2
latch free                                   1
db file parallel write                       1
SQL*Net more data from client                1
enq: PS - contention                         1
cursor: mutex X                              1
direct path read                             1
control file sequential read                 1
CSS operation: action                        1

As you can notice the dominant wait event is:

EVENT                          COUNT(1)
------------------------------ -------
library cache: mutex X         3928

A library cache: mutex X wait event represents a concurrency wait event that is a part of 6 mutexes wait events

	CURSOR : pin S
	CURSOR : pin X
	CURSOR : pin S wait on X
	CURSOR : mutex S
	CURSOR : mutex X
	Library cache : mutex X

Mutexes are similar to locks except that they lock object in shared memory rather than rows in tables and indexes. Whenever a session wants to read or write into the library cache shared memory it needs to pin that object (cursor generally) and acquire a mutex on it. If another session wants simultaneously to read the same piece of memory it will try to acquire a mutex on it. This session might then wait on one of those library or cursor mutex wait event since another session has already preceded it and has still not released the latch (the mutex).

So, extended to my actual case what has been exaggerated so that a library cache: mutex X has made the database unusable?

SQL> select
       sql_id
      ,session_id
      ,in_parse
      ,in_sql_execution
    from
      gv$active_session_history
    where sample_time between to_date('15072015 16:00:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('15072015 16:30:00', 'ddmmyyyy hh24:mi:ss')
    and event = 'library cache: mutex X'
    order by sql_id; 

SQL_ID        SQL_CHILD_NUMBER SESSION_ID IN_PARSE   IN_SQL_EXECUTION
------------- ---------------- ---------- ---------- ----------------
20f2kut7fg4g0               -1          8 Y          N
20f2kut7fg4g0               -1         20 Y          N
20f2kut7fg4g0                1         24 N          Y
20f2kut7fg4g0               -1         40 Y          N
20f2kut7fg4g0               -1         60 Y          N
20f2kut7fg4g0               -1         88 Y          N
20f2kut7fg4g0                1         89 N          Y
20f2kut7fg4g0                1         92 N          Y
20f2kut7fg4g0               -1        105 Y          N
20f2kut7fg4g0               -1        106 Y          N
20f2kut7fg4g0                1        109 N          Y
20f2kut7fg4g0               -1        124 Y          N
20f2kut7fg4g0               -1        128 Y          N
20f2kut7fg4g0               -1        143 Y          N
20f2kut7fg4g0               -1        157 Y          N
20f2kut7fg4g0                1        159 N          Y
20f2kut7fg4g0               -1        160 Y          N
20f2kut7fg4g0               -1        161 Y          N
20f2kut7fg4g0                1        172 N          Y
20f2kut7fg4g0                1        178 N          Y
20f2kut7fg4g0               -1        191 Y          N
20f2kut7fg4g0               -1        192 Y          N
20f2kut7fg4g0               -1        194 Y          N
20f2kut7fg4g0                1        209 N          Y
20f2kut7fg4g0               -1        223 Y          N
20f2kut7fg4g0                1        229 N          Y
20f2kut7fg4g0               -1        241 Y          N
20f2kut7fg4g0               -1        246 Y          N
20f2kut7fg4g0               -1        258 Y          N
20f2kut7fg4g0                1        259 N          Y
20f2kut7fg4g0                1        280 N          Y
20f2kut7fg4g0                1        294 N          Y
20f2kut7fg4g0               -1        309 Y          N
20f2kut7fg4g0               -1        310 Y          N
20f2kut7fg4g0               -1        328 Y          N
20f2kut7fg4g0                1        348 N          Y
20f2kut7fg4g0               -1        382 Y          N
20f2kut7fg4g0               -1        413 Y          N
20f2kut7fg4g0               -1        415 Y          N
20f2kut7fg4g0                1        428 N          Y
20f2kut7fg4g0               -1        449 Y          N
20f2kut7fg4g0               -1        450 Y          N
20f2kut7fg4g0               -1        462 Y          N
20f2kut7fg4g0                1        467 N          Y
20f2kut7fg4g0               -1        480 Y          N
20f2kut7fg4g0               -1        484 Y          N
20f2kut7fg4g0                1        516 N          Y
20f2kut7fg4g0               -1        533 Y          N
20f2kut7fg4g0                1        535 N          Y
20f2kut7fg4g0               -1        546 Y          N
20f2kut7fg4g0               -1        565 Y          N
20f2kut7fg4g0                1        568 N          Y
20f2kut7fg4g0               -1        584 Y          N
20f2kut7fg4g0               -1        585 Y          N
20f2kut7fg4g0               -1        601 Y          N
20f2kut7fg4g0               -1        602 Y          N
20f2kut7fg4g0               -1        615 Y          N
20f2kut7fg4g0               -1        619 Y          N
20f2kut7fg4g0               -1        635 Y          N
20f2kut7fg4g0               -1        652 Y          N
20f2kut7fg4g0               -1        667 Y          N
20f2kut7fg4g0               -1        668 Y          N
20f2kut7fg4g0               -1        687 Y          N
20f2kut7fg4g0               -1        705 Y          N
20f2kut7fg4g0               -1        717 Y          N
20f2kut7fg4g0               -1        721 Y          N
20f2kut7fg4g0               -1        733 Y          N
20f2kut7fg4g0               -1        735 Y          N
20f2kut7fg4g0               -1        753 Y          N
20f2kut7fg4g0               -1        754 Y          N
20f2kut7fg4g0                1        770 N          Y
20f2kut7fg4g0               -1        773 Y          N
20f2kut7fg4g0               -1        785 Y          N
20f2kut7fg4g0               -1        786 Y          N
20f2kut7fg4g0               -1        804 Y          N

75 rows selected.

I have limited the output to just one sql_id (20f2kut7fg4g0) in order to keep the explanation clear and simple.

What does represent this particular sql_id which is executed by 75 different sessions that are sometimes in parse and sometimes in execution?

SQL> with got_my_sql_id
    as ( select sql_id, count(1)
    from gv$active_session_history
    where sample_time between to_date('16072015 09:30:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('16072015 10:30:00', 'ddmmyyyy hh24:mi:ss')
    and event  = 'library cache: mutex X'
    group by sql_id)
    select distinct sql_id, sql_text
    from v$sql b
   where exists (select null
                 from got_my_sql_id a
                 where a.sql_id = b.sql_id)
   order by sql_id; 

SQL_ID        SQL_TEXT
------------ ----------------------------------------------------------------------
20f2kut7fg4g0 /* Flashback Table */ INSERT /*+ PARALLEL(S, DEFAULT) PARALLEL(T,
                 DEFAULT) */ INTO "DEV_ZXX"."CLOSED_DAY" SELECT
              /*+ USE_NL(S) ORDERED PARALLEL(S, DEFAULT) PARALLEL(T, DEFAULT) */
              S.* FROM SYS_TEMP_FBT T , "DEV_ZXX"."CLOSED_DAY"
              as of SCN :1 S WHERE T.rid = S.
              rowid and T.action = 'I' and T.object# = :2

The above piece of SQL code is generated by Oracle behind the scene when flashing backward a content of a given table. And this is exactly what this client was doing. At the end of their pre-production test campaign they flash back a certain number of tables to the data they have at the beginning of the test. And since the generated code uses parallel run with default degree it has produce such a kind of monitored execution plan


Parallel Execution Details (DOP=96 , Servers Allocated=96) 

SQL Plan Monitoring Details (Plan Hash Value=4258977226)
===============================================================================
| Id |        Operation        |      Name       |  Rows   | Execs |   Rows   |
|    |                         |                 | (Estim) |       | (Actual) |
===============================================================================
|  0 | INSERT STATEMENT        |                 |         |     1 |          |
|  1 |   LOAD AS SELECT        |                 |         |     1 |          |
|  2 |    PX COORDINATOR       |                 |         |    82 |          |
|  3 |     PX SEND QC (RANDOM) | :TQ10000        |     322 |    81 |          |
|  4 |      PX BLOCK ITERATOR  |                 |     322 |    81 |          |
|  5 |       TABLE ACCESS FULL | TABLE_RULE_SUP  |     322 |     4 |          |
===============================================================================

For every flashed back table Oracle started 96 parallel servers (96 sessions) in order to do a simple insert statement causing the observed library cache mutex X wait event. The DOP 96 is the maximum DOP which represents in fact the default DOP determined by the following simplified formula:

DOP = PARALLEL_THREADS_PER_CPU x CPU_COUNT
DOP = 2 x 48 = 96
SQL> show parameter parallel

NAME                               TYPE        VALUE
---------------------------------- ----------- ----------
fast_start_parallel_rollback       string      LOW
parallel_adaptive_multi_user       boolean     TRUE
parallel_automatic_tuning          boolean     FALSE
parallel_degree_limit              string      CPU
parallel_degree_policy             string      MANUAL
parallel_execution_message_size    integer     16384
parallel_force_local               boolean     FALSE
parallel_instance_group            string
parallel_io_cap_enabled            boolean     FALSE
parallel_max_servers               integer     100
parallel_min_percent               integer     0
parallel_min_servers               integer     0
parallel_min_time_threshold        string      AUTO
parallel_server                    boolean     FALSE
parallel_server_instances          integer     1
parallel_servers_target            integer     100
_parallel_syspls_obey_force        boolean     TRUE
parallel_threads_per_cpu           integer     2
recovery_parallelism               integer     0
SQL> show parameter cpu

NAME                           TYPE        VALUE
------------------------------ ----------- -----
cpu_count                       integer     48
parallel_threads_per_cpu        integer     2
resource_manager_cpu_allocation integer     48

Having no possibility to hint the internal flash back Oracle code so that it will not execute in parallel, all what I have been left with is to pre-empt Oracle from starting a huge number of parallel process by limiting the parallel_max_servers parameter to 8 and, and as such, the maximum DOP will be limited to 8 whatever the cpu_count is.

Once this done I observed the following new situation for one flashed back sql_id (a5u912v53t11t)


Global Information
------------------------------
 Status              :  DONE
 Instance ID         :  1
 Session             :  XXXXX (172:40851)
 SQL ID              :  a5u912v53t11t
 SQL Execution ID    :  16777236
 Execution Started   :  07/16/2015 11:21:55
 First Refresh Time  :  07/16/2015 11:21:55
 Last Refresh Time   :  07/16/2015 11:21:55
 Duration            :  .011388s
 Module/Action       :  JDBC Thin Client/-
 Service             :  SYS$USERS
 Program             :  JDBC Thin Client
 DOP Downgrade       :  92%                       

Global Stats
=======================================================
| Elapsed |   Cpu   | Concurrency |  Other   | Buffer |
| Time(s) | Time(s) |  Waits(s)   | Waits(s) |  Gets  |
=======================================================
|    0.05 |    0.00 |        0.04 |     0.00 |     19 |
=======================================================

Parallel Execution Details (DOP=8 , Servers Requested=96 , Servers Allocated=8)
==============================================================================
|      Name      | Type  | Server# | Elapsed |   Cpu   | Concurrency |Buffer |
|                |       |         | Time(s) | Time(s) |  Waits(s)   | Gets  |
==============================================================================
| PX Coordinator | QC    |         |    0.00 |    0.00 |             |     4 |
| p000           | Set 1 |       1 |    0.01 |         |        0.01 |     3 |
| p001           | Set 1 |       2 |    0.01 |         |        0.01 |     3 |
| p002           | Set 1 |       3 |    0.00 |         |        0.00 |     3 |
| p003           | Set 1 |       4 |    0.00 |         |        0.00 |     3 |
| p004           | Set 1 |       5 |    0.00 |    0.00 |        0.00 |     3 |
| p005           | Set 1 |       6 |    0.01 |         |        0.01 |       |
| p006           | Set 1 |       7 |    0.00 |         |        0.00 |       |
| p007           | Set 1 |       8 |    0.01 |         |        0.01 |       |
==============================================================================

SQL Plan Monitoring Details (Plan Hash Value=96405358)
============================================================
| Id |        Operation        |   Name   |  Rows   | Cost |
|    |                         |          | (Estim) |      |
============================================================
|  0 | INSERT STATEMENT        |          |         |      |
|  1 |   LOAD AS SELECT        |          |         |      |
|  2 |    PX COORDINATOR       |          |         |      |
|  3 |     PX SEND QC (RANDOM) | :TQ10000 |     409 |    2 |
|  4 |      PX BLOCK ITERATOR  |          |     409 |    2 |
|  5 |       TABLE ACCESS FULL | TABLE_CLS|     409 |    2 |
============================================================

Notice how Oracle has serviced the insert statement with 8 parallel servers instead of the requested 96 servers. This is a clear demonstration of how to bound the default DOP

Parallel Execution Details (DOP=8, Servers Requested=96, Servers Allocated=8)

Unfortunately, despite this implicit parallel run limitation, the application was still suffering from the same library cache symptoms (less than before thought) as shown below:

SQL> select event, count(1)
    from gv$active_session_history
    where sample_time between to_date('16072015 10:57:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('16072015 11:30:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;

EVENT                                  COUNT(1)
------------------------------------ ----------
library cache: mutex X                      518
                                            382
db file sequential read                     269
read by other session                        42
kksfbc child completion                      37
null event                                   31
log file parallel write                      18
cursor: pin S wait on X                      12
latch: shared pool                            7
cursor: pin S                                 7
log file sync                                 5
latch free                                    5
enq: RO - fast object reuse                   3
SQL*Net more data from client                 2
db file parallel write                        2
enq: CR - block range reuse ckpt              1
os thread startup                             1
SQL> select sql_id, session_id,in_parse, in_sql_execution
    from gv$active_session_history
    where sample_time between to_date('16072015 10:57:00', 'ddmmyyyy hh24:mi:ss'
                      and     to_date('16072015 11:30:00', 'ddmmyyyy hh24:mi:ss'
    and event = 'library cache: mutex X'
    order by sql_id;

SQL_ID        SESSION_ID IN_PARSE IN_SQL_EXEC
------------- ---------- -------- -----------
a5u912v53t11t        516 Y 	 N
a5u912v53t11t        494 Y 	 N
a5u912v53t11t        343 Y 	 N
a5u912v53t11t        482 Y 	 N

Finally we agreed with the client to disable parallelism (by setting the parallel_max_servers parameter value to 1) so that the flash back treatment will go serially:

SQL> show parameter parallel_max_servers

NAME                           TYPE        VALUE
------------------------------ ----------- -----
parallel_max_servers           integer     1

Once this has been done the test campaign finally started to perform very quickly with the following picture from ASH:

SQL> select event, count(1)
    from gv$active_session_history
    where sample_time between to_date('16072015 14:15:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('16072015 15:30:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;

EVENT                                  COUNT(1)
------------------------------------- ----------
                                             966
db file sequential read                      375
db file scattered read                        49
log file parallel write                       46
log file sync                                 22
db file parallel write                        13
null event                                     8
local write wait                               7
SQL*Net more data from client                  5
os thread startup                              3
reliable message                               3
enq: PS - contention                           3
enq: RO - fast object reuse                    3
cursor: pin S wait on X                        1
direct path read                               1
Disk file operations I/O                       1
enq: CR - block range reuse ckpt               1
enq: TX - row lock contention                  1

The flashed back treatment ceases completely from being run in parallel and the campaign test started again to perform quickly.

This is not an invitation to go with drastic and brutal workaround to reduce the effect of many sessions waked up due to a very high degree of parallelism itself due to the default maximum DOP. It represents a demonstration on

  • how a high degree of parallelism can affect the locking in the library cache
  • how the parallel_max_servers parameter can bound the DOP of your query

August 4, 2015

Degree of Parallelism is 16 because of table property

Filed under: Oracle — hourim @ 10:11 am

I have been pleasantly surprised by the following Note at the bottom of an execution plan coming from a 12.1.0.2.0 Oracle instance


SQL> select * from v$version;

BANNER                                                                               CON_ID
-------------------------------------------------------------------------------- ----------
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production              0
PL/SQL Release 12.1.0.2.0 - Production                                                    0
CORE    12.1.0.2.0      Production                                                        0
TNS for Linux: Version 12.1.0.2.0 - Production                                            0
NLSRTL Version 12.1.0.2.0 - Production                                                    0


SQL> create table t_par as select rownum n1, trunc((rownum -1/3)) n2, mod(rownum, 5) n3
    from dual
    connect by level<=1e6;
  
SQL> create index t_part_idx on t_par(n1);

Index created.
 
SQL> alter table t_par parallel 16;

Table altered.  

SQL> select count(1) from t_par where n1> 1;

  COUNT(1)
----------
    999999

SQL> select * from table(dbms_xplan.display_cursor);
----------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |       |       |    48 (100)|          |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |     5 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |     5 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |     5 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          |   999K|  4882K|    48   (3)| 00:00:01 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL| T_PAR    |   999K|  4882K|    48   (3)| 00:00:01 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)
       filter("N1">1)

Note
-----
   - Degree of Parallelism is 16 because of table property

As you can point it out, thanks to the above Note, we can immediately guess that the Optimizer decided to run the query in parallel because the T_PAR table has been decorated with a DOP of 16

SQL> select table_name,degree
  2  from user_tables
  3  where table_name = 'T_PAR';

TABLE_NAME    DEGREE
------------ -------
T_PAR        16 

A nice 12c add.

A couple of month ago a query running on 11.2.0.3 which used to run very quickly suddenly started deviating dangerously from its habitual execution time. The end user told me that they didn’t changed anything and he asked to investigate the root cause of this performance degradation. The corresponding SQL real time monitoring looks like:

Global Stats
======================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | Buffer | Read | Read  | Write | Write |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   |  Gets  | Reqs | Bytes | Reqs  | Bytes |
======================================================================================
|     799 |     443 |      356 |        0.01 |     3M | 398K |  11GB |  122K |  24GB |
======================================================================================

Parallel Execution Details (DOP=4 , Servers Allocated=8)
SQL Plan Monitoring Details (Plan Hash Value=637438362)
========================================================================================================================
| Id    |                 Operation                  |       Name        |  Rows   |Execs |   Rows   | Temp | Activity |
|       |                                            |                   | (Estim) |      | (Actual) |      |   (%)    |
========================================================================================================================
|     0 | SELECT STATEMENT                           |                   |         |    9 |        0 |      |     0.13 |
|     1 |   PX COORDINATOR                           |                   |         |    9 |          |      |          |
|     2 |    PX SEND QC (RANDOM)                     | :TQ10003          |     19M |    4 |          |      |          |
|     3 |     HASH JOIN RIGHT SEMI                   |                   |     19M |    4 |        0 |      |          |
|     4 |      PX RECEIVE                            |                   |      3M |    4 |     1853 |      |          |
|     5 |       PX SEND HASH                         | :TQ10002          |      3M |    4 |     1853 |      |          |
|     6 |        VIEW                                | VW_NSO_1          |      3M |    4 |     1853 |      |          |
|     7 |         FILTER                             |                   |         |    4 |     1853 |      |          |
|     8 |          NESTED LOOPS                      |                   |      3M |    4 |     1853 |      |          |
|     9 |           BUFFER SORT                      |                   |         |    4 |       38 |      |          |
|    10 |            PX RECEIVE                      |                   |         |    4 |       38 |      |          |
|    11 |             PX SEND ROUND-ROBIN            | :TQ10000          |         |    1 |       38 |      |          |
|    12 |              HASH JOIN                     |                   |   69556 |    1 |       38 |      |          |
|    13 |               INLIST ITERATOR              |                   |         |    1 |     6258 |      |          |
|    14 |                TABLE ACCESS BY INDEX ROWID | TAB_001X          |   69556 |  840 |     6258 |      |          |
|    15 |                 INDEX RANGE SCAN           | IDX_TAB_001X25    |   69556 |  840 |     6258 |      |          |
|    16 |               INDEX FAST FULL SCAN         | PK_TAB_00X13      |     18M |    1 |      19M |      |     0.27 |
|    17 |           INDEX RANGE SCAN                 | PK_IDX_MAIN_TAB   |      36 |   38 |     1853 |      |          |
| -> 18 |      BUFFER SORT                           |                   |         |    4 |        0 |  26G |    34.18 |
| -> 19 |       PX RECEIVE                           |                   |    648M |    4 |     566M |      |     4.14 |
| -> 20 |        PX SEND HASH                        | :TQ10001          |    648M |    1 |     566M |      |    13.89 |
| -> 21 |         TABLE ACCESS FULL                  | MAIN_TABLE_001    |    648M |    1 |     566M |      |    47.40 |
========================================================================================================================

The BUFFER SORT operation at line 18 was killing the performance of this query as far as it was buffering 566M of rows.

Looking back to the previous execution plans shows that they were serial plans!!! What makes this new plan running in parallel? I was practically sure from where this was coming. I know that this application rebuilds indexes from time to time. And I know that very often, they use parallel rebuild to accelerate the operation. But I know also that very often, DBA forget to set back the indexes at their default value at the end of the index rebuild process. Indeed the primary index PK_IDX_MAIN_TAB was at a DOP of 4 while it shouldn’t. Putting back this index to degree 1 sets back the corresponding execution plan to the serial execution plan the underlying query used to follow in the past:

Global Stats
=================================================
| Elapsed |   Cpu   |  Other   | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  |
=================================================
|      43 |      43 |     0.02 |    11 |     4M |
=================================================

SQL Plan Monitoring Details (Plan Hash Value=1734192894)
============================================================================================
| Id |              Operation              |       Name        |  Rows   |Execs |   Rows   |
|    |                                     |                   | (Estim) |      | (Actual) |
============================================================================================
|  0 | SELECT STATEMENT                    |                   |         |    1 |      108 |
|  1 |   HASH JOIN RIGHT SEMI              |                   |     19M |    1 |      108 |
|  2 |    VIEW                             | VW_NSO_1          |    701K |    1 |      108 |
|  3 |     FILTER                          |                   |         |    1 |      108 |
|  4 |      NESTED LOOPS                   |                   |    701K |    1 |      108 |
|  5 |       HASH JOIN                     |                   |   19387 |    1 |        3 |
|  6 |        INLIST ITERATOR              |                   |         |    1 |        3 |
|  7 |         TABLE ACCESS BY INDEX ROWID | TAB_001X          |   19387 |  168 |        3 |
|  8 |          INDEX RANGE SCAN           | IDX_TAB_001X25    |   19387 |  168 |        3 |
|  9 |        INDEX FAST FULL SCAN         | PK_TAB_00X13      |     18M |    1 |      19M |
| 10 |       INDEX RANGE SCAN              | PK_IDX_MAIN_TAB   |      36 |    3 |      108 |
| 11 |    TABLE ACCESS FULL                | MAIN_TABLE_001    |    648M |    1 |     677M |
============================================================================================ 

In this context of rebuild indexes left at a DOP > 1 and this nicely 12c added Note about the reason for which Oracle has decided to use parallel run, I was curious to know if the 12c Note will show the same information if the parallel plan was due to an index having a DOP > 1

SQL> alter table t_par noparallel;

SQL> alter index T_PART_IDX parallel 16;

SQL> select count(1) from t_par where n1> 1;

  COUNT(1)
----------
    999999

SQL> select * from table(dbms_xplan.display_cursor);

SQL_ID  4s7n5z52gun33, child number 0
-------------------------------------
---------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name       | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |            |       |       |   610 (100)|          |        |      |            |
|   1 |  SORT AGGREGATE           |            |     1 |     5 |            |          |        |      |            |
|   2 |   PX COORDINATOR          |            |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)    | :TQ10000   |     1 |     5 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE        |            |     1 |     5 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR    |            |   999K|  4882K|   610   (1)| 00:00:01 |  Q1,00 | PCWC |            |
|*  6 |       INDEX FAST FULL SCAN| T_PART_IDX |   999K|  4882K|   610   (1)| 00:00:01 |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)
       filter("N1">1)

Unfortunately there is no Note indicating that the above parallel execution plan is due to the parallel degree of the index T_PART_IDX.

July 9, 2015

Stressed ASH

Filed under: Oracle — hourim @ 5:29 pm

It is well known that any record found in dba_hist_active_session_history has inevitably been routed there from v$active_session_history. If so, then how could we interpret the following cut & past from a running production system?

ASH first

SQL> select event, count(1)
    from gv$active_session_history
    where sample_time between to_date('06072015 18:30:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('06072015 19:30:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
                                                                        372
direct path read                                                        185
log file parallel write                                                  94
Disk file Mirror Read                                                    22
control file sequential read                                             20
control file parallel write                                              18
direct path write temp                                                   16
Streams AQ: qmn coordinator waiting for slave to start                   12
db file parallel read                                                    11
gc cr multi block request                                                 6
enq: KO - fast object checkpoint                                          4
db file sequential read                                                   3
ges inquiry response                                                      3
os thread startup                                                         2
PX Deq: Signal ACK RSG                                                    2
enq: CF - contention                                                      1
PX Deq: Slave Session Stats                                               1
Disk file operations I/O                                                  1
IPC send completion sync                                                  1
reliable message                                                          1
null event                                                                1
enq: CO - master slave det                                                1
db file parallel write                                                    1
gc current block 2-way                                                    1

AWR next

SQL> select event, count(1)
    from dba_hist_active_sess_history
    where sample_time between to_date('06072015 18:30:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('06072015 19:30:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
SQL*Net break/reset to client                                         12950
enq: TM - contention                                                  12712
                                                                        624
db file sequential read                                                 386
enq: TX - row lock contention                                           259
SQL*Net message from dblink                                              74
direct path read                                                         62
SQL*Net more data from dblink                                            27
log file parallel write                                                  26
log file sync                                                            15
SQL*Net more data from client                                             9
control file sequential read                                              7
Disk file Mirror Read                                                     6
gc cr grant 2-way                                                         5
db file parallel write                                                    4
read by other session                                                     3
control file parallel write                                               3
Streams AQ: qmn coordinator waiting for slave to start                    3
log file sequential read                                                  2
direct path read temp                                                     2
enq: KO - fast object checkpoint                                          2
gc cr multi block request                                                 1
CSS initialization                                                        1
gc current block 2-way                                                    1
reliable message                                                          1
db file parallel read                                                     1
gc buffer busy acquire                                                    1
ges inquiry response                                                      1
direct path write temp                                                    1
rdbms ipc message                                                         1
os thread startup                                                         1

12,950 snapshots of SQL*Net break/reset to client and 12,712 snapshots of an enq: TM – contention wait events in AWR not found in ASH. How can we interpret this situation?

This 11.2.0.4.0 database is implemented under a RAC infrastructure with 2 instances. Let’s look at the ASH of the two instances separately

Instance 1 first

SQL> select event, count(1)
    from v$active_session_history
    where sample_time between to_date('06072015 18:30:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('06072015 19:30:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;

 no rows selected

Instance 2 next

SQL> select event, count(1)
    from v$active_session_history
    where sample_time between to_date('06072015 18:30:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('06072015 19:30:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
                                                                        372
direct path read                                                        185
log file parallel write                                                  94
Disk file Mirror Read                                                    22
control file sequential read                                             20
control file parallel write                                              18
direct path write temp                                                   16
Streams AQ: qmn coordinator waiting for slave to start                   12
db file parallel read                                                    11
gc cr multi block request                                                 6
enq: KO - fast object checkpoint                                          4
db file sequential read                                                   3
ges inquiry response                                                      3
os thread startup                                                         2
PX Deq: Signal ACK RSG                                                    2
enq: CF - contention                                                      1
PX Deq: Slave Session Stats                                               1
Disk file operations I/O                                                  1
IPC send completion sync                                                  1
reliable message                                                          1
null event                                                                1
enq: CO - master slave det                                                1
db file parallel write                                                    1
gc current block 2-way                                                    1

All what is sampled in ASH in that specific time interval is coming from the second instance while the first instance doesn’t report any record for the corresponding time interval. This inevitably questions either the ash size of instance one or an imbalanced workload between the two instances:

ASH size first

SQL> select
  2        inst_id
  3        ,total_size
  4      from gv$ash_info;

   INST_ID TOTAL_SIZE
---------- ----------
         1  100663296
         2  100663296

ASH Activity next

SQL> select
        inst_id
       ,total_size
       ,awr_flush_emergency_count
     from gv$ash_info;

   INST_ID TOTAL_SIZE AWR_FLUSH_EMERGENCY_COUNT
---------- ---------- -------------------------
         1  100663296                       136
         2  100663296                         0

Typically the activity is mainly oriented towards instance 1 and the abnormal and unusual 12,712 SQL*Net break/reset to client wait events have exacerbated the rate of insert into ASH buffers of instance one generating the 136 awr_flush_emergency_count and, as such, the discrepancies between ASH and AWR.

This is also confirmed by the difference in the ASH retention period between the two instances

Instance 1 first where only 3 hours of ASH data are kept

SQL> select min(sample_time), max(sample_time)
  2  from v$active_session_history;

MIN(SAMPLE_TIME)                         MAX(SAMPLE_TIME)
---------------------------------------  -------------------------
08-JUL-15 05.51.20.502 AM                08-JUL-15 08.35.48.233 AM

Instance 2 next where several days worth of ASH data are still present

SQL> select min(sample_time), max(sample_time)
  2  from v$active_session_history;

MIN(SAMPLE_TIME)                         MAX(SAMPLE_TIME)
---------------------------------------  -------------------------
25-JUN-15 20.01.43                       08-JUL-15 08.37.17.233 AM

The solution would be one of the following points (I think in the order of priority):

  • Solve this SQL*Net break/reset to client issue which is dramatically filling up the ash buffer causing unexpected rapid flush of important and more precise data
  • Balance the work load activity between the two instances
  • Increase the ash size of the instance 1 by means of alter system set “_ash_size”=25165824;

In the next article I will explain how I have identified what is causing this unusual SQL*Net break/reset to client wait events.

July 2, 2015

Don’t pre-empt the CBO from doing its work

Filed under: Oracle — hourim @ 2:03 pm

This is the last part of the parallel insert/select saga. As a reminder below is the two preceding episodes:

  •  Part 1: where I have explained why I was unable to get the corresponding SQL monitoring report because of the _sqlmon_max_planlines parameter.
  •  Part 2: where I have explained the oddity shown by the SQL monitoring report when monitoring non active parallel server for more than 30 minutes.

In Part 3 I will share with you how I have succeeded to solve this issue and convinced people to not pre-empt the Oracle optimizer from doing its work.

Thanks to the monitoring of this insert/select I have succeeded to isolate the part of the execution plan that needs absolutely to be tuned:

Error: ORA-12805
------------------------------
ORA-12805: parallel query server died unexpectedly

Global Information
------------------------------
 Status                                 :  DONE (ERROR)
 Instance ID                            :  2
 SQL ID                                 :  bg7h7s8sb5mnt
 SQL Execution ID                       :  33554432
 Execution Started                      :  06/24/2015 05:06:14
 First Refresh Time                     :  06/24/2015 05:06:21
 Last Refresh Time                      :  06/24/2015 09:05:10
 Duration                               :  14336s
 DOP Downgrade                          :  50%                 

Global Stats
============================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | Cluster  |  Other   | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Waits(s) |  Gets  | Reqs | Bytes |
============================================================================================
|   38403 |   35816 |     0.42 |        2581 |     0.16 |     6.09 |     7G |  103 | 824KB |
============================================================================================

SQL Plan Monitoring Details (Plan Hash Value=3668294770)
======================================================================================================
| Id  |                Operation         |             Name  |  Rows   | Execs |   Rows   | Activity |
|     |                                  |                   | (Estim) |       | (Actual) |   (%)    |
======================================================================================================
| 357 |VIEW PUSHED PREDICATE             | NAEHCE            |      59 | 23570 |    23541 |          |
| 358 | NESTED LOOPS                     |                   |      2M | 23570 |    23541 |     0.05 |
| 359 |  INDEX FAST FULL SCAN            | TABLEIND1         |   27077 | 23570 |     667M |     0.19 |
| 360 |  VIEW                            | VW_JF_SET$E6DCA8A3|       1 |  667M |    23541 |     0.10 |
| 361 |   UNION ALL PUSHED PREDICATE     |                   |         |  667M |    23541 |    30.59 |
| 362 |    NESTED LOOPS                  |                   |       1 |  667M |     1140 |     0.12 |
| 363 |     TABLE ACCESS BY INDEX ROWID  | TABLE2            |       1 |  667M |    23566 |     1.25 |
| 364 |      INDEX UNIQUE SCAN           | IDX_TABLE2        |       1 |  667M |     667M |    17.81 |
| 365 |     TABLE ACCESS BY INDEX ROWID  | TABLE3            |       1 | 23566 |     1140 |          |
| 366 |      INDEX RANGE SCAN            | IDX_TABLE3        |      40 | 23566 |     174K |          |
| 367 |    NESTED LOOPS                  |                   |       1 |  667M |    22401 |     0.11 |
| 368 |     TABLE ACCESS BY INDEX ROWID  | TABLE2            |       1 |  667M |    23566 |     1.27 |
| 369 |      INDEX UNIQUE SCAN           | IDX_TABLE2        |       1 |  667M |     667M |    17.72 |
| 370 |     TABLE ACCESS BY INDEX ROWID  | TABLE3            |       1 | 23566 |    22401 |     0.01 |
| 371 |      INDEX RANGE SCAN            | TABLE31           |      36 | 23566 |       4M |          |

The NESTED LOOPS operation at line 358 has an INDEX FAST FULL SCAN (TABLEIND1) as an outer data source driven an inner data row source represented by an internal view (VW_JF_SET$E6DCA8A3) built by Oracle on the fly. Reduced to the bare minimum it should resemble to this:

SQL Plan Monitoring Details (Plan Hash Value=3668294770)
=====================================================================================
| Id  |                 Operation |             Name   |  Rows   | Execs |   Rows   |
|     |                           |                    | (Estim) |       | (Actual) |
=====================================================================================
| 358 |  NESTED LOOPS             |                    |      2M | 23570 |    23541 |
| 359 |   INDEX FAST FULL SCAN    | TABLEIND1          |   27077 | 23570 |     667M |
| 360 |   VIEW                    | VW_JF_SET$E6DCA8A3 |       1 |  667M |    23541 |

Observe carefully operation at line 359 which is the operation upon which Oracle makes its join method choice. Very often a NESTED LOOPS operation is wrongly chosen by the optimizer because of not accurate estimations made at the first operation of the NESTED LOOPS join. Let’s check the accuracy of the estimation done in this case by Oracle for operation at line 359:

   Rows(Estim) * Execs = 27077 * 23570 = 638204890 ~ 638M
   Rows(Actual = 667M

Estimations done by the optimizer at this step are good. So why in earth Oracle will decide to opt for a NESTED LOOPS operation when it knows prior the execution that the outer data row set will produce 667M of rows inducing the inner operations to be executed 667M times? There is no way that Oracle will opt for this solution unless it is instructed to do so. And indeed, looking to the huge insert/select statement I found, among a tremendous amount of hints, a use_nl (o h) hint which dictates the optimizer to join the TABLEIND table with the rest of the view using a NESTED LOOPS operation. It was then a battle to convince the client that he has to get rid of that hint. What makes the client hesitating is that very often the same insert/select statement (including the use_nl hint) completes in an acceptable time. I was then obliged to explain why despite the presence of the use_nl hint (I am suggesting to be the problem of the performance degradation) the insert/select very often completes in an acceptable execution time. To explain this situation it suffices to get the execution plan of the acceptable execution time (reduced to the bare minimum) and spot the obvious:

SQL Plan Monitoring Details (Plan Hash Value=367892000)
====================================================================================
| Id  |                Operation |             Name   |  Rows   | Execs |   Rows   |
|     |                          |                    | (Estim) |       | (Actual) |
====================================================================================
| 168 |VIEW PUSHED PREDICATE     | NAEHCE             |       1 | 35118 |    35105 |
| 169 | NESTED LOOPS             |                    |       2 | 35118 |    35105 |
| 170 |  VIEW                    | VW_JF_SET$86BE946E |       2 | 35118 |    35105 |
| 182 |  INDEX UNIQUE SCAN       | TABLEIND1          |       1 | 35105 |    35105 |

The join order switched from (TABLEIND1, VW_JF_SET$86BE946E) to (VW_JF_SET$86BE946E,TABLEIND1). As far as the use_nl (o h) hint is not completed by a leading (h o) hint in order to indicate in what order Oracle has to join this two objects, then the choice of the important outer operation is left to Oracle. When the index is chosen as the outer operation, the insert/select statement performs very poorly. However when the same index is used as the inner operation of the join then the insert/select statement performs in an acceptable time.

With that explained, the client has been convinced, the hints disabled and the insert/select re-launched and completed within few seconds thanks to the approriate HASH JOIN operation used by the optimizer:

Global Information
------------------------------
 Status                                 :  DONE
 Instance ID                            :  2
 SQL ID                                 :  9g2a3gstkr7dv
 SQL Execution ID                       :  33554432
 Execution Started                      :  06/24/2015 12:53:49
 First Refresh Time                     :  06/24/2015 12:53:52
 Last Refresh Time                      :  06/24/2015 12:54:05
 Duration                               :  16s                      

Global Stats
============================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | Cluster  |  Other   | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Waits(s) |  Gets  | Reqs | Bytes |
============================================================================================
|      23 |      21 |     0.91 |        0.03 |     0.22 |     0.31 |     1M |  187 |   1MB |
============================================================================================

SQL Plan Monitoring Details (Plan Hash Value=3871743977)
=================================================================================================
| Id  |                           Operation   |             Name   |  Rows   | Execs |   Rows   |
|     |                                       |                    | (Estim) |       | (Actual) |
=================================================================================================
| 153 |       VIEW                            | NAEHCE             |      2M |     1 |       2M |
| 154 |        HASH JOIN                      |                    |      2M |     1 |       2M |
| 155 |         INDEX FAST FULL SCAN          | TABLEIND1          |   27077 |     1 |    28320 |
| 156 |         VIEW                          | VW_JF_SET$86BE946E |      2M |     1 |       2M |

Spot as well that when the optimizer opted for a HASH JOIN operation the VIEW PUSHED PREDICATE operation and the JPPD (JOIN PREDICATE PUSH DOWN) underlying transformation cease to used because it is occurs only with NESTED LOOP.

Bottom line: always try to supply Oracle with fresh and representative statistics and let it do its job. Don’t pre-empt it from doing its normal work by systematically hinting it when confronted to a performance issue. And when you decide to use hints make sure to hint correctly particularly for the outer (build) table and the inner(probe) table in case of NESTED LOOPS (HASH JOIN) hinted operation.

June 23, 2015

Real Time SQL Monitoring oddity

Filed under: Oracle,Sql Plan Managment — hourim @ 1:45 pm

This is a small note about a situation I have encountered and which I thought it is worth sharing with you. There was an insert/select executing in parallel DOP 16 on a 11.2.0.3 Oracle database for which the end user was complaining about the exceptional time it was taking without completing. Since the job was still running I tried getting its Real Time SQL monitoring report:


Global Information
------------------------------
 Status              :  DONE (ERROR)        
 Instance ID         :  1                   
 Session             :  XXXXX (392:229)    
 SQL ID              :  bbccngk0nn2z2       
 SQL Execution ID    :  16777216            
 Execution Started   :  06/22/2015 11:57:06 
 First Refresh Time  :  06/22/2015 11:57:06 
 Last Refresh Time   :  06/22/2015 11:57:46 
 Duration            :  40s                 

Global Stats
=================================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency |  Other   | Buffer | Read | Read  | Write | Write |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) |  Gets  | Reqs | Bytes | Reqs  | Bytes |
=================================================================================================
|   15315 |   15220 |       54 |        0.38 |       40 |     2G | 8601 |   2GB |  5485 |   1GB |

The insert/select according to the above report summary is DONE with (ERROR).
So why the end user is still complaining about the not ending batch job? And why he didn’t receive an error?

After having ruled out the resumable time out hypothesis I came back to the v$sql_monitor and issued the following two selects:

SQL> SELECT
  2    sql_id,
  3    process_name,
  4    status
  5  FROM v$sql_monitor
  6  WHERE sql_id = 'bbccngk0nn2z2'
  7  AND status   ='EXECUTING'
  8  ORDER BY process_name ;

SQL_ID        PROCE STATUS
------------- ----- ------------
bbccngk0nn2z2 p000  EXECUTING
bbccngk0nn2z2 p001  EXECUTING
bbccngk0nn2z2 p002  EXECUTING
bbccngk0nn2z2 p003  EXECUTING
bbccngk0nn2z2 p004  EXECUTING
bbccngk0nn2z2 p005  EXECUTING
bbccngk0nn2z2 p006  EXECUTING
bbccngk0nn2z2 p007  EXECUTING
bbccngk0nn2z2 p008  EXECUTING
bbccngk0nn2z2 p009  EXECUTING
bbccngk0nn2z2 p010  EXECUTING
bbccngk0nn2z2 p011  EXECUTING
bbccngk0nn2z2 p012  EXECUTING
bbccngk0nn2z2 p013  EXECUTING
bbccngk0nn2z2 p014  EXECUTING
bbccngk0nn2z2 p015  EXECUTING
bbccngk0nn2z2 p019  EXECUTING
bbccngk0nn2z2 p031  EXECUTING

SQL> SELECT
  2    sql_id,
  3    process_name,
  4    status
  5  FROM v$sql_monitor
  6  WHERE sql_id = 'bbccngk0nn2z2'
  7  AND status   ='DONE (ERROR)'
  8  ORDER BY process_name ;

SQL_ID        PROCE STATUS
------------- ----- -------------------
bbccngk0nn2z2 ora   DONE (ERROR)
bbccngk0nn2z2 p016  DONE (ERROR)
bbccngk0nn2z2 p017  DONE (ERROR)
bbccngk0nn2z2 p018  DONE (ERROR)
bbccngk0nn2z2 p020  DONE (ERROR)
bbccngk0nn2z2 p021  DONE (ERROR)
bccngk0nn2z2  p022  DONE (ERROR)
bbccngk0nn2z2 p023  DONE (ERROR)
bbccngk0nn2z2 p024  DONE (ERROR)
bbccngk0nn2z2 p025  DONE (ERROR)
bbccngk0nn2z2 p026  DONE (ERROR)
bbccngk0nn2z2 p027  DONE (ERROR)
bbccngk0nn2z2 p028  DONE (ERROR)
bbccngk0nn2z2 p029  DONE (ERROR)
bbccngk0nn2z2 p030  DONE (ERROR)

Among the 32 parallel servers half are executing and half are in error! How could this be possible? I have already been confronted to a parallel process that ends in its entirety when a single parallel server is in error. For example I have encountered several times the following error which is due to a parallel broadcast distribution of a high data row source exploding henceforth the TEMP tablespace:

ERROR at line 1:
ORA-12801: error signaled in parallel query server P013
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

A simple select against v$active_session_history confirmed that the insert/select is still running and it is consuming CPU

SQL> select sql_id, count(1)
  2  from gv$active_session_history
  3  where sample_time between to_date('22062015 12:30:00', 'ddmmyyyy hh24:mi:ss')
  4                    and     to_date('22062015 13:00:00', 'ddmmyyyy hh24:mi:ss')
  5  group by  sql_id
  6  order by 2 desc;

SQL_ID          COUNT(1)
------------- ----------
bbccngk0nn2z2       2545
                       4
0uuczutvk6jqj          1
8f1sjvfxuup9w          1

SQL> select decode(event,null, 'on cpu', event), count(1)
  2  from gv$active_session_history
  3  where sample_time between to_date('22062015 12:30:00', 'ddmmyyyy hh24:mi:ss')
  4                    and     to_date('22062015 13:00:00', 'ddmmyyyy hh24:mi:ss')
  5  and sql_id = 'bbccngk0nn2z2'
  6  group by  event
  7  order by 2 desc;

DECODE(EVENT,NULL,'ONCPU',EVENT)  COUNT(1)
--------------------------------  ---------
on cpu                            5439
db file sequential read           3

SQL> /

DECODE(EVENT,NULL,'ONCPU',EVENT)  COUNT(1)
--------------------------------- ---------
on cpu                            5460
db file sequential read           3

SQL> /

DECODE(EVENT,NULL,'ONCPU',EVENT)  COUNT(1)
--------------------------------  ---------
on cpu                            5470
db file sequential read           3

And after a while


SQL> /

DECODE(EVENT,NULL,'ONCPU',EVENT)   COUNT(1)
---------------------------------- ---------
on cpu                             15152
db file sequential read            9

While the parallel insert is still running I took several SQL monitoring reports of which the two followings ones:

Parallel Execution Details (DOP=16 , Servers Allocated=32)
============================================================================================
|      Name      | Type  | Server# | Elapsed |Buffer | Read  |         Wait Events         |
|                |       |         | Time(s) | Gets  | Bytes |         (sample #)          |
============================================================================================
| PX Coordinator | QC    |         |    0.48 |  2531 | 16384 |                             |
| p000           | Set 1 |       1 |    1049 |  128M |  63MB | direct path read (1)        |
| p001           | Set 1 |       2 |    1518 |  222M |  61MB |                             |
| p002           | Set 1 |       3 |     893 |  109M |  59MB |                             |
| p003           | Set 1 |       4 |    1411 |  194M |  62MB | direct path read (1)        |
| p004           | Set 1 |       5 |     460 |   64M |  62MB | direct path read (1)        |
| p005           | Set 1 |       6 |     771 |   87M | 322MB | direct path read (1)        |
|                |       |         |         |       |       | direct path read temp (5)   |
| p006           | Set 1 |       7 |     654 |   67M |  62MB | direct path read (1)        |
| p007           | Set 1 |       8 |     179 |   24M |  55MB | direct path read (1)        |
| p008           | Set 1 |       9 |    1638 |  235M |  70MB |                             |
| p009           | Set 1 |      10 |     360 |   46M |  54MB | direct path read (1)        |
| p010           | Set 1 |      11 |    1920 |  294M | 337MB | direct path read temp (6)   | --> 1920s
| p011           | Set 1 |      12 |     289 |   30M |  69MB |                             |
| p012           | Set 1 |      13 |     839 |   98M |  66MB | direct path read (1)        |
| p013           | Set 1 |      14 |     524 |   63M |  55MB |                             |
| p014           | Set 1 |      15 |    1776 |  263M |  69MB |                             |
| p015           | Set 1 |      16 |    1016 |  130M |  61MB | direct path read (1)        |
| p016           | Set 2 |       1 |    0.22 |  1166 |   3MB |                             |
| p017           | Set 2 |       2 |    1.36 |  6867 |  51MB |                             |
| p018           | Set 2 |       3 |    1.02 |  1298 |  36MB |                             |
| p019           | Set 2 |       4 |    6.71 |  2313 | 129MB | direct path read temp (2)   |
| p020           | Set 2 |       5 |    0.40 |   978 |  16MB |                             |
| p021           | Set 2 |       6 |    1.32 |  8639 |  41MB | direct path read temp (1)   |
| p022           | Set 2 |       7 |    0.18 |   896 |   2MB |                             |
| p023           | Set 2 |       8 |    0.23 |   469 |   9MB |                             | --> 0.23s
| p024           | Set 2 |       9 |    0.52 |  3635 |  19MB |                             | --> 0.52s
| p025           | Set 2 |      10 |    0.33 |  1163 |   3MB |                             |
| p026           | Set 2 |      11 |    0.65 |   260 |  31MB | db file sequential read (1) |
| p027           | Set 2 |      12 |    0.21 |  1099 |   6MB |                             |
| p028           | Set 2 |      13 |    0.58 |   497 |  20MB |                             |
| p029           | Set 2 |      14 |    1.43 |  4278 |  54MB |                             |
| p030           | Set 2 |      15 |    0.30 |  3481 |   8MB |                             |
| p031           | Set 2 |      16 |    2.86 |   517 |  91MB |                             |
============================================================================================


Parallel Execution Details (DOP=16 , Servers Allocated=32)
=============================================================================================
|      Name      | Type  | Server# | Elapsed | Buffer | Read  |         Wait Events         |
|                |       |         | Time(s) |  Gets  | Bytes |         (sample #)          |
=============================================================================================
| PX Coordinator | QC    |         |    0.48 |   2531 | 16384 |                             |
| p000           | Set 1 |       1 |    1730 |   202M |  63MB | direct path read (1)        |
| p001           | Set 1 |       2 |    2416 |   351M |  61MB |                             |
| p002           | Set 1 |       3 |    1094 |   133M |  59MB |                             |
| p003           | Set 1 |       4 |    2528 |   348M |  64MB | direct path read (1)        |
| p004           | Set 1 |       5 |     965 |   129M |  63MB | direct path read (1)        |
| p005           | Set 1 |       6 |    1089 |   129M | 322MB | direct path read (1)        |
|                |       |         |         |        |       | direct path read temp (5)   |
| p006           | Set 1 |       7 |    1459 |   165M |  62MB | direct path read (1)        |
| p007           | Set 1 |       8 |     221 |    30M |  55MB | direct path read (1)        |
| p008           | Set 1 |       9 |    2640 |   357M |  70MB |                             |
| p009           | Set 1 |      10 |     952 |   115M |  54MB | direct path read (1)        |
| p010           | Set 1 |      11 |    3117 |   471M | 337MB | direct path read temp (6)   | --> 3117s
| p011           | Set 1 |      12 |     400 |    42M |  69MB |                             |
| p012           | Set 1 |      13 |    1621 |   195M |  66MB | direct path read (1)        |
| p013           | Set 1 |      14 |    1126 |   132M |  55MB |                             |
| p014           | Set 1 |      15 |    2662 |   370M |  72MB |                             |
| p015           | Set 1 |      16 |    1194 |   147M |  61MB | direct path read (1)        |
| p016           | Set 2 |       1 |    0.22 |   1166 |   3MB |                             |
| p017           | Set 2 |       2 |    1.36 |   6867 |  51MB |                             |
| p018           | Set 2 |       3 |    1.02 |   1298 |  36MB |                             |
| p019           | Set 2 |       4 |    6.72 |   2313 | 131MB | direct path read temp (2)   |
| p020           | Set 2 |       5 |    0.40 |    978 |  16MB |                             |
| p021           | Set 2 |       6 |    1.32 |   8639 |  41MB | direct path read temp (1)   |
| p022           | Set 2 |       7 |    0.18 |    896 |   2MB |                             |
| p023           | Set 2 |       8 |    0.23 |    469 |   9MB |                             | --> 0.23s
| p024           | Set 2 |       9 |    0.52 |   3635 |  19MB |                             | --> 0.52s
| p025           | Set 2 |      10 |    0.33 |   1163 |   3MB |                             |
| p026           | Set 2 |      11 |    0.65 |    260 |  31MB | db file sequential read (1) |
| p027           | Set 2 |      12 |    0.21 |   1099 |   6MB |                             |
| p028           | Set 2 |      13 |    0.58 |    497 |  20MB |                             |
| p029           | Set 2 |      14 |    1.43 |   4278 |  54MB |                             |
| p030           | Set 2 |      15 |    0.30 |   3481 |   8MB |                             |
| p031           | Set 2 |      16 |    2.89 |    517 |  92MB |                             |
=============================================================================================

If you look carefully to the above reports you will notice that the elapsed time of the parallel servers mentioned being in ERROR (p16-p30) is not increasing in contrast to the elapsed time of the parallel servers mentioned being in EXECUTION (p0-p15) which is continuously increasing.

Thanks to Randolf Geist (again) I knew that there is a bug in Real time SQL monitoring report which occurs when a parallel server is not working for more than 30 minutes. In such a case the Real time SQL monitoring will starts showing those parallel severs in in ERROR confusing the situation.

As far as I was able to reproduce the issue I started the process again at 16h03 and I kept executing the following select from time to time having no rows for each execution

SELECT 
  sql_id,
  process_name,
  status
FROM v$sql_monitor
WHERE sql_id = '5np4u0m0h69jx' –- changed a little bit the sql_id
AND status   ='DONE (ERROR)'
ORDER BY process_name ;

no rows selected

Until at around 16h37 i.e. after 30 minutes (and a little bit more) of execution the above select started showing processes in error:

SQL> SELECT
  2    sql_id,
  3    process_name,
  4    status
  5  FROM v$sql_monitor
  6  WHERE sql_id = '5np4u0m0h69jx'
  7  AND status   ='DONE (ERROR)'
  8  ORDER BY process_name ;

SQL_ID        PROCE STATUS
------------- ----- ---------------
5np4u0m0h69jx ora   DONE (ERROR)
5np4u0m0h69jx p016  DONE (ERROR)
5np4u0m0h69jx p017  DONE (ERROR)
5np4u0m0h69jx p018  DONE (ERROR)
5np4u0m0h69jx p020  DONE (ERROR)
5np4u0m0h69jx p021  DONE (ERROR)
5np4u0m0h69jx p022  DONE (ERROR)
5np4u0m0h69jx p023  DONE (ERROR)
5np4u0m0h69jx p024  DONE (ERROR)
5np4u0m0h69jx p025  DONE (ERROR)
5np4u0m0h69jx p026  DONE (ERROR)
5np4u0m0h69jx p027  DONE (ERROR)
5np4u0m0h69jx p028  DONE (ERROR)
5np4u0m0h69jx p029  DONE (ERROR)
5np4u0m0h69jx p030  DONE (ERROR)

At the very beginning of the process several parallel servers was not running while several others were busy. And when the first parallel server (p10 in this case) reaches more than 1800 seconds (1861 seconds in this case) the real time Sql monitoring started showing the not working parallel servers in ERROR.

Bottom line: don’t be confused (as I have been) by that DONE (ERROR) status, your SQL statement might still be running consuming time and energy despite this wrong real time SQL monitoring reporting status

June 12, 2015

Why Dynamic Sampling has not been used?

Filed under: Oracle — hourim @ 10:15 am

Experienced tuning guys are known for their pronounced sense of looking at details others are very often ignoring. This is why I am always paying attention to their answers in otn and oracle-l list. Last week I have been asked to look at a query performing badly which has been monitored via the following execution plan:

Global Information
------------------------------
 Status              :  EXECUTING
 Instance ID         :  1
 SQL ID              :  8114dqz1k5arj
 SQL Execution ID    :  16777217            

Global Stats
=============================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | Cluster  |  Other   | Buffer | Read  | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Waits(s) |  Gets  | Reqs  | Bytes |
=============================================================================================
|  141842 |  140516 |       75 |        5.82 |       69 |     1176 |    21G | 26123 | 204MB |
=============================================================================================

SQL Plan Monitoring Details (Plan Hash Value=3787402507)
===========================================================================================
| Id   |             Operation             |      Name       |  Rows   | Execs |   Rows   |
|      |                                   |                 | (Estim) |       | (Actual) |
===========================================================================================
|    0 | SELECT STATEMENT                  |                 |         |     1 |          |
|    1 |   SORT ORDER BY                   |                 |       1 |     1 |          |
|    2 |    FILTER                         |                 |         |     1 |          |
|    3 |     NESTED LOOPS                  |                 |         |     1 |        0 |
| -> 4 |      NESTED LOOPS                 |                 |       1 |     1 |       4G |
| -> 5 |       TABLE ACCESS BY INDEX ROWID | TABLEXXX        |       1 |     1 |     214K |
| -> 6 |        INDEX RANGE SCAN           | IDX_MESS_RCV_ID |      2M |     1 |     233K |
| -> 7 |       INDEX RANGE SCAN            | VGY_TEST2       |       1 |  214K |       4G |->
|    8 |      TABLE ACCESS BY INDEX ROWID  | T_TABL_YXZ      |       1 |    4G |        0 |->
|      |                                   |                 |         |       |          |
===========================================================================================

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(TO_DATE(:SYS_B_2,:SYS_B_3)<=TO_DATE(:SYS_B_4,:SYS_B_5))
   5 - filter(("TABLEXXX"."T_NAME"=:SYS_B_6 AND
              "TABLEXXX"."M_TYPE"=:SYS_B_0 AND
              "TABLEXXX"."A_METHOD"=:SYS_B_7 AND
              "TABLEXXX"."M_STATUS"<>:SYS_B_8))
   6 - access("TABLEXXX"."R_ID"=:SYS_B_1)
   7 - access("T_TABL_YXZ"."SX_DATE">=TO_DATE(:SYS_B_2,:SYS_B_3) AND
              "T_TABL_YXZ"."SX_DATE"<=TO_DATE(:SYS_B_4,:SYS_B_5))
   8 - filter("T_TABL_YXZ"."T_ID"="TABLEXXX"."T_ID")

Those 214K and 4G executions (Execs) of operations 7 and 8 respectively are the classical wrong NESTED LOOP join the CBO has decided to go with because of the wrong cardinality estimation at operation n° 5 (the double NESTED LOOP operation is the effect of the NLJ_BATCHING optimisation).
There was no previous historical plan_hash_value for this particular sql_id in order to compare with the current execution plan. But the report has certainly been executed in the past without any complaint from the end user.
The outline_data section of the execution plan is where I usually look when trying to understand what the optimizer has done behind the scene:

Outline Data
-------------
   /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('11.2.0.3')
      OPT_PARAM('_b_tree_bitmap_plans' 'false')
      OPT_PARAM('optimizer_dynamic_sampling' 4) ---------------------------> spot this
      OPT_PARAM('optimizer_index_cost_adj' 20)
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      INDEX_RS_ASC(@"SEL$1" "TABLEXXX"@"SEL$1" ("TABLEXXX"."R_ID"))
      INDEX(@"SEL$1" "T_TABL_YXZ"@"SEL$1" ("T_TABL_YXZ"."SX_DATE"
              "T_TABL_YXZ"."GL_ACCOUNT_ID" "T_TABL_YXZ"."CASH_ACCOUNT_ID"))
      LEADING(@"SEL$1" "TABLEXXX"@"SEL$1" "T_TABL_YXZ"@"SEL$1")
      USE_NL(@"SEL$1" "T_TABL_YXZ"@"SEL$1")
      NLJ_BATCHING(@"SEL$1" "T_TABL_YXZ"@"SEL$1")
      END_OUTLINE_DATA
  */

As you can see apart from the optimizer_index_cost_adj parameter value we should never change, there is one thing that has kept my attention: optimizer_dynamic_sampling. Since the outline is showing that the optimizer has used dynamic sampling why then there is no Note about dynamic sampling at the bottom of the above corresponding execution plan?
I decided to run the same query in a CLONE data base (cloned via RMAN). Below is the corresponding execution plan for the same set of input parameters:

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 SQL ID              :  8114dqz1k5arj
 SQL Execution ID    :  16777217
 Duration            :  904s           

SQL Plan Monitoring Details (Plan Hash Value=2202725716)
========================================================================================
| Id |            Operation             |      Name       |  Rows   | Execs |   Rows   |
|    |                                  |                 | (Estim) |       | (Actual) |
========================================================================================
|  0 | SELECT STATEMENT                 |                 |         |     1 |      280 |
|  1 |   SORT ORDER BY                  |                 |    230K |     1 |      280 |
|  2 |    FILTER                        |                 |         |     1 |      280 |
|  3 |     HASH JOIN                    |                 |    230K |     1 |      280 |
|  4 |      TABLE ACCESS BY INDEX ROWID | T_TABL_YXZ      |    229K |     1 |     301K |
|  5 |       INDEX RANGE SCAN           | VGY_TEST2       |       1 |     1 |     301K |
|  6 |      TABLE ACCESS BY INDEX ROWID | TABLEXXX        |    263K |     1 |       2M |
|  7 |       INDEX RANGE SCAN           | IDX_MESS_RCV_ID |      2M |     1 |       2M |
========================================================================================

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(TO_DATE(:SYS_B_2,:SYS_B_3)<=TO_DATE(:SYS_B_4,:SYS_B_5))
   3 - access("T_TABL_YXZ"."T_ID"="TABLEXXX"."T_ID")
   5 - access("T_TABL_YXZ"."SX_DATE">=TO_DATE(:SYS_B_2,:SYS_B_3) AND
              "T_TABL_YXZ"."SX_DATE"<=TO_DATE(:SYS_B_4,:SYS_B_5))
   6 - filter(("TABLEXXX"."T_NAME"=:SYS_B_6 AND
              "TABLEXXX"."M_TYPE"=:SYS_B_0 AND
              "TABLEXXX"."A_METHOD"=:SYS_B_7 AND
              "TABLEXXX"."M_STATUS"<>:SYS_B_8))
   7 - access("TABLEXXX"."R_ID"=:SYS_B_1)

Note
-----
   - dynamic sampling used for this statement (level=4)

In this CLONED database, in contrast to the Production database, the optimizer has used dynamic sampling at its level 4 and has come up with a different estimation when visiting TABLEXXX (263K instead of 1) and T_TABL_YXZ (229K instead of 1) tables so that it has judiciously opted for a HASH JOIN instead of that dramatic production NESTED LOOP operation making the query completing in 904 seconds.

The fundamental question turns then from why the report is performing badly to why the optimizer has ignored using dynamic sampling at level 4?

There are several ways to answer this question (a) 10053 trace file, (b) 10046 or (c) trace file or tracing directly dynamic sampling as it has been suggested to me by Stefan Koehler

SQL> alter session set events 'trace[RDBMS.SQL_DS] disk=high';

The corresponding 10053 optimizer trace shows the following lines related to dynamic sampling:

10053 of the COPY database

*** 2015-06-03 11:05:43.701
** Executed dynamic sampling query:
    level : 4
    sample pct. : 0.000489
    actual sample size : 837
    filtered sample card. : 1
    orig. card. : 220161278
    block cnt. table stat. : 6272290
    block cnt. for sampling: 6345946
    max. sample block cnt. : 32
sample block cnt. : 31
min. sel. est. : 0.00000000
** Using single table dynamic sel. est. : 0.00119474
  Table: TABLEXXX  Alias: TABLEXXX
    Card: Original: 220161278.000000  Rounded: 263036  Computed: 263036.17  Non Adjusted: 263036.17

In the COPY data base, the optimiser has used dynamic sampling at level 4 and did come up with a cardinality estimation of TABLEXXX of be 263K which obviously has conducted the CBO to opt for a reasonable HASH JOIN operation.

10053 of the PRODUCTION database

*** 2015-06-03 13:39:03.992
** Executed dynamic sampling query:
    level : 4
    sample pct. : 0.000482
    actual sample size : 1151
    filtered sample card. : 0  ------------------>  spot this information
    orig. card. : 220161278
    block cnt. table stat. : 6272290
    block cnt. for sampling: 6435970
    max. sample block cnt. : 32
sample block cnt. : 31
min. sel. est. : 0.00000000
** Not using dynamic sampling for single table sel. or cardinality.
DS Failed for : ----- Current SQL Statement for this session (sql_id=82x3mm8jqn5ah) -----
  Table: TABLEXXX  Alias: TABLEXXX
    Card: Original: 220161278.000000  Rounded: 1  Computed: 0.72  Non Adjusted: 0.72

In the PRODUCTION database, the CBO failed to use dynamic sampling at level 4 as clearly shown by the following line taken from the above 10053 trace file:

** Not using dynamic sampling for single table sel. or cardinality.
DS Failed for : ----- Current SQL Statement for this session (sql_id=82x3mm8jqn5ah)

PS: 10053 trace file has been applied on the important part of the query this is
    why the sql_id is not the same as the one mentioned above.

Thanks to Randolf Geist I learnt that the internal code of the Dynamic Sampling algorithm is so that when the predicate part has been applied on a sample of the TABLEXXX it returned 0 rows

filtered sample card. : 0

which is the reason why the optimizer has ignored Dynamic sampling at level 4 and falls back to the available object statistics producing a 1 row cardinality estimation and henceforth a dramatic wrong NESTED LOOP operation. By the way, should I have been in 12c database release the STATISTICS COLLECTOR placed above the first operation in the NESTED LOOP join would have reached the inflexion point and would have, hopefully, switched to a HASH JOIN operation during execution time.

A quick solution to this very critical report was to up the level of the dynamic sampling to a higher value. And, as far as this query belongs to a third party software I decided to use Kerry Osborne script in order to inject a dynamic sampling hint as shown below:

SQL>@create_1_hint_sql_profile.sql
Enter value for sql_id: 8114dqz1k5arj
Enter value for profile_name (PROFILE_sqlid_MANUAL):
Enter value for category (DEFAULT):
Enter value for force_matching (false): true
Enter value for hint: dynamic_sampling(6)
Profile PROFILE_8114dqz1k5arj_MANUAL created.

Once this done, the end user re-launched the report which completed within 303 seconds instead of those not ending 141,842 seconds


Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 SQL ID              :  8114dqz1k5arj
 SQL Execution ID    :  16777216
 Execution Started   :  06/10/2015 11:40:39
 First Refresh Time  :  06/10/2015 11:40:45
 Last Refresh Time   :  06/10/2015 11:45:39
 Duration            :  300s           

SQL Plan Monitoring Details (Plan Hash Value=2202725716)
========================================================================================
| Id |            Operation             |      Name       |  Rows   | Execs |   Rows   |
|    |                                  |                 | (Estim) |       | (Actual) |
========================================================================================
|  0 | SELECT STATEMENT                 |                 |         |     1 |     2989 |
|  1 |   SORT ORDER BY                  |                 |    234K |     1 |     2989 |
|  2 |    FILTER                        |                 |         |     1 |     2989 |
|  3 |     HASH JOIN                    |                 |    234K |     1 |     2989 |
|  4 |      TABLE ACCESS BY INDEX ROWID | T_TABL_YXZ      |    232K |     1 |     501K |
|  5 |       INDEX RANGE SCAN           | VGY_TEST2       |       1 |     1 |     501K |
|  6 |      TABLE ACCESS BY INDEX ROWID | TABLEXXX        |    725K |     1 |       2M |
|  7 |       INDEX RANGE SCAN           | IDX_MESS_RCV_ID |      2M |     1 |       2M |
========================================================================================

Note
-----
   - dynamic sampling used for this statement (level=6)
   - SQL profile PROFILE_8114dqz1k5arj_MANUAL used for this statement

June 6, 2015

SUBQ INTO VIEW FOR COMPLEX UNNEST

Filed under: Oracle — hourim @ 8:42 am

If you are a regular reader of Jonathan Lewis blog you will have probably came across this article in which the author explains why an “OR subquery” pre-empts the optimizer from unnesting the subquery and merging it with its parent query for a possible optimal join path. This unnesting impossibility is so that the “OR subquery” is executed as a FILTER predicate which when applied on a huge data row set penalizes dramatically the performance of the whole query. In the same article, you will have hopefully also learned how by re-writing the query using a UNION ALL (and taking care of the always threatening NULL via the LNNVL() function) you can open a new path for the CBO allowing an unnest of the subquery.

Unfortunately, nowadays there is a massive expansion of third party software where changing SQL code is not possible so that I hoped that the optimizer was capable to automatically re-factor a disjunctive subquery and consider unnesting it using the UNION ALL workaround.

I was under that impression that this hope is never exhausted by the optimizer until last week when I have received from my friend Ahmed Aangour an e-mail showing a particular disjunctive subquery which has been unnested by the optimizer without any rewrite of the original query by the developer. I have found the case very interesting so that I decided to model it and to share it with you. Take a look to the query and the execution plan first in 11.2.0.2 (the table script is supplied at the end of the article)

SQL> alter session set statistics_level=all;

SQL> alter session set optimizer_features_enable='11.2.0.2';

SQL> select
 a.id1
 ,a.n1
 ,a.start_date
from t1 a
where (a.id1 in
 (select
 b.id
 from t2 b
 where
 b.status = 'COM'
 )
 OR
 a.id1 in
 (select
 c.id1
 from t2 c
 where
 c.status = 'ERR'
 )
 );

SQL> select * from table(dbms_xplan.display_cursor(null,null, ‘allstats last’));

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |   9890 |00:00:02.23 |     742K|<--
|*  1 |  FILTER            |      |      1 |        |   9890 |00:00:02.23 |     742K|
|   2 |   TABLE ACCESS FULL| T1   |      1 |  10000 |  10000 |00:00:00.01 |    1686 |
|*  3 |   TABLE ACCESS FULL| T2   |  10000 |      1 |   9890 |00:00:02.16 |     725K|
|*  4 |   TABLE ACCESS FULL| T2   |    110 |      1 |      0 |00:00:00.05 |   15400 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 1 - filter(( IS NOT NULL OR IS NOT NULL))
 3 - filter(("B"."ID"=:B1 AND "B"."STATUS"='COM'))
 4 - filter(("C"."ID1"=:B1 AND "C"."STATUS"='ERR'))

The double full access to table t2 plus the FILTER operation indicate clearly that the OR clause has not been combined with the parent query. If you want to know what is behind the filter predicate n°1 above then the “not so famous” explain plan for command will help in this case:

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   975 | 15600 |   462   (0)| 00:00:01 |
|*  1 |  FILTER            |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   156K|   462   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T2   |     1 |     8 |    42   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS FULL| T2   |     1 |     7 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT 0 FROM "T2" "B" WHERE "B"."ID"=:B1 AND
              "B"."STATUS"='COM') OR  EXISTS (SELECT 0 FROM "T2" "C" WHERE
              "C"."ID1"=:B2 AND "C"."STATUS"='ERR'))
   3 - filter("B"."ID"=:B1 AND "B"."STATUS"='COM')
   4 - filter("C"."ID1"=:B1 AND "C"."STATUS"='ERR')

Notice how the subquery has been executed as a FILTER operation which sometimes (if not often) represents a real performance threat. 

However, when I‘ve executed the same query under optimizer 11.2.0.3 I got the following interesting execution plan

SQL> alter session set optimizer_features_enable='11.2.0.3';

--------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |      1 |        |   9890 |00:00:00.03 |    1953 |<--
|*  1 |  HASH JOIN            |          |      1 |   5000 |   9890 |00:00:00.03 |    1953 |
|   2 |   VIEW                | VW_NSO_1 |      1 |   5000 |   9890 |00:00:00.01 |     282 |
|   3 |    HASH UNIQUE        |          |      1 |   5000 |   9890 |00:00:00.01 |     282 |
|   4 |     UNION-ALL         |          |      1 |        |   9900 |00:00:00.01 |     282 |
|*  5 |      TABLE ACCESS FULL| T2       |      1 |   2500 |     10 |00:00:00.01 |     141 |
|*  6 |      TABLE ACCESS FULL| T2       |      1 |   2500 |   9890 |00:00:00.01 |     141 |
|   7 |   TABLE ACCESS FULL   | T1       |      1 |  10000 |  10000 |00:00:00.01 |    1671 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."ID1"="ID1")
   5 - filter("C"."STATUS"='ERR')
   6 - filter("B"."STATUS"='COM')

Notice now how the new plan is showing a HASH JOIN operation between an internal view( VW_NSO_1) and table t1 coming from the parent query block. Notice as well the HASH JOIN condition (access(“A”.”ID1″=”ID1″)) that appears in filter n°1. The optimizer has done a double transformation:

  • created an internal view VW_NSO_1  representing a UNION-ALL between the two subqueries present in the where clause
  • joined the newly online created view with table t1 present in the parent query block

Looking at the corresponding 10053 trace file I have found how the CBO has transformed the initial query:

select a.id1 id1,
  a.n1 n1,
  a.start_date start_date
from (
  (select c.id1 id1 from c##mhouri.t2 c where c.status='ERR')
union
  (select b.id id from c##mhouri.t2 b where b.status='COM')
     ) vw_nso_1,
  c##mhouri.t1 a
where a.id1= vw_nso_1.id1;

In fact the optimizer has first combined the two subqueries into a VIEW and finished by UNNESTING them with the parent query. This is a transformation which Oracle optimizer seems to name : SUBQ INTO VIEW FOR COMPLEX UNNEST

In the same 10053 trace file we can spot the following lines:

*****************************
Cost-Based Subquery Unnesting
*****************************
Query after disj subq unnesting:******* UNPARSED QUERY IS *******

SU:   Transform an ANY subquery to semi-join or distinct.
Registered qb: SET$7FD77EFD 0x15b5d4d0 (SUBQ INTO VIEW FOR COMPLEX UNNEST SET$E74BECDC)

SU: Will unnest subquery SEL$3 (#2)
SU: Will unnest subquery SEL$2 (#3)
SU: Reconstructing original query from best state.
SU: Considering subquery unnest on query block SEL$1 (#1).
SU:   Checking validity of unnesting subquery SEL$2 (#3)
SU:   Checking validity of unnesting subquery SEL$3 (#2)
Query after disj subq unnesting:******* UNPARSED QUERY IS *******

SU:   Checking validity of unnesting subquery SET$E74BECDC (#6)
SU:   Passed validity checks.

This is a clear enhancement made in the optimizer query transformation that will help improving performance of disjunctive subqueries automatically without any external intervention.

Unfortunately, I was going to end this article until I’ve realized that although I am testing this case under 12.1.0.1.0 database release I still have not executed the same query under optimizer feature 12.1.0.1.0

SQL> alter session set optimizer_features_enable='12.1.0.1.1';
SQL > execute query
-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |   9890 |00:00:03.84 |     716K|
|*  1 |  FILTER            |      |      1 |        |   9890 |00:00:03.84 |     716K|
|   2 |   TABLE ACCESS FULL| T1   |      1 |  10000 |  10000 |00:00:00.01 |    1686 |
|*  3 |   TABLE ACCESS FULL| T2   |  10000 |      2 |   9890 |00:00:03.81 |     715K|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NOT NULL)
   3 - filter((("B"."ID1"=:B1 AND "B"."STATUS"='ERR') OR ("B"."ID"=:B2 AND
              "B"."STATUS"='COM')))

The automatic unnesting of the disjunctive subquery has been removed in the 12.1.0.1.1 optimizer model.

If you want to reproduce and test this case here below is the model (I would be interested to see if the disjunctive subquery is unnested or not in the 12.1.0.1.2 release )

create table t1
   as select
    rownum                id1,
    trunc((rownum-1/3))   n1,
    date '2012-06-07' + mod((level-1)*2,5) start_date,
    lpad(rownum,10,'0')   small_vc,
    rpad('x',1000)        padding
from dual
connect by level <= 1e4;   

create table t2
as select
    rownum id
    ,mod(rownum,5) + mod(rownum,10)* 10  as id1
    ,case
       when mod(rownum, 1000) = 7 then 'ERR'
       when rownum <= 9900 then 'COM'
       when mod(rownum,10) between 1 and 5 then 'PRP'
     else
       'UNK'
     end status
     ,lpad(rownum,10,'0')    as small_vc
     ,rpad('x',70)           as padding
from dual
connect by level <= 1e4;

alter table t1 add constraint t1_pk primary key (id1);
Next Page »

The Rubric Theme. Blog at WordPress.com.

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Diagnostician

Performance troubleshooting as exact science

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

EU Careers info

Your career in the European Union

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

Tanel Poder's Performance & Troubleshooting blog

Linux, Oracle, Exadata and Hadoop.

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)

Follow

Get every new post delivered to your Inbox.

Join 151 other followers