Mohamed Houri’s Oracle Notes

August 12, 2017

Unnesting of coalesced subqueries

Filed under: Oracle — hourim @ 3:49 pm

The 12cR1 way used by Oracle to coalesce two correlated disjunctive subqueries has prevented a real life query from unnesting its two subqueries causing a serious performance issue. This article examines a representative model of this real life query. The first paragraph explains the any-to-exists subquery conversion. The second paragraph shows how two correlated exists subqueries can be coalesced to form a single subquery. Once these two transformations introduced the third paragraph will examine how the previously coalesced subqueries can be unnested. It will particularly show that the ability of Oracle to unnest two coalesced correlated subqueries depends strongly on the coalescing method as this one differs from version to version.

1. Any-to-exists subquery conversion

Before coalescing two (or more) subqueries Oracle starts first by transforming ANY and ALL subqueries into EXISTS and NOT EXISTS respectively. Consider, for example, the following query:

select
    a.id1
   ,a.n1
   ,a.start_date
from 
    t1 a
where 
   (a.id1 in
         (select b.id
          from t2 b
          where b.status = 'COM')
   or a.id1 in
         (select c.id1
          from t2 c
          where c.status = 'ERR')
     );

If you examine its corresponding 10053 CBO trace file you will find that its two disjunctive subqueries have been first converted from an any-to-exists subqueries as shown below:

Query After Exists Conversion:******* UNPARSED QUERY IS *******

select
  a.id1 id1,
  a.n1 n1,
  a.start_date start_date
from
  t1 a
where
  exists
  ( select b.id id
    from t2 b
    where b.status='COM'
    and   a.id1 =b.id)
or exists
  ( select c.id1 id1
    from t2 c
    where c.status='ERR'
    and a.id1 =c.id1)
  );
 

This is generally the preduled to or, a preliminary part of, a subquery coalescing transformation presented in the next section

2. Subquery Coalescing

The preceding any-to-exists transformed query is nothing else than an instance of the following simplified one:

select 
  {list of columns}
from
  table
where 
  predicate

The predicate in this particular case is acting as two disjunctive correlated subqueries of the same type:

-- first subquery
exists
  ( select b.id id
    from t2 b
    where b.status = 'COM'
    and   a.id1    = b.id)

-- second subquery
 exists
  ( select c.id1 id1
    from t2 c
    where c.status ='ERR'
    and a.id1 = c.id1)

If these two subqueries are not coalesced they will probably generate two accesses to the same table t2. This is why Oracle can, under certain conditions, coalesce two conjunctive or disjucntive subqueries into a single one. Applied to the current query the coalescing process yields the following new transformed query:

Query After Ex/Any Disj SQ coalescing:******* UNPARSED QUERY IS *******

select
  a.id1 id1,
  a.n1 n1,
  a.start_date start_date
from
  t1 a
where
  exists
  (
    select
      0 id
    from
      t2 b
    where
      b.status  = 'COM'
    and a.id1   = b.id
    or b.status ='ERR'
    and a.id1   = b.id1
  );

The execution plan of the doubly transformed query is shown below:

alter session set statistics_level=all;
select * from table(dbms_xplan.display_cursor(null,null, 'allstats last'));

--------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |
--------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |   9890 |
|*  1 |  FILTER            |      |      1 |        |   9890 |
|   2 |   TABLE ACCESS FULL| T1   |      1 |  10000 |  10000 |
|*  3 |   TABLE ACCESS FULL| T2   |  10000 |      1 |   9890 |
--------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NOT NULL)
   3 - filter((("B"."ID"=:B1 AND "B"."STATUS"='COM') OR ("B"."ID1"=:B2 AND
              "B"."STATUS"='ERR')))

If you still have any doubts about whether the two subqueries have been coalesced or not then the single apparition of table T2 in the plan together with the predicate n°3 will certainly clear out this doubt for good. Indeed, it is the subquery coalescing process that resulted into Oracle visiting table T2 only one time.

In addition in the outline data used to generate the above execution plan reproduced here below:

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$ACD206C8")
      COALESCE_SQ(@"SEL$3")
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE(@"SEL$7C83FF7B")
      COALESCE_SQ(@"SEL$2")
      OUTLINE(@"SEL$3")
      OUTLINE(@"SEL$2")
      FULL(@"SEL$1" "A"@"SEL$1")
      PQ_FILTER(@"SEL$1" SERIAL)
      FULL(@"SEL$ACD206C8" "B"@"SEL$2")
      END_OUTLINE_DATA
  */

The COALESCE_SQ(@”SEL$3″) and COALESCE_SQ(@”SEL$2″)represent the hints used to force a subquery coalescing transformation. If you want to cancel such a subquery coalescing transformation then the NO_COALESCE_SQ(qb_name) hint will be the one you should use. You can also change the hidden parameter _optimizer_coalesce_subqueries governing this transformation which defaults to true in 12cR1.

3. Subquery unnesting
        3.1. The 12cR1 effect

The any-to-exist conversion followed by the subquery coalescing should have normally paved the way for subquery unnesting transformation. Unfortunately, the real life query, which the actual model is a true representation of, the coalesced subqueries have not been unnested altering dramatically the performance of the original query as shown via its corresponding sql monitor report:

-- Get the report_id corresponding to the triplet (sql_id,sql_exec_id, sql_exec_start)
 select 
      report_id
    , key1 sql_id
    , key2 sql_exec_id
    , key3 sql_exec_start
  from
     dba_hist_reports
  where
     component_name ='sqlmonitor'
  and key1 = '97r22vwap7x9t';

--  Use this report_id to get the historical SQL monitor report
select
     dbms_auto_report.report_repository_detail
     (rid   => 8934
     ,type => 'text')
  from dual;

Global Information
---------------------  
 Status                :  DONE (ALL ROWS)       
 Instance ID           :  1                     
 Session               :  skskslsls (1343:42100) 
 SQL ID                :  97r22vwap7x9t         
 SQL Execution ID      :  16777216              
 Execution Started     :  06/27/2017 11:11:35   
 First Refresh Time    :  06/27/2017 11:11:41 
 Last Refresh Time     :  06/27/2017 12:02:57 
 Duration              :  3082s                    
 Module/Action         :  xxxxxxxxxxxx          
 Service               :  xxxxxxxxx          
 Program               :  xxxxxxxxxxx                  
 Fetch Calls           :  13             

Global Stats
=================================================
| Elapsed |   Cpu   |  Other   | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  |
=================================================
|    3083 |    3079 |     3.67 |    13 |  294M  |
=================================================

SQL Plan Monitoring Details (Plan Hash Value=4071256796)
========================================================================
| Id |               Operation |   Name   |  Rows   | Execs |   Rows   |
|    |                         |          | (Estim) |       | (Actual) |
========================================================================
|  0 | SELECT STATEMENT        |          |         |     1 |     1010 |
|  1 |   FILTER                |          |         |     1 |     1010 |
|  2 |    TABLE ACCESS FULL    | TABLE1   |   89457 |     1 |    89457 |
|  3 |    TABLE ACCESS FULL    | TABLE2   |       1 | 89457 |     1010 |
========================================================================

The FILTER at operation n°1 acts exactly as NESTED LOOPS join would do: for each row from table TABLE1 (A-Rows=89,457) looks for rows in table TABLE2 (Starts = 89,457) that satisfy the filter predicate n°3.

This is exactly what happens when a correlated subquery is not unnested: it will be evaluated as many times as there are rows in the outer query block.

       3.2. The 11gR2 effect

The on call DBA told me that this query was performing very well before the upgrade. My immediate reflex was then to run the same query under the previous 14 OFE(Optimizer Features Enable) . Useless to say that it completed very quickly using the following execution plan:

Plan hash value: 3372386476
-------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Time     |
-------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |       |          |
|*  1 |  HASH JOIN            |          |  2202 | 00:00:01 |
|   2 |   VIEW                | VW_NSO_1 |  2202 | 00:00:01 |
|   3 |    HASH UNIQUE        |          |  2202 | 00:00:01 |
|   4 |     UNION-ALL         |          |       |          |
|*  5 |      TABLE ACCESS FULL| TABLE2   |  1101 | 00:00:01 |
|*  6 |      TABLE ACCESS FULL| TABLE2   |  9890 | 00:00:01 |
|   7 |   TABLE ACCESS FULL   | TABLE1   | 89457 | 00:00:01 |
-------------------------------------------------------------

The presence of the term VW_NSO_1 at operation id n°2 in the previous plan is a sign of Oracle having unnested the two correlated subqueries in order to eliminate the prolific FILTER effect of multi-starting the TABLE2outer full scan operation.

However, the presence of TABLE2 twice in the execution plan tends to suggest that Oracle has not used the 12cR1 technic to coalesce the two correlated subqueries. Indeed in contrast to 12cR1, in 11gR2 Oracle has merged the two subqueries into a single one but without eliminating the two select against TABLE2. In the corresponding 10053 trace file we can read the following lines:

Final query after transformations:******* UNPARSED QUERY IS *******

select
  a.id1 id1,
  a.n1 n1,
  a.start_date start_date
from
  (  (select c.id1 id1 from t2 c where c.status='ERR')
    union
     (select b.id id from t2 b where b.status='COM')
  ) vw_nso_1,
  t1 a
where
  a.id1= vw_nso_1.id1;

-- 11.2.0.4
*****************************
Cost-Based Subquery Unnesting
*****************************
SU: Unnesting query blocks in query block SEL$1 (#1) that are valid to unnest.
Subquery removal for query block SEL$3 (#2)
RSW: Not valid for subquery removal SEL$3 (#2)
Subquery unchanged.
Subquery Unnesting on query block SEL$1 (#1)SU: Performing unnesting that does not require costing.
SU: Considering subquery unnest on query block SEL$1 (#1).
SU:   Checking validity of unnesting subquery SEL$2 (#3)
SU:   Checking validity of unnesting subquery SEL$3 (#2)
../..
SU:   Checking validity of unnesting subquery SET$E74BECDC (#6)
SU:   Passed validity checks.
SU:   Transform an ANY subquery to semi-join or distinct.

With the OFE set to 11.2.0.4 Oracle has transformed the two disjunctive subqueries into a single subquery in which the dijsunction has been replaced by a union operation (instead of union all because a distinct operation has to be applied for the in-list elements). The result of this transformation has been transformed into a inline view before being unnested with the outer block query:

Registered qb: SET$7FD77EFD 0x1c710c88 (SUBQ INTO VIEW FOR COMPLEX UNNEST SET$E74BECDC)

Thanks to the unnesting of the inline view ( VW_NSO_1) a more peformant and suitable hash join (at least for the real life query) has been used instead of the mutli-executed filter operation.

4. Why the unnesting has not been possible in 12cR1?

As always the 10053 trace file reproduced below (reduced to the bare minimum) gives an answer to the posed question:

-- 12.1.0.1
*****************************
Cost-Based Subquery Unnesting
*****************************
SU: Unnesting query blocks in query block SEL$1 (#1) that are valid to unnest.
Subquery Unnesting on query block SEL$1 (#1)SU: Performing unnesting that does not require costing.
SU: Considering subquery unnest on query block SEL$1 (#1).
SU: Checking validity of unnesting subquery SEL$ACD206C8 (#2)
SU: SU bypassed: Invalid correlated predicates.
SU: Validity checks failed.
*******************************

It seems that Oracle has been unable to unnest the two subqueries it has coalesced because they are using two different correlated predicates as shown below:

where
  exists
  (
    select
      0 id
    from
      t2 b
    where
      b.status  = 'COM'
    and a.id1   = b.id  --1st correlated predicate
    or b.status ='ERR'
    and a.id1   = b.id1 -- 2nd correlated predicate
  );

Had those two correlated predicates been identical the unnesting of the coalesced disjunctive correlated subqueries would have been possible as demonstrated below:

explain plan for
select
    a.id1
   ,a.n1
   ,a.start_date
from 
    t1 a
where 
   (a.id1 in
         (select b.id1 –- spot the change here
          from t2 b
          where b.status = 'COM')
   or a.id1 in
         (select c.id1
          from t2 c
          where c.status = 'ERR')
        );

select * from table(dbms_xplan.display(format =>'+outline'));

------------------------------------------------------
| Id  | Operation                    | Name  | Rows  |
------------------------------------------------------
|   0 | SELECT STATEMENT             |       |    10 |
|   1 |  NESTED LOOPS                |       |    10 |
|   2 |   NESTED LOOPS               |       |    10 |
|   3 |    SORT UNIQUE               |       |  9890 |
|*  4 |     TABLE ACCESS FULL        | T2    |  9890 |
|*  5 |    INDEX UNIQUE SCAN         | T1_PK |     1 |
|   6 |   TABLE ACCESS BY INDEX ROWID| T1    |     1 |
------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      SEMI_TO_INNER(@"SEL$82F4A621" "B"@"SEL$2")
      NLJ_BATCHING(@"SEL$82F4A621" "A"@"SEL$1")
      USE_NL(@"SEL$82F4A621" "A"@"SEL$1")
      LEADING(@"SEL$82F4A621" "B"@"SEL$2" "A"@"SEL$1")
      INDEX(@"SEL$82F4A621" "A"@"SEL$1" ("T1"."ID1"))
      FULL(@"SEL$82F4A621" "B"@"SEL$2")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SEL$3")
      COALESCE_SQ(@"SEL$2")
      OUTLINE(@"SEL$7C83FF7B")
      COALESCE_SQ(@"SEL$3")
      OUTLINE(@"SEL$ACD206C8")
      OUTLINE(@"SEL$1")
      UNNEST(@"SEL$ACD206C8")
      OUTLINE_LEAF(@"SEL$82F4A621")
      ALL_ROWS
      DB_VERSION('12.2.0.1')
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      IGNORE_OPTIM_EMBEDDED_HINTS
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("B"."STATUS"='COM' OR "B"."STATUS"='ERR')
   5 - access("A"."ID1"="B"."ID1")

Note
-----
   - this is an adaptive plan

5. Summary

The Oracle way of coalescing subqueries seems to be going back and forth between different versions. Two dijsunctive correlated subqueries have been differently coalesced in 11.2.0.4 and 12.1.0.1. While in the former they are classically transformed into a single subquery combining two branches of a union of two tables, in the later release they have been coalesced to form a single subquery accesing a single table. The 12cR1 and 12cR2 way of coalescing two correlated subqueries introduces a new requirement for their unnesting: they need to have the same correlated predicates. Thereby you should watch out carrefully your 12c queries having two (or more) correlated subqueries. If they are using different (invalid:-)) predicates they might be still coalesced but they will not be unnested. If you judge that the unnesting of your subqueries is better than using their coalesced version as a predicate of a FILTER operation then you have at your dispoable the NO_COALESCE_SQ(qb_name)or the _optimizer_coalesce_subqueries hidden parameter to cancel the subqueries coalescing transformation.

6. Bonus (more…)

Advertisements

August 5, 2017

Parallel index rebuild: Oracle versus MS-SQL Server

Filed under: Oracle — hourim @ 4:57 pm

A couple of months ago I have been prompted by this MS-SQL Server article about rebuilding indexes in parallel. The bottom line of this article is that MS-SQL Server engine will not use parallel run when rebuilding a unique index starting with a column having a single distinct value. This article will first demonstrate this claim using MS-SQL Server 2016 and then extend the same demonstration to Oracle in its latest version.

1. SQL Server

The followings will create a table and its unique three columns composite clustered index

select @@version
Microsoft SQL Server 2016 (RTM-GDR) (KB3210111) - 13.0.1728.2 (X64)  
Dec 13 2016 04:40:28   Copyright (c) Microsoft Corporation 
Developer Edition (64-bit) on Windows 8.1 6.3 <X64> (Build 9600: ) 

-- create table
create table t (c1 int, c2 int, c3 int);

set nocount on
begin tran
declare @i int = 0
while @i < 10000000
begin
    insert into t values (@i, @i, @i)
    set @i = @i + 1
end
commit tran

-- create a unique clustered index
create unique clustered index ix on t(c1,c2,c3)

Now that we are done with the model we are going to rebuild the above clustered index using parallel run

-- This will use parallel plan
ALTER INDEX ix on [dbo].t rebuild WITH (maxdop=10, online=on)

Before rebuilding the index I have activated the ”Include Actual Execution Plan” icon of SSMS (SQL Server Management Studio) so that I got the following “row source” plan where the two yellow arrows indicate that the rebuild has been, indeed, parallelized:

However has the first column c1 of the index been single valued the index would have not been rebuild parallely by the SQL Server engine as the followings demonstrate:

-- set the leading column of the clustered index to a single value
 update t set c1 = 1

-- update statistics with 100% sample
update statistics t with fullscan

–- This will use serial plan
ALTER INDEX ix on [dbo].t rebuild WITH (maxdop=10, online=on)

The execution plan of the second index rebuild show that this time MS-SQL refuses to obey the parallel hint and decides to go serially as shown below:

The MS-SQL documentation stipulates that when the leading key of the index of a non-partitioned table has a single distinct value it becomes useless to give a parallel thread a range of single value to process them. In other words the leading column should have multiple distinct values in order for parallelism to be worth a utilization.

Now that we have set the scene for MS-SQL let’s reproduce it for Oracle

2. Oracle

SQL> select banner from v$version where rownum=1;

BANNER
-----------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> create table t (c1 int, c2 int, c3 int);

SQL> begin
       for j in 1..10000000
     loop
        insert /*+ append */ into t values (j, j, j);
     end loop;
     commit;
     end;
     /

SQL> create unique index ix on t(c1,c2,c3);

SQL> exec dbms_stats.gather_table_stats(user, 't');

SQL> select 
         column_name
         ,num_distinct
         ,histogram
    from 
        all_tab_col_statistics
    where table_name = 'T'
    order by 2;

COLUMN_NAM NUM_DISTINCT HISTOGRAM
---------- ------------ ---------------
C1              9914368 NONE
C3              9914368 NONE
C2              9914368 NONE

It’s now time to use parallelism to rebuild the above index

SQL> alter index ix rebuild parallel 4;

select
    dfo_number, tq_id, server_type, instance, process, num_rows
from
   v$pq_tqstat
order by
    dfo_number, tq_id, server_type, instance, process;

DFO_NUMBER      TQ_ID SERVER_TYPE    INSTANCE PROCESS    NUM_ROWS
---------- ---------- ------------ ---------- -------- ----------
         1          0 Consumer              1 P000        2923571
         1          0 Consumer              1 P001        2623371
         1          0 Consumer              1 P002        2270919
         1          0 Consumer              1 P003        2182139
         1          0 Producer              1 P004        2638901
         1          0 Producer              1 P005        2637316
         1          0 Producer              1 P006        2611722
         1          0 Producer              1 P007        2114251
         1          0 Ranger                1 QC               12
         1          1 Consumer              1 QC                4
         1          1 Producer              1 P000              1
         1          1 Producer              1 P001              1
         1          1 Producer              1 P002              1
         1          1 Producer              1 P003              1


Parallel Execution Details (DOP=4 , Servers Allocated=8)
===================================================================
|      Name      | Type  | Server# | Elapsed |   Cpu   |    IO    |
|                |       |         | Time(s) | Time(s) | Waits(s) |
===================================================================
| PX Coordinator | QC    |         |    0.10 |    0.02 |     0.01 |
| p000           | Set 1 |       1 |    5.53 |    4.12 |     0.02 |
| p001           | Set 1 |       2 |    5.48 |    3.60 |     0.01 |
| p002           | Set 1 |       3 |    4.87 |    3.20 |     0.03 |
| p003           | Set 1 |       4 |    4.97 |    3.03 |     0.01 |
| p004           | Set 2 |       1 |    1.05 |    1.05 |          |
| p005           | Set 2 |       2 |    1.01 |    0.83 |          |
| p006           | Set 2 |       3 |    1.04 |    1.03 |          |
| p007           | Set 2 |       4 |    0.99 |    0.84 |          |
===================================================================

SQL Plan Monitoring Details (Plan Hash Value=3277881472)
===========================================================================
| Id |         Operation          |   Name   |  Rows   | Cost |   Time    |
|    |                            |          | (Estim) |      | Active(s) |
===========================================================================
|  0 | CREATE INDEX STATEMENT     |          |         |      |         1 |
|  1 |   PX COORDINATOR           |          |         |      |         1 |
|  2 |    PX SEND QC (ORDER)      | :TQ10001 |      82 |      |         1 |
|  3 |     INDEX BUILD UNIQUE     | IX       |         |      |         4 |
|  4 |      SORT CREATE INDEX     |          |      82 |      |         6 |
|  5 |       PX RECEIVE           |          |      82 |    2 |         4 |
|  6 |        PX SEND RANGE       | :TQ10000 |      82 |    2 |         2 |
|  7 |         PX BLOCK ITERATOR  |          |      82 |    2 |         1 |
|  8 |          TABLE ACCESS FULL | T        |      82 |    2 |         1 |
===========================================================================

As expected Oracle, in accordance with MS-SQL Server, has obeyed the instruction of rebuilding the index in parallel.

Let’s now unify the distinct values of the leading column c1 of the index and then try rebuilding it parallely:

 SQL> update t set c1 = 1;

SQL> exec dbms_stats.gather_table_stats(user, 't');

SQL> select 
         column_name
         ,num_distinct
         ,histogram
      from 
        all_tab_col_statistics
      where table_name = 'T'
      order by 2;

COLUMN_NAM NUM_DISTINCT HISTOGRAM
---------- ------------ ----------
C1                    1 NONE
C3              9914368 NONE
C2              9914368 NONE

SQL> alter index ix rebuild parallel 4;

select
    dfo_number, tq_id, server_type, instance, process, num_rows
from
   v$pq_tqstat
order by
    dfo_number, tq_id, server_type, instance, process;
DFO_NUMBER      TQ_ID SERVER_TYPE     INSTANCE PROCESS     NUM_ROWS
---------- ---------- ------------- ---------- --------- ----------
         1          0 Consumer               1 P000         2923571
         1          0 Consumer               1 P001         2623371
         1          0 Consumer               1 P002         2270919
         1          0 Consumer               1 P003         2182139
         1          0 Producer               1 P004         2694213
         1          0 Producer               1 P005         2542784
         1          0 Producer               1 P006         2814202
         1          0 Producer               1 P007         1950991
         1          0 Ranger                 1 QC                12
         1          1 Consumer               1 QC                 4
         1          1 Producer               1 P000               1
         1          1 Producer               1 P001               1
         1          1 Producer               1 P002               1
         1          1 Producer               1 P003               1

Parallel Execution Details (DOP=4 , Servers Allocated=8)
===================================================================
|      Name      | Type  | Server# | Elapsed |   Cpu   |    IO    |
|                |       |         | Time(s) | Time(s) | Waits(s) |
===================================================================
| PX Coordinator | QC    |         |    0.78 |    0.02 |     0.01 |
| p000           | Set 1 |       1 |    5.16 |    3.95 |     0.01 |
| p001           | Set 1 |       2 |    5.10 |    3.56 |     0.01 |
| p002           | Set 1 |       3 |    4.09 |    3.15 |     0.01 |
| p003           | Set 1 |       4 |    4.70 |    3.23 |     0.00 |
| p004           | Set 2 |       1 |    1.38 |    1.08 |     0.02 |
| p005           | Set 2 |       2 |    1.26 |    1.23 |     0.03 |
| p006           | Set 2 |       3 |    1.32 |    1.12 |     0.03 |
| p007           | Set 2 |       4 |    1.31 |    0.99 |     0.03 |
===================================================================

SQL Plan Monitoring Details (Plan Hash Value=3277881472)
===========================================================================
| Id |         Operation          |   Name   |  Rows   | Cost |   Time    |
|    |                            |          | (Estim) |      | Active(s) |
===========================================================================
|  0 | CREATE INDEX STATEMENT     |          |         |      |         2 |
|  1 |   PX COORDINATOR           |          |         |      |         7 |
|  2 |    PX SEND QC (ORDER)      | :TQ10001 |     10M |      |         1 |
|  3 |     INDEX BUILD UNIQUE     | IX       |         |      |         4 |
|  4 |      SORT CREATE INDEX     |          |     10M |      |         5 |
|  5 |       PX RECEIVE           |          |     10M | 2397 |         3 |
|  6 |        PX SEND RANGE       | :TQ10000 |     10M | 2397 |         2 |
|  7 |         PX BLOCK ITERATOR  |          |     10M | 2397 |         1 |
|  8 |          TABLE ACCESS FULL | T        |     10M | 2397 |         1 |
===========================================================================		

In contrast to MS-SQL server which refuses to use parallelism when rebuilding a unique clustered index starting with a column having a unique distinct value, Oracle has successfully rebuild the same kind of unique index in approximatively the same execution time.

3. SUMMARY

We’ve considered two instances of the same unique composite index:

  1. A unique composite index starting with a column having a unique distinct value
  2. A unique composite index starting with a more selective column

It is interesting to see that Oracle, in contrast to MS-SQL Server, is in the capacity of using parallelism when rebuilding both type of indexes. In addition Oracle rebuild execution time of the second type of indexes is as good as that of the first type.

June 24, 2017

ORA-54032 : column to be renamed is used in a virtual column expression

Filed under: Oracle,virtual column — hourim @ 3:45 pm

This is a simple note re-explaining what this 12cR1 ORA-54032 error is as it kicked in again a couple of days ago at one of my customer sites and showing that it has definitely been solved in 12cR2.

Here’s a how to reproduce it at will:

12cR1

SQL> select banner from v$version where rownum=1;

BANNER
----------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production

SQL> create table t1(x number, y number, z number);

Table created.

SQL> alter table t1 rename column x to x_bis;

Table altered.

As you can see we renamed column x to x_bis without any issue. However if we create a column group extension out of (x_bis, y) columns we will not be able to rename neither x_bis nor y column as shown below:

SQL> SELECT
          dbms_stats.create_extended_stats
           (ownname   => user
           ,tabname   => 't1'
          ,extension => '(x_bis,y)'
           ) ext
    FROM dual;

-------------------------------
SYS_STU3L40_B_DWBMIA8PMJRGR$S0

SQL> alter table t1 rename column x_bis to x;
alter table t1 rename column x_bis to x
                             *
ERROR at line 1:
ORA-54032: column to be renamed is used in a virtual column expression

That’s exactly what happened to my client with the sole difference that the client has never explicitly created any virtual column hence its great stupefaction. I had to explain him that this is because Oracle has created, behind the scenes, an extended column group, which is nothing else than a virtual column. This creation is very probably a response to a SQL Plan Directive request.

And indeed, the column group extension I have manually created above is a virtual column created out of a combination of x_bis and y column as clearly shown by the default value of this virtual column:

SQL> select 
         column_name
        ,data_default
        ,data_length
     from
       user_tab_cols
     where
       table_name = 'T1'
     and hidden_column = 'YES';

COLUMN_NAME                    DATA_DEFAULT                      DATA_LENGTH
------------------------------ --------------------------------- -----------
SYS_STU3L40_B_DWBMIA8PMJRGR$S0 SYS_OP_COMBINED_HASH("X_BIS","Y")   22

We are unable to rename the x_bis column to x because x_bis is used in the definition of the above virtual column. But that was before the arrival of 12cR2.

12cR2

Execute the same experiment in 12cR2 and you will realize that things have changed as the followings demonstrate:

SQL> select banner from v$version where rownum=1;
BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> create table t1(x number, y number, z number);

Table T1 created.

SQL> alter table t1 rename column x to x_bis;

Table T1 altered.

  SELECT
      dbms_stats.create_extended_stats
       (ownname   => user
       ,tabname   => 't1'
       ,extension => '(x_bis,y)'
       ) ext
  FROM dual;

EXT
-------------------------------
SYS_STU3L40_B_DWBMIA8PMJRGR$S0

SQL> alter table t1 rename column x_bis to x;

Table T1 altered.

As of 12cR2 we are not anymore annoyed by the error when altering a table column used in a virtual column expression. This is because Oracle will take the alter column from the table level and extend it to the virtual column where it can be found as shown below:

SQL> select 
         column_name
        ,data_default
        ,data_length
     from
       user_tab_cols
     where
       table_name = 'T1'
     and hidden_column = 'YES';

COLUMN_NAME                    DATA_DEFAULT                   DATA_LENGTH
------------------------------ ------------------------------ -----------
SYS_STU3L40_B_DWBMIA8PMJRGR$S0 SYS_OP_COMBINED_HASH("X","Y")           22

Spot how, in the data_default value of the generated column group extension, the new modified name of the column has been changed as well.

SUMMARY

Again you know that an automatic extended column group extension, which might be created on behalf of you, generates a border corner where you are not anymore able to rename any of the columns that composed the extend virtual column group. Fortunately as of Oracle 12cR2 Oracle extends the column renaming directly to the extended virtual column and doesn’t raise the ORA-054023 anymore.

March 25, 2017

De-Correlated Lateral view: VW_DCL_mmm

Filed under: Oracle — hourim @ 5:18 pm

Oracle Cost Based Optimizer has a number of query rewrites and transformations both simple and complex, old and new. Wether you know very few of them or only the most common ones, I believe it is worth keeping yourself well updated with the new ones as it might introduce strange performance issue like the one I have encountered very recently. This new 12c transformation I want to explain here is known as De-Correlated Lateral view. It appears in execution plans as VW_DCL_mmm. This article examines this transformation in the context of a real life production query.

A critical query, executed several hundred of thousands of times within a two hours batch job consumes 67K of Logical I/O per execution. It made the job systematically fail with a time out error. The average execution time of this query being less than one second, I ruled out the possibility of using SQL monitoring report to spot where most of these Logical I/O are coming from. Hopefully the high number of times this query is launched during the batch job makes its activity sampled into ASH. As such, via the following simple query I knew exactly at which operations of the corresponding execution plan this query is spending most of its time and resource:

select
   sql_plan_line_id
  ,count(1)
from
   gv$active_session_history
where
 sample_time between to_date('22032017 23:00:18', 'ddmmyyyy hh24:mi:ss')
             and     to_date('22032017 23:50:18', 'ddmmyyyy hh24:mi:ss')
and
 sql_id = '94jkz74mvwmbj'
group by
   sql_plan_line_id
order by 2 desc;

SQL_PLAN_LINE_ID   COUNT(1)
---------------- ----------
              48         41
              47         9
	          11         8
              30         1
SQL> select * from table(dbms_xplan.display_curosr('94jkz74mvwmbj'));                                                                              
----------------------------------------------------------------          
| Id  | Operation                     | Name            | Rows  |                
-----------------------------------------------------------------                                                                                   
|*  11|  HASH JOIN OUTER              |                 |      3|     
|   12|  NESTED LOOP OUTER            |                 |      3|                                           
|   33|   TABLE ACCESS BY INDEX ROWID | T1              |      1|  
|*  34|   INDEX UNIQUE SCAN           | T1_PK           |      1|           
|   47|  VIEW                         | VW_DCL_52812513 |  5340K|                                           
|*  48|    TABLE ACCESS FULL          | T2              |  5340K|                                           
-----------------------------------------------------------------         

Reduced to the bare minimum, the above execution plan is the result of a left outer join between table T1 and table T2 using a primary-foreign key relationship (predicate n°11) and a mix of an OR and an AND predicate applied on table T2 (operation n° 48). We will see later in this article that the two bolded words above represent the key words of a Lateral View.

As per regards to the real life query I was practically sure that a NESTED LOOP with table T1 as the outer row source probing the inner row source via an existing foreign key index will end up by filtering table T2 using the OR and the AND predicate but only on the rows that survived the primary-foreign key join.

But what the heck is this VW_DCL_52812513 view? It is useless to say that I didn’t find any related information both using Google and MyOracle Support.

After a couple of hour of investigation at the client site I ended up finding a work around and pushed an urgent fix into PRODUCTION. Notice with me how the fix has transformed the statistics of this query and its batch job:

SQL> @sqlstats.sql 
Enter value for sql_id:  94jkz74mvwmbj

CHILD_NUMBER SQL_PROFILE                   PLAN_HASH_VALUE  AVG_GETS  EXECS
----------- ------------------------------ --------------- ---------- ------
4                                           1521027974       66288     1976  → old execution plan
6           PROFILE_94jkz74mvwmbj_MANUAL     553415384          90    10092  → new execution plan

In order to definitely understand this new transformation and share it with you I’ve engineered the following model:

Setting the Scenes

create table t1
as
 with generator as (
   select --+ materialize
      rownum id
   from dual
   connect by level <= 1000
)
select
    rownum id1,
    mod(rownum-1,2) flag1,
    mod(rownum-1,3) flag2,
    rownum          n1,
    lpad(rownum,30) v1
from
   generator v1,
   generator v2
where
   rownum <= 1e4;
   
alter table t1 add constraint t1_pk primary key (id1);

create table t2
as 
 select
    level id1
   ,trunc((rownum+2)/2) product_t1
   ,date '2012-06-07' + mod((level-1)*5,10) + interval '5' minute start_date
   ,date '2012-06-08' + mod((level-1)*5,10) + interval '5' minute end_date
   ,rpad('xx',10) padding
from
   dual
connect by level <=1e4;

alter table t2 add constraint t2_pk primary key (id1);
alter table t2 add constraint t2_t1_fk foreign key (product_t1) references t1(id1);

-- creating an index covering the FK deadlock threat and other business requirements
create index idx_t2_usr_1 on t2(product_t1, start_date);

Here’s below the query (and its execution plan) with which I have reproduced the interesting part of the client’s real life query:

explain plan for
select
   t1.*
  ,t2.*
from
   t1
left outer join
   t2
on
  (t1.id1 = t2.product_t1
  and
     (t2.start_date <= :1 and t2.end_date >= :2
        or (t2.id1 > 100)
     )
  );

SQL> select * from table(dbms_xplan.display);                                                                                                  
-------------------------------------------------------                                           
| Id  | Operation           | Name            | Rows  |                                           
------------------------------------------------------                                           
|   0 | SELECT STATEMENT    |                 | 14925 |                                           
|*  1 |  HASH JOIN OUTER    |                 | 14925 |                                             
|   2 |   TABLE ACCESS FULL | T1              | 10000 |                                           
|   3 |   VIEW              | VW_DCL_C83A7ED5 |  9926 |                                           
|*  4 |    TABLE ACCESS FULL| T2              |  9926 |                                           
------------------------------------------------------                                           
                                                                                                                                  
Predicate Information (identified by operation id):                                                                               
---------------------------------------------------                                                                                                                                                                                         
   1 - access("T1"."ID1"="ITEM_2"(+))                                                                                             
   4 - filter("T2"."ID1">100 OR "T2"."START_DATE"<=:1 
         AND "T2"."END_DATE">=:2) 

Spot again how the predicate part can be so crucial to indicate the way you should pursue when troubleshooting query performance issues:

Predicate Information (identified by operation id):                                                                               
--------------------------------------                                                                                                                                                                                        
   1 - access("T1"."ID1"="ITEM_2"(+))                

While I recognize the (+) symbol as the Oracle way of re-architecting and ANSI outer join, I admit that I have no idea what the heck is this ITEM_2 in predicate n°1?

When all else fails then a 10053 trace file might help

alter session set tracefile_identifier='VW_DCL_MHO_XXXX';
alter session set events '10053 trace name context forever, level 1';
explain plan for
select
   t1.*
  ,t2.*
from
   t1
left outer join
   t2
on
  (t1.id1 = t2.product_t1
  and
     (t2.start_date <= :1 and t2.end_date >= :2
        or (t2.id1 > 100)
     )
  );
alter session set events '10053 trace name context off';           

Digging a little bit into the generated trace file I found that Oracle has transformed the original query into the following one:

SELECT 
  t1.id1 id1,
  t1.flag1 flag1,
  t1.flag2 flag2,
  t1.n1 n1,
  t1.v1 v1,
  vw_dcl_1b0973d4.item_1 id1,
  vw_dcl_1b0973d4.item_2 product_t1,
  vw_dcl_1b0973d4.item_3 start_date,
  vw_dcl_1b0973d4.item_4 end_date,
  vw_dcl_1b0973d4.item_5 padding
FROM c##mhouri.t1 t1,
  (SELECT 
    t2.id1 item_1_0,
    t2.product_t1 item_2_1,
    t2.start_date item_3_2,
    t2.end_date item_4_3,
    t2.padding item_5_4
  FROM c##mhouri.t2 t2
  WHERE t2.start_date<=:b1 AND t2.end_date >=:b2
  OR t2.id1           >100
  ) VW_DCL_1B0973D4
WHERE t1.id1=VW_DCL_1B0973D4.item_2(+) 

Simply put Oracle did two things:
1. Created a Lateral view
2. De-Correlated this lateral view by excluding the join predicate with table T1 from the Lateral view

This is clearly backed up by the following lines in the same trace file

DCL: Checking validity of lateral view decorrelation SEL$BCD4421C (#1)
DCL: Passed decorrelation validity for lateral view block SEL$BCD4421C (#1)
DCL: Decorrelation of lateral view query block SEL$BCD4421C (#1).
Registered qb: SEL$6226B99A 0x693ad4d0 (VIEW DECORRELATED SEL$BCD4421C; SEL$BCD4421C)

Since I have finally succeeded to understand what this transformation is I knew what else I have to do:

explain plan for
select /*+ optimizer_features_enable('11.2.0.4') */
   t1.*
  ,t2.*
from
   t1
left outer join
   t2
on
  (t1.id1 = t2.product_t1
  and
     (t2.start_date <= :1 and t2.end_date >= :2
        or (t2.id1 > 100)
     )
  );

SQL> select * from table(dbms_xplan.display);                                                                                                                                                                                                       
--------------------------------------------------------------                       
| Id  | Operation                     | Name         | Rows  |                                   
--------------------------------------------------------------                                   
|   0 | SELECT STATEMENT              |              | 10000 |                                   
|   1 |  NESTED LOOPS OUTER           |              | 10000 |                                   
|   2 |   TABLE ACCESS FULL           | T1           | 10000 |                                   
|   3 |   VIEW                        |              |     1 |                                   
|*  4 |    TABLE ACCESS BY INDEX ROWID| T2           |     1 |                                   
|*  5 |     INDEX RANGE SCAN          | IDX_T2_USR_1 |     1 |                                   
--------------------------------------------------------------                                   
                                                                                                                                  
Predicate Information (identified by operation id):                                                                               
---------------------------------------------------                                                                                                                                                                                                           
   4 - filter("T2"."ID1">100 OR "T2"."START_DATE"<=:1 AND "T2"."END_DATE">=:2)                                                    
   5 - access("T1"."ID1"="T2"."PRODUCT_T1")                   

This exactly what I was expecting at the beginning of my investigations. Annihilating the effect of the de-correlated lateral view will open a much better path at least for my real life query case.

Summary

As of now I hope that this article can help identifying what a VW_DCL_mmm transformation is. I hope as well that google will hist this article when asked about this particular transformation.

March 18, 2017

12cR2: SPM and cursor bind awareness property

Filed under: Oracle — hourim @ 7:24 am

In the previous article we knew that, as of Oracle 12cR2, it is now possible to capture SPM baselined plans from AWR historical tables. In this article we are going to see how in 12cR2 a cursor will immediately stop to be bind aware when it is protected by a single SPM plan. The basic idea driving this new implementation is that, since the Adaptive Cursor Sharing main goal is to generate multiple optimal execution plans, if you decide to constrain it with a single SPM plan, you are implicitly asking Oracle to stop generating multiple execution plans.

Let’s demonstrate this new ACS-SPM relationship using the same model as that of the preceding article

Setting the Scenes

create table t_acs(n1 number, n2 number);

BEGIN
 for j in 1..1200150 loop
  if j = 1 then
    insert into t_acs values (j, 1);
  elsif j>1 and j<=101 then
    insert into t_acs values(j, 100);
  elsif j>101 and j<=1101 then
    insert into t_acs values (j, 1000);
  elsif j>10001 and j<= 110001 then
    insert into t_acs values(j,10000); 
  else
    insert into t_acs values(j, 1000000); end if;
 end loop; 
commit;
END;
/

create index t_acs_i1 on t_acs(n2);

BEGIN 
  dbms_stats.gather_table_stats
     (user
     ,'t_acs'
     ,method_opt => 'for all columns size skewonly' 
     ,cascade => true
     ,estimate_percent => dbms_stats.auto_sample_size );
END; 
/

var ln2 number;
exec :ln2 := 100
select count(1) from t_acs where n2 = :ln2;

exec :ln2 := 1e6
select count(1) from t_acs where n2 = :ln2;
select count(1) from t_acs where n2 = :ln2;

exec :ln2 := 100
select count(1) from t_acs where n2 = :ln2;


Observation

The execution of the above script will end up by producing two bind sensitive, bind aware and shareable cursors as shown below:

select 
    sql_id
  ,child_number
  ,is_bind_sensitive 
  ,is_bind_aware
  ,executions
from
   v$sql 
where
   sql_id = 'f2pmwazy1rnfd'
and is_shareable = 'Y';

SQL_ID        CHILD_NUMBER I I EXECUTIONS
------------- ------------ - - ----------
f2pmwazy1rnfd            1 Y Y          1 → full scan plan
f2pmwazy1rnfd            2 Y Y          1 → index range scan plan

So far so good.
What if I decide to create a SPM execution plan in order to constrain the above query to use always the index range scan plan?

SQL> alter session set optimizer_capture_sql_plan_baselines = TRUE;

SQL> select count(1) from t_acs where n2 = :ln2;

  COUNT(1)
----------
       100

SQL> select count(1) from t_acs where n2 = :ln2;

  COUNT(1)
----------
       100

SQL> alter session set optimizer_capture_sql_plan_baselines = FALSE;

SQL> col plan_name format a40
SQL> select 
        plan_name ,origin, accepted, enabled
     from dba_sql_plan_baselines;

PLAN_NAME                        ORIGIN          ACC ENA
------------------------------- ---------------- --- ---
SQL_PLAN_fn4mhg52jx5z125348c47  AUTO-CAPTURE     YES YES

As such the next time I will run this query it will use the index range scan SPM baselined plan as shown below:

SQL> select count(1) from t_acs where n2 = :ln2;

SQL> select * from table(dbms_xplan.display_cursor);

SQL_ID  f2pmwazy1rnfd, child number 1                                                                                                                                              
-------------------------------------                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
Plan hash value: 1882749816                                                                                                                                                                                                                                                                                                                                    
----------------------------------------------                                                                                                    
| Id  | Operation         | Name     | Rows  |                                                                                                   
----------------------------------------------                                                                                                    
|   0 | SELECT STATEMENT  |          |       |                                                                                                    
|   1 |  SORT AGGREGATE   |          |     1 |                                                                                                     
|*  2 |   INDEX RANGE SCAN| T_ACS_I1 |   100 |                                                                                                  
----------------------------------------------                                                                                                    
                                                                                                                                                                                   
Predicate Information (identified by operation id):                                                                                                                                
---------------------------------------------------                                                                                                                                                                                                                                                                                              
   2 - access("N2"=:LN2)                                                                                                                                                           
                                                                                                                                                                                   
Note                                                                                                                                                                               
-----                                                                                                                                                                              
- SQL plan baseline SQL_PLAN_fn4mhg52jx5z125348c47 used for this statement  

But what you might ignore is that, following the creation of this SPM baselined plan, Oracle did something behind the scene as shown via the following cursor new situation:

select 
    sql_id
  ,child_number
  ,is_bind_sensitive 
  ,is_bind_aware
  ,executions
from
   v$sql 
where
   sql_id = 'f2pmwazy1rnfd'
and is_shareable = 'Y';

SQL_ID        CHILD_NUMBER I I EXECUTIONS
------------- ------------ - - ----------
f2pmwazy1rnfd            1 N N          1

If you analyze carefully the new situation compared to the old one you will realize that the following actions have been done behind the scenes

1. The previous child cursors n°1 and n°2 has been flushed out.
2. A new no bind sensitive and no bind aware child cursor n°1 has been created

It is the number of execution (only 1) of child cursor n°1 which clearly explains that the previous two bind aware cursors have been flushed out from memory due to the new SPM baselined plan. The child cursor n°0 is still in the shared pool but is in a non shareable status.

SQL> select 
        sql_id
      ,child_number
      ,is_bind_sensitive 
      ,is_bind_aware
      ,is_shareable
      ,executions
    from
       v$sql 
   where
      sql_id = 'f2pmwazy1rnfd';

SQL_ID        CHILD_NUMBER I I I EXECUTIONS
------------- ------------ - - - ----------
f2pmwazy1rnfd            0 Y N N          2 → non shareable
f2pmwazy1rnfd            1 N N Y          1

That’s the new 12cR2 implementation managing the relationship between ACS and SPM. ACS is disabled when it is constrained by a single accepted and enabled SPM plan.

If we disable the previous SPM baseline the cursor becomes bind aware after a warm up period as usual

SQL> @disSPM.sql 
Enter value for plan_name: SQL_PLAN_fn4mhg52jx5z125348c47

SQL> select 
          plan_name ,origin, enabled
      from dba_sql_plan_baselines;

PLAN_NAME                       ORIGIN       ENA
------------------------------- ------------ ---
SQL_PLAN_fn4mhg52jx5z125348c47  AUTO-CAPTURE NO 

exec :ln2 :=1e6
select count(1) from t_acs where n2 = :ln2;

exec :ln2 := 100
select count(1) from t_acs where n2 = :ln2;

etc…./...

SQL> select 
        sql_id
      ,child_number
      ,is_bind_sensitive 
      ,is_bind_aware
      ,is_shareable
      ,executions
    from
       v$sql 
   where
      sql_id = 'f2pmwazy1rnfd'
  ;

SQL_ID        CHILD_NUMBER I I I EXECUTIONS
------------- ------------ - - - ----------
f2pmwazy1rnfd            0 Y N N          2
f2pmwazy1rnfd            1 Y N N          3
f2pmwazy1rnfd            2 Y Y Y          1 → bind aware full scan
f2pmwazy1rnfd            3 Y Y Y          1 → bind aware range scan

Summary

As of Oracle 12cR2 when an ACS cursor having multiple optimal execution plans is constrained by a single enabled and accepted SPM plan, Oracle will age it out from memory and cancel its bind sensitive and bind aware properties. This will remain intact until the SPM is dropped or disabled. In the next article we will see what happens to an ACS cursor when one of its multiple accepted and enabled SPM plan is rendered unreproducible.

March 17, 2017

12cR2 : Capturing SPM plans from AWR

Filed under: Oracle,Sql Plan Managment — hourim @ 6:45 pm

A couple of years ago I set myself a best practice goal of preferring SPM baselines over SQL-Profiles. I must recognize that I failed to achieve this goal. Fortunately as of Oracle 12cR2 it becomes possible to load SPM baselined plans directly from AWR tables. This is why I am now unexcused to do not operate the switch.

Let’s see first how we can load SPM plans using AWR historical tables

Setting the Scenes

create table t_acs(n1 number, n2 number);

BEGIN
 for j in 1..1200150 loop
  if j = 1 then
    insert into t_acs values (j, 1);
  elsif j>1 and j<=101 then
    insert into t_acs values(j, 100);
  elsif j>101 and j<=1101 then
    insert into t_acs values (j, 1000);
  elsif j>10001 and j<= 110001 then
    insert into t_acs values(j,10000); 
  else
    insert into t_acs values(j, 1000000); end if;
 end loop; 
commit;
END;
/

create index t_acs_i1 on t_acs(n2);

BEGIN 
  dbms_stats.gather_table_stats
     (user
     ,'t_acs'
     ,method_opt => 'for all columns size skewonly' 
     ,cascade => true
     ,estimate_percent => dbms_stats.auto_sample_size );
END; 
/

exec dbms_workload_repository.create_snapshot;
var ln2 number;
exec :ln2 := 100
select count(1) from t_acs where n2 = :ln2;

exec :ln2 := 1e6
select count(1) from t_acs where n2 = :ln2;
select count(1) from t_acs where n2 = :ln2;

exec :ln2 := 100
select count(1) from t_acs where n2 = :ln2;

exec dbms_workload_repository.create_snapshot;

If you copy and past the above SQL code into a SQL PLUS session and issue the following select you should find that you have already two bind aware cursors:

select 
    sql_id
  ,child_number
  ,is_bind_sensitive 
  ,is_bind_aware
  ,executions
from
   v$sql 
where
   sql_id = 'f2pmwazy1rnfd'
and is_shareable = 'Y';

SQL_ID        CHILD_NUMBER I I EXECUTIONS
------------- ------------ - - ----------
f2pmwazy1rnfd            1 Y Y          1
f2pmwazy1rnfd            2 Y Y          1

The bind awareness property of the above cursor has nothing to do with the bottom line of this article. It is here just because I will use the same model in my next article where this time this particular property becomes necessary.

If I would have decided to create a SQL profile over the above cursor I would have then opted for Carlos Sierra coe_xfr_sql_profile.sql script as shown below:

SQL> @coe_xfr_sql_profile.sql

Parameter 1:
SQL_ID (required)

Enter value for 1: f2pmwazy1rnfd


PLAN_HASH_VALUE AVG_ET_SECS
--------------- -----------
      535703726 ,054       
     1882749816 ,085       

Parameter 2:
PLAN_HASH_VALUE (required)

Enter value for 2: 1882749816

Values passed:
~~~~~~~~~~~~~
SQL_ID         : "f2pmwazy1rnfd"
PLAN_HASH_VALUE: "1882749816"

Execute coe_xfr_sql_profile_f2pmwazy1rnfd_1882749816.sql
on TARGET system in order to create a custom SQL Profile
with plan 1882749816 linked to adjusted sql_text.

SQL>@coe_xfr_sql_profile_f2pmwazy1rnfd_1882749816.sql
... manual custom SQL Profile has been created

COE_XFR_SQL_PROFILE_f2pmwazy1rnfd_1882749816 completed

As such the next time I will run this query it will use the fixed SQL Profile as shown below:

SQL> select count(1) from t_acs where n2 = :ln2;

SQL> select * from table(dbms_xplan.display_cursor);

Plan hash value: 1882749816                                                     
                                                                                
------------------------------------------------------ 
| Id  | Operation         | Name     | Rows  | Bytes |  
------------------------------------------------------ 
|   0 | SELECT STATEMENT  |          |       |       |  
|   1 |  SORT AGGREGATE   |          |     1 |     3 |   
|*  2 |   INDEX RANGE SCAN| T_ACS_I1 |   100 |   300 |  
------------------------------------------------------
                                                                                
Predicate Information (identified by operation id):                             
---------------------------------------------------                                                                                         
   2 - access("N2"=:LN2)                                                        
                                                                                
Note                                                                            
-----                                                                           
- SQL profile coe_f2pmwazy1rnfd_1882749816 used for this statement

However until the arrival of 12cR2 it was impossible to load a SPM plan baseline for the above cursor using corresponding AWR historical execution plan. Hopefully it is now possible. This is below how to capture SPM plan from AWR:

SQL>@LoadSPMfromAwr.sql
Listing latest AWR snapshots ...

   SNAP_ID END_INTERVAL_TIME          
---------- ---------------------------
        14 08/03/17 13:20:09,251000000
        15 08/03/17 14:00:13,233000000
        16 08/03/17 15:07:46,465000000
        17 09/03/17 01:13:41,092000000
        18 09/03/17 12:11:26,748000000
        19 10/03/17 01:07:36,836000000
        20 10/03/17 10:08:54,214000000
        21 10/03/17 12:35:26,590000000
        22 11/03/17 01:04:40,947000000
        23 12/03/17 12:41:35,578000000
        24 12/03/17 15:03:55,730000000
        25 13/03/17 02:21:01,517000000
        26 13/03/17 12:53:22,204000000
        27 13/03/17 15:43:46,522000000
        28 14/03/17 13:13:07,716000000
        29 15/03/17 12:55:54,089000000
        30 16/03/17 12:59:39,201000000
        31 17/03/17 02:09:14,047000000
        32 17/03/17 10:38:33,520000000
        33 17/03/17 12:50:59,072000000

20 rows selected. 

Enter begin snapshot id: 14
Enter end   snapshot id: 27
Enter value for sql_filter: sql_text like ''select count(1) from t_acs%''

SQL> select 
      plan_name ,origin
    from dba_sql_plan_baselines;

PLAN_NAME                                ORIGIN                       
---------------------------------------- ---------------------
SQL_PLAN_fn4mhg52jx5z125348c47           MANUAL-LOAD-FROM-AWR         
SQL_PLAN_fn4mhg52jx5z13069e6f9           MANUAL-LOAD-FROM-AWR   


SQL> select count(1) from t_acs where n2 = :ln2;

SQL> select * from table(dbms_xplan.display_cursor);


SQL_ID  f2pmwazy1rnfd, child number 1
-------------------------------------
select count(1) from t_acs where n2 = :ln2
 
Plan hash value: 1882749816
 
------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | 
------------------------------------------------------
|   0 | SELECT STATEMENT  |          |       |       |
|   1 |  SORT AGGREGATE   |          |     1 |     3 | 
|*  2 |   INDEX RANGE SCAN| T_ACS_I1 |   100 |   300 | 
-------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N2"=:LN2)
 
Note
-----
   - SQL profile coe_f2pmwazy1rnfd_1882749816 used for this statement
   - SQL plan baseline SQL_PLAN_fn4mhg52jx5z125348c47 used for this statement

You have to choose the snap interval that includes historical details of your sql_id

Here’s below the content of the script I used to capture SPM plan from AWR

PROMPT Listing latest AWR snapshots ...
SELECT snap_id, end_interval_time 
FROM dba_hist_snapshot 
WHERE end_interval_time > SYSDATE - 30
ORDER BY end_interval_time;

ACCEPT bsnapid NUMBER PROMPT "Enter begin snapshot id: "
ACCEPT esnapid NUMBER PROMPT "Enter end   snapshot id: "

SET TERMOUT OFF PAGESIZE 0 HEADING OFF LINESIZE 1000 TRIMSPOOL ON TRIMOUT ON TAB OFF

declare
  rs pls_integer;
begin
  rs := dbms_spm.load_plans_from_awr('&bsnapid', '&esnapid', '&sql_filter');
end;
/

SET TERMOUT ON PAGESIZE 5000 HEADING ON

October 1, 2016

SQL Profile: or when the hint exceeds 500 bytes

Filed under: Oracle — hourim @ 5:13 pm

A couple of days ago I have been in a weird situation following a 12c upgrade:

  • a critical query (46h7mfaac03yv) started to perform very poorly impacting the overall response time of a critical report
  • there were no ”good” execution plans in the historical execution of this query to use for fixing a SQL Profile.
  • The current execution plan has more than a hundred of operations making the issue very hard to solve very quickly
  • the real time sql monitoring report shows several parts of the plan contributing to the alteration of the response time.

One thing I have the good habit to do in many situations like the one exposed here is to check the historical execution plans of the same query in TEST environment. I also sometimes backup outlines of critical queries into a dedicated windows directory to use them if the need arises. Hopefully one of the TEST environment contains an execution plans with very good average elapsed time. Having this plan at my disposal, I used the following strategy to fix the good plan for the bad performing query:

  • I took the outline of the TEST execution plan put it into the sql_text of the PRODUCTION sql_id
  • I used the real time sql monitoring report to fill up the corresponding bind variables values
  • I opened a SQLPlus session in PRODUCTION and executed the new hinted query

And as expected the hinted query identified by the tandem(sql_id: 3ts967mzugyxw, child number:0) completes in few seconds. All what remains to do before announcing the good news for the client was to use a custom sql script with which I will transfer the execution plan of the hinted query to the production non hinted one. Something resembling to this:

create table t1 as select rownum n1 from dual connect by level <=1e2;
select count(1) from t1 where n1 <= 5;

---------------------------------------
SQL_ID  addzft9frsckw, child number 0
-------------------------------------
select count(1) from t1 where n1 <= 5

Plan hash value: 3724264953
-------------------------------------------
| Id  | Operation          | Name | Rows  |
-------------------------------------------
|   0 | SELECT STATEMENT   |      |       |
|   1 |  SORT AGGREGATE    |      |     1 |
|*  2 |   TABLE ACCESS FULL| T1   |     5 |
-------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 – filter("N1"<=5)


SELECT count(1) from t1 where n1 <= 5;

----------------------------------------
SQL_ID  2w9a295mxcjgx, child number 0
-------------------------------------
SELECT count(1) from t1 where n1 <= 5

Plan hash value: 3724264953
-------------------------------------------
| Id  | Operation          | Name | Rows  |
-------------------------------------------
|   0 | SELECT STATEMENT   |      |       |
|   1 |  SORT AGGREGATE    |      |     1 |
|*  2 |   TABLE ACCESS FULL| T1   |     5 |
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("N1"<=5)   

Creating a SQL profile for the first sql_id using the metadata of the second one is accomplished by means of the following call:

SQL> @fixProfilefromCache
Enter value for sql_id_from: 2w9a295mxcjgx
Enter value for child_no_from: 0
Enter value for sql_id_to: addzft9frsckw
Enter value for sql_id_to: addzft9frsckw
Enter value for sql_id_to: addzft9frsckw

PL/SQL procedure successfully completed.
select count(1) from t1 where n1 <= 5;

----------------------------------------
SQL_ID  addzft9frsckw, child number 0
-------------------------------------
select count(1) from t1 where n1 <= 5

Plan hash value: 3724264953
-------------------------------------------
| Id  | Operation          | Name | Rows  |
-------------------------------------------
|   0 | SELECT STATEMENT   |      |       |
|   1 |  SORT AGGREGATE    |      |     1 |
|*  2 |   TABLE ACCESS FULL| T1   |     5 |
-------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("N1"<=5)

Note
-----
   - SQL profile profile_addzft9frsckw_attach used for this statement
 

Unfortunately there are situations where you will stop to be lucky. Look to that weird situation I have been faced to when I applied the same script for the real life query:

SQL> @fixProfilefromCache
Enter value for sql_id_from: 3ts967mzugyxw
Enter value for child_no_from: 0
Enter value for sql_id_to: 46h7mfaac03yv
Enter value for sql_id_to: 46h7mfaac03yv
Enter value for sql_id_to: 46h7mfaac03yv
declare
*
ERROR at line 1:
ORA-05602: PL/SQL: numeric or value error: Bulk Bind: Truncated Bind
ORA-06512: at line 11

Few minutes of PL/SQL investigations reveal that the above error occurs around the following piece of code in the FixProfilefromCache.sql script

declare
   ar_profile_hints sys.sqlprof_attr;
   cl_sql_text clob;
begin
   select
      extractvalue(value(d), '/hint') as outline_hints
         bulk collect into ar_profile_hints
   from
      xmltable('/*/outline_data/hint'
         passing (
            select
               xmltype(other_xml) as xmlval
            from
               gv$sql_plan
            where
               sql_id = '&sql_id_from'
               and child_number = &child_no_from
               and other_xml is not null
			   and rownum =1
         )
      ) d;

And, to be more precise, exactly at this line

   select
      extractvalue(value(d), '/hint') as outline_hints
        bulk collect into ar_profile_hints

What is the definition of the sys.sqlprof_attr object type I am using very often without being preoccupied by its data type so far?

   desc SQLPROF_ATTR
        SQLPROF_ATTR VARRAY(2000) OF VARCHAR2(500)

It’s a list of strings not allowed to exceed 500 bytes each. Is this meaning that one of my real life query outline hints exceeded 500 bytes? Let’s check:

  select
    substr(outline_hints,1,45) outline_hints
   ,outline_hints_length
from
 (
   select
      extractvalue(value(d), '/hint') as outline_hints  
     ,length(extractvalue(value(d), '/hint')) as outline_hints_length
   from
      xmltable('/*/outline_data/hint'
         passing (
            select
               xmltype(other_xml) as xmlval
            from
               gv$sql_plan
            where
               sql_id = '3ts967mzugyxw'
               and child_number = 0
               and other_xml is not null
			   and rownum =1
         )
      ) d
  order by outline_hints_length desc
  )
where rownum <= 1;

OUTLINE_HINTS                                 OUTLINE_HINTS_LENGTH
--------------------------------------------- --------------------
USE_CONCAT(@"SEL$C59E9DD6" 8 OR_PREDICATES(3)                  508

That’s it. One of the outline hints exceeds the 500 bytes upper limit imposed by the sys.sqlprof_attr type. Don’t try to create your proper sys.sqlprof_attr type allowing to store more than 500 bytes. The import_sql_profile procedure of the dbms_sqltune package doesn’t allow a parameter with a different data type:

 PROCEDURE import_sql_profile(
                   sql_text      IN CLOB,
                   profile       IN sqlprof_attr,
                   name          IN VARCHAR2 := NULL,
                   description   IN VARCHAR2 := NULL,
                   category      IN VARCHAR2 := NULL,
                   validate      IN BOOLEAN  := TRUE,
                   replace       IN BOOLEAN  := FALSE,
                   force_match   IN BOOLEAN  := FALSE);

Hopefully it seems that Oracle has already foreseen this kind of situation and has overloaded the above procedure in order to accept the hint as a CLOB data type via the proxfile_xml parameter:

PROCEDURE import_sql_profile(
                   sql_text      IN CLOB,
                   profile_xml   IN CLOB,
                   name          IN VARCHAR2 := NULL,
                   description   IN VARCHAR2 := NULL,
                   category      IN VARCHAR2 := NULL,
                   validate      IN BOOLEAN  := TRUE,
                   replace       IN BOOLEAN  := FALSE,
                   force_match   IN BOOLEAN  := FALSE);

Finally I have created a new script which I have named fixprofilexmlfromcache and which I have successfully used to transfer the good plan to the sql_id of the real life query as shown below:

SQL> @FixProfileFromXmlFromCache
Enter value for sql_id_from: 3ts967mzugyxw
Enter value for child_no_from: 0
Enter value for sql_id_to: 46h7mfaac03yv
Enter value for sql_id_to: 46h7mfaac03yv
Enter value for sql_id_to: 46h7mfaac03yv

PL/SQL procedure successfully completed.

SUMMARY

While it is extremely rare to use a SQL Profile with hints exceeding 500 bytes it is however good to know that there is an alternative to overcome this limit by using the second overloaded import_sql_profile procedure of the dbms_sqltune package which accepts the outline hints as a CLOB instead of a varray of 500 bytes.

September 27, 2016

TEMP Table transformation and PQ_SLAVE_mismatch

Filed under: Oracle — hourim @ 7:20 pm

A SQL Profiled query opted for a different execution plan despite the Note at the bottom of its execution plan indicating that a SQL Profile has been used. The new plan makes the query failing with the classical parallel query error due to a lack of TEMP space following a massive parallel broadcast distribution

A SQL Profiled query opted for a different execution plan despite the Note at the bottom of its execution plan indicating that a SQL Profile has been used. The new plan makes the query failing with the classical parallel query error due to a lack of TEMP space following a massive parallel broadcast distribution

ORA-12801: error signaled in parallel query server P013
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

Initially this query fails when it reached 32GB of TEMP space. When waked up, the on call DBA augmented the TEMP space but, unfortunately, the next run failed as well when it reached 137GB of TEMP space.

When it was my turn to investigate this issue, instead of continuing enlarging the TEMP tablespace, or changing the parallel distribution method, I decided to figure out why Oracle is refusing to use the SQL Profile and was compiling a new plan practically at each execution. Applying Tanel Poder nonshared script to the sql_id of the real world query gives this:

SQL> @nonshared 1b7g55gx40k79
Show why existing SQL child cursors were not reused (V$SQL_SHARED_CURSOR)...

INST_ID			              : 1
SQL_ID                        : gx28sa7z20btn
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF92115E6B0
CHILD_NUMBER                  : 1
REASON                        : <reason>PQ Slave mismatch(5)</reason>

-----------------
INST_ID			              : 1
SQL_ID                        : gx28sa7z20btn
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF92115E6B0
CHILD_NUMBER                  : 2
REASON                        : <reason>PQ Slave mismatch(5)</reason>
PQ_SLAVE_MISMATCH             :  Y

-----------------
INST_ID			              : 1
SQL_ID                        : gx28sa7z20btn
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF92115E6B0
CHILD_NUMBER                  : 3
REASON                        : <reason>PQ Slave mismatch(5)</reason>
PQ_SLAVE_MISMATCH             :  Y

-----------------
INST_ID			              : 1
SQL_ID                        : gx28sa7z20btn
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF92115E6B0
CHILD_NUMBER                  : 4
REASON                        : <reason>PQ Slave mismatch(5)</reason>
PQ_SLAVE_MISMATCH             :  Y
-----------------
INST_ID			              : 1
SQL_ID                        : gx28sa7z20btn
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF92115E6B0
CHILD_NUMBER                  : 5
REASON                        : <reason>PQ Slave mismatch(5)</reason>
PQ_SLAVE_MISMATCH             :  Y
-----------------

What does this PQ_SLAVE_MISMATCH non sharing reason mean?
Oracle defines it as

(Y|N) Top-level slave decides not to share cursor

This very short definition seems indicating that a parallel slave refused to share the execution plan of its Query Coordinator and decided to hard parse its proper execution plan even though they are both (the QC and the PX slaves) running in the same instance in an 11.2.0.4 release.

I spent a couple of minutes looking at the query trying to simplify it until I found the part of it causing the execution plan mismatch. Having got a clue of what is happening in this real life query I engineered the following model with which I have reproduced the same behaviour in 11.2.0.4 and 12.1.0.1.0. Look at the following setup (where you will recognize one of the Jonathan Lewis table scripts):

select banner from gv$version where rownum=1;

BANNER
-----------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production

create table t1
as
 with generator as (
      select  --+ materialize
	     rownum id
	  from dual
	  connect by level <=1000
)
select 
    trunc((rownum -1)/2) product_id
   ,mod(rownum-1,2)      flag1
   ,mod(rownum-1,3)      flag2
   ,rownum               n1
   ,lpad(rownum,30)      v1
from
    generator v1
   ,generator v2
 where rownum <= 1e6;
 
create table t2
as 
select 

     level id
     ,date  '2012-06-07' + mod((level-1)*5,10)+ interval '5' minute start_date
     ,rpad( 'xx',10) padding
from 
   dual
connect by level <=1e6;
 
begin
 dbms_stats.gather_table_stats(user, 't1', method_opt => 'for all columns size 1');
 dbms_stats.gather_table_stats(user, 't2', method_opt => 'for all columns size 1');
 end;
 /

And here’s below the query I will be using in order to reproduce the non-sharing reason:

with got_my_dates 
 as (select /*+ materialize */
          id
		 ,padding
		 ,start_date
	 from 
	      t2
	 where 
	   start_date   > to_date('07/06/2012','dd/mm/yyyy')
	 and start_date <= to_date('10/06/2012','dd/mm/yyyy')
	 )
select
     /*+ parallel(4) */
     t1.*
	,cte.padding
	,cte.start_date
from
     t1
	,got_my_dates cte
where
    t1.product_id = cte.id
and t1.product_id <= 1e3;

A simple remark before starting the experiment. In the real life query the “with subquery” has been automatically materialized by Oracle because it is called two times in the main query. This is why the materialize hint I used above might not be absolutely necessary for the behaviour, I will be explaining hereinafter, to happen.
Let’s also confirm, before starting the experiment, that this query has not been previously parsed and as such is completely unknown:

SQL> select sql_id
    from gv$sql
    where sql_text like '%got_my_dates%'
    and sql_text not like '%v$sql%';

no rows selected

In the following I will execute the above query, get it execution plan, and show how many child cursor it has used during this very first execution:

SQL> – run query

SQL_ID  1b7g55gx40k79, child number 0
-------------------------------------

Plan hash value: 2708956082
-------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name                       | Rows  | Bytes |    TQ  |IN-OUT|
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |                            |       |       |        |      |
|   1 |  TEMP TABLE TRANSFORMATION |                            |       |       |        |      |
|   2 |   PX COORDINATOR           |                            |       |       |        |      |
|   3 |    PX SEND QC (RANDOM)     | :TQ10000                   |   994K|    22M|  Q1,00 | P->S |
|   4 |     LOAD AS SELECT         |                            |       |       |  Q1,00 | PCWP |
|   5 |      PX BLOCK ITERATOR     |                            |   994K|    22M|  Q1,00 | PCWC |
|*  6 |       TABLE ACCESS FULL    | T2                         |   994K|    22M|  Q1,00 | PCWP |
|   7 |   PX COORDINATOR           |                            |       |       |        |      |
|   8 |    PX SEND QC (RANDOM)     | :TQ20001                   |  1830 |   134K|  Q2,01 | P->S |
|*  9 |     HASH JOIN              |                            |  1830 |   134K|  Q2,01 | PCWP |
|  10 |      JOIN FILTER CREATE    | :BF0000                    |  1830 | 84180 |  Q2,01 | PCWP |
|  11 |       PX RECEIVE           |                            |  1830 | 84180 |  Q2,01 | PCWP |
|  12 |        PX SEND BROADCAST   | :TQ20000                   |  1830 | 84180 |  Q2,00 | P->P |
|  13 |         PX BLOCK ITERATOR  |                            |  1830 | 84180 |  Q2,00 | PCWC |
|* 14 |          TABLE ACCESS FULL | T1                         |  1830 | 84180 |  Q2,00 | PCWP |
|* 15 |      VIEW                  |                            |   994K|    27M|  Q2,01 | PCWP |
|  16 |       JOIN FILTER USE      | :BF0000                    |   994K|    22M|  Q2,01 | PCWP |
|  17 |        PX BLOCK ITERATOR   |                            |   994K|    22M|  Q2,01 | PCWC |
|* 18 |         TABLE ACCESS FULL  | SYS_TEMP_0FD9D6633_140F243 |   994K|    22M|  Q2,01 | PCWP |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)
       filter(("START_DATE">TO_DATE(' 2012-06-07 00:00:00', 'syyyy-mm-dd hh24:mi:ss') 
              AND "START_DATE"<=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
   9 - access("T1"."PRODUCT_ID"="CTE"."ID")
  14 - access(:Z>=:Z AND :Z<=:Z)
       filter("T1"."PRODUCT_ID"<=1000)
  15 - filter("CTE"."ID"<=1000)
  18 - access(:Z>=:Z AND :Z<=:Z)
       filter(SYS_OP_BLOOM_FILTER(:BF0000,"C0"))
 
Note
-----
- Degree of Parallelism is 4 because of hint


SQL> @gv$sql
Enter value for sql_id: 1b7g55gx40k79

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER LAST_LOAD_TIME      EXECUTIONS
------------- --------------- ------------ ------------------- ----------
1b7g55gx40k79      2708956082            0 2016-09-26/07:30:23          1
1b7g55gx40k79      2708956082            1 2016-09-26/07:30:24          0

Notice how a completely new query produces, during its very first execution, two child cursors. Interestingly, the number of executions seems to indicate that Oracle used the child cursor 0 while the number of executions of the child cursor n°1 has not been incremented.

The reason for this double child cursors is:

SQL> @nonshared 1b7g55gx40k79
Show why existing SQL child cursors were not reused (V$SQL_SHARED_CURSOR)...

SQL_ID                        : 1b7g55gx40k79
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF92115E6B0
CHILD_NUMBER                  : 0
REASON                        : <reason>PQ Slave mismatch(5)</reason>
CON_ID                        : 1

-----------------

SQL_ID                        : 1b7g55gx40k79
ADDRESS                       : 00007FF92115E830
CHILD_ADDRESS                 : 00007FF9214A40E8
CHILD_NUMBER                  : 1
PQ_SLAVE_MISMATCH             : Y
REASON                        : <reason>PQ Slave mismatch(5)</reason>
CON_ID                        : 1
-----------------

So here we are: exactly at the same situation as the real world query.

Another remark which is worth to be mentioned here is that the execution plan of child cursor n°1 is exactly identical to the child cursor n° 0 shown above except this bizarre Note at the bottom:

SQL_ID  1b7g55gx40k79, child number 1
-------------------------------------
with got_my_dates  as (select /*+ materialize */           id    
,padding    ,start_date   from        t2   where     start_date   > 
to_date('07/06/2012','dd/mm/yyyy')   and start_date <= 
to_date('10/06/2012','dd/mm/yyyy')   ) select      /*+ parallel(4) */   
   t1.*  ,cte.padding  ,cte.start_date from      t1  ,got_my_dates cte 
where     t1.product_id = cte.id and t1.product_id <= 1e3
 
Note
-----
- automatic DOP: Computed Degree of Parallelism is 4

I don’t clearly get where this automatic DOP is coming from since the auto DOP parameter is not set in my personal laptop nor in the real world application:

SQL> show parameter parallel_degree_policy

PARAMETER_NAME                  TYPE        VALUE
------------------------------- ----------- -------
parallel_degree_policy          string      MANUAL

Last but not least, while the query was always requesting 2*DOP (8) parallel serves, Oracle managed, systematically, to give it 12 (and sometimes 16) parallel servers:

SQL> select
         sql_id
         ,process_name px_slave
     from gv$sql_monitor
     where sql_id = '1b7g55gx40k79'
     and sql_exec_id =16777216
     and  trunc(sql_exec_start) =to_date('27092016','ddmmyyyy')
     and sql_text is null
     order by 2 ;

SQL_ID        PX_SL
------------- -----
1b7g55gx40k79 p000
1b7g55gx40k79 p000
1b7g55gx40k79 p001
1b7g55gx40k79 p001
1b7g55gx40k79 p002
1b7g55gx40k79 p002
1b7g55gx40k79 p003
1b7g55gx40k79 p003
1b7g55gx40k79 p004
1b7g55gx40k79 p005
1b7g55gx40k79 p006
1b7g55gx40k79 p007

12 rows selected.


Global Information
------------------------------
 Status              :  DONE (ALL ROWS)     
 Instance ID         :  1                   
 Session             :  C##MHOURI (7:4495)  
 SQL ID              :  1b7g55gx40k79       
 SQL Execution ID    :  16777216            
 Execution Started   :  09/27/2016 07:29:34 
 First Refresh Time  :  09/27/2016 07:29:34 
 Last Refresh Time   :  09/27/2016 07:29:45 
 Duration            :  11s                 
 Module/Action       :  SQL*Plus/-          
 Service             :  orcl                
 Program             :  sqlplus.exe         
 Fetch Calls         :  3                

Parallel Execution Details (DOP=4 , Servers Allocated=12)

This information of Servers Allocated that are greater than the requested servers is not an innocent information and should always kept your attention.

That’s said, as you might have already guessed via the title of this post, the simplification of the real word query shows that the parallel slave is refusing to share the QC execution plan because of the materialisation of the Common Table Expression. This is why when I pre-empted the materialisation of the CTE via the /*+ inline */ hint the parallel slave shared the execution plan of its QC as shown below:

with got_my_dates
 as (select /*+ inline */
          id
            ,padding
            ,start_date
    from
         t2
    where
      start_date   > to_date('07/06/2012','dd/mm/yyyy')
    and start_date <= to_date('10/06/2012','dd/mm/yyyy')
    )
select
     /*+ parallel(4) */
     t1.*
   ,cte.padding
   ,cte.start_date
from
     t1
   ,got_my_dates cte
where
    t1.product_id = cte.id
and t1.product_id <= 1e3;

1000 rows selected.

SQL> start xpsimp

SQL_ID  4h1qa708b9p3j, child number 0
-------------------------------------
Plan hash value: 2637578939
--------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes |    TQ  |IN-OUT|
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |       |       |        |      |
|   1 |  PX COORDINATOR             |          |       |       |        |      |
|   2 |   PX SEND QC (RANDOM)       | :TQ10002 |   681 | 47670 |  Q1,02 | P->S |
|*  3 |    HASH JOIN BUFFERED       |          |   681 | 47670 |  Q1,02 | PCWP |
|   4 |     JOIN FILTER CREATE      | :BF0000  |   681 | 16344 |  Q1,02 | PCWP |
|   5 |      PX RECEIVE             |          |   681 | 16344 |  Q1,02 | PCWP |
|   6 |       PX SEND HYBRID HASH   | :TQ10000 |   681 | 16344 |  Q1,00 | P->P |
|   7 |        STATISTICS COLLECTOR |          |       |       |  Q1,00 | PCWC |
|   8 |         PX BLOCK ITERATOR   |          |   681 | 16344 |  Q1,00 | PCWC |
|*  9 |          TABLE ACCESS FULL  | T2       |   681 | 16344 |  Q1,00 | PCWP |
|  10 |     PX RECEIVE              |          |  1830 | 84180 |  Q1,02 | PCWP |
|  11 |      PX SEND HYBRID HASH    | :TQ10001 |  1830 | 84180 |  Q1,01 | P->P |
|  12 |       JOIN FILTER USE       | :BF0000  |  1830 | 84180 |  Q1,01 | PCWP |
|  13 |        PX BLOCK ITERATOR    |          |  1830 | 84180 |  Q1,01 | PCWC |
|* 14 |         TABLE ACCESS FULL   | T1       |  1830 | 84180 |  Q1,01 | PCWP |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."PRODUCT_ID"="ID")
   9 - access(:Z>=:Z AND :Z<=:Z)
       filter(("ID"<=1000 AND "START_DATE">TO_DATE(' 2012-06-07 00:00:00', 'syyyy-mm-dd
               hh24:mi:ss') 
       AND  "START_DATE"<=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
  14 - access(:Z>=:Z AND :Z<=:Z)
       filter(("T1"."PRODUCT_ID"<=1000 AND
       SYS_OP_BLOOM_FILTER(:BF0000,"T1"."PRODUCT_ID")))
 
Note
-----
   - Degree of Parallelism is 4 because of hint
 

Notice below how, now that the CTE is not materialized, the query is using a single child cursor :

SQL> @gv$sql2
Enter value for sql_id: 4h1qa708b9p3j

SQL_ID        PLAN_HASH_VALUE CHILD_NUMBER LAST_LOAD_TIME      EXECUTIONS END_OF_FETCH_COUNT
------------- --------------- ------------ ------------------- ---------- ------------------
4h1qa708b9p3j      2637578939            0 2016-09-27/18:00:54          1                  1

SQL> @nonshared 4h1qa708b9p3j
Show why existing SQL child cursors were not reused (V$SQL_SHARED_CURSOR)...

SQL_ID                        : 4h1qa708b9p3j
ADDRESS                       : 00007FF921744A58
CHILD_ADDRESS                 : 00007FF921744698
CHILD_NUMBER                  : 0
REASON                        :
CON_ID                        : 1
-----------------

And spot as well that the non materialization of the CTE is so that the number of requested parallel server (2*DOP) equals the number of allocated servers as shown below :

SQL> select
        sql_id
       ,process_name px_slave
     from gv$sql_monitor
     where sql_id = '4h1qa708b9p3j'
     and sql_exec_id =16777216
     and trunc(sql_exec_start) = trunc(sysdate)
     and sql_text is null
     order by 2 ;

SQL_ID        PX_SL
------------- -----
4h1qa708b9p3j p000
4h1qa708b9p3j p001
4h1qa708b9p3j p002
4h1qa708b9p3j p003
4h1qa708b9p3j p004
4h1qa708b9p3j p005
4h1qa708b9p3j p006
4h1qa708b9p3j p007

8 rows selected.

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)     
 Instance ID         :  1                   
 Session             :  C##MHOURI (7:4495)  
 SQL ID              :  4h1qa708b9p3j       
 SQL Execution ID    :  16777216            
 Execution Started   :  09/27/2016 18:00:54 
 First Refresh Time  :  09/27/2016 18:00:54 
 Last Refresh Time   :  09/27/2016 18:00:57 
 Duration            :  3s                  
 Module/Action       :  SQL*Plus/-          
 Service             :  orcl                
 Program             :  sqlplus.exe         
 Fetch Calls         :  3                 

Parallel Execution Details (DOP=4 , Servers Allocated=8)

And for those who can’t deal without execution plan here’s below the new execution plan

Plan hash value: 2637578939
--------------------------------------------------------------------------------
| Id  | Operation                   | Name     | Rows  | Bytes |    TQ  |IN-OUT|
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |          |       |       |        |      |
|   1 |  PX COORDINATOR             |          |       |       |        |      |
|   2 |   PX SEND QC (RANDOM)       | :TQ10002 |   738 | 51660 |  Q1,02 | P->S |
|*  3 |    HASH JOIN BUFFERED       |          |   738 | 51660 |  Q1,02 | PCWP |
|   4 |     JOIN FILTER CREATE      | :BF0000  |   737 | 17688 |  Q1,02 | PCWP |
|   5 |      PX RECEIVE             |          |   737 | 17688 |  Q1,02 | PCWP |
|   6 |       PX SEND HYBRID HASH   | :TQ10000 |   737 | 17688 |  Q1,00 | P->P |
|   7 |        STATISTICS COLLECTOR |          |       |       |  Q1,00 | PCWC |
|   8 |         PX BLOCK ITERATOR   |          |   737 | 17688 |  Q1,00 | PCWC |
|*  9 |          TABLE ACCESS FULL  | T2       |   737 | 17688 |  Q1,00 | PCWP |
|  10 |     PX RECEIVE              |          |  1986 | 91356 |  Q1,02 | PCWP |
|  11 |      PX SEND HYBRID HASH    | :TQ10001 |  1986 | 91356 |  Q1,01 | P->P |
|  12 |       JOIN FILTER USE       | :BF0000  |  1986 | 91356 |  Q1,01 | PCWP |
|  13 |        PX BLOCK ITERATOR    |          |  1986 | 91356 |  Q1,01 | PCWC |
|* 14 |         TABLE ACCESS FULL   | T1       |  1986 | 91356 |  Q1,01 | PCWP |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."PRODUCT_ID"="ID")
   9 - access(:Z>=:Z AND :Z<=:Z)
       filter(("ID"<=1000 AND "START_DATE">TO_DATE(' 2012-06-07 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND 
              "START_DATE"<=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
  14 - access(:Z>=:Z AND :Z<=:Z)
       filter(("T1"."PRODUCT_ID"<=1000 AND SYS_OP_BLOOM_FILTER(:BF0000,"T1"."PRODUCT_ID")))
 
Note
-----
- Degree of Parallelism is 4 because of hint

While the initial execution plan uses two DFO trees, the new one managed to use only a single DFO trees with 3 DFOs. Having multiple parallelisers (or DFOs) might sometimes create issues like in this current case of TEMP TABLE transformation and PQ_SLAVE_MISMATCH.

BOTTOM LINE

Even though this issue doesn’t reproduce in 12.0.1.2, for previous Oracle releases, watch out your parallel queries using a temp table transformation. You might be confronted to an execution plan instability due to the PQ_SLAVE_MISMATCH non-sharing reason where a parallel slave refuses to share the execution plan of its query coordinator. Bear in mind as well that in such situation even a SQL Profile will not succeed to definitely guarantee the same execution plan at each execution.

August 6, 2016

Primary Key non unique unusable index and direct path load:again

Filed under: direct path,Oracle,SQL Server — hourim @ 6:10 am

This combination of aprimary key policed by a non-unique unusable index on table direct path loaded raising error ORA-26026 has occurred again following a recent upgrade from 11.2.0.3 to 11.0.2.4

I know that many authors like Randolph Geist and Richard Foote has already wrote something about this subject. I, however, for the sake of my proper documentation, decided to summarise this issue in my blog.

The story comes when a developer sends me an e-mail saying that he has observed the following error in one of his overnight batch jobs in one of the TEST databases:

ORA-26026 : unique index xxx.t1_non_unique_idx initially in unusable state

Notice how the error message is pointing to a unique index error on a non-unique index. That was, for me, the first clue to what is really happening in the developer situation. The name of the developer index in the original error text was not so suggestive about the uniqueness of the index as the one I used in the above error message. The developer batch job was accomplishing the following steps:

  • disable all non-unique indexes on the target table
  • parallel direct path load into the target table

The developer says that his job was running quite smoothly in 11.2.0.3 and started failing because of the ORA-26026 following a fresh upgrade to 11.2.0.4. Here’s below the developer set-up you can use and play with at will:

create table t1 as select
         rownum              n1
        ,trunc((rownum-1/3)) n2
        ,mod(rownum,10)      n3
    from dual
    connect by level <= 1e3;

create index t1_non_unique_idx on t1(n1,n2);

alter table t1 add constraint t1_pk primary key (n1) using index;

create unique index t1_pk on t1(n1);

 select index_name, uniqueness
     from user_indexes
     where table_name = 'T1';
INDEX_NAME                     UNIQUENES
------------------------------ ---------
T1_PK                          UNIQUE
T1_NON_UNIQUE_IDX              NONUNIQUE

select
        constraint_name
       ,index_name
    from user_constraints
    where table_name = 'T1'
    and constraint_type = 'P';

CONSTRAINT INDEX_NAME
---------- ---------------------
T1_PK      T1_NON_UNIQUE_IDX

With this set-up in place the primary key constraint is policed by the non-unique index t1_non_unique_idx. The batch continues then by disabling all non-unique indexes and direct path load into the target table. Something resembling to the following piece of SQL code:

create table t2 as select * from t1;
truncate table t1;
alter index t1_non_unique_idx unusable;

Execute the following insert/select piece of code in 11.2.0.3 and you will not encounter any error:

insert /*+ append */ into t1 select * from t2;

Upgrade to 11.2.0.4 (or 12c) and you will immediately hit the ORA-26026:

ERROR at line 1:
ORA-26026: unique index C##MHOURI.T1_NON_UNIQUE_IDX initially in unusable state

In the developer case, despite the existence of a perfect unique index to cover the Primary key, this constraint was enforced via a non-unique index. It seems that the developer has created the T1_PK index after he has created the non-unique index and the primary key constraint.

To solve this issue all what I did is to change the index enforcing the primary key as follows:

alter table t1 modify constraint t1_pk using index t1_pk;

select
         constraint_name
        ,index_name
    from user_constraints
    where table_name = 'T1'
    and constraint_type = 'P';

CONSTRAINT INDEX_NAME
---------- ----------
T1_PK      T1_PK

insert /*+ append */ into t1 select * from t2;

1000 rows created.

Bottom line : when you are direct path loading data into a table after you have disabled its non-unique indexes be sure that your primary key is not enforced via one of those disabled non-unique indexes. Otherwise, depending on the Oracle release you are using, your insert will be rejected because of the ORA-26026 error.

July 16, 2016

SQL Server 2016 : parallel DML

Filed under: Oracle — hourim @ 2:16 pm

In contrast to Oracle where parallel DML is possible since a long time in SQL Server it is only until the 2016 last release where a parallel insert-select has been made possible. This article aims to illustrate this parallel operation in SQL Server and show how the resemblance with Oracle is very accurate.

I am going first to create the source and the target tables necessary for the demonstration

drop table t1;
drop table t2;

create table t1 (id   INT,
                 Nom  VARCHAR(100),
  	             Prenom  VARCHAR(100),
		        Ville   VARCHAR(100)
		 );

insert into t1(id, Nom, Prenom, Ville)
select TOP 1000000 ROW_NUMBER() over (order by a.name) ROWID,
                   'Bruno',
		   CASE WHEN ROW_NUMBER() over (order by a.name)%2 =1 THEN 'Smith'
		   ELSE 'Mohamed' END,
		   CASE WHEN ROW_NUMBER() over (order by a.name)%10 =1 THEN 'Paris'
		   WHEN ROW_NUMBER() over (order by a.name)%10 =5 THEN 'Lille'
		   WHEN ROW_NUMBER() over (order by a.name)%10 =3 THEN 'Marseille'
		   ELSE 'Londres' END
FROM sys.all_objects a
cross join sys.all_objects b;

create table t2 (id     INT,
                 Nom    VARCHAR(100),
	            Prenom  VARCHAR(100),
	      	    Ville   VARCHAR(100)
		 );

I have created table t1 with 1,000,000 rows and have cloned it into an empty t2 table. I am going below to insert the content of t1 into t2 table hoping that this insert will operate in parallel.

Here’s below the SQL Server version I am using:

select @@version;

Microsoft SQL Server 2016 (RTM) - 13.0.1601.5 (X64)
Copyright (c) Microsoft Corporation  Developer Edition

One of the parallel DML pre-requisites in SQL Server is the compatibility_level parameter which should have a value set to 130. Let’s verify this parameter value before trying the insert/select operation:

select name, compatibility_level from sys.databases where name = 'master';

name	compatibility_level
master	130

If the value of this parameter is not equal to 130 you can set it using the following command (I don’t know the impact of this change on your application so don’t change it without measuring its possible side effects ):

 alter database master set compatibility_level = 130;

Finally I am now ready to launch the insert/select operation and gather its corresponding execution plan:

insert into t2 WITH (TABLOCK)
select * from t1;

parallel DML3

Notice that I have added a WITH (TABLOCK) clause to this insert so that it can operates in parallel. In close comparison with Oracle direct path load where a lock is automatically acquired on the inserted table (until the transaction is committed) it seems that in SQL Server we need to explicitly acquire this lock with the (TABLOCK) clause.

You might ask how I have figured out that the insert/select operation has been done in parallel. In fact the two yellow arrows in the above execution plan indicate the parallel nature of the execution. However the two yellow arrows do not indicate that the DOP (Degree Of Parallelism) is 2. If you want to know the actual DOP used by this operation you have to hover over the first operation (towards the left : INSERT) to see a tooltip showing that degree of parallelism if 4. However I still have not found a convenient way to capture a mouse tooltip using greenshot. When dealing with SQL Server execution plan I very often prefer the free version of the SQL Sentry plan explorer from which I have captured the following part of the above execution plan where you can see that the number of executions (Actual Executions or Starts in Oracle terms) equals 4 indicating that the DOP is 4:

Sentry Plan Explorer 3 - parallel_dml.sqlplan

Notice by the way that in contrast to the graphical execution plan where the INSERT operation doesn’t seem to be done in parallel the SQL Sentry Plan explorer is clearly indicating that the insert has been operated in parallel.

If you know how parallel process is handled in Oracle you will certainly not be disappointed when you start dealing with parallel processing in SQL Server. Almost all the parallel concepts are identical. That is:

  • The maximum number of 2 concurrent DFO  active per DFO tree
  • The number of parallel servers (thread) which is 2 * DOP
  • The different data parallel distribution between servers
  • etc…

I have mentioned above that, one of the pre-requisites for a parallel insert/select operation to be successfully accomplished is the explicit lock of the inserted table. This has paved for me the way to check whether the Oracle direct path impeaching reasons: triggers and foreign keys can also restrict the parallel insert in SQL Server. Here’s then the demo; first with a trigger and second with a foreign key implemented on the t2 table:

CREATE TRIGGER t2_A_Ins_ ON  t2
FOR INSERT
AS
begin
    declare @Firstname nvarchar(50)
    set @Firstname = 'Unknown'
END;
truncate table t2;

insert into t2 WITH (TABLOCK)
select * from t1;

parallel DML trigger
As you can point it out the insert trigger annihilates the parallel insert operations in SQL Server.

drop trigger master.t2_A_Ins;
ALTER TABLE t1 ADD CONSTRAINT t1_pk
UNIQUE (id);

ALTER TABLE t2 ADD CONSTRAINT t2_t1_fk
FOREIGN KEY (id) references t1(id);

And now an insert on table t2 having a foreign key

truncate table t2;

insert into t2 WITH (TABLOCK)
select * from t1;

Plan with FK

Again the existence of a Foreign key on the t2 table pre-empted a parallel insert/select operation.
Funny enough the resemblance with Oracle.

Before pushing the “Publish” button I have played again with the model and from time to time I was unable to reproduce exactly the conclusions made in this article about the parallel insert impeachment reasons. I will certainly be back to this article when my investigations will be finished

Next Page »

Blog at WordPress.com.

Tony's Oracle Tips

Tony Hasler's light hearted approach to learning about Oracle

Richard Foote's Oracle Blog

Focusing Specifically On Oracle Indexes, Database Administration and Some Great Music

Hatem Mahmoud Oracle's blog

Just another Oracle blog : Database topics and techniques

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Diagnostician

Performance troubleshooting as exact science

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

EU Careers info

Your career in the European Union

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)