Mohamed Houri’s Oracle Notes

February 24, 2015

Parallel Query, Broadcast distribution and TEMP space

Filed under: Parallel Processing — hourim @ 6:24 pm

When it comes to parallel operations there are fundamentally two data distribution methods between individual parallel (PX) server sets: hash-hash and broadcast distribution. Suppose that you are going to operate a parallel HASH JOIN between two full table scans using a Degree of Parallelism (DOP) of 2. Since the HASH JOIN operation needs two distinct set of PX servers there will be actually always 2*DOP = 4 slaves acting in parallel. They represent two slaves per PX set: (PX1 and PX2) for the first set and (PX3 and PX4) for the second set. Each table in the join is read by one of the PX server set in parallel. When both PX servers have finished collecting their data set they need to distribute it to the subsequent parallel HASH JOIN operation. This data distribution can be done using, commonly, either a hash-hash or a broadcast distribution. Using the former method (hash-hash) the result set gathered by the parallel scan of the build and the probe tablesin the join are both sent to the parallel server set responsible for the hash join operation. Using a BROADCAST distribution method, instead of distributing rows from both result sets Oracle sends the smaller result set to all parallel server slaves of the set responsible of the subsequent parallel hash joins operation.

I bolded the word smaller result set to emphasize that smaller is relative to the other row source in the join. Do you consider that a build table with 78 million of rows is a small data set? It might be considered smaller in the eye of the Oracle optimizer when the probe table is estimated to generate 770 million of rows. This is exactly what happened to me.

If this concept of parallel data distribution is still not clear for you and you want to understand what I have encountered and how I’ve managed to pull out myself from this nightmare then continuing reading this article might be worth the effort.

I was asked to create a big table based on a join with two other big tables so that doing this serially was practically impossible to complete in an acceptable execution time. I decided, therefore, to create it in parallel. After having enabled parallel DML and forced parallel DDL I launched the following create table statement with a DOP of 8 both for DML and DDL

SQL> create /*+ parallel(8) */
       table table_of_dates
       tablespace dtbs
       pctfree 0
    as
    select
        /*+ parallel(8)
          full(t)
          full(tr)
      */
       t.tr_tabl_id
     , t.table1_date_time
     , t.writing_date
     , min(tr.arrivl_date)
     , max(tr.arrivl_date)
   from
      table1 t
   left outer join table2 tr
   on t.tr_tabl_id = tr.tr_tabl_id
   join table3
   on t.order_id = table3.order_id
   and tr.status not in ('CANCELED')
   where t.writing_date <= to_date('17.06.2011', 'dd.mm.yyyy')
   and table3.order_type = ‘Broadcast’
   group by
         t.tr_tabl_id
       , t.table1_date_time
       , t.writing_date;

create /*+ parallel(8) */
*
ERROR at line 1:
ORA-12801: error signaled in parallel query server P013
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

As you can notice, it went with an ORA-01652 error. Below it the corresponding Real Time SQL Monitoring Report showing that the HASH JOIN operation at line 7 reaches 67G which is beyond the size of the current physical TEMP tablespace (64G) and hence the ORA-01652 error.

Parallel Execution Details (DOP=8, Servers Allocated=16)
SQL Plan Monitoring Details (Plan Hash Value=3645515647)
=====================================================================================
| Id |              Operation              |        Name | Execs |   Rows   | Temp  |
|    |                                     |             |       | (Actual) | (Max) |
=====================================================================================
|  0 | CREATE TABLE STATEMENT              |             |    17 |          |       |
|  1 |   PX COORDINATOR                    |             |    17 |          |       |
|  2 |    PX SEND QC (RANDOM)              | :TQ10003    |       |          |       |
|  3 |     LOAD AS SELECT                  |             |       |          |       |
|  4 |      HASH GROUP BY                  |             |       |          |       |
|  5 |       PX RECEIVE                    |             |       |          |       |
|  6 |        PX SEND HASH                 | :TQ10002    |     8 |          |       |
|  7 |         HASH JOIN                   |             |     8 |        0 |   67G |
|  8 |          PX RECEIVE                 |             |     8 |     626M |       |
|  9 |           PX SEND BROADCAST         | :TQ10001    |     8 |     626M |       |
| 10 |            HASH JOIN                |             |     8 |      78M |       |
| 11 |             PX RECEIVE              |             |     8 |     372K |       |
| 12 |              PX SEND BROADCAST      | :TQ10000    |     8 |     372K |       |
| 13 |               PX BLOCK ITERATOR     |             |     8 |    46481 |       |
| 14 |                INDEX FAST FULL SCAN | IDX_TABLE3_3|   182 |    46481 |       |
| 15 |             PX BLOCK ITERATOR       |             |     8 |      88M |       |
| 16 |              TABLE ACCESS FULL      | TABLE1      |   120 |      88M |       |
| 17 |          PX BLOCK ITERATOR          |             |     8 |     717M |       |
| 18 |           TABLE ACCESS FULL         | TABLE2      |   233 |     717M |       |
=====================================================================================

I stumped few minutes looking at the above execution plan and have finally decided to try a second create table with a reduced degree of parallelism (4 instead of 8) and here what I got

SQL> create /*+ parallel(4) */
       table table_of_dates
       tablespace dtbs
       pctfree 0
    as
    select
        /*+ parallel(4)
         full(t)
         full(tr)
      */
       t.tr_tabl_id
     , t.table1_date_time
     , t.writing_date
     , min(tr.arrivl_date)
     , max(tr.arrivl_date)
   from
      table1 t
   left outer join table2 tr
   on t.tr_tabl_id = tr.tr_tabl_id
   join table3
   on t.order_id = table3.order_id
   and tr.status not in ('CANCELED')
   where t.writing_date <= to_date('17.06.2011', 'dd.mm.yyyy')
   and table3.order_type = ‘Broadcast’
   group by
         t.tr_tabl_id
       , t.table1_date_time
       , t.writing_date;

Table created.
Elapsed: 00:31:42.29

The table has been this time successfully created within 32 minutes approximatively.

Before going to the next issue in the pipe, I wanted to understand why reducing the Degree Of Parallelism (DOP) from 8 to 4 made the create statement successful? The obvious thing I have attempted was to compare the 8 DOP execution plan with the 4 DOP one. The first plan has already been shown above. The second one is presented here below (reduced only to the information that is vital to the aim of this article):

Parallel Execution Details (DOP=4 , Servers Allocated=8)
SQL Plan Monitoring Details (Plan Hash Value=326881411)
=============================================================================================
| Id |                 Operation                  |       Name     | Execs |   Rows   |Temp |
|    |                                            |                |       | (Actual) |(Max)|
=============================================================================================
|  0 | CREATE TABLE STATEMENT                     |                |     9 |        4 |     |
|  1 |   PX COORDINATOR                           |                |     9 |        4 |     |
|  2 |    PX SEND QC (RANDOM)                     | :TQ10003       |     4 |        4 |     |
|  3 |     LOAD AS SELECT                         |                |     4 |        4 |     |
|  4 |      HASH GROUP BY                         |                |     4 |      75M |     |
|  5 |       PX RECEIVE                           |                |     4 |     168M |     |
|  6 |        PX SEND HASH                        | :TQ10002       |     4 |     168M |     |
|  7 |         HASH JOIN                          |                |     4 |     168M | 34G |
|  8 |          PX RECEIVE                        |                |     4 |     313M |     |
|  9 |           PX SEND BROADCAST                | :TQ10001       |     4 |     313M |     |
| 10 |            HASH JOIN                       |                |     4 |      78M |     |
| 11 |             BUFFER SORT                    |                |     4 |     186K |     |
| 12 |              PX RECEIVE                    |                |     4 |     186K |     |
| 13 |               PX SEND BROADCAST            | :TQ10000       |     1 |     186K |     |
| 14 |                TABLE ACCESS BY INDEX ROWID | TABLE3         |     1 |    46481 |     |
| 15 |                 INDEX RANGE SCAN           | IDX_ORDER_TYPE |     1 |    46481 |     |
| 16 |             PX BLOCK ITERATOR              |                |     4 |      88M |     |
| 17 |              TABLE ACCESS FULL             | TABLE1         |   115 |      88M |     |
| 18 |          PX BLOCK ITERATOR                 |                |     4 |     770M |     |
| 19 |           TABLE ACCESS FULL                | TABLE2         |   256 |     770M |     |
=============================================================================================

They don’t share exactly the same execution plan (they have two different plan hash values). The irritating question was: why, by halving down the degree of parallelism from 8 to 4 the same SQL statement necessitated almost half (34G) the amount of TEMP space and completed successfully?

The answer resides into the parallel distribution (PQ Distrib) method used by the parallel server sets to distribute their collected set of rows to the subsequent parallel server set (doing the hash join).

In the DOP 4 execution plan above we can see that we have 4 PX parallel server sets each one responsible of filling up one of the virtual TQ tables: TQ10000, TQ10001, TQ10002 and TQ10003. Here below is how to read the above execution plan:

  1.  The first PX1 set of slaves reads TABLE3 and broadcast its data set to the second PX2 set of slaves through the TQ10000 virtual table.
  2.  PX2 set reads TABLE1, hash join it with the data set it has received (TQ10000) from PX1 set and broadcast its result to the next parallel server
    set which is PX3 via the second TQ10001 virtual table.
  3. PX3 set of parallel slaves probes TABLE2 by parallel full scanning it and hash join it with the build result set (TQ10001) it has received from
    PX2 parallel set. This operation ends up by filling up the third virtual TQ table TQ10002 and by sending it to the next and last PX server PX4
    using a hash distribution.
  4. Finally, PX4 set of slaves will receive the TQ10002 data, hash group by it, fill the last virtual table (TQ10003) table and send it to the query
    coordinator (QC) which will end up by creating the table_of_dates table

That is a simple way of reading a parallel execution plan. By listing the above parallel operations I aimed to emphasize that data (HASH JOIN operation at line 10) produced by the parallel set of slaves PX2 is broadcasted (PX SEND BROADCAST operation at line 9) to the next parallel set PX3. And this means that each slave of PX2 set will pass every row it has received to every slave of the PX3 set. A typical reason to do such a distribution method is that the data set of the first row source (78M of rows in this case) is ”smaller” than the second row source to be joined with (770M of rows).

In order to make the picture clear let’s zoom around the hash join operation in the 4 DOP plan

Parallel Execution Details (DOP=4 , Servers Allocated=8)
======================================================================
| Id |        Operation    |       Name     | Execs |   Rows   |Temp |
|    |                     |                |       | (Actual) |(Max)|
======================================================================
|  7 |HASH JOIN            |                |     4 |     168M | 34G |
|  8 | PX RECEIVE          |                |     4 |     313M |     |
|  9 |  PX SEND BROADCAST  | :TQ10001       |     4 |     313M |     |
| 10 |   HASH JOIN         |                |     4 |      78M |     |
| 18 | PX BLOCK ITERATOR   |                |     4 |     770M |     |
| 19 |  TABLE ACCESS FULL  | TABLE2         |   256 |     770M |     |
======================================================================

The ”smaller” result set produced by the PX2 set of slaves at line 10 is 78M of rows. As far as Oracle decided to broadcast those 78M of rows towards PX3 set it has been duplicated as many times as there are slaves in PX3 set to receive the broadcasted data. As far as we have a DOP of 4 then this means that the HASH JOIN operation at line 7 has received 4*78 = 313M of rows (operation at line 8) that has been built and hashed in memory in order to be able to probe the result set produced by the second parallel server set coming from operations 18-19.

Multiply the DOP by 2, keep the same parallel distribution method and the same HASH JOIN operation will have to build and hash a table of 8*78 = 626M of rows. Which ultimately has required more than 67GB of TEMP space (instead of the initial DOP 4 34GB) as shown below:

Parallel Execution Details (DOP=8, Servers Allocated=16)
===================================================================
| Id |     Operation     |        Name | Execs |   Rows   | Temp  |
|    |                   |             |       | (Actual) | (Max) |
===================================================================
|  7 |HASH JOIN          |             |     8 |        0 |   67G |
|  8 | PX RECEIVE        |             |     8 |     626M |       |
|  9 |  PX SEND BROADCAST| :TQ10001    |     8 |     626M |       |
| 10 |   HASH JOIN       |             |     8 |      78M |       |
| 17 | PX BLOCK ITERATOR |             |     8 |     717M |       |
| 18 |  TABLE ACCESS FULL| TABLE2      |   233 |     717M |       |
===================================================================

Now that I know from where the initial ORA-01652 error is coming from, changing the parallel distribution method from BROADCAST to HASH might do the job without requiring more than 64GB of TEMP space. A simple way to accomplish this task is to hint the parallel select to use the desired parallel distribution method. From the outline of the successful DOP 4 execution plan I took the appropriate hint (pq_distribute(alias hash hash)), adapted it and issued the following create table:

SQL> create /*+ parallel(8) */
       table table_of_dates
       tablespace dtbs
       pctfree 0
    as
    select
        /*+ parallel(8)
          full(t)
          full(tr)
	  pq_distribute(@"SEL$E07F6F7C" "T"@"SEL$2" hash hash)
          px_join_filter(@"SEL$E07F6F7C" "T"@"SEL$2")
          pq_distribute(@"SEL$E07F6F7C" "TR"@"SEL$1" hash hash)
          px_join_filter(@"SEL$E07F6F7C" "TR"@"SEL$1")
      */
       t.tr_tabl_id
     , t.table1_date_time
     , t.writing_date
     , min(tr.arrivl_date)
     , max(tr.arrivl_date)
   from
      table1 t
   left outer join table2 tr
   on t.tr_tabl_id = tr.tr_tabl_id
   join table3
   on t.order_id = table3.order_id
   and tr.status not in ('CANCELED')
   where t.writing_date <= to_date('17.06.2011', 'dd.mm.yyyy')
   and table3.order_type = ‘Broadcast’
   group by
         t.tr_tabl_id
       , t.table1_date_time
       , t.writing_date;	   

Table created.
Elapsed: 00:12:46.33

And the job has been done in less than 13 minutes instead of the initial 33 minutes with DOP 4.

The new execution plan is:

SQL Plan Monitoring Details (Plan Hash Value=5257928)
====================================================================================
| Id |              Operation              |        Name |Execs |   Rows   | Temp  |
|    |                                     |             |      | (Actual) | (Max) |
====================================================================================
|  0 | CREATE TABLE STATEMENT              |             |   17 |        8 |       |
|  1 |   PX COORDINATOR                    |             |   17 |        8 |       |
|  2 |    PX SEND QC (RANDOM)              | :TQ10004    |    8 |        8 |       |
|  3 |     LOAD AS SELECT                  |             |    8 |        8 |       |
|  4 |      HASH GROUP BY                  |             |    8 |      75M |   10G |
|  5 |       HASH JOIN                     |             |    8 |     168M |       |
|  6 |        JOIN FILTER CREATE           | :BF0000     |    8 |      78M |       |
|  7 |         PX RECEIVE                  |             |    8 |      78M |       |
|  8 |          PX SEND HASH               | :TQ10002    |    8 |      78M |       |
|  9 |           HASH JOIN BUFFERED        |             |    8 |      78M |    3G |
| 10 |            JOIN FILTER CREATE       | :BF0001     |    8 |    46481 |       |
| 11 |             PX RECEIVE              |             |    8 |    46481 |       |
| 12 |              PX SEND HASH           | :TQ10000    |    8 |    46481 |       |
| 13 |               PX BLOCK ITERATOR     |             |    8 |    46481 |       |
| 14 |                INDEX FAST FULL SCAN | IDX_TABLE3_3|  181 |    46481 |       |
| 15 |            PX RECEIVE               |             |    8 |      88M |       |
| 16 |             PX SEND HASH            | :TQ10001    |    8 |      88M |       |
| 17 |              JOIN FILTER USE        | :BF0001     |    8 |      88M |       |
| 18 |               PX BLOCK ITERATOR     |             |    8 |      88M |       |
| 19 |                TABLE ACCESS FULL    | TABLE1      |  121 |      88M |       |
| 20 |        PX RECEIVE                   |             |    8 |     770M |       |
| 21 |         PX SEND HASH                | :TQ10003    |    8 |     770M |       |
| 22 |          JOIN FILTER USE            | :BF0000     |    8 |     770M |       |
| 23 |           PX BLOCK ITERATOR         |             |    8 |     770M |       |
| 24 |            TABLE ACCESS FULL        | TABLE2      |  245 |     770M |       |
====================================================================================

Changing the parallel distribution method from broadcast to hash-hash has not only reduced the TEMP space usage but has halved the execution time of the create table.

Conclusion
Watch carefully the degree of parallelism you are going to use. In case of a broadcast distribution method for relatively big ”smaller result set” you might end up by devouring a huge amount of TEMP space and failing to succeed with the ORA-01652: unable to extend temp segment by 128 in tablespace TEMP. Hopefully, with the arrival of the 12c Adaptive Distribution Method and the new HYBRID HASH parallel distribution method, the STATISTIC COLLECTOR operation placed below the distribution method will operate a dynamic switch from a BROADCAST distribution to a HASH distribution whenever the number of rows to be distributed exceeds a threshold which is 2*DOP. With my initial DOP of 8 I would have had a threshold of 16 which is largely below the 78 million of rows that I am due to distribute further up.

February 14, 2015

Resumable time out

Filed under: Tuning — hourim @ 9:47 am

I was trying to create a very big table through a create table as select from a join between two huge tables forcing both parallel DDL and parallel QUERY and was mainly concerned about the amount of temp space the hash join between those two big tables will require to get my table smoothly created. I said hash join because my first attempt to create this table used a nested loop join which was driving a dramatic 100 million starts of table access by index rowid since more than 10 hours when I decided to kill the underlying session. The create table being a one-shot process I decided to hint the optimizer so that it will opt for a hash join operation instead of the initial nested loop. But my concern has been transferred from a long running non finishing SQL statement to how much my create table will need of TEMP space in order to complete successfully.

I launched the create table and started monitoring it with Tanel Poder snapper script, v$active_session_history and the Real Time SQL Monitoring feature (RTSM). This is what the monitoring was showing

SQL> select 
          event
	 ,count(1) 
      from 
          v$active_session_history 
      where sql_id = '0xhm7700rq6tt' 
      group by event 
      order by 2 desc;


EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
direct path read                                                       3202
direct path write temp                                                 1629
                                                                        979
db file scattered read                                                    8
CSS initialization                                                        7
CSS operation: query                                                      1

However few minutes later Tanel Poder snapper started showing the following dominant wait event

SQL> @snapper ash 5 1 all

----------------------------------------------------------------------------------------------------
Active% | INST | SQL_ID          | SQL_CHILD | EVENT                               | WAIT_CLASS
----------------------------------------------------------------------------------------------------
   800% |    1 | 0xhm7700rq6tt   | 0         | statement suspended, wait error to  | Configuration

--  End of ASH snap 1, end=2015-02-12 09:12:57, seconds=5, samples_taken=45

Event which keeps incrementing in v$active_session_history as shown below:

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
statement suspended, wait error to be cleared                          7400
direct path read                                                       3202
direct path write temp                                                 1629
                                                                        979
db file scattered read                                                    8
CSS initialization                                                        7
CSS operation: query                                                      1

7 rows selected.

SQL> /

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
statement suspended, wait error to be cleared                          7440
direct path read                                                       3202
direct path write temp                                                 1629
                                                                        979
db file scattered read                                                    8
CSS initialization                                                        7
CSS operation: query                                                      1

7 rows selected.

SQL> /

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
statement suspended, wait error to be cleared                          7480
direct path read                                                       3202
direct path write temp                                                 1629
                                                                        979
db file scattered read                                                    8
CSS initialization                                                        7
CSS operation: query                                                      1

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
statement suspended, wait error to be cleared                         11943
direct path read                                                       3202
direct path write temp                                                 1629
                                                                        980
db file scattered read                                                    8
CSS initialization                                                        7
CSS operation: query                                                      1

The unique wait event that was incrementing is that bizarre statement suspended, wait error to be cleared. The session from which I launched the create table statment was hanging without reporting any error.

As far as I was initially concerned by the TEMP space I checked the available space there

SQL> select 
       tablespace_name,
       total_blocks,
       used_blocks,
       free_blocks
     from v$sort_segment;

TABLESPACE_NAME                 TOTAL_BLOCKS USED_BLOCKS FREE_BLOCKS
------------------------------- ------------ ----------- -----------
TEMP                                 8191744     8191744           0    

No available free space left in TEMP tablespace

The RTSM of the corresponding sql_id was showing the following

=====================================================================================
| Id    |              Operation              |        Name        |  Rows   | Temp |
|       |                                     |                    |(Actual) |      |
=====================================================================================
|     0 | CREATE TABLE STATEMENT              |                    |       0 |      |
|     1 |   PX COORDINATOR                    |                    |         |      |
|     2 |    PX SEND QC (RANDOM)              | :TQ10003           |         |      |
|     3 |     LOAD AS SELECT                  |                    |         |      |
|     4 |      HASH GROUP BY                  |                    |         |      |
|     5 |       PX RECEIVE                    |                    |         |      |
|     6 |        PX SEND HASH                 | :TQ10002           |         |      |
|  -> 7 |         HASH JOIN                   |                    |       0 |  66G | --> spot this 
|       |                                     |                    |         |      |
|       |                                     |                    |         |      |
|       |                                     |                    |         |      |
|     8 |          PX RECEIVE                 |                    |    626M |      |
|     9 |           PX SEND BROADCAST         | :TQ10001           |    626M |      |
|    10 |            HASH JOIN                |                    |     78M |      |
|    11 |             PX RECEIVE              |                    |    372K |      |
|    12 |              PX SEND BROADCAST      | :TQ10000           |    372K |      |
|    13 |               PX BLOCK ITERATOR     |                    |   46481 |      |
|    14 |                INDEX FAST FULL SCAN | INDX_12453656_TABL |   46481 |      |
|    15 |             PX BLOCK ITERATOR       |                    |     88M |      |
|    16 |              TABLE ACCESS FULL      | TABL1              |     88M |      |
|       |                                     |                    |         |      |
| -> 17 |          PX BLOCK ITERATOR          |                    |    682M |      |
| -> 18 |           TABLE ACCESS FULL         | TABLE123456        |    682M |      |
|       |                                     |                    |         |      |
=====================================================================================

The HASH JOIN operation at line 7 has already gone above the available TEMP tablespace size which is 64G. I was wondering then, why my session has not been stopped with an ORA-01652 error instead of hanging there and letting the wait event statement suspended, wait error to be cleared popping up in my different monitoring tools.

After few minutes of googling tuning steps I ended up by executing the following query

SQL>select
        coord_session_id
       ,substr(sql_text, 1,10)
       ,error_msg
    from dba_resumable;

SESSION_ID  SQL        ERROR_MSG
----------- ---------  -------------------------------------------------------------------
146         create /*+  ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

This 146 SID is my create table session! It is in error but hanging and waiting on that statement suspend wait event.

After few minutes of googling tuning again I ended up by checking the following parameter:

SQL> sho parameter resumable

NAME                           TYPE                             VALUE
------------------------------ -------------------------------- ------
resumable_timeout              integer                          9000

Notice now how I finally have made a link between the title and the purpose of this article: resumable_timeout. The Oracle documentation states that when this parameter is set to a nonzero value the resumable space allocation might kicks in.

Put is simply what happens in my case is that since this parameter has been set to 9000 minutes and since my create table as select has encountered an ORA-01652 temp tablespace error, Oracle decided to put my session in a resumable state until someone will point out that space shortage,correct the situation and allow the session to continue its normal processing

The next step I did is to abort my session from its resumable state

SQL> exec dbms_resumable.abort(146);
 
PL/SQL procedure successfully completed.

Unfortunately I waited a couple of minutes in front of my sqlplus session waiting for it to resume without success so that I decided to use the brute force

ERROR:
ORA-03113: end-of-file on communication channel
Process ID: 690975
Session ID: 146 Serial number: 3837

Summary
I would suggest to do not set the resumable_timeout parameter to a nonzero value. Otherwise you will be hanging on a statement suspended, wait error to be cleared event during an amount of time indicated by this resumable_timeout parameter looking around for what is making your SQL statement hanging.

Footnote
Here below few extra words from Oracle documentation about Resumable space allocation

“Resumable space allocation avoids headaches and saves time by suspending, instead
of terminating, a large database operation requiring more disk space than is currently
available. While the operation is suspended, you can allocate more disk space on
the destination tablespace or increase the quota for the user. Once the low space
condition is addressed, the large database operation automatically picks up where it
left off.
As you might expect, the statements that resume are known as resumable statements.
The suspended statement, if it is part of a transaction, also suspends the transaction.
When disk space becomes available and the suspended statement resumes, the
transaction can be committed or rolled back whether or not any statements in
the transactions were suspended. The following conditions can trigger resumable
space allocation:
■ Out of disk space in a permanent or temporary tablespace
■ Maximum extents reached on a tablespace
■ User space quota exceeded
You can also control how long a statement can be suspended. The default time
interval is two hours, at which point the statement fails and returns an error message
to the user or application as if the statement was not suspended at all.
There are four general categories of commands that can be resumable:
(1) SELECT statements, (2) DML commands, (3) SQL*Loader operations, and
(4) DDL statements that allocate disk space.”

January 6, 2015

Approximate_ndv

Filed under: Oracle,Statistics — hourim @ 8:17 pm

A couple of months ago I have been asked to do a pro-active tuning task on an application running Oracle 11.2.0.3.0 under Linux x86 64-bit which is going life in production within the next couple of months. I have been supplied a 60 minutes AWR pre-production report covering a period during which a critical batch job was running. At my surprise the following SQL statement pops up at the top of several SQL Ordered by AWR parts:

select
  /*+ no_parallel(t) 
      no_parallel_index(t) 
      dbms_stats cursor_sharing_exact 
      use_weak_name_resl 
      dynamic_sampling(0) 
      no_monitoring 
      no_substrb_pad 
   */
    count(*),
    count(distinct "SABR_ID"),
    sum(sys_op_opnsize("SABR_ID")),
    substrb(dump(min("SABR_ID"), 16, 0, 32), 1, 120),
    substrb(dump(max("SABR_ID"), 16, 0, 32), 1, 120),
    count(distinct "TYPE_ID"),
    sum(sys_op_opnsize("TYPE_ID")),
    substrb(dump(min(substrb("TYPE_ID", 1, 32)), 16, 0, 32), 1, 120),
    substrb(dump(max(substrb("TYPE_ID", 1, 32)), 16, 0, 32), 1, 120)
from 
    "XXX"."SOM_ET_ORDER_KS" sample (33.0000000000) t

This is a call to dbms_stats package collecting statistics for SOM_ET_ORDER_KS table.
So far so good right?
Hmmm……
Instructed and experimented eyes would have been already caught by two things when observing the above SQL statement:

 count(distinct)
 sample(33,000000)

Do you know what do those two points above mean in an 11gR2 database?
The count (distinct) indicates that the application is collecting statistics under the global preference approximate_ndv set to false which I have immediately checked via this command:

SQL> select dbms_stats.get_prefs ('approximate_ndv') ndv from dual;
NDV
------
FALSE

While  the (sample 33) indicates that the estimate_percent parameter used by this application when collecting statistics is set to 33%

     (estimate_percent => 33)

Is this the best way of collecting statistics?
I don’t think so.
Let’s build a proof of concept. Below is the table on which I will be collecting statistics with and without approximate_ndv

create table t_ndv
as select
      rownum n1
     ,trunc((rownum-1)/3) n2
     ,trunc(dbms_random.value(1,100)) n3
     ,dbms_random.string('L',dbms_random.value(1,5))||rownum v4
  from dual
  connect by level <=1e6;

There are 1 million rows in this table with 1 million distinct n1 values and 1 million distinct v4 values. There are however 333,334 n2 distinct values and only 99 distinct n3 values.
Lets check the Oracle behaviour when approximate_ndv is disabled. All the following tests have been done on 12 Oracle database instance:


BANNER                                                                               
---------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production        
PL/SQL Release 12.1.0.1.0 - Production                                                    
CORE    12.1.0.1.0      Production                                                                
TNS for 64-bit Windows: Version 12.1.0.1.0 - Production                                   
NLSRTL Version 12.1.0.1.0 - Production                                                    
SQL> select dbms_stats.get_prefs ('approximate_ndv') ndv from dual;

NDV
-----
FALSE

SQL> alter session set events '10046 trace name context forever, level 12';

SQL> begin
  2      dbms_stats.gather_table_stats
  3      (user
  4      ,'t_ndv'
  5      ,method_opt => 'FOR ALL COLUMNS SIZE 1'
  6      ,estimate_percent => 57);
  7  end;
 8  /

SQL> alter session set events '10046 trace name context off';                                               

The corresponding tkprofed trace file shows a piece of code which is, as expected, similar to above SQL I have found at the top of SQL Ordered by parts of the AWR I was asked to analyse

select /*+  no_parallel(t) 
            no_parallel_index(t) 
            dbms_stats 
            cursor_sharing_exact 
            use_weak_name_resl 
            dynamic_sampling(0) 
            no_monitoring 
            xmlindex_sel_idx_tbl 
           no_substrb_pad  
       */
  count(*)
, count("N1")
, count(distinct "N1")
, sum(sys_op_opnsize("N1"))
, substrb(dump(min("N1"),16,0,64),1,240)
, substrb(dump(max("N1"),16,0,64),1,240)
, count("N2")
, count(distinct "N2"), sum(sys_op_opnsize("N2")) 
 …etc...
from
 "C##MHOURI"."T_NDV" sample (57.0000000000)  t 

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      8.03       8.03          0       3861          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4      8.03       8.04          0       3861          0           1

And the collected statistics information on t_ndv table is

SQL> select num_rows, blocks, avg_row_len, sample_size 
     from user_tables 
     where table_name = 'T_NDV';

  NUM_ROWS     BLOCKS AVG_ROW_LEN SAMPLE_SIZE
---------- ---------- ----------- -----------
   1000586       3924          23      570334

SQL> select
         column_name
        ,num_distinct
        ,sample_size
    from user_tab_columns
    where table_name = 'T_NDV';

COLUMN_NAME          NUM_DISTINCT SAMPLE_SIZE
-------------------- ------------ -----------
V4                        1001467      570836
N3                             99      570836
N2                         333484      570836
N1                        1001467      570836

When I did the same experiments but with approximate_ndv set to true this is below what I have obtained

SQL> exec dbms_stats.set_global_prefs('approximate_ndv','TRUE');

SQL> exec dbms_stats.delete_table_stats(user ,'t_ndv');

SQL> alter session set events '10046 trace name context forever, level 12';

SQL> begin
  2      dbms_stats.gather_table_stats
  3      (user
  4      ,'t_ndv'
  5      ,method_opt       => 'FOR ALL COLUMNS SIZE 1'
         ,estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE
  6     );
  7  end;
  8  /

SQL> alter session set events '10046 trace name context off';

And finally the corresponding trace file particularly the part that corresponds to gathering number of distinct values

select /*+  full(t)    
            no_parallel(t) 
            no_parallel_index(t) 
            dbms_stats 
            cursor_sharing_exact 
            use_weak_name_resl 
            dynamic_sampling(0) 
            no_monitoring 
            xmlindex_sel_idx_tbl 
            no_substrb_pad  */
  to_char(count("N1")),
  to_char(substrb(dump(min("N1"),16,0,64),1,240)),
  to_char(substrb(dump(max("N1"),16,0,64),1,240)),
  to_char(count("N2")),
  to_char(substrb(dump(min("N2"),16,0,64),1,240)),
  to_char(substrb(dump(max("N2"),16,0,64),1,240)),
  to_char(count("N3")),
  to_char(substrb(dump(min("N3"),16,0,64),1,240)),
  to_char(substrb(dump(max("N3"),16,0,64),1,240)),
  to_char(count("V4")),
  to_char(substrb(dump(min("V4"),16,0,64),1,240)),
  to_char(substrb(dump(max("V4"),16,0,64),1,240)) 
from
 "C##MHOURI"."T_NDV" t  /* NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL,NDV,NIL,NIL*/
 
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.01       0.00          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1      2.85       2.88          0       3861          0           1
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Total        3      2.87       2.88          0       3861          0           1

SQL> select num_rows, blocks, avg_row_len, sample_size 
     from user_tables 
    where table_name = 'T_NDV';

  NUM_ROWS     BLOCKS AVG_ROW_LEN SAMPLE_SIZE
---------- ---------- ----------- -----------
   1000000       3924          23     1000000

SQL> select
        column_name
       ,num_distinct
       ,sample_size
    from user_tab_columns
    where table_name = 'T_NDV';

COLUMN_NAME          NUM_DISTINCT SAMPLE_SIZE
-------------------- ------------ -----------
V4                         990208     1000000
N3                             99     1000000
N2                         337344     1000000
N1                        1000000     1000000

Several important remarks can be emphasized here. First, the elapsed time(which is almost cpu time) has been reduced from 8 seconds to less than 3 seconds. Second, the sample size used automatically by Oracle is 100 (in response to dbms_stats.auto_sample_size parameter I have set during my second statistics gathering attempt) which has the consequence of computing the exact number of rows (num_rows) that are actually present in the t_ndv table(1000000).

With less time and less cpu consumption, Oracle, under approximate_ndv preference set to true and estimate_percent set to dbms_stat.auto_sample_size, produced a perfect estimation when compared to the time and resource consumption it has necessitated when the same property was set to false.

When approximate_ndv is enabled Oracle is fast and accurate. Because if you compare the SQL code used when this property is disabled to the corresponding SQL when it is enabled you will point out that in the later case (enabled) there is no call anymore to the costly count(distinct) function. There is instead a strange call to a couple of NDV, NIL, NIL function as shown below:

from
 "C##MHOURI"."T_NDV" t  /* NDV,NIL,NIL
                          ,NDV,NIL,NIL
                          ,NDV,NIL,NIL
                          ,NDV,NIL,NIL
                         */

which I have intentionnaly arranged to show you that the number of NDV, NIL, NIL coincides with the number of columns (4) of the t_ndv table

SQL> desc t_ndv
     Name      Type
-------------- ------------------
N1             NUMBER
N2             NUMBER
N3             NUMBER     
V4             VARCHAR2(4000)

And immediately after this piece of code the trace file show the following SQL statement

SELECT /*+ parallel */ TO_NUMBER(EXTRACTVALUE(VALUE(T), 
  '/select_list_item/pos') + 1) POS, EXTRACTVALUE(VALUE(T), 
  '/select_list_item/value') VAL, TO_NUMBER(EXTRACTVALUE(VALUE(T), 
  '/select_list_item/nonnulls')) NONNULLS, TO_NUMBER(EXTRACTVALUE(VALUE(T), 
  '/select_list_item/ndv')) NDV, TO_NUMBER(EXTRACTVALUE(VALUE(T), 
  '/select_list_item/split')) SPLIT, TO_NUMBER(EXTRACTVALUE(VALUE(T), 
  '/select_list_item/rsize')) RSIZE, TO_NUMBER(EXTRACTVALUE(VALUE(T), 
  '/select_list_item/rowcnt')) ROWCNT, TO_NUMBER(EXTRACTVALUE(VALUE(T), 
  '/select_list_item/topncnt')) TOPNCNT, EXTRACT(VALUE(T), 
  '/select_list_item/topn_values').GETCLOBVAL() TOPN, NULL MINFREQ, NULL 
  MAXFREQ, NULL AVGFREQ, NULL STDDEVFREQ 
FROM
 TABLE(XMLSEQUENCE(EXTRACT(:B1 , '/process_result/select_list_item'))) T 
  ORDER BY TOPNCNT DESC

which means that Oracle is using internally an in memory xml structue when calculating the number of distinct values of each table column making this process fast and accurate

Bottom Line: I was not asked to look at the manner used by the local DBAs to collect statistics but, being curious at all what I look at when diagnosing performance issues, my attention was kept by this TOP SQL statement one would have considered as a purely coincidence and would have not analysed while it has permitted me to discuss with the local DBAs and to change the way they are collecting statistics.

December 30, 2014

2014 in review

Filed under: Oracle — hourim @ 8:59 am

I have wished in my 2013 annual blogging review to publish 100 blog articles in 2014. Instead of those hoped 100 posts, there have been only 36 posts. In my defence I can use the list of articles I have published for Oracle Otn, Allthings Oracle and Toad World summarized here. Anyway, my 2014 blogging activity was not as I have expected it to be. I hope that the next year will be more fruitful and boosting as far as there are in front of me several challenging issues to investigate and all what I need is:

  • a reproducible model
  • few tests
  • few tests again with different Oracle releases
  • observations
  • consolidations

And end up by writing a document and submit it for critics and review.

Here below is my 2014 blogging activity summary. I wish you a happy new year

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 31,000 times in 2014. If it were a concert at Sydney Opera House, it would take about 11 sold-out performances for that many people to see it.

Click here to see the complete report.

December 25, 2014

Subquery Coalescing

Filed under: Oracle — hourim @ 9:01 am

It’s fairly clear that to improve a performance of a SQL query we sometimes need to rewrite it so that we end up by either reducing the number of scanned tables or by giving the CBO an alternative path the original query was pre-empting it to take. In a previous article, I’ve looked at ways of turning a disjunctive subquery into a conjunctive subquery so that it can be unnested and merged with its parent query block. In this article I aim to explain how to improve performance by coalescing two distinct subqueries. This tuning strategy will be preceded by the conjunction and disjunction subquery concept definition. Finally, simple examples of subquery coalescing will be presented and explained. I aim also to show that sometime de-coalescing can reveal to be a good tuning strategy as far as it might open a new path for an optimal execution plan. This article is based on the Oracle published paper entitled Enhanced Subquery Optimization in Oracle

Conjunction, Disjunction and Containment property

Two subqueries are eligible to be coalesced provided they verify the containment property. A subquery subq1 is said to contain a second subq2 if the result of subq2 is a subset of the subq1 result. In this case subq1 is called the container query block while subq2 is called the contained query block. The same two subqueries will verify the contained property when subq2 contains a conjunctive predicate which, if suppressed, makes subq1 and subq2 equivalent.

Let’s picture the containment property to make it crystal clear. Below I have the following three distinct query blocks

select 
     t1.*
from 
     t1,t2
where t2.id1 = t1.id1;

9000 rows selected.

select 
     t1.*
from 
     t1,t2
where t2.id1 = t1.id1
and   t2.status = 'COM';

8900 rows selected.

select 
     t1.*
from 
     t1,t2
where t2.id1 = t1.id1
and   t2.id1 = t2.id;

4 rows selected.

If we get rid of the predicate part on the status column from the second query block, the first and the second query become equivalent. They will then verify the containment property and therefore can be coalesced. If we look at the second and the third query blocks they are also verifying the containment property as far as the result set of the third query is a subset of the result set of the second one. That is the containment property definition.

A conjunction is the action of linking two subqueries with an AND operator. The conjunction of the two subqueries is true if the two subqueries are simultaneously true; otherwise it is false.

A disjunction is the action of liking two subqueries with an OR predicate. The disjunction of two subqueries is true when one of the two subqueries is true and is false when both are simultaneously true.

Coalescing two subqueries of the same type (exists)

Two conjunctive subqueries satisfying the containment property can be coalesced. Consider the following reproducible model (11.2.0.3)

create table t1
   as select 
    rownum                id1, 
    trunc((rownum-1/3))   n1, 
    date '2012-06-07' + mod((level-1)*2,5) start_date,
    lpad(rownum,10,'0')   small_vc, 
    rpad('x',1000)        padding 
from dual
connect by level <= 1e4;   

create table t2
as select 
    rownum id
    ,mod(rownum,5) + mod(rownum,10)* 10  as id1
    ,case
       when mod(rownum, 1000) = 7 then 'ERR'
       when rownum <= 9900 then 'COM'
       when mod(rownum,10) between 1 and 5 then 'PRP'
     else
       'UNK'
     end status
     ,lpad(rownum,10,'0')    as small_vc  
     ,rpad('x',70)           as padding
from dual
connect by level <= 1e4;

And the following query with two exists subqueries of the same type (exists)

select 
     start_date
    ,count(1) 
from t1 
where 
     start_date >= to_date('10062012','ddmmyyyy')
and (exists (select null
             from t2
             where t2.id1 = t1.id1
             and t2.status = 'COM'
             )
  or exists (select null
             from t2
             where t2.id1 = t1.id1
             and   t2.id1 = t2.id
             )
     )
group by start_date;     

START_DATE          COUNT(1)
----------------- ----------
20120611 00:00:00          2
20120610 00:00:00          1

--------------------------------------------------------------------------------------
| Id  | Operation             | Name    | Starts | E-Rows | A-Rows | Buffers | Reads  |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |      1 |        |      2 |    1950 |   1665 |
|   1 |  HASH GROUP BY        |         |      1 |      1 |      2 |    1950 |   1665 |
|*  2 |   HASH JOIN SEMI      |         |      1 |      1 |      3 |    1950 |   1665 |
|*  3 |    TABLE ACCESS FULL  | T1      |      1 |   4908 |   4000 |    1670 |   1665 |
|   4 |    VIEW               | VW_SQ_1 |      1 |  11843 |   9894 |     280 |      0 |
|   5 |     UNION-ALL         |         |      1 |        |   9894 |     280 |      0 |
|*  6 |      TABLE ACCESS FULL| T2      |      1 |      1 |      4 |     140 |      0 |
|*  7 |      TABLE ACCESS FULL| T2      |      1 |  11842 |   9890 |     140 |      0 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("VW_COL_1"="T1"."ID1")
   3 - filter("START_DATE">=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   6 - filter("T2"."ID1"="T2"."ID")
   7 - filter("T2"."STATUS"='COM')

Note
-----
   - dynamic sampling used for this statement (level=2)

Since the above two exists subqueries verify the containment property, let’s then coalesce them and note the performance consequences we will reach through this transformation

select 
     start_date
    ,count(1) 
from t1 
where 
     start_date >= to_date('10062012','ddmmyyyy')
and (exists (select null
            from t2
            where t2.id1 = t1.id1
            and(t2.status = 'COM'
                or t2.id1 = t2.id)
            )                 
          )
group by start_date;     

----------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows | Buffers | Reads  |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      2 |    1808 |   1665 |
|   1 |  HASH GROUP BY      |      |      1 |      1 |      2 |    1808 |   1665 |
|*  2 |   HASH JOIN SEMI    |      |      1 |     10 |      3 |    1808 |   1665 |
|*  3 |    TABLE ACCESS FULL| T1   |      1 |   4000 |   4000 |    1669 |   1665 |
|*  4 |    TABLE ACCESS FULL| T2   |      1 |   9890 |   9890 |     139 |      0 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T2"."ID1"="T1"."ID1")
   3 - filter("START_DATE">=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   4 - filter(("T2"."STATUS"='COM' OR "T2"."ID1"="T2"."ID"))

Thanks to this coalescing operation we are now doing only a single access full t2 table instead of two initial full table scan of the same table. As such, we saved 140 logical I/O dropping the total number of buffers from 1950 to 1808.

Coalescing two subqueries of different type (exists and not exists)
Real life examples can show cases of subqueries verifying the containment property but having a different type. An example of such situation is presented in the below query:

select 
     start_date,
     id1,
     n1
from t1 
where 
     start_date >= to_date('10062012','ddmmyyyy')
and exists     (select null
                from t2 a
                where a.id1 = t1.id1                
                )
and not exists (select null
                from t2 b
                where b.id1 = t1.id1
                and   b.id1 != 83
                );

START_DATE               ID1         N1
----------------- ---------- ----------
20120611 00:00:00         83         82

----------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows | Buffers | Reads  |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |    1947 |   1665 |
|*  1 |  HASH JOIN ANTI     |      |      1 |      7 |      1 |    1947 |   1665 |
|*  2 |   HASH JOIN SEMI    |      |      1 |     10 |      3 |    1808 |   1665 |
|*  3 |    TABLE ACCESS FULL| T1   |      1 |   4000 |   4000 |    1669 |   1665 |
|   4 |    TABLE ACCESS FULL| T2   |      1 |  10000 |  10000 |     139 |      0 |
|*  5 |   TABLE ACCESS FULL | T2   |      1 |   9000 |   9000 |     139 |      0 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("B"."ID1"="T1"."ID1")
   2 - access("A"."ID1"="T1"."ID1")
   3 - filter("START_DATE">=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   5 - filter("B"."ID1"<>83)

The above query contains two distinct subqueries:

exists (select null
        from t2 a
        where a.id1 = t1.id1                
        )

not exists (select null
            from t2 b
            where b.id1 = t1.id1
            and   b.id1 != 83
            );

They verify the containment property because if we get rid of the predicate (b.id1 != 83) from the second one it becomes equivalent to the first one. The sole difference is that they are of different types: the first one is an exists subquery while the second one is a not exists subquery. Coalescing such a kind of subquery yields to the following query

select 
     start_date,
     id1,
     n1
from t1 
where 
     start_date >= to_date('10062012','ddmmyyyy')
and exists     (select null
                from t2 a
                where a.id1 = t1.id1                
                having sum (case when  a.id1 != 83 
                            then 1 else 0 end) = 0
               );
START_DATE               ID1         N1
----------------- ---------- ----------
20120611 00:00:00         83         82

-----------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows | Buffers | Reads  |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      1 |     557K|   1665 |
|*  1 |  FILTER              |      |      1 |        |      1 |     557K|   1665 |
|*  2 |   TABLE ACCESS FULL  | T1   |      1 |   4000 |   4000 |    1670 |   1665 |
|*  3 |   FILTER             |      |   4000 |        |      1 |     556K|      0 |
|   4 |    SORT AGGREGATE    |      |   4000 |      1 |   4000 |     556K|      0 |
|*  5 |     TABLE ACCESS FULL| T2   |   4000 |   1000 |   3000 |     556K|      0 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NOT NULL)
   2 - filter("START_DATE">=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   3 - filter(SUM(CASE  WHEN "A"."ID1"<>83 THEN 1 ELSE 0 END )=0)
   5 - filter("A"."ID1"=:B1)

One of the immediate and obvious observations is that the coalescing process produces a dramatic performance alteration as far as we went from a clean 1947 logical I/O to a terrific 557 thousands of Buffers. This is a clear lesson that it is not because we’ve eliminated a full table access from the picture that we will have a performance improvement. Sometimes visiting twice a table might make the CBO generating an optimal execution plan. The collateral effect of the above coalescing process is that the CBO has been unable to unnest the new exist subquery and therefore it execute it as filter operation even though that the most resource expensive operation is operation 5. Important information in the new suboptimal plan is the terrific (4000) number of executed operations as shown by the Starts column in the execution plan. The operation at line 2 produces 4000 rows and it dictates the number of times the operation at line 3 (and its children operations at lines 4 and 5) it will be executed.

The CBO has been unable to unnest the new coalesced exist subquery because of the presence of the aggregate function (sum) as proofed by the corresponding 10053 trace file

*****************************
Cost-Based Subquery Unnesting
*****************************
SU: Unnesting query blocks in query block SEL$1 (#1) that are valid to unnest.
Subquery Unnesting on query block SEL$1 (#1)SU: Performing unnesting that 
does not require costing.
SU: Considering subquery unnest on query block SEL$1 (#1).
SU:   Checking validity of unnesting subquery SEL$2 (#2)
SU:     SU bypassed: Failed aggregate validity checks.
SU:     SU bypassed: Failed basic validity checks.
SU:   Validity checks failed.

As far as unnesting is impossible de-coalescing might represent a good strategy in this case. And as far as we can go from the coalesced query to the original one thanks to the containment property, we can therefore open the unnesting path leading to original optimal plan we thought that we are going to improve by coalescing the two subqueries.

I was thinking in this particular case of subquery unnesting impossibility to take advantage of virtual column and hide the aggregation function in the coalesced exists subquery. Unfortunately it is impossible to create a virtual column using an aggregation function as shown below:

alter table t2 add aggr_virt number generated always 
                    as (sum(case when id1 != 83 then 1 else 0 end)) virtual;
alter table t2 add aggr_virt number generated always as (sum(case when id1 != 83 then 1 else 0 end)) virtual
ERROR at line 1:
ORA-00934: group function is not allowed here

The Oracle documentation says the following about creating virtual columns

When you define a virtual column, the defining column_expr must refer only to columns of the subject table that have already been defined, in the current statement or in a prior statement.

Conclusion

Coalescing two subqueries of the same type might drastically reduce the number of logical I/O as far as it can eliminate an entire table access. Coalescing two subqueries of different types might pre-empt the CBO from taking advantage of the unnesting transformation. Fortunately if you know how to coalesce two different subqueries you will know how to de-coalesce them to allow the CBO taking advantage of unnesting the subqueries with their main query blocks.

Keep always an open eye on the concept of conjunction and disjunction with subqueries. The CBO is unable to unnest a subquery when it appears into a disjunction operation leading therefore to several executing of the subquery as a filter operation applied on the result set of the main query block. The CBO is however able to transform a conjunction of two subqueries of the same time into a single conjunction subquery and unnest it with its parent query block.

December 17, 2014

Adaptive plan

Filed under: Oracle — hourim @ 1:12 pm

Have you ever seen such kind of execution plan

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Starts | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |      1 |      5 |00:00:00.04 |       4 |
|   1 |  MAT_VIEW REWRITE ACCESS FULL| T1_T2_MV |      1 |      5 |00:00:00.04 |       4 |
------------------------------------------------------------------------------------------

Note
-----
   - this is an adaptive plan

Or this one:

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last +adaptive'));

-----------------------------------------------------------------------------
| Id  | Operation                    | Name     | Starts | A-Rows | Buffers |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |      1 |      5 |       4 |
|   1 |  MAT_VIEW REWRITE ACCESS FULL| T1_T2_MV |      1 |      5 |       4 |
-----------------------------------------------------------------------------

Note
-----
   - this is an adaptive plan (rows marked '-' are inactive)    

Where are those inactive rows marked ‘-‘ in the plan?

I was writing an article on tuning disjunctive subqueries when I have been prompted to check something with materialized views. Coincidentaly at the time the materialized view question kicks in, I was finishing the model for the disjunctive subquery purpose. Thereofe, I’ve decided to created a materialied view using this model and here what happens.

select /*+ qb_name(parent) */
     id,
     n1
from
     t1 
where
     n1 = 100
and  exists
      (select /*+ qb_name(child) */  
            null 
       from t2 
       where 
            t2.id = t1.id
       and  t2.x1 = 100
       );     
-------------------------------------------------------------------------------------------
|   Id  | Operation                                | Name     | Starts | A-Rows | Buffers |
-------------------------------------------------------------------------------------------
|     0 | SELECT STATEMENT                         |          |      1 |      5 |      13 |
|- *  1 |  HASH JOIN                               |          |      1 |      5 |      13 |
|     2 |   NESTED LOOPS                           |          |      1 |      5 |      13 |
|     3 |    NESTED LOOPS                          |          |      1 |      5 |      11 |
|-    4 |     STATISTICS COLLECTOR                 |          |      1 |      5 |       3 |
|     5 |      SORT UNIQUE                         |          |      1 |      5 |       3 |
|     6 |       TABLE ACCESS BY INDEX ROWID BATCHED| T2       |      1 |      5 |       3 |
|  *  7 |        INDEX RANGE SCAN                  | T2_X1_I1 |      1 |      5 |       2 |
|  *  8 |     INDEX RANGE SCAN                     | T1_ID_I1 |      5 |      5 |       8 |
|  *  9 |    TABLE ACCESS BY INDEX ROWID           | T1       |      5 |      5 |       2 |
|- * 10 |   TABLE ACCESS FULL                      | T1       |      0 |      0 |       0 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."ID"="T1"."ID")
   7 - access("T2"."X1"=100)
   8 - access("T2"."ID"="T1"."ID")
   9 - filter("N1"=100)
  10 - filter("N1"=100)

Note
-----
   - this is an adaptive plan (rows marked '-' are inactive)

The materialized view on the top of the above query resembles to this:

create materialized view t1_t2_mv as
select /*+ qb_name(parent) */
     id,
     n1
from
     t1 
where
     n1 = 100
and  exists
      (select /*+ qb_name(child) */  
            null 
       from t2 
       where 
            t2.id = t1.id
       and  t2.x1 = 100
       );     
Materialized view created.

And I have finished the setup by enabling query rewrite on the materialized view:

alter materialized view t1_t2_mv enable query rewrite;
Materialized view altered.

Finally I re-executed the initial query and get the corresponding execution plan as usual

select /*+ qb_name(parent) */
     id,
     n1
from
     t1 
where
     n1 = 100
and  exists
      (select /*+ qb_name(child) */  
            null 
       from t2 
       where 
            t2.id = t1.id
       and  t2.x1 = 100
       );     
------------------------------------------------------------------------------------------
| Id  | Operation                    | Name     | Starts | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |          |      1 |      5 |00:00:00.04 |       4 |
|   1 |  MAT_VIEW REWRITE ACCESS FULL| T1_T2_MV |      1 |      5 |00:00:00.04 |       4 |
------------------------------------------------------------------------------------------

Note
-----
   - this is an adaptive plan

And the corresponding outline is:

Outline Data:
  /*+
    BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.1.0.1')
      DB_VERSION('12.1.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$35C14E55")
      REWRITE(@"SEL$F743C7BF" "T1_T2_MV")
      OUTLINE(@"SEL$F743C7BF")
      REWRITE(@"PARENT" "T1_T2_MV")
      OUTLINE(@"PARENT")
      FULL(@"SEL$35C14E55" "T1_T2_MV"@"SEL$518C1272")
    END_OUTLINE_DATA
  */	   

Spot in passing how the number of Buffers drops from 13 to 4 when the CBO decided to use the materialized view instead of the initial query.

Bottom line: The Note about adaptive plan you might see when materialized views are used is related to the SQL behind the materialized view and not to the materialized view itself.

December 10, 2014

Bind Aware – Part III

Filed under: Oracle — hourim @ 12:30 pm

After having explained

  • In Part I, how Oracle is monitoring a cursor candidate for a bind aware status via the number of processed rows each bind variable value is generating and how this is linked with the count of the three different bucket_id
  • In Part II, how Oracle is exploiting the tandem (bucket_id, count) to decide when time has come to make a cursor bind aware.Part in which I think I have succeeded to figure out the secret sauce the Adaptive Cursor Sharing Layer code is using to mark a cursor bind aware in two cases which are (a) when only to adjacent bucket_id are concerned (the count of the third bucket_id = 0) and (b) when only to distant bucket_id are concerned (that is the count of bucket_id n°1 = 0).

I left the third case which concerns all bucket_id  (that is the count of all bucket_id != 0) for the current blog article.

I will immediately temper your enthusiasm and say that unfortunately I didn’t succeed to figure out the secret sauce Oracle is using to mark a cursor bind aware in this third case. Nevertheless, this failure didn’t pre-empted me to share with you all the observations and tests I have done hoping that someone will take over the task and come up with a nice conclusion

You will have pointed out that I’ve also made an exploit by reserving two articles on the Adaptive Cursor Sharing without showing a single execution plan! This is because I aimed through this series of articles to show how Oracle is internally managing and monitoring this feature rather than to show its goal which is to produce several optimal execution plans for a SQL statement using bind variables and fulfilling particular pre-requisites.

Finally, you might have also noticed that, despite 3 articles, I still have not exposed nor explain what happens when the cursor is finally made bind aware. In fact once a cursor is made bind aware there is a new piece of internal Oracle code that takes over the monitoring job. This is the Extended Cursor Sharing (ECS) feature which monitors the selectivity (or a range of selectivities) for each child cursor that has been previously made bind aware by ACS internal code. While the ACS feature uses the tandem (bucket_id, count) for its internal secret sauce (half secret from now and on), the ECS feature is based on the v$sql_cs_selectivity view.

Back now to the content of this third and last part of the series which exposes my tests when trying to decipher the ACS working mechanism when all bucket_id are concerned

I have decomposed my tests in 3 categories (all bucket_id have a count != 0)

  • The first case corresponds to a cursor marked bind aware following an increment of the count of bucket_id n°0
  • The second case is when a cursor is marked bind aware following an increment of the count of bucket_id n°1
  • The third category corresponds to a cursor marked bind aware following an increment the count of the bucket_id n°2

Based on those different observations, I was aiming to derive an empirical algorithm that might explain how Oracle is taking its decision to mark a cursor bind aware when all bucket-ids are concerned. Unfortunately, in contrast to the case exposed in Part II of this series, I have been unable to derive that algorithm. Nevertheless, I’ve decided to share with you find all my tests.

1.    First category of tests

As exposed above the first category of my tests concerns a cursor marked bind aware following an execution done using a bind variable that increments the count of bucket_id n°0. For example I have the below situation:

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID  COUNT
------------ ---------- ----------
0             0          5 ---> +1 execution
0             1          8
0             2          3

Looking at the above picture, you can understand that I’ve managed to execute my query in the following execution order

  • bind variable value ‘j10000’ 8 times  (increment count of bucket_id 1 to 8)
  • bind variable value ‘j100’ 5 times    (increment count of bucket_id 0 to 5)
  • bind variable value ‘j>million’ 3 times (increment count of bucket_id 2 to 3)

Of course these executions have been done in a way to avoid two adjacent bucket_id reaching the same count while the remaining bucket_id has a count =0. This is why I managed to have all bucket_id incremented to avoid bind aware situations exposed in Part II. To accomplish this you can for example start by executing your query using the ‘j10000’ bind variable 8 times then jump to the adjacent bucket_id n°0 using bind variable ‘j100’ execute it 5 times then go to the other bind variable value ‘j>million’ and execute it 3 times. Proceeding as such you will arrive exactly at the above situation exposed in v$sql_cs_histogram for the 6f6wzmu3yzm43 sql_id.

At this stage of the experiment, if you execute the same query using the first bind variable value ‘j100’, your will see that Oracle has made your cursor bind aware as shown below (a new child cursor n°1 appears):

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID COUNT
------------ ---------- ----------
1             0          1
1             1          0
1             2          0
0             0          5
0             1          8
0             2          3

2.    Second category of tests

The second category of my tests is the one that corresponds to a cursor becoming bind aware following an execution at bucket_id n° 1

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID COUNT
------------ ---------- ----------
0             0          8
0             1          3 + 1 execution
0             2          2

CHILD_NUMBER  BUCKET_ID COUNT
------------ ---------- ----------
1             0          0
1             1          1
1             2          0
0             0          8
0             1          3
0             2          2

The same precaution have been taken so that bind aware reasons exposed in Part II of the series are avoided before having the count of all bucket_id incremented

3.    Third category of tests

Finally the third category corresponds to a cursor becoming bind aware following an execution at bucket_id n° 2 as shown below:

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID  COUNT
------------ ---------- ----------
0             0          8
0             1          3
0             2          2  ----> + 1 execution

CHILD_NUMBER  BUCKET_ID  COUNT
------------ ---------- ----------
1             0          0
1             1          0
1             2          1
0             0          8
0             1          3
0             2          2

Looking at the above situation there is no apparent clue indicating what secret sauce ACS is using to mark a cursor bind aware when all bucket_id are concerned.

I have attached this tests with 3 bucket document where you can find several similar cases I have done and that have not put me in the right direction as well.

I hope that these 3 articles help someone.

November 12, 2014

Dynamic sampling: unlearning

Filed under: Statistics — hourim @ 3:45 pm

Here is an execution plan that served as a path for the Oracle SQL engine to execute a simple two table join

 SQL> SELECT
          count(t1.small_vc)
         ,count(t2.padding)
    FROM
          t1, t2
    WHERE
          t1.id1 = t2.id
        ;

SQL> start xstat

SQL_ID  b5yc4wnt59bxn, child number 0
-------------------------------------
--------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |      1 |
|   1 |  SORT AGGREGATE               |       |      1 |      1 |      1 |
|   2 |   NESTED LOOPS                |       |      1 |        |     10 |
|   3 |    NESTED LOOPS               |       |      1 |     10 |     10 |
|   4 |     TABLE ACCESS FULL         | T2    |      1 |     10 |     10 |
|*  5 |     INDEX UNIQUE SCAN         | T1_PK |     10 |      1 |     10 |
|   6 |    TABLE ACCESS BY INDEX ROWID| T1    |     10 |      1 |     10 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(&quot;T1&quot;.&quot;ID1&quot;=&quot;T2&quot;.&quot;ID&quot;)

Note
-----
   - dynamic sampling used for this statement (level=2)

What does this last Note about dynamic sampling imply?

That involved tables have stale statistics?
No it doesn’t imply such a conclusion

That involved tables have no statistics at all?
No it doesn’t exactly imply such a conclusion

The above Note indicates that at least one (particularly for the default level 2) of the involved tables in the above query has no statistics at all. It doesn’t mean as well that all tables in the join query have no statistics. It means that at least one of the two tables has been NOT ANALYZED using the 10053 trace file terminology as shown below:


***************************************
BASE STATISTICAL INFORMATION
***************************************
Table Stats::
  Table: T2  Alias: T2  (NOT ANALYZED) --- spot this
    #Rows: 327  #Blks:  4  AvgRowLen:  100.00  ChainCnt:  0.00
  Column (#1): ID(  NO STATISTICS (using defaults)
    AvgLen: 13 NDV: 10 Nulls: 0 Density: 0.097859

Index Stats::
  Index: T2_I_FK  Col#: 1
    LVLS: 0  #LB: 1  #DK: 10  LB/K: 1.00  DB/K: 1.00  CLUF: 1.00
***********************

In fact, what I did not show you is the model I’ve used which is

create table t1
    as select
              rownum                     id1,
              trunc(dbms_random.value(1,1000)) id2,
              lpad(rownum,10,'0')              small_vc,
              rpad('x',1000)                   padding
              from dual
    connect by level &lt;= 1e4;

alter table t1 add constraint t1_pk primary key (id1);

create index t1_i1 on t1(id2);

create table t2
   as select *
      from ( select
               rownum                              as id
              ,mod(t1.id1,5) + mod(rownum,10)* 10  as id1
              ,lpad(rownum,10,'0')                 as small_vc
              ,rpad('x',70)                        as padding
            from t1
            where rownum &lt;= 10
           )
            order by id1;

alter table t2 add constraint t2_fk foreign key(id) references t1(id1);

create index t2_i_fk on t2(id);

And most importantly the statistics I have collected for table t1 only

 BEGIN
  dbms_stats.gather_table_stats(USER
                           ,'t1'
                           ,method_opt =&gt; 'FOR ALL COLUMNS SIZE 1'
                           ,estimate_percent =&gt; DBMS_STATS.AUTO_SAMPLE_SIZE
                           ,CASCADE =&gt; true);  
END;                          
/

The CBO 10053 trace file echoes for table t1, in contrast to t2 table on which I haven’t collected statistics, the following lines which indicate that the CBO has used table t1 corresponding dictionary statistics during the execution plan compilation time

***********************
Table Stats::
  Table: T1  Alias: T1
    #Rows: 10000  #Blks:  1460  AvgRowLen:  1020.00  ChainCnt:  0.00
  Column (#1): ID1(
    AvgLen: 4 NDV: 10000 Nulls: 0 Density: 0.000100 Min: 1 Max: 10000
Index Stats::
  Index: T1_I1  Col#: 2
    LVLS: 1  #LB: 21  #DK: 999  LB/K: 1.00  DB/K: 9.00  CLUF: 9968.00
  Index: T1_PK  Col#: 1
    LVLS: 1  #LB: 20  #DK: 10000  LB/K: 1.00  DB/K: 1.00  CLUF: 1429.00
Access path analysis for T1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
  Table: T1  Alias: T1
    Card: Original: 10000.000000  Rounded: 10000  Computed: 10000.00  Non Adjusted: 10000.00
  Access Path: TableScan
    Cost:  447.51  Resp: 447.51  Degree: 0
      Cost_io: 439.00  Cost_cpu: 12297302
      Resp_io: 439.00  Resp_cpu: 12297302
  Best:: AccessPath: TableScan
         Cost: 447.51  Degree: 1  Resp: 447.51  Card: 10000.00  Bytes: 0

Bottom line: when you see a Note about dynamic sampling at level 2 in an execution plan then bear in mind that this Note doesn’t imply that all tables involved in the underlying query have no statistics. It indicates that there is at least one table without statistics.

I’ve also attached to this blog the corresponding documentation about dynamic (statistics) sampling

November 8, 2014

Bind Aware – Part II

Filed under: cursor sharing — hourim @ 1:49 pm

In part I we saw how the count of the bucket_id in the v$cs_histogram view is incremented according to the number of rows their corresponding child cursor execution has processed depending on the bind variable value. We saw that this count-bucket_id relationship follows the following algorithm

  • If the number of processed rows < 1,000 then increment count of bucket_id n°0
  • If the number of processed rows < 1,000,000 then increment count of bucket_id n°1
  • If the number of processed rows > 1,000,000 then increment count of bucket_id n°2

In part II we will see how the decision to mark a cursor bind-aware is made. I have found three situations in which the CBO is triggered to compile a new execution plan:

    • increment the count of two adjacent bucket_id (0-1 or 1-2)
    • increment the count of the two distant bucket_id (0-2)
    • increment the count of the three bucket_id (0-1-2)

1. Two neighbour bucket_ids

Let’s start with the first situation. For this I will be using ‘j100’ and ‘j10000’ bind variable values as far as these two values increment two adjacent bucket_id which are 0 and 1. I will be using the same table and the same query as in part I.

SQL> exec :lvc := 'j100'

SQL> select count(1) from t_acs where vc2 = :lvc;

SQL> @cntbukt 6f6wzmu3yzm43 -- see script at the end

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          1
           0          1          0
           0          2          0

Next, I will execute the same query 6 extra times so that I will get the following cursor monitoring picture:

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          7
           0          1          0
           0          2          0

7 executions using the same bind variable value (‘j100’) which has incremented the count of bucket_id n° 0 as expected.
Now, I will change the bind variable value so that it is the bucket_id n°1 that will be incremented and execute the same query 7 times

SQL> exec :lvc := 'j10000'

SQL>select count(1) from t_acs where vc2 = :lvc; -- repeat this 7 times

SQL> @cntbukt 6f6wzmu3yzm43 

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          7
           0          1          7
           0          2          0

After 7 executions using the new bind variable we are still sharing the same child cursor n°0. However,Oracle has, as expected too, incremented the bucket_id n°1 7 times. Let’s add an extra run of the same query and see what happens:

SQL> select count(1) from t_acs where vc2 = :lvc;

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           1          0          0
           1          1          1
           1          2          0
           0          0          7 -- equals its neighbour
           0          1          7 -- equals its neighbour
           0          2          0

Finally after at the 8th execution Oracle has decided that it is now time to compile a new plan (new child cursor n°1). This allows me to write the first below observation:

For the same child cursor, when the count of a bucket_id reaches the count of its neighbour bucket_id, the next execution will mark the original cursor as bind aware and a new child cursor is compiled.

This conclusion is of course valid only before the apparition of the first bind aware cursor in the cursor cache.

2. Two distant bucket_ids

In order to investigate the second case I will start by flushing the shared pool and repeat the same experiment with two distant (bucket_id) bind variables, ‘j100’ and ‘j>million’ starting by 6 cursor executions using the first bind variable.

SQL> alter system flush shared_pool;
SQL> exec :lvc := 'j100';
SQL> select count(1) from t_acs where vc2 = :lvc;

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          6
           0          1          0
           0          2          0

As expected, the  count of bucket_id number 0 has been incremented 6 times. Now, I will execute the same query two times using the ‘j>million’ bind variable which favours the non-adjacent bucket_id n°2

SQL> exec :lvc := 'j>million'
SQL> select count(1) from t_acs where vc2 = :lvc;

SQL> @cntbukt 6f6wzmu3yzm43
CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          6
           0          1          0
           0          2          2

As expected too, the count of bucket_id number 2 has been incremented 2 times. But I am still sharing the same child cursor n°0. Let’s try an extra execution with the same bind variable

SQL> select count(1) from t_acs where vc2 = :lvc;

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           1          0          0
           1          1          0
           1          2          1
           0          0          6
           0          1          0
           0          2          2

And finally a new child cursor n°1 has been created indicating that the cursor has been marked bind aware.

At this step I can make the second conclusion:

For the same child cursor, when the count of a bucket_id is greater or equal to  (trunc(count/2) – 1) of its distant non-adjacent bucket_id, the next execution will mark the original cursor as bind aware and a new child cursor is compiled.

We have 6 executions at bucket_id 0 and 2 executions at bucket_id n° 2 of the same child cursor n°0. We reaches the situation where 2>= (trunc(6/2)-1). This has the tendency to indicate that the the next execution (the 3rd one) will compile a new execution plan (child cursor n°1)

What happens when the number of executions is odd?

By flushing the shared pool and setting again the bind variable to ‘j100’ and executing the same query 11 times I obtained the following picture:

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0         11
           0          1          0
           0          2          0

Then by setting the bind variable value to ‘j>million’ and executing the same query 4 times I arrived to the following picture:

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0         11
           0          1          0
           0          2          4

The above pictures indicates that  4 >= trunc(11/2) – 1 = 4. This means that the next execution(5th one)  will compile a new plan:

SQL> select count(1) from t_acs where vc2 = :lvc;

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           1          0          0
           1          1          0
           1          2          1
           0          0         11
           0          1          0
           0          2          4

The two observations seems to be valid in both directions.i.e. the initial compiled cursor can belong to the bucket_id n°0 or 1 or 2. I have experimented the above tests starting by bind variable ‘j>million’ (or ‘j10000′) and have came up with the same conclusions.

In my incessant desire to make my article short and simple, I’ve decided to treat the last case (mixture bucket_ids) in Part III

PS:  cntbukt script

select
        child_number
        ,bucket_id
        ,count
    from
         v$sql_cs_histogram
    where sql_id = '&amp;1' ;

Update 11/11/2014

I did today an extra bunch of test with two distant bucket id and have updated the corresponding conclusion. When the number of cursor executions at bucket_id 2 reaches ceil(number of execution at bucket_id 0)/3) then the next execution at bucket_id 2 will mark the cursor as bind aware and compile a new execution plan

I have also attached here a Bind Aware test file on which I gathered my tests related to two distant bucket_ids (0 and 2)

November 6, 2014

Bind Aware – Part I

Filed under: cursor sharing — hourim @ 4:14 pm

If you don’t want to use the /*+ bind-aware */ hint to make your cursor immediately bind aware, you may find that your initial cursor needs a certain number of executions to reach the bind awareness status. How many of such initial executions your cursor needs in its warming up period? And how this number of executions is monitored? Those are the two questions I aim to answer via a two parts articles

In my answering path I will start by creating the following t_cs table with a special data pattern

create table t_acs(n1  number, vc2  varchar2(10));

BEGIN
     for j in 1..1200150 loop
      if j = 1 then
       insert into t_acs values (j, 'j1');
      elsif j>1 and j<=101 then
       insert into t_acs values(j, 'j100');
      elsif j>101 and j<=1101 then
       insert into t_acs values (j, 'j1000');
      elsif j>10001 and j<= 110001 then
      insert into t_acs values(j,'j10000');
     else
      insert into t_acs values(j, 'j>million');
     end if;
    end loop;
   commit;
END;
/

create index t_acs_i1 on t_acs(vc2);

select vc2, count(1) from t_acs group by vc2 order by 2;

VC2          COUNT(1)
---------- ----------
j1                  1
j100              100
j1000            1000
j10000         100000
j>million     1099049

As you can notice above, the values of the vc2 column indicate the number of their occurrence in the t_cs table:

	‘j1’        means   select count(1) from t_acs where vc2 = ‘j1’         generates  1 rows
	‘j100’      means   select count(1) from t_acs where vc2 = ‘j100’       generates 100 rows
	‘j1000’     means   select count(1) from t_acs where vc2 = ‘j1000’      generates 1000 rows
	‘j10000’    means   select count(1) from t_acs where vc2 = ‘j10000’     generates 10000 rows
	‘j>million’ means   select count(1) from t_acs where vc2 = ‘j>million’  generates more than a million of rows

I will explain later why I built such a data skew set and such a naming of the vc2 column values.
There are three views that are used to monitor a cursor candidate to the adaptive cursor sharing feature:

SQL> desc v$sql_cs_statistics

           Name                            Null?    Type
           ------------------------------- -------- --------------------
    1      ADDRESS                                  RAW(8)
    2      HASH_VALUE                               NUMBER
    3      SQL_ID                                   VARCHAR2(13)
    4      CHILD_NUMBER                             NUMBER        --> emphasize this
    5      BIND_SET_HASH_VALUE                      NUMBER
    6      PEEKED                                   VARCHAR2(1)
    7      EXECUTIONS                               NUMBER
    8      ROWS_PROCESSED                           NUMBER        --> emphasize this
    9      BUFFER_GETS                              NUMBER
   10      CPU_TIME                                 NUMBER

SQL> desc v$sql_cs_histogram

           Name                            Null?    Type
           ------------------------------- -------- --------------------
    1      ADDRESS                                  RAW(8)
    2      HASH_VALUE                               NUMBER
    3      SQL_ID                                   VARCHAR2(13)
    4      CHILD_NUMBER                             NUMBER        --> emphasize this
    5      BUCKET_ID                                NUMBER        --> emphasize this
    6      COUNT                                    NUMBER        --> emphasize this

SQL> desc v$sql_cs_selectivity

           Name                            Null?    Type
           ------------------------------- -------- --------------------
    1      ADDRESS                                  RAW(8)
    2      HASH_VALUE                               NUMBER
    3      SQL_ID                                   VARCHAR2(13)
    4      CHILD_NUMBER                             NUMBER
    5      PREDICATE                                VARCHAR2(40)
    6      RANGE_ID                                 NUMBER
    7      LOW                                      VARCHAR2(10)
    8      HIGH                                     VARCHAR2(10)

The third view starts to be useful only when the cursor becomes bind aware. At this step of the article, let’s concentrate only on the rows_processed column of the first view and on the couple (bucket_id, count) columns of the second one. They go hand-in-hand in the monitoring process of the same child cursor

Let’s investigate how these two views interact with each other. I will be using, all over this article, the below query which is:

SQL> select count(1) from t_acs where vc2 = :lvc;

Where :lvc is a bind variable which I will declare and set (the first time) as follows:

SQL> var lvc varchar2(10);
SQL> exec :lvc := 'j100';

To be bind aware, a cursor requires from the column used in its predicate part to possess a histogram:

BEGIN
       dbms_stats.gather_table_stats
                   (user
                   ,'t_acs'
                   ,method_opt       => 'for all columns size skewonly'
                   ,estimate_percent => dbms_stats.auto_sample_size
                   ,cascade          => true
                   ,no_invalidate    => false
                   );
END;
/

I used ‘’size skewonly’’ for the method_opt parameter to collect histogram on the vc2 column. There was no previous query using the vc2 column in a where clause so that using the ‘’size auto’’ parameter in the above dbms_stats package call would have been more appropriate.

As mentioned above ‘j100’ means that my initial cursor will be processing 100 rows. This is what we can see in the two monitoring views when we execute the above query:

SQL> select
         child_number
         ,executions
         ,rows_processed
    from v$sql_cs_statistics
    where sql_id = '6f6wzmu3yzm43' ;

SQL> select
        child_number
        ,bucket_id
        ,count
    from
         v$sql_cs_histogram
    where  sql_id = '6f6wzmu3yzm43' ;

CHILD_NUMBER EXECUTIONS ROWS_PROCESSED
------------ ---------- --------------
           0          1            101

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          1 --> incremented
           0          1          0
           0          2          0

The child cursor n° 0 processed 101 rows and Oracle incremented the count of its bucket_id n° 0.

What happens now if I execute the same query using a different bind variable value (‘j10000’)?

SQL> exec :lvc := 'j10000';

CHILD_NUMBER EXECUTIONS ROWS_PROCESSED
------------ ---------- --------------
           0          1         101

I know that ‘j10000’ bind variable produces 10,000 rows; so why my shared cursor n°0 is showing only 101 rows?

This is not completely true. We will see below, that in contrast to v$sql_cs_histogram view, the rows_processed column of the v$sql_cs_statistics view is updated only when a new execution plan is compiled. This means that, the rows_processed column is not updated when a child cursor is shared. At this step we are still sharing the child cursor n°0 and this is why the number of processed rows of the last second (last) execution is not corect.

Hopefully, we have the v$sql_cs_histogram:

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          1
           0          1          1 --> incremented
           0          2          0

Oracle is showing, via this view, that we are still sharing the same child cursor n° 0, but, in contrast to the first execution where it was the bucket_id n° 0 that has seen its count incremented to 1, for the second query execution, it is actually the bucket_id n°1 that has been incremented.

What does this mean?

Oracle is telling us that the second query execution has actually processed more than the indicated 101 rows. We will see at the end of this article the proof of that claim.

Now, let’s use a new bind variable value (‘j>million’) and re-execute the same query?

SQL> exec :lvc := 'j>million';

CHILD_NUMBER EXECUTIONS ROWS_PROCESSED
----------- ---------- --------------
           1          1        1099050
           0          1         101

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           1          0          0
           1          1          0
           1          2          1 --> incremented
           0          0          1
           0          1          1
           0          2          0

Notice this time that Oracle produces a new child cursor n°1, which is a sign that our cursor is now bind aware. It has processed 1,099,050 rows and it is the bucket_id number 2 of this new compiled child cursor n° 1 that has seen its count incremented to 1.

In order to show you that the bind variable ‘j10000’ has effectively generated more than 10,000 rows, let’s re-execute the initial query using this bind variable and observe what the monitoring views will show us:

SQL> exec :lvc := 'j10000';

CHILD_NUMBER EXECUTIONS ROWS_PROCESSED
------------ ---------- --------------
           2          1         100001
           1          1        1099050
           0          1            101

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           2          0          0
           2          1          1 --> incremented
           2          2          0
           1          0          0
           1          1          0
           1          2          1
           0          0          1
           0          1          1
           0          2          0

A new child cursor n°2 has been compiled for the ‘j10000’ bind variable value and this is why we see the correct number of rows_processed (10,001) in the v$sql_cs_statistics view. Note also that for this child cursor n°2, it is the bucket_id n° 1 that has been incremented to 1

Keep away, for this moment, the fact that cursor child n°1 and 2 are bind aware and that cursor child n°0 became obsolete (not shareable anymore) as shown below:

select  sql_id
       ,child_number
       ,is_bind_aware
       ,is_bind_sensitive
       ,is_shareable
       ,executions
from   v$sql
where  sql_id ='6f6wzmu3yzm43';

SQL_ID        CHILD_NUMBER I I I EXECUTIONS
------------- ------------ - - - ----------
6f6wzmu3yzm43            0 N Y N          2
6f6wzmu3yzm43            1 Y Y Y          1
6f6wzmu3yzm43            2 Y Y Y          1

And notice that, with what preceds, I hope that I have demonstrated the relationship between the child cursor, the number of rows it has processed and the corresponding incremented bucket_id. This relationship obeys to the following algorithm:

             0   < ROWS_PROCESSED <= 1000  --> COUNT of BUCKET_ID  0 will be incremented
	       1000 < ROWS_PROCESSED <= 1e6   --> COUNT of BUCKET_ID  1 will be incremented
	              ROWS_PROCESSED > 1e6   --> COUNT of BUCKET_ID  2  will be incremented

In part II of this series we will see how the couple (bucket_id, count) influences the bind aware status of the initial child cursor. This couple of column ceases to play their initial role when the cursor becomes bind aware; in which case the third v$view (v$sql_cs_selectivity) enters the bind awareness scene and starts using the selectivity of the bind variable to decide whether to share an existing child cursor or optimize a new one.

Next Page »

The Rubric Theme. Blog at WordPress.com.

Oracle Diagnostician

Performance troubleshooting as exact science

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Database 11g

Oracle Database

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

EU Careers info

Your career in the European Union

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

Tanel Poder's Performance & Troubleshooting blog

Linux, Oracle, Exadata and Hadoop.

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)

Follow

Get every new post delivered to your Inbox.

Join 118 other followers