November | 2014 | Mohamed Houri’s Oracle Notes

November 12, 2014

Dynamic sampling: unlearning

Filed under: Statistics — hourim @ 3:45 pm

Here is an execution plan that served as a path for the Oracle SQL engine to execute a simple two table join

 SQL> SELECT
          count(t1.small_vc)
         ,count(t2.padding)
    FROM
          t1, t2
    WHERE
          t1.id1 = t2.id
        ;

SQL> start xstat

SQL_ID  b5yc4wnt59bxn, child number 0
-------------------------------------
--------------------------------------------------------------------------
| Id  | Operation                     | Name  | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |       |      1 |        |      1 |
|   1 |  SORT AGGREGATE               |       |      1 |      1 |      1 |
|   2 |   NESTED LOOPS                |       |      1 |        |     10 |
|   3 |    NESTED LOOPS               |       |      1 |     10 |     10 |
|   4 |     TABLE ACCESS FULL         | T2    |      1 |     10 |     10 |
|*  5 |     INDEX UNIQUE SCAN         | T1_PK |     10 |      1 |     10 |
|   6 |    TABLE ACCESS BY INDEX ROWID| T1    |     10 |      1 |     10 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access(&quot;T1&quot;.&quot;ID1&quot;=&quot;T2&quot;.&quot;ID&quot;)

Note
-----
   - dynamic sampling used for this statement (level=2)

What does this last Note about dynamic sampling imply?

That involved tables have stale statistics?
No it doesn’t imply such a conclusion

That involved tables have no statistics at all?
No it doesn’t exactly imply such a conclusion

The above Note indicates that at least one (particularly for the default level 2) of the involved tables in the above query has no statistics at all. It doesn’t mean as well that all tables in the join query have no statistics. It means that at least one of the two tables has been NOT ANALYZED using the 10053 trace file terminology as shown below:


***************************************
BASE STATISTICAL INFORMATION
***************************************
Table Stats::
  Table: T2  Alias: T2  (NOT ANALYZED) --- spot this
    #Rows: 327  #Blks:  4  AvgRowLen:  100.00  ChainCnt:  0.00
  Column (#1): ID(  NO STATISTICS (using defaults)
    AvgLen: 13 NDV: 10 Nulls: 0 Density: 0.097859

Index Stats::
  Index: T2_I_FK  Col#: 1
    LVLS: 0  #LB: 1  #DK: 10  LB/K: 1.00  DB/K: 1.00  CLUF: 1.00
***********************

In fact, what I did not show you is the model I’ve used which is

create table t1
    as select
              rownum                     id1,
              trunc(dbms_random.value(1,1000)) id2,
              lpad(rownum,10,'0')              small_vc,
              rpad('x',1000)                   padding
              from dual
    connect by level &lt;= 1e4;

alter table t1 add constraint t1_pk primary key (id1);

create index t1_i1 on t1(id2);

create table t2
   as select *
      from ( select
               rownum                              as id
              ,mod(t1.id1,5) + mod(rownum,10)* 10  as id1
              ,lpad(rownum,10,'0')                 as small_vc
              ,rpad('x',70)                        as padding
            from t1
            where rownum &lt;= 10
           )
            order by id1;

alter table t2 add constraint t2_fk foreign key(id) references t1(id1);

create index t2_i_fk on t2(id);

And most importantly the statistics I have collected for table t1 only

 BEGIN
  dbms_stats.gather_table_stats(USER
                           ,'t1'
                           ,method_opt =&gt; 'FOR ALL COLUMNS SIZE 1'
                           ,estimate_percent =&gt; DBMS_STATS.AUTO_SAMPLE_SIZE
                           ,CASCADE =&gt; true);  
END;                          
/

The CBO 10053 trace file echoes for table t1, in contrast to t2 table on which I haven’t collected statistics, the following lines which indicate that the CBO has used table t1 corresponding dictionary statistics during the execution plan compilation time

***********************
Table Stats::
  Table: T1  Alias: T1
    #Rows: 10000  #Blks:  1460  AvgRowLen:  1020.00  ChainCnt:  0.00
  Column (#1): ID1(
    AvgLen: 4 NDV: 10000 Nulls: 0 Density: 0.000100 Min: 1 Max: 10000
Index Stats::
  Index: T1_I1  Col#: 2
    LVLS: 1  #LB: 21  #DK: 999  LB/K: 1.00  DB/K: 9.00  CLUF: 9968.00
  Index: T1_PK  Col#: 1
    LVLS: 1  #LB: 20  #DK: 10000  LB/K: 1.00  DB/K: 1.00  CLUF: 1429.00
Access path analysis for T1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
  Table: T1  Alias: T1
    Card: Original: 10000.000000  Rounded: 10000  Computed: 10000.00  Non Adjusted: 10000.00
  Access Path: TableScan
    Cost:  447.51  Resp: 447.51  Degree: 0
      Cost_io: 439.00  Cost_cpu: 12297302
      Resp_io: 439.00  Resp_cpu: 12297302
  Best:: AccessPath: TableScan
         Cost: 447.51  Degree: 1  Resp: 447.51  Card: 10000.00  Bytes: 0

Bottom line: when you see a Note about dynamic sampling at level 2 in an execution plan then bear in mind that this Note doesn’t imply that all tables involved in the underlying query have no statistics. It indicates that there is at least one table without statistics.

I’ve also attached to this blog the corresponding documentation about dynamic (statistics) sampling

Comments (4)

November 8, 2014

Bind Aware – Part II

Filed under: cursor sharing — hourim @ 1:49 pm

In part I we saw how the count of the bucket_id in the v$cs_histogram view is incremented according to the number of rows their corresponding child cursor execution has processed depending on the bind variable value. We saw that this count-bucket_id relationship follows the following algorithm

If the number of processed rows < 1,000 then increment count of bucket_id n°0
If the number of processed rows < 1,000,000 then increment count of bucket_id n°1
If the number of processed rows > 1,000,000 then increment count of bucket_id n°2

In part II we will see how the decision to mark a cursor bind-aware is made. I have found three situations in which the CBO is triggered to compile a new execution plan:

increment the count of two adjacent bucket_id (0-1 or 1-2)
increment the count of the two distant bucket_id (0-2)
increment the count of the three bucket_id (0-1-2)

1. Two neighbour bucket_ids

Let’s start with the first situation. For this I will be using ‘j100’ and ‘j10000’ bind variable values as far as these two values increment two adjacent bucket_id which are 0 and 1. I will be using the same table and the same query as in part I.

SQL> exec :lvc := 'j100'

SQL> select count(1) from t_acs where vc2 = :lvc;

SQL> @cntbukt 6f6wzmu3yzm43 -- see script at the end

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          1
           0          1          0
           0          2          0

Next, I will execute the same query 6 extra times so that I will get the following cursor monitoring picture:

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          7
           0          1          0
           0          2          0

7 executions using the same bind variable value (‘j100’) which has incremented the count of bucket_id n° 0 as expected.
Now, I will change the bind variable value so that it is the bucket_id n°1 that will be incremented and execute the same query 7 times

SQL> exec :lvc := 'j10000'

SQL>select count(1) from t_acs where vc2 = :lvc; -- repeat this 7 times

SQL> @cntbukt 6f6wzmu3yzm43 

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          7
           0          1          7
           0          2          0

After 7 executions using the new bind variable we are still sharing the same child cursor n°0. However,Oracle has, as expected too, incremented the bucket_id n°1 7 times. Let’s add an extra run of the same query and see what happens:

SQL> select count(1) from t_acs where vc2 = :lvc;

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           1          0          0
           1          1          1
           1          2          0
           0          0          7 -- equals its neighbour
           0          1          7 -- equals its neighbour
           0          2          0

Finally after at the 8th execution Oracle has decided that it is now time to compile a new plan (new child cursor n°1). This allows me to write the first below observation:

For the same child cursor, when the count of a bucket_id reaches the count of its neighbour bucket_id, the next execution will mark the original cursor as bind aware and a new child cursor is compiled.

This conclusion is of course valid only before the apparition of the first bind aware cursor in the cursor cache.

2. Two distant bucket_ids

In order to investigate the second case I will start by flushing the shared pool and repeat the same experiment with two distant (bucket_id) bind variables, ‘j100’ and ‘j>million’ starting by 6 cursor executions using the first bind variable.

SQL> alter system flush shared_pool;
SQL> exec :lvc := 'j100';
SQL> select count(1) from t_acs where vc2 = :lvc;

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          6
           0          1          0
           0          2          0

As expected, the count of bucket_id number 0 has been incremented 6 times. Now, I will execute the same query two times using the ‘j>million’ bind variable which favours the non-adjacent bucket_id n°2

SQL> exec :lvc := 'j>million'
SQL> select count(1) from t_acs where vc2 = :lvc;

SQL> @cntbukt 6f6wzmu3yzm43
CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          6
           0          1          0
           0          2          2

As expected too, the count of bucket_id number 2 has been incremented 2 times. But I am still sharing the same child cursor n°0. Let’s try an extra execution with the same bind variable

SQL> select count(1) from t_acs where vc2 = :lvc;

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           1          0          0
           1          1          0
           1          2          1
           0          0          6
           0          1          0
           0          2          2

And finally a new child cursor n°1 has been created indicating that the cursor has been marked bind aware.

At this step I can make the second conclusion:

For the same child cursor, when the count of a bucket_id is greater or equal to (trunc(count/2) – 1) of its distant non-adjacent bucket_id, the next execution will mark the original cursor as bind aware and a new child cursor is compiled.

We have 6 executions at bucket_id 0 and 2 executions at bucket_id n° 2 of the same child cursor n°0. We reaches the situation where 2>= (trunc(6/2)-1). This has the tendency to indicate that the the next execution (the 3rd one) will compile a new execution plan (child cursor n°1)

What happens when the number of executions is odd?

By flushing the shared pool and setting again the bind variable to ‘j100’ and executing the same query 11 times I obtained the following picture:

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0         11
           0          1          0
           0          2          0

Then by setting the bind variable value to ‘j>million’ and executing the same query 4 times I arrived to the following picture:

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0         11
           0          1          0
           0          2          4

The above pictures indicates that 4 >= trunc(11/2) – 1 = 4. This means that the next execution(5th one) will compile a new plan:

SQL> select count(1) from t_acs where vc2 = :lvc;

SQL> @cntbukt 6f6wzmu3yzm43

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           1          0          0
           1          1          0
           1          2          1
           0          0         11
           0          1          0
           0          2          4

The two observations seems to be valid in both directions.i.e. the initial compiled cursor can belong to the bucket_id n°0 or 1 or 2. I have experimented the above tests starting by bind variable ‘j>million’ (or ‘j10000’) and have came up with the same conclusions.

In my incessant desire to make my article short and simple, I’ve decided to treat the last case (mixture bucket_ids) in Part III

PS: cntbukt script

select
        child_number
        ,bucket_id
        ,count
    from
         v$sql_cs_histogram
    where sql_id = '&amp;1' ;

Update 11/11/2014

I did today an extra bunch of test with two distant bucket id and have updated the corresponding conclusion. When the number of cursor executions at bucket_id 2 reaches ceil(number of execution at bucket_id 0)/3) then the next execution at bucket_id 2 will mark the cursor as bind aware and compile a new execution plan

I have also attached here a Bind Aware test file on which I gathered my tests related to two distant bucket_ids (0 and 2)

Comments (3)

November 6, 2014

Bind Aware – Part I

Filed under: cursor sharing — hourim @ 4:14 pm

If you don’t want to use the /*+ bind-aware */ hint to make your cursor immediately bind aware, you may find that your initial cursor needs a certain number of executions to reach the bind awareness status. How many of such initial executions your cursor needs in its warming up period? And how this number of executions is monitored? Those are the two questions I aim to answer via a two parts articles

In my answering path I will start by creating the following t_cs table with a special data pattern

create table t_acs(n1  number, vc2  varchar2(10));

BEGIN
     for j in 1..1200150 loop
      if j = 1 then
       insert into t_acs values (j, 'j1');
      elsif j>1 and j<=101 then
       insert into t_acs values(j, 'j100');
      elsif j>101 and j<=1101 then
       insert into t_acs values (j, 'j1000');
      elsif j>10001 and j<= 110001 then
      insert into t_acs values(j,'j10000');
     else
      insert into t_acs values(j, 'j>million');
     end if;
    end loop;
   commit;
END;
/

create index t_acs_i1 on t_acs(vc2);

select vc2, count(1) from t_acs group by vc2 order by 2;

VC2          COUNT(1)
---------- ----------
j1                  1
j100              100
j1000            1000
j10000         100000
j>million     1099049

As you can notice above, the values of the vc2 column indicate the number of their occurrence in the t_cs table:

	‘j1’        means   select count(1) from t_acs where vc2 = ‘j1’         generates  1 rows
	‘j100’      means   select count(1) from t_acs where vc2 = ‘j100’       generates 100 rows
	‘j1000’     means   select count(1) from t_acs where vc2 = ‘j1000’      generates 1000 rows
	‘j10000’    means   select count(1) from t_acs where vc2 = ‘j10000’     generates 10000 rows
	‘j>million’ means   select count(1) from t_acs where vc2 = ‘j>million’  generates more than a million of rows

I will explain later why I built such a data skew set and such a naming of the vc2 column values.
There are three views that are used to monitor a cursor candidate to the adaptive cursor sharing feature:

SQL> desc v$sql_cs_statistics

           Name                            Null?    Type
           ------------------------------- -------- --------------------
    1      ADDRESS                                  RAW(8)
    2      HASH_VALUE                               NUMBER
    3      SQL_ID                                   VARCHAR2(13)
    4      CHILD_NUMBER                             NUMBER        --> emphasize this
    5      BIND_SET_HASH_VALUE                      NUMBER
    6      PEEKED                                   VARCHAR2(1)
    7      EXECUTIONS                               NUMBER
    8      ROWS_PROCESSED                           NUMBER        --> emphasize this
    9      BUFFER_GETS                              NUMBER
   10      CPU_TIME                                 NUMBER

SQL> desc v$sql_cs_histogram

           Name                            Null?    Type
           ------------------------------- -------- --------------------
    1      ADDRESS                                  RAW(8)
    2      HASH_VALUE                               NUMBER
    3      SQL_ID                                   VARCHAR2(13)
    4      CHILD_NUMBER                             NUMBER        --> emphasize this
    5      BUCKET_ID                                NUMBER        --> emphasize this
    6      COUNT                                    NUMBER        --> emphasize this

SQL> desc v$sql_cs_selectivity

           Name                            Null?    Type
           ------------------------------- -------- --------------------
    1      ADDRESS                                  RAW(8)
    2      HASH_VALUE                               NUMBER
    3      SQL_ID                                   VARCHAR2(13)
    4      CHILD_NUMBER                             NUMBER
    5      PREDICATE                                VARCHAR2(40)
    6      RANGE_ID                                 NUMBER
    7      LOW                                      VARCHAR2(10)
    8      HIGH                                     VARCHAR2(10)

The third view starts to be useful only when the cursor becomes bind aware. At this step of the article, let’s concentrate only on the rows_processed column of the first view and on the couple (bucket_id, count) columns of the second one. They go hand-in-hand in the monitoring process of the same child cursor

Let’s investigate how these two views interact with each other. I will be using, all over this article, the below query which is:

SQL> select count(1) from t_acs where vc2 = :lvc;

Where :lvc is a bind variable which I will declare and set (the first time) as follows:

SQL> var lvc varchar2(10);
SQL> exec :lvc := 'j100';

To be bind aware, a cursor requires from the column used in its predicate part to possess a histogram:

BEGIN
       dbms_stats.gather_table_stats
                   (user
                   ,'t_acs'
                   ,method_opt       => 'for all columns size skewonly'
                   ,estimate_percent => dbms_stats.auto_sample_size
                   ,cascade          => true
                   ,no_invalidate    => false
                   );
END;
/

I used ‘’size skewonly’’ for the method_opt parameter to collect histogram on the vc2 column. There was no previous query using the vc2 column in a where clause so that using the ‘’size auto’’ parameter in the above dbms_stats package call would have been more appropriate.

As mentioned above ‘j100’ means that my initial cursor will be processing 100 rows. This is what we can see in the two monitoring views when we execute the above query:

SQL> select
         child_number
         ,executions
         ,rows_processed
    from v$sql_cs_statistics
    where sql_id = '6f6wzmu3yzm43' ;

SQL> select
        child_number
        ,bucket_id
        ,count
    from
         v$sql_cs_histogram
    where  sql_id = '6f6wzmu3yzm43' ;

CHILD_NUMBER EXECUTIONS ROWS_PROCESSED
------------ ---------- --------------
           0          1            101

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          1 --> incremented
           0          1          0
           0          2          0

The child cursor n° 0 processed 101 rows and Oracle incremented the count of its bucket_id n° 0.

What happens now if I execute the same query using a different bind variable value (‘j10000’)?

SQL> exec :lvc := 'j10000';

CHILD_NUMBER EXECUTIONS ROWS_PROCESSED
------------ ---------- --------------
           0          1         101

I know that ‘j10000’ bind variable produces 10,000 rows; so why my shared cursor n°0 is showing only 101 rows?

This is not completely true. We will see below, that in contrast to v$sql_cs_histogram view, the rows_processed column of the v$sql_cs_statistics view is updated only when a new execution plan is compiled. This means that, the rows_processed column is not updated when a child cursor is shared. At this step we are still sharing the child cursor n°0 and this is why the number of processed rows of the last second (last) execution is not corect.

Hopefully, we have the v$sql_cs_histogram:

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          1
           0          1          1 --> incremented
           0          2          0

Oracle is showing, via this view, that we are still sharing the same child cursor n° 0, but, in contrast to the first execution where it was the bucket_id n° 0 that has seen its count incremented to 1, for the second query execution, it is actually the bucket_id n°1 that has been incremented.

What does this mean?

Oracle is telling us that the second query execution has actually processed more than the indicated 101 rows. We will see at the end of this article the proof of that claim.

Now, let’s use a new bind variable value (‘j>million’) and re-execute the same query?

SQL> exec :lvc := 'j>million';

CHILD_NUMBER EXECUTIONS ROWS_PROCESSED
----------- ---------- --------------
           1          1        1099050
           0          1         101

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           1          0          0
           1          1          0
           1          2          1 --> incremented
           0          0          1
           0          1          1
           0          2          0

Notice this time that Oracle produces a new child cursor n°1, which is a sign that our cursor is now bind aware. It has processed 1,099,050 rows and it is the bucket_id number 2 of this new compiled child cursor n° 1 that has seen its count incremented to 1.

In order to show you that the bind variable ‘j10000’ has effectively generated more than 10,000 rows, let’s re-execute the initial query using this bind variable and observe what the monitoring views will show us:

SQL> exec :lvc := 'j10000';

CHILD_NUMBER EXECUTIONS ROWS_PROCESSED
------------ ---------- --------------
           2          1         100001
           1          1        1099050
           0          1            101

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           2          0          0
           2          1          1 --> incremented
           2          2          0
           1          0          0
           1          1          0
           1          2          1
           0          0          1
           0          1          1
           0          2          0

A new child cursor n°2 has been compiled for the ‘j10000’ bind variable value and this is why we see the correct number of rows_processed (10,001) in the v$sql_cs_statistics view. Note also that for this child cursor n°2, it is the bucket_id n° 1 that has been incremented to 1

Keep away, for this moment, the fact that cursor child n°1 and 2 are bind aware and that cursor child n°0 became obsolete (not shareable anymore) as shown below:

select  sql_id
       ,child_number
       ,is_bind_aware
       ,is_bind_sensitive
       ,is_shareable
       ,executions
from   v$sql
where  sql_id ='6f6wzmu3yzm43';

SQL_ID        CHILD_NUMBER I I I EXECUTIONS
------------- ------------ - - - ----------
6f6wzmu3yzm43            0 N Y N          2
6f6wzmu3yzm43            1 Y Y Y          1
6f6wzmu3yzm43            2 Y Y Y          1

And notice that, with what preceds, I hope that I have demonstrated the relationship between the child cursor, the number of rows it has processed and the corresponding incremented bucket_id. This relationship obeys to the following algorithm:

             0   < ROWS_PROCESSED <= 1000  --> COUNT of BUCKET_ID  0 will be incremented
	       1000 < ROWS_PROCESSED <= 1e6   --> COUNT of BUCKET_ID  1 will be incremented
	              ROWS_PROCESSED > 1e6   --> COUNT of BUCKET_ID  2  will be incremented

In part II of this series we will see how the couple (bucket_id, count) influences the bind aware status of the initial child cursor. This couple of column ceases to play their initial role when the cursor becomes bind aware; in which case the third v$view (v$sql_cs_selectivity) enters the bind awareness scene and starts using the selectivity of the bind variable to decide whether to share an existing child cursor or optimize a new one.

Comments (10)

Mohamed Houri’s Oracle Notes

November 12, 2014

Dynamic sampling: unlearning

November 8, 2014

November 6, 2014

Follow Blog via Email

Categories

Blog Stats

Blogs I Follow

Archives