Mohamed Houri’s Oracle Notes

January 20, 2016

Natural and Adjusted Hybrid Histogram

Filed under: Oracle,Statistics — hourim @ 7:16 pm

I was going to write a comment in this Jonathan Lewis article and have finally decided to write a blog article because it turned to be a long comment.  In the above mentioned article a reader was wondering why the bucket size of his modified data set is not obeying the minimum bucket size explained by Jonathan. Coincidentally I am writing a second article on Hybrid histogram for allthingsOracle where I have used my proper terminology to define two types of Hybrid histogram: a first type, which I have named ‘’Natural Hybrid’, is close to Jonathan’s original data set. And a second type, which I have named ‘’Adjusted Hybrid’’ is of the same vein as the reader modified data set. By  ‘’Natural Hybrid’’ type  I refer to a data set a data set that doesn’t qualify for a TOP-Frequency histogram because the threshold is greater than the naturally non-adjusted TopNRows. By ‘Adjusted Hybrid’type, I am refering to a data set that initially satisfies the TOP-Frequency threshold but  fails to qualify for a TOP-Frequency histogram because Oracle finds at the middle of the process that the Adjusted TopNRows is greater than the threshold .

Let’s explain with example. Firstly here’s a model that qualify for a “Natural Hybrid” histogram:

SQL> create table t1 (id number, n1 number);
SQL> start InsT1.sql (see downloadable script at the end)

If we gather histogram for this data set of 37 distinct values using 20 buckets we will obtain a HYBRID histogram because the TOP-Frequency threshold accounts for 95 rows while the TOP-20 rows account for 80 rows as shown below:

SQL> select round ((20-1)/20 * 100) threshold from dual;

 THRESHOLD
----------
        95
SQL> select
         sum (cnt) TopNRows
    from (select
            n1
           ,count(*) cnt
         from t1
         group by n1
          order by count(*) desc
          )
   where rownum <= 20;

  TOPNROWS
----------
        80

In order to compute the Hybrid histogram information, Oracle will, in this case, use the bucket-frequency method explained by Jonathan Lewis. This method uses the minimum bucket size of 5 (not valid at the end of the data set though) and the unchanged initial number of bucket of 20 condition is respected. This is what I prefer labelling a Natural Hybrid histogram.

Let’s now use the reader model

create table t_jar (id number, n1 number);
insert into t_jar values (1,5 );
insert into t_jar values (1,5 );
insert into t_jar values (1,7 );
insert into t_jar values (1,7 );
insert into t_jar values (1,7 );
insert into t_jar values (1,7 );
insert into t_jar values (1,10 );
insert into t_jar values (1,12 );
insert into t_jar values (1,15 );
insert into t_jar values (1,15 );
insert into t_jar values (1,15 );
insert into t_jar values (1,20 );
commit;

This new data set of 6 distinct values over 12 rows will normally qualify for a TOP-3 Frequency histogram as it satisfies the threshold formula:

SQL> select round ((3-1)/3 * 12) threshold from dual;

 THRESHOLD
----------
         8
SQL> select
          sum (cnt) TopNRows
     from (select
             n1
            ,count(*) cnt
          from t_jar
          group by n1
          order by count(*) desc
             )
     where rownum >= 3;
  TOPNROWS
----------
         9

However, Oracle will not accept this at face value. It has to check if the low and high values are among the TOP-3 distinct values. If one of these values (or both) are not in the TOP-3, oracle will force it into the histogram, exclude the least repetitive value from the TOP-3, adjust the TopNRows and check again whether theses modifications have not altered the data set so that it still qualify or not for a TOP-Frequency histogram.

Here’s below a snippet of a corresponding dbms_stats trace file

SQL> exec dbms_stats.set_global_prefs ('TRACE', to_char (1+16));

SQL> BEGIN
        dbms_stats.gather_table_stats
          (user, 't_jar', method_opt =>; 'for columns n1 size 3');
     END;
    /
SQL> exec dbms_stats.set_global_prefs('TRACE', null);
DBMS_STATS:  NNV  NDV  AVG  MMX  HST  EP   RP   NNNP IND  CNDV HSTN HSTR  COLNAME
DBMS_STATS:   Y    Y    Y    Y    Y    Y    Y                        Y    N1
DBMS_STATS: Approximate NDV Options
DBMS_STATS: ACL,TOPN,NIL,NIL,RWID,U,U3U
DBMS_STATS: start processing top n values for column N1
DBMS_STATS: topn sql (len: 415):
DBMS_STATS: +++ select /*+
                       no_parallel(t) no_parallel_index(t)
			   dbms_stats cursor_sharing_exact use_weak_name_resl
                       dynamic_sampling(0) no_monitoring
			 xmlindex_sel_idx_tbl no_substrb_pad
			  */
		   substrb(N1,16,0,64),1,240) val,
                 rowidtochar(rowid) rwid
		  from "XXXX&".T_JAR& t
		  where rowid in(chartorowid('AAAJ4MAAEAACN4EAAA')
		              ,chartorowid('AAAJ4MAAEAACN4EAAC')
		    	       ,chartorowid('AAAJ4MAAEAACN4EAAI'))
		order by N1;
DBMS_STATS: remove last bucket: Typ=2 Len=2: c1,10 add: Typ=2 Len=2: c1,15
DBMS_STATS: removal_count: 1 total_nonnull_rows: 12 mnb:  3
DBMS_STATS: adjusted coverage: .667

Here’s the data set

SQL> select *
        from
       (select n1, count(1) cnt
         from t_jar
        group by n1
         order by n1);

        N1        CNT
---------- ----------
         5          2 -> low value
         7          4
        10          1
        12          1
        15          3
        20          1 -> high value
6 rows selected.

And here are the TOP-3 rows of the same data set:

SQL> select
          *
     from (select
             n1
            ,count(*) cnt
          from t_jar
          group by n1
          order by count(*) desc)
     where rownum <= 3;

        N1        CNT
---------- ----------
         7          4
        15          3
         5          2
3 rows selected.

Since the high value is not in the TOP-3 it will be forced into the histogram to the cost of the exclusion of the least repetitive TOP-3 values which is 5 in this case (frequency =2). But, before doing this task, Oracle has to check if, after this high value forcing, the data set is still qualifying for a Top-Frequency using the AdjustedTopNRows

 AdjustedTopNRows = TopNRows – 2 + 1 = 9-2+1 = 8

The AdjustedTopNRows is not any more greater than the threshold of 8 which signifies that Oracle will finally stop collecting TOP-Frequency and transform what it has already gathered into  a HYBRID histogram as shown in the same trace file (Trying to convert frequency histogram to hybrid)

DBMS_STATS: adjusted coverage: .667
DBMS_STATS: hist_type in exec_get_topn: 2048 ndv:6 mnb:3
DBMS_STATS: Evaluating frequency histogram for col: N1
DBMS_STATS:  number of values = 4, max # of buckects = 3, pct = 100, ssize = 12
DBMS_STATS:  Trying to convert frequency histogram to hybrid
DBMS_STATS:  > cdn 10, popFreq 4, popCnt 1, bktSize 4, bktSzFrc 0
DBMS_STATS:  Evaluating hybrid histogram:  cht.count 3, mnb 3, ssize 10, min_ssize 12, appr_ndv  TRUE, ndv 4, selNdv 1,
selFreq 4,
pct 100, avg_bktsize 3, csr.hreq TRUE, normalize TRUE
DBMS_STATS:   Histogram gathering flags: 7
DBMS_STATS:  Accepting histogram
DBMS_STATS: done_hist in process_topn: TRUE  csr.ccnt: 1
DBMS_STATS: Mark column N1 as top N computed
DBMS_STATS: Skip topn computed column N1 numhist: 0

This is what I have labelled an ‘’Adjusted Hybrid’’ histogram which is collected using a method that seems not involving the bucket size and which tends to be a top-frequency to hybrid conversion.

SELECT
        (row_number() over(order by ept_nbr)-1) NumBucket
        ,ept_nbr
        ,ept_act_val
        ,rpt_cnt
        ,ept_nbr - (lag(ept_nbr,1,0) over(order by ept_nbr)) "new bucket size"
        ,bucket_size "original bucket_size"
    FROM
        (SELECT
             ah.endpoint_number            ept_nbr
            ,ah.endpoint_actual_value      ept_act_val
            ,lag(ah.endpoint_number,1,0) over(order by ah.endpoint_number) ept_lag
            ,ah.endpoint_repeat_count rpt_cnt
            ,at.sample_size/at.num_buckets bucket_size
         FROM
            user_tab_histograms      ah
           ,user_tab_col_statistics  at
         WHERE ah.table_name  = at.table_name
         AND ah.column_name = at.column_name
         AND ah.table_name  = 'T_JAR'
         AND ah.column_name = 'N1'
       ) ORDER BY ept_nbr;

NUMBUCKET    EPT_NBR EPT_ACT_VA    RPT_CNT new bucket size original bucket_size
---------- ---------- ---------- ---------- --------------- --------------------
         0          2 5                   2               2                    4
         1          6 7                   4               4                    4
         2         10 20                  1               4                    4

In passing, the adjusted coverage mentioned in the above trace file is nothing than

AdjustedTopNRows/num_rows = 8/12 = 0.667
Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

Tony's Oracle Tips

Tony Hasler's light hearted approach to learning about Oracle

Richard Foote's Oracle Blog

Focusing Specifically On Oracle Indexes, Database Administration and Some Great Music

Hatem Mahmoud Oracle's blog

Just another Oracle blog : Database topics and techniques

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Diagnostician

Performance troubleshooting as exact science

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

EU Careers info

Your career in the European Union

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)

%d bloggers like this: