Mohamed Houri’s Oracle Notes

August 29, 2013

INDEX FULL SCAN , NLS_SORT and ORDER BY

Filed under: Oracle — hourim @ 12:38 pm

If I was to ask you how an INDEX FULL SCAN is operated by the SQL engine you would certainly answer that it will go to the first leaf block of the index and walk down up to the last leaf block in the index key order using typically a db file sequential read.

I have strengthened the words  index key order because it is specifically that operation which interests me here in this article. Each time I see an INDEX FULL SCAN operation in an execution plan, I immediately try to know if the CBO did took an advantage of this typical access to avoid an eventual supplementary ORDER BY operation. There is a bug, under the FIRST_ROWS mode, where an INDEX FULL SCAN is preferred, whatever its cost is, in order to avoid an ORDER BY operation. Hence, in the presence of such index operation, I also try to verify the CBO mode and/or the presence of a where clause such as where rownum=1 which makes, behind the scene, the CBO behaving as if it was running under FIRST_ROWS mode.

Recently an excellent question comes up in a French forum where the Original Poster (OP) was wondering why the CBO was making a wrong decision. Several very good interventions by very nice peoples motivated me to write two articles, the first one related to the relationship that might exist between an INDEX FULL SCAN and an ORDER BY operation while the second article will look on the effect the optimizer_index_cost_adj parameter might have on the choice of a good or wrong execution path.

The  OP query and execution plan  are shown below:

SELECT
     colonne1
FROM matable
GROUP BY colonne1
ORDER BY colonne1 ASC NULLS LAST;
Plan hash value: 2815412565

------------------------------------------------------------------------------------------
| Id  | Operation         | Name                 | Starts | E-Rows | A-Rows |   A-Time   |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                      |      1 |        |      3 |00:00:21.60 |
|   1 |  SORT ORDER BY    |                      |      1 |      3 |      3 |00:00:21.60 |
|   2 |   HASH GROUP BY   |                      |      1 |      3 |      3 |00:00:21.60 |
|   3 |    INDEX FULL SCAN| MATABLE_PK           |      1 |   2923K|   2928K|00:00:21.99 |
------------------------------------------------------------------------------------------

The query takes more than 20 seconds to complete. And when he instructs the CBO to use a FULL table scans the response time fells down to about 4 seconds

SELECT /*+ NO_INDEX(matable matable_pk) */
colonne1
FROM matable
GROUP BY colonne1
ORDER BY colonne1 ASC NULLS LAST;

-----------------------------------------------------------------------------------------
| Id  | Operation           | Name              | Starts | E-Rows | A-Rows |   A-Time   |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |                   |      1 |        |      3 |00:00:04.03 |
|   1 |  SORT ORDER BY      |                   |      1 |      3 |      3 |00:00:04.03 |
|   2 |   HASH GROUP BY     |                   |      1 |      3 |      3 |00:00:04.03 |
|   3 |    TABLE ACCESS FULL| MATABLE           |      1 |   2923K|   2928K|00:00:03.19 |
-----------------------------------------------------------------------------------------

The next blog article will discuss the reason of this wrong execution plan choice.  For this moment, let me just spot with you the duplicate sort operation the OP has got.

------------------------------------------------------------------------------------------
| Id  | Operation         | Name                 | Starts | E-Rows | A-Rows |   A-Time   |
------------------------------------------------------------------------------------------
|   1 |  SORT ORDER BY    |                      |      1 |      3 |      3 |00:00:21.60 |
|   2 |   HASH GROUP BY   |                      |      1 |      3 |      3 |00:00:21.60 |
|   3 |    INDEX FULL SCAN| MATABLE_PK           |      1 |   2923K|   2928K|00:00:21.99 |
------------------------------------------------------------------------------------------

An ordered INDEX FULL SCAN (on the leading PK column) access followed by a SORT ORDER BY of this PK column.

Why?

This is the aim of the current blog article.

First let me present the model

CREATE TABLE t
(c1 VARCHAR2(64), c2 CHAR(15), d1 DATE);

INSERT INTO t
  SELECT
       mod(ABS(dbms_random.random),3)+ 1||chr(ascii('Y')) ,
       dbms_random.string('L',dbms_random.value(1,5))||rownum ,
       to_date(TO_CHAR(to_date('01/01/1980','dd/mm/yyyy'),'J') + TRUNC(dbms_random.value(1,11280)),'J')
FROM dual
CONNECT BY level <= 2e6;

ALTER TABLE t ADD CONSTRAINT t_pk PRIMARY KEY (c1,c2) USING INDEX;

EXEC dbms_stats.gather_table_stats (USER, 't', CASCADE => true, method_opt => 'FOR ALL COLUMNS SIZE 1');

And now the query on 11.2.0.3.0 – 64bit Production

SQL > SELECT  c1
 2    FROM t
 3    GROUP BY c1
 4   ORDER BY c1 ASC NULLS LAST;

C1
 --------------------------
 1Y
 2Y
 3Y

--------------------------------------
 SQL_ID  0nfhzk4r58zuw, child number 1
 -------------------------------------
 SELECT  c1   FROM t   GROUP BY c1  ORDER BY c1 ASC NULLS LAST

Plan hash value: 2111031280
 -----------------------------------------------------------------------------
 | Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
 -----------------------------------------------------------------------------
 |   0 | SELECT STATEMENT     |      |       |       |  2069 (100)|          |
 |   1 |  SORT GROUP BY NOSORT|      |     3 |     9 |  2069   (5)| 00:00:06 |
 |   2 |   INDEX FULL SCAN    | T_PK |  2000K|  5859K|  2069   (5)| 00:00:06 |
 -----------------------------------------------------------------------------

As I have expected, an ordered INDEX FULL SCAN on the leading primary key column which allows the CBO to avoid the ORDER BY c1 operation as clearly shown by the operation 1 SORT GROUP BY NOSORT

So what is the difference between my model and the OP one? Or more precisely what is the difference between my environment and the OP one? It should exist something that makes the difference. Fortunately the thread was under good hands and someone cleverly asked to get the execution plan with the advanced option thought that his intention was to see the cost. Nevertheless, the advanced option shows that the OP was using a French NLS_SORT parameter.

Hmmmm…

Let me then change my nls_sort to FRENCH and see what happens to my engineered query


SQL> show parameter nls_sort

NAME                                 TYPE        VALUE
------------------------------------ ----------- ----------
nls_sort                             string      BINARY

SQL> alter session set nls_sort=FRENCH;

Session altered.

SQL> SELECT  c1
2    FROM t
3    GROUP BY c1
4   ORDER BY c1 ASC NULLS LAST;

C1
------------------------
1Y
2Y
3Y

SQL_ID  0nfhzk4r58zuw, child number 3
-------------------------------------
SELECT  c1   FROM t   GROUP BY c1  ORDER BY c1 ASC NULLS LAST

Plan hash value: 1760210272

------------------------------------------------------------------------------
| Id  | Operation             | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |       |       |  2451 (100)|          |
|   1 |  SORT ORDER BY        |      |     3 |     9 |  2451  (20)| 00:00:07 |
|   2 |   SORT GROUP BY NOSORT|      |     3 |     9 |  2451  (20)| 00:00:07 |
|   3 |    INDEX FULL SCAN    | T_PK |  2000K|  5859K|  2069   (5)| 00:00:06 |
------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
3 - SEL$1 / T@SEL$1

Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
DB_VERSION('11.2.0.3')
ALL_ROWS
OUTLINE_LEAF(@"SEL$1")
INDEX(@"SEL$1" "T"@"SEL$1" ("T"."C1" "T"."C2"))
END_OUTLINE_DATA
*/

Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) NLSSORT("C1",'nls_sort=''FRENCH''')[2000],"C1"[VARCHAR2,256]
2 - (#keys=1) "C1"[VARCHAR2,256]
3 - "C1"[VARCHAR2,256] 

The column projection gives an interesting information on what’s going on here (nls_sort= french)


Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) NLSSORT("C1",'nls_sort=''FRENCH''')[2000],"C1"[VARCHAR2,256]

On contrast to the situation where my column c1 would have been declared as of a NUMBER data type, the nls_sort parameter value would not have played any effect as shown below:


SQL> describe t1

Name                            Null?    Type
------------------------------- -------- ----------
1      C1                              NOT NULL NUMBER
2      C2                              NOT NULL CHAR(15)
3      D1                                       DATE

SQL> show parameter nls_sort

NAME                                 TYPE        VALUE
------------------------------------ ----------- -----------
nls_sort                             string      FRENCH

SQL>SELECT  c1
2    FROM t1
3    GROUP BY c1
4   ORDER BY c1 ASC NULLS LAST;

C1
----------
1
2
3

------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |       |       |  2105 (100)|          |
|   1 |  SORT GROUP BY NOSORT|       |  1754K|    21M|  2105   (5)| 00:00:06 |
|   2 |   INDEX FULL SCAN    | T1_PK |  1754K|    21M|  2105   (5)| 00:00:06 |
------------------------------------------------------------------------------

SQL> alter session set nls_sort = BINARY;

Session altered.

SQL> SELECT  c1
2    FROM t1
3    GROUP BY c1
4   ORDER BY c1 ASC NULLS LAST;

C1
----------
1
2
3

------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |       |       |  2105 (100)|          |
|   1 |  SORT GROUP BY NOSORT|       |  1754K|    21M|  2105   (5)| 00:00:06 |
|   2 |   INDEX FULL SCAN    | T1_PK |  1754K|    21M|  2105   (5)| 00:00:06 |
------------------------------------------------------------------------------

Footnote: When you see in your execution plan two ordered operations like and INDEX FULL SCAN followed by an ORDER BY on the leading index column then check the nls_sort parameter. It might be due to the difference of the session nls_sort parameter and the sort parameter used internally by Oracle when reading the INDEX FULL SCAN keys.

August 7, 2013

Indexed virtual column and DML error logging : ORA-03113: end-of-file on communication channel

Filed under: Oracle — hourim @ 2:57 pm

This is a brief note to show you that when combining virtual column and DML error logging you might be in trouble as I have been.  First the model

SQL> select * from v$version;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
PL/SQL Release 11.2.0.3.0 - Production
CORE    11.2.0.3.0      Production
TNS for Linux: Version 11.2.0.3.0 - Production
NLSRTL Version 11.2.0.3.0 – Production


create table t1(n1 number, dat1 timestamp(6));

alter table t1 add virt_n1 generated always as (case when dat1 is null then n1 else null end) virtual;

create index ind_virt_n1 on t1(virt_n1);

create table t2(n1 number, n2 number);

insert into t1 (N1,DAT1) values (1067597,to_timestamp('05-AUG-13 10.44.09.456703000 AM','DD-MON-RR HH.MI.SS.FF AM'));
insert into t1 (N1,DAT1) values (1067597,to_timestamp('05-AUG-13 10.44.09.456703000 AM','DD-MON-RR HH.MI.SS.FF AM'));
insert into t1 (N1,DAT1) values (1067597,to_timestamp('05-AUG-13 10.44.09.456703000 AM','DD-MON-RR HH.MI.SS.FF AM'));
insert into t1 (N1,DAT1) values (1067597,to_timestamp('05-AUG-13 10.44.09.456703000 AM','DD-MON-RR HH.MI.SS.FF AM'));
insert into t1 (N1,DAT1) values (1067597,to_timestamp('05-AUG-13 10.44.09.456703000 AM','DD-MON-RR HH.MI.SS.FF AM'));
insert into t1 (N1,DAT1) values (36869,to_timestamp('05-AUG-13 10.44.09.456703000 AM','DD-MON-RR HH.MI.SS.FF AM'));
insert into t1 (N1,DAT1) values (36869,to_timestamp('05-AUG-13 10.44.09.456703000 AM','DD-MON-RR HH.MI.SS.FF AM'));
insert into t1 (N1,DAT1) values (170012,to_timestamp('05-AUG-13 10.44.09.456703000 AM','DD-MON-RR HH.MI.SS.FF AM'));
insert into t1 (N1,DAT1) values (170012,to_timestamp('05-AUG-13 10.44.09.456703000 AM','DD-MON-RR HH.MI.SS.FF AM'));
insert into t1 (N1,DAT1) values (170012,to_timestamp('05-AUG-13 10.44.09.456703000 AM','DD-MON-RR HH.MI.SS.FF AM'));
insert into t1 (N1,DAT1) values (170012,null);

insert into t2 select distinct n1, n1+1 from t1;

commit;

exec dbms_errlog.create_error_log (dml_table_name => 't1');

I have created two tables, t1 and t2.  I enriched t1 with a virtual column (virt_n1) and a non unique index (ind_virt_n1) on this virtual column. And finally, I create a DML error logging table (err$_t1) in order to log rejected records when inserting into t1;

And now the problem

insert into t1
    (n1
    ,dat1)
select
   t2.n1
   ,systimestamp
from t2
log errors into ERR$_t1 reject limit unlimited;
*
ERROR at line 1:
ORA-03113: end-of-file on communication channel

It took me a couple of minutes to figure out that this error is due the DML logging error because when I get rid of this error logging the insert works perfectly

insert into t1
(n1
,dat1)
select  t2.n1
,systimestamp
from t2;

3 rows created.

But wait; I was not going to let down my DML error just because of this end-of-file on communication channel?  I was curious to know the reason that impeaches my insert to work when combined with the DML error logging. The way I did approached this issue is to ask myself what is the difference between the actual situation and the old and many situations where I did smoothly used the DML error logging. You might have already guessed the answer because it is in the title of this blog article: virtual column. So, I immediately described the err$_t1 which shows the presence of the virtual column


SQL> desc err$_t1
Name                            Null?    Type
------------------------------- -------- ---------------
1      ORA_ERR_NUMBER$                          NUMBER
2      ORA_ERR_MESG$                            VARCHAR2(2000)
3      ORA_ERR_ROWID$                           ROWID
4      ORA_ERR_OPTYP$                           VARCHAR2(2)
5      ORA_ERR_TAG$                             VARCHAR2(2000)
6      N1                                       VARCHAR2(4000)
7      DAT1                                     VARCHAR2(4000)
8      VIRT_N1                                  VARCHAR2(4000) -- virtual column

By simply dropping the virtual column from the err$_t1 the insert works perfectly  in the presence of DML error logging table as shown below:

alter table err$_t1 drop column virt_n1;
Table altered.

insert into t1
(n1
,dat1)
select  t2.n1
,systimestamp
from t2
log errors into ERR$_t1 reject limit unlimited;

3 rows created.

I was still curious to know why the presence of a virtual column in the error table impeaches things to work correctly. There might be another reason. This is why I started as an electrician who is having an electrical problem in his installation but doesn’t know exactly which is the culprit electrical devise is.  I modeled my real life tables and started dropping object by object (foreign key, unique key, index) until I found the culprit object:  ind_virt_n1 the index on the virtual column.

If you re-create the model presented above and, this time, drop the index ind_virt_n1 and let the virtual column in the err$_t1 table the insert will this time work perfectly as shown below:

drop index ind_virt_n1;
Index dropped.

insert into t1
    (n1
    ,dat1)
select
    t2.n1
   ,systimestamp
from t2
log errors into ERR$_t1 reject limit unlimited;

3 rows created.

The bottom line of this article is to show that mixing indexed virtual column and DML error logging might not work without error. In my case I opted for dropping the virtual column from the err$_t1 instead of dropping the virtual index because of the performance gain this index brings to my application.

Blog at WordPress.com.

Tony's Oracle Tips

Tony Hasler's light hearted approach to learning about Oracle

Richard Foote's Oracle Blog

Focusing Specifically On Oracle Indexes, Database Administration and Some Great Music

Hatem Mahmoud's blog

Just another blog : Databases, Linux and other stuffs

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Diagnostician

Performance troubleshooting as exact science

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)