Mohamed Houri’s Oracle Notes

June 21, 2018

SQL Monitor and parsing

Filed under: SQL Monitor — hourim @ 6:32 pm

In part I of this short series on SQL Monitoring Duration and Elapsed time information we saw that in the case of a query fetching data to the client, we have better to rely on the former information rather than on the latter. Indeed, the Elapsed time of the SQL monitor Global Stats section doesn’t keep count of the time spent passing the data to the client. Piotr Wrzosek, Tanel Poder and Robert have emphasized that, while Duration represents the wall clock time, Elapsed time in SQL monitor refers to the DB time measured for non-idle sessions that are either waiting or actively working on CPU. This is why time spent outside the database by idle sessions is not measured by the SQL monitor Elapsed time information.

In part 2 we will be trying to demonstrate that, all things being equal, the SQL monitor kicks in only during the query execution phase. Any preliminary steps occurring between the launch of the query and its execution phase will not be monitored. Typically, time spent in the hard parse phase, will appear neither in the Duration nor in the Elapsed time of the SQL monitor report.

Here’s below a quick demonstration

Parsing

I started looking around on how to produce a sufficiently long hard parse situation within a single query. Very quickly few ideas germinated in my mind like extreme Adaptive Cursor Sharing, or several SQL Plan Management baselined execution plans which Oracle has to reproduce and cost before selecting the best costed plan. But I finally ended up by using a simple join between two tables of 3 million rows each but with an imense IN-LIST predicate:

create table t1
as select
    rownum n1
   ,trunc((rownum-1)/3) n2
   ,mod(rownum,10) n3
from dual
connect by level <= 3e6;


create table t2
as select
    2*rownum n1
   ,trunc(rownum,5) n2
   ,lpad('x',5) vc
from dual
connect by level <= 3e6;

alter table t1 add constraint t1_pk primary key (n1);
alter table t2 add constraint t2_pk primary key (n1);

select /*+ monitor */
     t1.n1
    ,t1.n2
    ,t1.n3
from
     t1
where
     t1.n1 in
           (select t2.n2
           from t2
           where
                (
                 (
                  (
                   (
                          (t2.n1 in (33183, 31917, 31405)
                           or t2.n1 in (33383, 33917, 38405)
                           or t2.n1 in (150, 234, 456)
                           or t2.n1 in (309, 33456, 675)
                           or t2.n1 in (10, 2345, 42367)
                           or 
                           or
                           etc..
                    )
                   )
                  )
                 )
		   )
	     );

Just before launching this query I actived the 10046 trace event as shown below:


SQL> set timing on

SQL> @46on 12

SQL>@MyLongParsingQuery

        N1         N2         N3
---------- ---------- ----------
         5          1          5
        75         24          5
       117         38          7
       228         75          8
     16728       5575          8

Elapsed: 00:00:14.39

SQL> @46off

As you can see the query took 14 seconds wall clock time to complete. And you have to believe me when I say that the same query when executed without tracing lasted 14 seconds too.

Since the query execution phase is super-fast (less than 5 seconds) and is run serially, it will not be automatically monitored. This is why I forced it to be monitored using the hint /*+ monitor */. Here is below the corresponding SQL monitor report (reduced to the Global Information and Global Stats sections):

Global Information
------------------------------
 Status              :  DONE ( ALL ROWS)             
 Instance ID         :  1                     
 SQL ID              :  1w1jpyxgk0asj                    
 Execution Started   :  06/08/2018 12:16:31   
 First Refresh Time  :  06/08/2018 12:16:31  
 Last Refresh Time   :  06/08/2018 12:16:32   
 Duration            :  1s                  
                 
Global Stats
=================================================
| Elapsed |   Cpu   |  IO      | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  |
=================================================
|  0.08   |  0.02   |   0.06   |   2   |   542  |
=================================================

It is interesting to note that both Duration and Elapsed time are not showing the correct time of 14 seconds taken by the query.

However the TKPROF of the query 10046 trace file generated above is clearly showing where most of those 14 seconds have been spent:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1     12.89      14.22          0          0          3           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.02       0.08         40        542          0           4
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     12.91      14.30         40        542          3           4

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  PGA memory operation                         1179        0.00          0.01
  acknowledge over PGA limit                    117        0.10          1.30
  SQL*Net message to client                       2        0.00          0.00
  db file sequential read                        40        0.01          0.06
  SQL*Net message from client                     2        0.00          0.01
********************************************************************************

Thanks to the above TKPROF lines we now have a clear image of the query execution time distribution:

 elapsed time    = 14.30 seconds (of which 14.22 for parsing)
 cpu consumption = 12.91 seconds (of which 12.89 for parsing)
 wait events     = 0.01 + 1.30 + 0.00 + 0.06 + 0.01 = 1.38 seconds
 elapsed -cpu    = 14.30 -12.91 = 1.39 seconds

Summary

In this second part of the series about Duration and Elapsed time information displayed by the SQL monitor report, we have demonstrated the following two points:

• SQL monitor monitors only the execution phase of the query
• Any preliminary steps preceding the execution phase will not be considered by the SQL monitor. Typically, time taken by a long hard parsing situation, will not show up neither in the Duration nor in the Elapsed time.

Advertisements

June 9, 2018

ASH, SQL Monitor and Extended trace

Filed under: ASH,SQL Monitor — hourim @ 3:21 pm

Without any analogy with Sergio Leone’s famous movie “The Good, the Bad and the Ugly” which has brought up my teenage years, I would rather define the above three Oracle features as “The Good, the Good and the Good” as they made my consultancy days very often. This article has been inspired by the last Nenad Noveljic blog post about negative unaccounted-for where he mentioned that the SQL Monitor Global Stats section might show a wrong value in the Elapsed time. In Nenad’s case this misleading piece of information was due to unaccounted-for time. In fact, I know three other situations where this time discrepancy might occur: when fetching data, during a long hard parsing query and when using parallel execution. In this article I will start by demonstrating the fetching data case. Two other articles for the parse and parallel cases will follow to complete the series.

Fetching data

I executed a query which takes few minutes to complete but spent a considerable amount of time fetching a large number of columns with quite a large number of rows. While the query was running I took several snapshots of its corresponding 12cR2 SQL monitor which I am presenting below:

Global Information
------------------------------
 Status              :  EXECUTING             
 Instance ID         :  1                     
 SQL ID              :  g6px76dmjv1jy                    
 Execution Started   :  06/05/2018 09:54:19   
 First Refresh Time  :  06/05/2018 09:54:25  
 Last Refresh Time   :  06/05/2018 09:54:29   
 Duration            :  11s                  
                 
Global Stats
=========================================
| Elapsed |   Cpu   |  IO      | Buffer |
| Time(s) | Time(s) | Waits(s) |  Gets  |
=========================================
|    10   |    8.36 |     1.34 |   2M   |
=========================================

SQL Plan Monitoring Details (Plan Hash Value=2278376452)
===========================================================================
| Id    |                 Operation |       Name       | Execs |   Rows   |
|       |                           |                  |       | (Actual) |
===========================================================================
|     0 | SELECT STATEMENT          |                  |     1 |          |
|     1 |   NESTED LOOPS OUTER      |                  |     1 |          |
|     2 |    NESTED LOOPS           |                  |     1 |          |
|     3 |     FILTER                |                  |     1 |          |

As you can see, both Elapsed time and Duration are showing the same time of 10 seconds. But spot as well that at this stage of the query execution, data was not yet sent to the client (Actual Rows column is empty).

At exactly the same moment, ASH is also showing approximately the same elapsed time as shown below:

column  cnt   format 9999
compute sum label 'Total Elapsed' of cnt on report

break   on report

select 
   sql_plan_line_id
  ,count(1) cnt
from
   gv$active_session_history
 where
    sample_time between to_date('&datefrom', 'mm/dd/yyyy hh24:mi:ss')
				and     to_date('&dateto', 'mm/dd/yyyy hh24:mi:ss')
and sql_id = '&sql_id'
group by sql_plan_line_id
order by 2 desc;

SQL> @ashplanlineId
Enter value for datefrom:  06/05/2018 09:54:19
Enter value for dateto:  06/05/2018 09:54:29
Enter value for sql_id: g6px76dmjv1jy


SQL_PLAN_LINE_ID   CNT
---------------- -----
              10    7
              13    1
              11    1
                 -----
Total Elapsed       9

However, as soon as Oracle has started sending data to the client, a discrepancy between Elapsed time in Global Stats section and Duration began to be displayed as shown below:

Global Information
------------------------------
 Status              :  EXECUTING             
 Instance ID         :  1                     
 SQL ID              :  g6px76dmjv1jy                    
 Execution Started   :  06/05/2018 09:54:19   
 First Refresh Time  :  06/05/2018 09:54:25  
 Last Refresh Time   :  06/05/2018 09:56:18   
 Duration            :  120s                  
 Fetch Calls         :  375            
    
Global Stats
=================================================
| Elapsed |   Cpu   |  IO      | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  |
=================================================
|    32   |    11   |     21   |   375 |   3M   |
=================================================

SQL Plan Monitoring Details (Plan Hash Value=2278376452)
===========================================================================
| Id    |                 Operation |       Name       | Start |   Rows   |
|       |                           |                  | Active| (Actual) |
===========================================================================
| ->  0 | SELECT STATEMENT          |                  |   +30 |     5611 |
| ->  1 |   NESTED LOOPS OUTER      |                  |   +30 |     5611 |
| ->  2 |    NESTED LOOPS           |                  |   +30 |     5611 |
| ->  3 |     FILTER                |                  |   +30 |          |

ASH, in accordance with SQL monitor Elapsed time, started also to report a wrong elapsed time as shown below:

SQL> @ashplanlineId
Enter value for datefrom:  06/05/2018 09:54:19
Enter value for dateto:  06/05/2018 09:56:18
Enter value for sql_id: g6px76dmjv1jy


SQL_PLAN_LINE_ID   CNT
---------------- -----
              13    25
              10     7
              4      1
              4      1
                 -----
Total Elapsed       34

And so on until the query has finished fetching the entire set of data in about 527 seconds while both ASH and Elapsed time of the SQL monitor section are showing roughly 34 seconds of elapsed time as shown below respectively:

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)             
 Instance ID         :  1                     
 SQL ID              :  g6px76dmjv1jy                    
 Execution Started   :  06/05/2018 09:54:19   
 First Refresh Time  :  06/05/2018 09:54:25  
 Last Refresh Time   :  06/05/2018 10:03:06   
 Duration            :  527s                  
 Fetch Calls         :  2085            
    
Global Stats
=================================================
| Elapsed |   Cpu   |  IO      | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  |
=================================================
|    34   |    12   |     22   |  2085 |   3M   |
=================================================

SQL> @ashplanlineId
Enter value for datefrom:  06/05/2018 09:54:19
Enter value for dateto:  06/05/2018 09:56:18
Enter value for sql_id: g6px76dmjv1jy


SQL_PLAN_LINE_ID   CNT
---------------- -----
              13    26
              10     7
               4     1
              11     1
               5     1
                 -----
Total Elapsed       36

Using Nenad query, I can also report the same wrong elapsed time from ASH.

column  event format a30
column  cnt   format 9999
compute sum label 'Total Elapsed' of cnt on report

break   on report
select 
   session_state, event
  ,count(1) cnt
from
   gv$active_session_history
 where
   sql_id = '&sql_id'
 and sql_exec_id = (select max(sql_exec_id) 
                    from gv$active_session_history
					where sql_id = '&sql_id'
					)
group by event, session_state
;
SESSION EVENT                            CNT
------- ------------------------------ -----
WAITING direct path read                  26
ON CPU                                    11
                                       -----
Total Elapsed                             37

TKPROF

In a second attempt to make this time difference crystal clear I traced another execution of the same query using the 10046 trace events. I reproduced below the corresponding SQL monitor report and TKPROF summary respectively:

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)             
 Instance ID         :  1                     
 SQL ID              :  g6px76dmjv1jy                    
 Execution Started   :  06/05/2018 09:27:23   
 First Refresh Time  :  06/05/2018 09:27:30  
 Last Refresh Time   :  06/05/2018 09:39:21   
 Duration            :  718s                  
 Fetch Calls         :  2085            
    
Global Stats
=================================================
| Elapsed |   Cpu   |  IO      | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  |
=================================================
|    198  |    15   |     183  |  2085 |   3M   |
=================================================


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.21       0.21          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch     2085     15.18     198.03     647106    2918455          0       31246
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     2087     15.39     198.24     647106    2918455          0       31246


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                    2085        0.00          0.00
  SQL*Net more data to client                     2        0.00          0.00
  Disk file operations I/O                       90        0.00          0.00
  db file sequential read                     30313        0.09        158.42
  reliable message                                1        0.00          0.00  
  enq: KO – fast object checkpoint                1        0.00          0.00
  direct path read                             1220        0.09         24.76
  SQL*Net message from client                  2085        0.93        519.45
********************************************************************************

As you can see, the Duration of the SQL monitor is showing that the query took 718 seconds while the Elapsed time is showing only 198 seconds (15 sec for CPU and 183 sec for IO waits).
However, the TKPROF file comforts the reliability of the SQL monitor Duration information since it shows that the total elapsed time of the query is:

158.42 + 24.76     = 183.18 seconds for I/O waits
158.42 + 519.45    = 702.63 seconds for I/O waits and  SQL*Net message from client
702.63 + 15.39(cpu)= 718.02 seconds 

Naturally, ASH can’t display the correct piece of information since a large part of the query has been spent on the idle SQL*Net message from client wait event:

SQL> @ashplanlineId
Enter value for datefrom:  06/05/2018 09:27:23
Enter value for dateto:  06/05/2018 09:39:21
Enter value for sql_id: g6px76dmjv1jy

SQL_PLAN_LINE_ID   CNT
---------------- -----
              12   153
              13    28
              10     9
              11     6
              14     1
                 -----
Total Elapsed      197

Summary

We have clearly shown that when a query is fetching a large amount of data to the client, this transfer time is not taken into account by the Elapsed time of the SQL monitor Global Stats section. This is why, as mentioned by Nenad in his article, you should always rely on the time reported by the Duration in the Global information of the SQL Monitor instead of the Elapsed time of the Global stats section.

June 7, 2018

12cR2 Partitioning: auto-list and the disk space threat

Filed under: Partitioning — hourim @ 5:55 pm

Oracle 12cR2 comes with the auto-list partitioning option; It allows for system named partitions to be created on the fly as data arrives:

SQL> select banner from v$version where rownum=1;

BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production

SQL> create table auto_list_part
  	( n1 number  
       ,n2 number 
       ,vc1 varchar2(30)
      )
     partition by list(n1)
     AUTOMATIC  
     (partition p1 values(1,2,3,4,5)
     ,partition p2 values(6,7,8,9,10)
     ,partition p3 values(11,12,13,14,15) 
      )
     /

You only need to add the AUTOMATIC clause to the old way of creating list partitioned tables to have this feature enabled.

There is, however, a subtlety to consider: the AUTOMATIC clause doesn’t work with the DEFAULT clause:

SQL> create table auto_list_part
  	( n1 number  
       ,n2 number 
       ,vc1 varchar2(30)
      )
     partition by list(n1)
     AUTOMATIC
     (partition p1 values(1,2,3,4,5)
     ,partition p2 values(6,7,8,9,10)
     ,partition p3 values(11,12,13,14,15) 
     ,partition p_default values (default)
      )
     /
create table auto_list_part
*
ERROR at line 1:
ORA-14851: DEFAULT [sub]partition cannot be specified for AUTOLIST [sub]partitioned objects.

This is perfectly coherent as you don’t want to ask Oracle creating automatic list partitions for each of your newly arrived partition key values and ask it in the same time to put them into the DEFAULT partition.

That’s said let’s insert into the above automatic list partitioned table, rows that will go both in the already pre-created partitions and rows that need to go into non-existent ones:

SQL> insert into auto_list_part
     select
          rownum
         ,mod(rownum,5)
         ,'v'
     from dual
     connect by level <= 20;

 20 rows created.

SQL> exec dbms_stats.gather_table_stats(user, 'auto_list_part');

PL/SQL procedure successfully completed.



SQL> @getPartTab
Enter value for table_name: auto_list_part

TABLE_NAME           PARTITION_NAME  PARTITION_POSITION   NUM_ROWS TABLESPACE_NAME
-------------------- --------------- ------------------ ---------- ---------------
AUTO_LIST_PART       P1                               1          5 USERS
AUTO_LIST_PART       P2                               2          5 USERS
AUTO_LIST_PART       P3                               3          5 USERS
AUTO_LIST_PART       SYS_P3054                        4          1 USERS
AUTO_LIST_PART       SYS_P3055                        5          1 USERS
AUTO_LIST_PART       SYS_P3056                        6          1 USERS
AUTO_LIST_PART       SYS_P3057                        7          1 USERS
AUTO_LIST_PART       SYS_P3058                        8          1 USERS
                                                        ----------
Total num_rows                                                  20

8 rows selected.

I inserted 20 rows of which 5 have been embedded into 5 new SYS-partitions created automatically to store not foreseen partition key values greater than 15. It is important to note as well that the AUTOMATIC clause creates one partition per partition key. This is why you should absolutely take care in basing your partition strategy upon a data load pattern that will insert a reasonable amount of rows per partition key.

Let’s now get the size of the partitioned table:

SQL> @SizeBySegNameMB
Enter value for segment_name: auto_list_part
Enter value for owner: c##mhouri

SEGMENT_TYPE       TABLESPACE_NAME  SEGMENT_NAME         PARTITION_NAME          MB
------------------ ---------------- -------------------- --------------- ----------
TABLE PARTITION    USERS            AUTO_LIST_PART       P1                       8
TABLE PARTITION    USERS            AUTO_LIST_PART       P2                       8
TABLE PARTITION    USERS            AUTO_LIST_PART       P3                       8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3054                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3055                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3056                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3057                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3058                8
                                                                         ----------
Total Segment Size                                                               64

8 rows selected.

As you can see the 8 partitions occupy 64MB of disk space as their initial extend is 8MB regardless of the tiny number of rows they contain:

SQL> select 
        initial_extent/power(1024,2) initial_extentMB
       ,next_extent/power(1024,2)    next_extentMB
       ,extents
     from
        sys_dba_segs
     where
        segment_name = 'AUTO_LIST_PART'
     and
        owner        = 'C##MHOURI'
     and rownum      = 1   -- show only for one partition 
   ; 
INITIAL_EXTENTMB NEXT_EXTENTMB    EXTENTS
---------------- ------------- ----------
               8             1          1

And now the question that motivated this blog post: what happens when a very big insert into an auto-list partitioned table fails and is rolled back (by the transaction itself or via an explicit user rollback):

SQL> insert into auto_list_part
     select
          rownum
         ,mod(rownum,5)
         ,'w'
     from dual
     connect by level <= 100;

100 rows created.

SQL> rollback;

Rollback complete.

SQL> @sizebySegnameMB
Enter value for segment_name: auto_list_part
Enter value for owner: c##mhouri

SEGMENT_TYPE       TABLESPACE_NAME  SEGMENT_NAME         PARTITION_NAME          MB
------------------ ---------------- -------------------- --------------- ----------
TABLE PARTITION    USERS            AUTO_LIST_PART       P1                       8
TABLE PARTITION    USERS            AUTO_LIST_PART       P2                       8
TABLE PARTITION    USERS            AUTO_LIST_PART       P3                       8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3054                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3055                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3056                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3057                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3058                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3069                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3070                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3071                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3072                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3073                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3074                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3075                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3076                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3077                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3078                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3079                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3080                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3081                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3082                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3083                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3084                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3085                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3086                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3087                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3088                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3089                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3090                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3091                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3092                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3093                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3094                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3095                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3096                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3097                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3098                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3099                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3100                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3101                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3102                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3103                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3104                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3105                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3106                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3107                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3108                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3109                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3110                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3111                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3112                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3113                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3114                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3115                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3116                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3117                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3118                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3119                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3120                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3121                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3122                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3123                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3124                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3125                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3126                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3127                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3128                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3129                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3130                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3131                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3132                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3133                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3134                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3135                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3136                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3137                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3138                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3139                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3140                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3141                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3142                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3143                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3144                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3145                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3146                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3147                8
TABLE PARTITION    USERS            AUTO_LIST_PART       SYS_P3148                8
                                                                         ----------
Total Segment Size                                                              704

88 rows selected.

As you might have already pointed it out, it doesn’t matter whether I commit or rollback, Oracle will create on the fly, as much partitions as the distinct partition key values I have tried to insert. The immediate consequence is that I ended up with an almost empty partitioned table having 704MB worth of disk space.

If you want to shrink this ”wasted” space then you can move those partitions to use a less bigger initial extent:

SQL> alter table auto_list_part move partition SYS_P3148 
          storage(initial 65536 next 1M);

Table altered.
SQL> @sizeBySegNameMB
Enter value for segment_name: auto_list_part
Enter value for owner: c##mhouri

SEGMENT_TYPE       TABLESPACE_NAME SEGMENT_NAME         PARTITION_NAME          MB
------------------ --------------- -------------------- --------------- ----------
TABLE PARTITION    USERS           AUTO_LIST_PART       P1                       8
TABLE PARTITION    USERS           AUTO_LIST_PART       P2                       8
TABLE PARTITION    USERS           AUTO_LIST_PART       P3                       8
TABLE PARTITION    USERS           AUTO_LIST_PART       SYS_P3054                8
TABLE PARTITION    USERS           AUTO_LIST_PART       SYS_P3055                8
../..
TABLE PARTITION    USERS           AUTO_LIST_PART       SYS_P3148              ,06
                                                                        ----------
Total Segment Size                                                          696,06

you can also prevent such a disk space explosion due to the future automatically created list partitions by un-setting the following parameter (make sure you have read this before):

SQL> alter system set "_partition_large_extents"=false;

SQL> insert into auto_list_part values (9999, 3, 'z');

SQL>@sizebysegNameMB
Enter value for segment_name: auto_list_part
Enter value for owner: c##mhouri

SEGMENT_TYPE       TABLESPACE_NAME SEGMENT_NAME         PARTITION_NAME          MB
------------------ --------------- -------------------- --------------- ----------
TABLE PARTITION    USERS           AUTO_LIST_PART       P1                       8
TABLE PARTITION    USERS           AUTO_LIST_PART       P2                       8
TABLE PARTITION    USERS           AUTO_LIST_PART       P3                       8
TABLE PARTITION    USERS           AUTO_LIST_PART       SYS_P3054                8
../..
TABLE PARTITION    USERS           AUTO_LIST_PART       SYS_P3148              ,06
TABLE PARTITION    USERS           AUTO_LIST_PART       SYS_P3149              ,06
                                                                        ----------
Total Segment Size                                                          696,12

Bottom Line
While the new 12cR2 auto-list partitioning option releases you from foreseeing in advance the creation of new partitions as data arrives, it puts you under two threats (1) a roll backed insert can leave you with a bunch of empty partitions having potentially 8MB of disk space each increasing the total space occupied by your partitioned table (2) since this automatic feature creates one partition per partition key you should make sure to have many rows per partition key; otherwise the one row per partition doesn’t make sense.

May 25, 2018

Indexing Intelligibly

Filed under: Index — hourim @ 2:10 pm

Here’s again a simple but very practical tuning case taken from a real life running system. It started by a client complaint about a product inserting process that started to take a considerable amount of time (60 seconds per product). According to the client, it was not performing so badly a couple of days ago. And, this is sufficiently rare to be worthy of a note, the client supplied the sql_id of the query he found to be the root cause of this delay. In the next section I will explain, step by step, what I did to make my client very happy.

Get SQL monitoring report of the sql_id

And to make my task easy the sql_id was still in v$sql_monitor so that I immediately got its corresponding report (RTSM) shown below:

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)                
 Instance ID         :  1                    
 Session             :  xxxx (15:32901) 
 SQL ID              :  7bm6m1r2xsj5f        
 SQL Execution ID    :  16777216             
 Execution Started   :  04/05/2018 14:08:32  
 First Refresh Time  :  04/05/2018 14:08:38
 Last Refresh Time   :  04/05/2018 14:09:13
 Duration            :  41s                  
 Module/Action       :  JDBC Thin Client/-           
 Service             :  xxxxx            
 Program             :  JDBC Thin Client/-

Global Stats
=================================================
| Elapsed |   Cpu   |  Other   | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  |
=================================================
|    42   |    5.72 |     36   |    1  |  1M    |
=================================================

SQL Plan Monitoring Details (Plan Hash Value=4071256796)
========================================================================
| Id |               Operation |   Name   |  Rows   | Execs |   Rows   |
|    |                         |          | (Estim) |       | (Actual) |
========================================================================
|  0 | SELECT STATEMENT        |          |         |     1 |        1 |
|  1 |   NESTED LOOPS OUTER    |          |       1 |     1 |        1 |
|  2 |    NESTED LOOPS OUTER   |          |       1 |     1 |        1 |
|  3 |     NESTED LOOPS OUTER  |          |       1 |     1 |        1 |
.../...
| 12 |      TABLE ACCESS FULL  | T1       |       1 |    42 |        1 |
.../...
| 33 |   INDEX RANGE SCAN      | IDX-XXX  |       1 |     2 |          |
========================================================================

I didn’t have to search a lot. The query completes in 42 seconds of which 42 are due to the TABLE ACCESS FULL operation at line n°12.

But as interesting as this RTSM report is, it has a key limitation. It doesn’t report the predicate part. And as always, the predicate information is of a crucial importance to check what filters are applied on the above full table scan.

select * from table(dbms_xplan.display_cursor('7bm6m1r2xsj5f'));

Predicate Information (identified by operation id):
---------------------------------------------------
   12 – filter("XXX”.”COL1" = :3 AND "XXX”.”END_DATE" >=:5
               AND "XXX”.”START_DATE" <=:4 )
       

You don’t like speculation? Neither do I. This is why I immediately checked the following points:

  • There is no index starting by COL1 in table T1
  • And I executed the following query to check the pertinence of indexing COL1 column
SQL> select /*+ parallel(4) */ 
         col1, count(1)
     from t1
     group by col1
     order by 2 desc;

      COL1   COUNT(1)
---------- ----------
             60594499
    WTCLZ5         49
     LCOXS         47
    WTCLK1         47
     ../..

As you can see there is manifestly a design issue here as more than 99% of COL1 values are null. After I have got the confirmation from the client that the null value for COL1 is never used, I was going to create a single column index on COL1 when one of my DBA colleagues asked me the following question:

Why don’t you create a composite index on (COL1, END_DATE, START_DATE)?

And that was precisely the question that has motivated this blog post. I provided the following answer:

  • If we exclude null values, at maximum, we can filter down 49 rows from table T1 using COL1 column
  • While an inequality is applied on END_DATE and START_DATE columns an equality predicate is used against COL1
  • If we opt for the composite index (COL1, END_DATE, START_DATE) we will lose the formidable opportunity to have a very small and attractive index on COL1 since null values of COL1 will be added into the composite index.

Having sad that here’s below what happened next:

SQL> create index idx_t1_col1 on t1(COL1) parallel 4;

SQL> alter index idx_t1_col1 noparallel ;

With the new index having only 0,39GB of size out of table of 8,7GB as shown below:

SQL> @sizeBySegName
Enter value for segment_name: T1
Enter value for owner: xxxx

SEGMENT_TYPE       TABLESPACE_NAME   SEGMENT_NAME  PARTITION_NAME   GB
------------------ ----------------- ------------- --------------- -------
TABLE              XXXX               T1                            8.7012
                                                                   -------
Total Segment Size                                                  8.7012


SQL> @sizeBySegName
Enter value for segment_name: idx_t1_col1
Enter value for owner: xxxx

SEGMENT_TYPE       TABLESPACE_NAME   SEGMENT_NAME  PARTITION_NAME   GB
------------------ ----------------- ------------- --------------- ------
INDEX              XXXX               IDX_T1_COL1                  0.387
                                                                   ------
Total Segment Size                                                 0.387

Naturally the new execution plan uses the new index and resulted into a drastically improved query performance as the followings show via the new execution plan and the new elapsed time:

SQL_ID  7bm6m1r2xsj5f, child number 0
-------------------------------------
Plan hash value: 874603108

------------------------------------------------------------------------
| Id  | Operation                               | Name      | Rows     |
------------------------------------------------------------------------
|  0 | SELECT STATEMENT                         |            |         |
|  1 |   NESTED LOOPS OUTER                     |            |       1 |
|  2 |    NESTED LOOPS OUTER                    |            |       1 |
|  3 |     NESTED LOOPS OUTER                   |            |       1 |
.../...
| 12 |      TABLE ACCESS BY INDEX ROWID BATCHED | T1         |       1 |
| 13 |       INDEX RANGE SCAN                   | IDX_T1_COL1|       1 | 
|.../...                                                               |
------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   12 – filter( "XXX”.”END_DATE" >=:6 AND "XXX”.”START_DATE" <=:5 )
   13 - access("XXX”.”COL1" = :4)
SQL> @sqlstats
Enter value for sql_id: 7bm6m1r2xsj5f

     CHILD  PLAN_HASH_VALUE   GETS      ELAPS   EXECUTIONS
----------  --------------- ---------- ------  -----------
         0       874603108        11      0      897

Thanks to the new index the execution time of the client critical query dropped from 42 to 0 second and the logical I/O consumption from 1 million to only 11 buffer gets per execution.

Bottom Line

Through this simple real life example I wanted to show how crucial is the first column of an index when used in an equality predicate. I wanted also to emphasize that, sometimes, and all things being equal, it might be better to prefer a single column index to cover a multi-column where clause provided this single column is very selective and contains a bunch of null values making the single index very small and thereby very attractive to the CBO.

May 19, 2018

Merge join in modern Relational Database Management Systems

Filed under: Oracle — hourim @ 2:07 pm

There are many posts out there explaining what a MERGE JOIN is, how it works and why it is less popular than NESTED LOOPS and HASH JOIN physical operations. In a nutshell, MERGE JOIN compares two sets of sorted data on the merge column and outputs matched rows. It reads both data sets only once. This is why it is known as an unrelated combined operation as explained by Christian Antognini in his last book.

1. Oracle merge join

Here’s a simple Oracle example illustrating the different subtleties of the MERGE JOIN using Oracle 12cR2:

SQL> create table t1 as 
        select rownum*2 n1, rownum*5 n2, rownum n3
     from dual
     connect by level <=2e1; 

SQL> create table t2 as 
        select rownum*3 n1, rownum*5 n2, rownum n3
     from dual
     connect by level <=1e2; 

SQL> select
        /*+ use_merge(t1,t2) */
        t1.n1 t1n1
       ,t2.n1 t2n1
     from t1
        join t2
     on t1.n1 = t2.n1;

      T1N1       T2N1
---------- ----------
         6          6
        12         12
        18         18
        24         24
        30         30
        36         36

6 rows selected.

---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      6 |
|   1 |  MERGE JOIN         |      |      1 |     20 |      6 |
|   2 |   SORT JOIN         |      |      1 |     20 |     20 |
|   3 |    TABLE ACCESS FULL| T1   |      1 |     20 |     20 |
|*  4 |   SORT JOIN         |      |     20 |    100 |      6 | 
|   5 |    TABLE ACCESS FULL| T2   |      1 |    100 |    100 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."N1"="T2"."N1")
       filter("T1"."N1"="T2"."N1")

The number of rows (A-Rows=20) generated by the first child operation(Line Id n°2) of the MERGE JOIN operation at line id n° 1 and the number of times its second child operation (line id n°4) has been executed ( Starts=20) suggest that the MERGE JOIN is somehow using a NESTED LOOPS kind of algorithm. But we see clearly that, in contrast to NESTED LOOPS, the two merge join inputs (T1 and T2) are scanned only once (Starts = 1 at lines id n°3 and 5). So what does this Starts=20 of operation n°4 mean exactly? Clearly we haven’t made 20 separate sorts as the following proves:

set autotrace on stat

select
        /*+ use_merge(t1,t2) */
        t1.n1 t1n1
       ,t2.n1 t2n1
     from t1
        join t2
     on t1.n1 = t2.n1;

Statistics
---------------------------------------------------
          0  recursive calls
          4  db block gets
         14  consistent gets
          0  physical reads
          0  redo size
        708  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)          --> only two sorts corresponding to operations n°2 and n°4 
          0  sorts (disk)                
          6  rows processed

In effect, the MERGE JOIN parent operation gets a row from each sorted input and compares them. Typically it takes 20 rows from T1 and 100 rows from T2. It then gets the first row of each input and compare them using predicate n°4; if they join their corresponding rows are returned. If not, the MERGE JOIN will then discard the lower value and gets the next row from the lower input data set and continue the comparison process until there is no anymore rows to process. This algorithm can be simplified as follows:

Get 20  sorted rows from T1
Get 100 sorted rows from T2
LOOP until no rows to compare
  if  join value of T1 =  join value of T2
  then
      output the joined rows 
      discard  join value of T2
      get next join value of T1
      get next join value of T2      
  elsif join value of T1 < join value of T2 
      discard join value of T1 
      get next join value of T1 
  elsif join value of T1 >  join value of T2
      discard  join value of T2
      get next join value of T2
  end if;
END LOOP;

So, we can infer that the Starts = 20 of operation at line Id n° 4 represents Oracle comparing each of the 20 join column values of T1 with their equivalent ordered join column of T2 (first rows from T1 with first row from T2 and so on until there is no more rows in T1 to compare).

But let’s now change the order of the join so that table T2 will be the first data set input of the merge join operation:

select
  /*+ leading (t2, t1) use_merge(t2,t1) */
   t1.n1 t1n1
  ,t2.n1 t2n1
from t1
join t2
on t1.n1 = t2.n1;

      T1N1       T2N1
---------- ----------
         6          6
        12         12
        18         18
        24         24
        30         30
        36         36

6 rows selected.
---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      6 |
|   1 |  MERGE JOIN         |      |      1 |     20 |      6 |
|   2 |   SORT JOIN         |      |      1 |    100 |     14 | -- why 14 rows?
|   3 |    TABLE ACCESS FULL| T2   |      1 |    100 |    100 |
|*  4 |   SORT JOIN         |      |     14 |     20 |      6 |
|   5 |    TABLE ACCESS FULL| T1   |      1 |     20 |     20 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."N1"="T2"."N1")
       filter("T1"."N1"="T2"."N1")

We are still generating the same number of rows, 6, but this time the operation at line id n°4 is started 14 times. Why 14 and not 100 the totality of the T2 rows?

The answer to this question is : the merge join doesn’t necessarily need to scan every row from both inputs. It stops scanning as soon as:

  • it reaches the end of either input
  • or it reaches a join value from the first input that is greater than the highest join value from the second input.

When I made T2 the driving table of the merge join, Oracle declared the end of the join process as soon as it reached the 14th row of T2 (A-Rows=14). This is simply because the 14th ordered row of T2 is greater than any join value from T1 as the following proves:

-- get the minimum value of t2.n1 that is greater than the max value of t1.n1
SQL> select min(n1) from t2 where n1 > (select max(n1) from t1);

   MIN(N1)
----------
        42

This means that starting from t2.n1= 42 there will be no join possibility since 42 is greater than all join column values from the lesser input (T1). And, as such, remaining join values from T1 should be discarded according to the join algorithm(T2 being the first input of the merge join)

elsif merge value of T2 >  merge value of T1
      discard merge value of T1
      get next merge value of T1
end if;

And how many rows the merge join shoud have already processed when it reaches this merge stop point? 14 naturally:

SQL> select 
         count(1) 
     from t2 
     where 
         n1 <= (select min(n1) from t2 where n1 > (select max(n1) from t1) 
                );
 COUNT(1)
---------
       14

If we redo the same demonstration for the case where T1 will be the driving table we will obviously find that the merge join has to go through the entire number of rows in T1 table (A-Rows = 20) because there is no join value in T1 that is greater than the largest join value in T2.

SQL> select n1 from t1 where n1 > (select max(n1) from t2);
no rows selected

2. One-to-many and many-to-many merge join

In the above setup we unofficially examined a one-to-many join version of the merge join. This type of join occurs when the optimizer knows that there are no duplicate join column values in the first input data set. Although I know t1.n1 is unique I didn’t supply any such extra information like a unique index for the optimizer to opt for a one-to-many join option.This is why officially we have been using a many-to-many merge join in the preceding examples.

       2.1. MS-SQL Server
In contrast to Oracle, MS SQL server execution plan makes a clear distinction between these two types of merge join as the following shows:

create table t1(n1 int, n2 int);

insert into t1(n1,n2)
       select top 20
	     2* row_number() over (order by a.name) 
		,abs(row_number() over (order by a.name) -1) 
from sys.all_objects a cross join sys.all_objects;

create table t2(n1 int, n2 int) ;

insert into t2(n1,n2)
       select top 100
	     3* row_number() over (order by a.name) 
		,abs(row_number() over (order by a.name) -1) 
from sys.all_objects a cross join sys.all_objects;

-- many to many
select
   t1.n1 t1n1
  ,t2.n1 t2n1
from
   t1
join t2
on t1.n1 = t2.n1
option (merge join);

But if I create a unique index in T1 indicating to the optimizer the absence of duplicate rows in the join column I will obviously obtain a one-to-many merge join type as the following execution plan shows:

create unique index t1_uk on t1(n1);

-- one-to-many join
select
   t1.n1 t1n1
  ,t2.n1 t2n1
from
   t1
join t2
on t1.n1 = t2.n1
option (merge join);


In a one-to-many join, when two rows join, the optimizer outputs them, discards the join value from the second input (T2), gets the next join value from the first input (T1) and continue the merge process. The optimizer can safely discard the joined value from T2 because it knows that there will be no duplicate rows in T1 that will ever join with the T2 discarded row.

In a many-to-many join, the merge join algorithm, very probably, keeps track of the discarded T2 row somewhere in a memory structure. If the next iteration finds that the current row is duplicated it will then compare it with the saved inmemory row. If, instead, the next row from T1 reveals to be a new one, the optimizer can then safely delete the inmemory T2 saved row. This approach can be backed up by the merge join algorithm displayed above which shows that the merge process goes always forward. It never needs to step backward in the data set. In the complex many-to-many join case this “always walk down” can be ensured by looking for a previous compared join row stored in memory and probably not by stepping backward. If an extra filter is present in the query it will be replayed back to ensure that the saved joined row satisfy the filter predicate or not.

     2.2. PostgreSQL
Using PostgreSQL we can have both textual and graphical execution plan. But instead of a many-to-many or a one-to-many merge join, PostgreSQL uses a different terminology which is Inner Unique (True and False) respectively as illustrated below:
First the data model:

create table t1 (n1 int, n2 int, n3 int);
create table t2 (n1 int, n2 int, n3 int);	

with got_my_data (j)
as
 (select generate_series(1, 20)
 )
 insert into t1(n1, n2, n3)
 select 
    j*2
   ,j*5
   ,j
  from 
   got_my_data;
   
with got_my_data (j)
as
 (select generate_series(1, 100)
 )
 insert into t2(n1, n2, n3)
 select 
    j*3
   ,j*5
   ,j
  from 
   got_my_data;

Since there is no hint in PostgreSQL with which I can force a merge join operation, I will cancel the hash join possibility, run the query and get the graphical execution plan using pgAdmin4

postgres=# set enable_hashjoin=false;
SET

 explain analyze
 select
        t1.n1 t1n1
       ,t2.n1 t2n1
     from t1
        join t2
     on t1.n1 = t2.n1;


As you can see this is a many-to-many join as indicated by the Inner Unique set to false. If I create a unique index on T2, re-query and get the corresponding execution plan this is what I will observe:

postgres=# create unique index t2_uk on t2(n1);
CREATE INDEX

explain analyze verbose select
       t1.n1 t1n1
      ,t2.n1 t2n1
    from t1
       join t2
    on t1.n1 = t2.n1;

I don’t have enough experience in reading PostgreSQL execution plans but according to the actual rows generated by the second sort in the plan (rows = 14) it seems that, very probably, the query planner has used T2 table as the first input of the merge join. This is why the unique index on this table has triggered a one-to-many join while a unique index on T1 hasn’t(not show here but tested). For a one-to-many merge join to occur uniquensess is required for the join column of the first input.

2.3. Oracle
Let’s add a duplicate row in T1 and re-execute the same merge join query using Oracle database

SQL> insert into t1 values (6, -1, -1);
1 row created.

SQL> commit;

select
    /*+ use_merge(t1,t2) */
  t1.n1 t1n1
 ,t2.n1 t2n1
from t1
join t2
on t1.n1 = t2.n1;

      T1N1       T2N1
---------- ----------
         6          6
         6          6
        12         12
        18         18
        24         24
        30         30
        36         36

7 rows selected.
---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      7 |
|   1 |  MERGE JOIN         |      |      1 |     20 |      7 |
|   2 |   SORT JOIN         |      |      1 |     20 |     21 |
|   3 |    TABLE ACCESS FULL| T1   |      1 |     20 |     21 |
|*  4 |   SORT JOIN         |      |     21 |    100 |      7 |
|   5 |    TABLE ACCESS FULL| T2   |      1 |    100 |    100 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."N1"="T2"."N1")
       filter("T1"."N1"="T2"."N1")

As expected, the duplicated join value from T1 has been returned by the merge join. But there is no clue in the execution plan about whether this is a many-to-many or a a one-to-many join. Even if I delete the inserted duplicate row and create a unique index on t1.n1, I will still find nothing related to the type of merge join in the corresponding execution plan as shown below:

SQL> delete from t1 where n1=6 and n2 =-1;

SQL> create unique index t1_uk on t1(n1);

select
    /*+ use_merge(t1,t2) */
  t1.n1 t1n1
 ,t2.n1 t2n1
from t1
join t2
on t1.n1  = t2.n1;

      T1N1       T2N1
---------- ----------
         6          6
        12         12
        18         18
        24         24
        30         30
        36         36

6 rows selected.

----------------------------------------------------------------
| Id  | Operation           | Name  | Starts | E-Rows | A-Rows |
----------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |      1 |        |      6 |
|   1 |  MERGE JOIN         |       |      1 |     20 |      6 |
|   2 |   INDEX FULL SCAN   | T1_UK |      1 |     20 |     20 |
|*  3 |   SORT JOIN         |       |     20 |    100 |      6 |
|   4 |    TABLE ACCESS FULL| T2    |      1 |    100 |    100 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"="T2"."N1")
       filter("T1"."N1"="T2"."N1")

There is no clue about the type of merge join after I have made unique the join column of the first input. However, we can observe that, thanks to the INDEX FULL SCAN operation, rows from the first input are acquired pre-sorted and don’t need the usual extra SORT JOIN operation.

Finally, we are not going to finish this merge join investigation without creating a unique index on the second data set input and see what this will change in the execution plan:

SQL> create unique index t2_uk on t2(n1);

SQL> select
        /*+ use_merge(t1,t2) */
        t1.n1 t1n1
       ,t2.n1 t2n1
    from t1
    join t2
    on t1.n1 = t2.n1;

      T1N1       T2N1
---------- ----------
         6          6
        12         12
        18         18
        24         24
        30         30
        36         36

6 rows selected.
--------------------------------------------------------------
| Id  | Operation         | Name  | Starts | E-Rows | A-Rows |
--------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |      1 |        |      6 |
|   1 |  MERGE JOIN       |       |      1 |     20 |      6 |
|   2 |   INDEX FULL SCAN | T2_UK |      1 |    100 |     14 |
|*  3 |   SORT JOIN       |       |     14 |     20 |      6 |
|   4 |    INDEX FULL SCAN| T1_UK |      1 |     20 |     20 |
--------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"="T2"."N1")
       filter("T1"."N1"="T2"."N1")

If you look closely to the above execution plan you will immediately spot out two important points:

  • Oracle has inverted the join order as T2 becomes the first input and T1 the second one

The Cost Based optimizer is very clever since, by switching the join order, it can declare the end of the query much earlier by stop scanning T2 as soon as it reaches the 14th row of the first input (T2) as explained above.

  • The second point is related to the extra SORT JOIN operation at line n°3.

This operation receives an already pre-sorted data set via the INDEX FULL SCAN operation at line n°4. So why Oracle needs to apply an extra SORT on an ordered data?

In fact, as already explained in section 1 above, the SORT JOIN operation at line n°3, as its name doesn’t suggest, is responsible for applying the join condition (access and filter predicate n°3) on the right input for each row produced by the left input (14 rows in the current case). This is why, regardless of how the rows are acquired, the SORT JOIN operation is always needed to be applied on the right data set input. The same doesn’t apply for the first input data set where the SORT JOIN operation can be skipped whenever this data is retrieved already sorted.

3. Summary

In this article I tried to explain how the merge join algorithm has been implemented in modern relational database systems. I demonstrated that the merge join doesn’t necessarily need to scan every row from both inputs. It stops scanning as soon as it reaches the end of either input or it reaches a join value from the first input that is greater than the highest join value from the second input. I have outlined, using MS-SQL Sever and PostgreSQL the concept of one-to-many and many-to-many join and how a unique index on the first input data set of the join can switch from a costly many-to-many to a less aggressive one-to-many form of the join.Although I have shown it here, a merge join can work with inequality join predicate and it supports outer (MERGE JOIN OUTER), semi(MERGE JOIN SEMI) and anti (MERGE JOIN ANTI) logical join operations.

March 31, 2018

From bind sensitive to bind aware

Filed under: adaptive cursor sharing,cursor sharing — hourim @ 1:20 pm

In the book I have co-authored I wrote chapter 4 on Adaptive Cursor Sharing which you can read here free of charge. In this chapter I said that Oracle uses an algorithm to transit a cursor from bind sensitive to bind aware. I have demonstrated that this algorithm handles three different cases:

  • when only two adjacent buckets are involved
  • when only two distinct buckets are involved
  • when all three buckets are involved

In this article and in this one I have explained reliably how the two first cases are handled. But I have always failed to decipher how Oracle manages the third case. I think that this article will finally explain the nature of the algorithm used by Oracle to mark bind aware a bind sensitive cursor when all buckets are involved. Here’s a function I have implemented for that purpose:

create or replace function fv_is_bind_aware
          (pin_first_bucket in number
          ,piv_sql_id       in varchar2
          ) 
return varchar2
-- ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
-- Date      : January 2018
-- Scope     : say whether the next execution at any bucket will
--             mark the underlying cursor bind aware or not.
--             This function is valid only when the bind sensitive
--             cursor has undergone executions at three buckets 0,1 and 2
--             without  being already bind aware 
-- Parameters: 
--      1) pin_first_bucket is the bucket at which the very first execution
--             has been done
--      2) piv_sql_id is the sql_id of the underlying cursor
--
--Usage      : select fv_is_bind_aware(0, 7ck8k47bnqpnv) from dual;
--             if 'Y' then the next execution will mark cursor bind aware
--             if 'N' then the cursor will not be hard parsed during the next
--             execution
-- ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
is
  ln_nbr_of_distantexecs number  := 0;
  ln_nbr_of_totalexecs   number := 0;
begin
  for x in (select 
               bucket_id
              ,count
            from 
               sys.v_$sql_cs_histogram
            where 
               sql_id = piv_sql_id 
            order by bucket_id
            )
  loop
    ln_nbr_of_totalexecs   := ln_nbr_of_totalexecs   +  x.count;
    ln_nbr_of_distantexecs := ln_nbr_of_distantexecs 
                            + (x.count *abs(x.bucket_id - pin_first_bucket));
  end loop;
  --                      
  if ln_nbr_of_distantexecs/ln_nbr_of_totalexecs >= 0.5 
  then 
     return 'Y';
  else 
     return 'N';
  end if;
end fv_is_bind_aware;
/

Using plain english the above function can be turned to:

  • When the number of executions done at distant buckets reaches half that of total executions at all buckets then mark the cursor bind aware during the next execution at any bucket.

The only subtlety here is to figure out what a distant bucket represents in this context. When you know how ACS works internally you know the importance Oracle gives to the very first execution of the bind sensistive cursor. Since there are three buckets 0, 1 and 2, a distant bucket represents the two buckets at which the very first execution didn’t occur. So Oracle will always keep track of the first bucket at which the cursor has been executed and compute the total number of executions done at the other two buckets (the adjacent one and the distant one). And for the sake of completeness Oracle considers:

  • total number of executions = sum(v_$sql_cs_histogram.count)
  • the total number of execution done at distant(non adjacent) bucket is doubled
ln_nbr_of_distantexecs := ln_nbr_of_distantexecs 
                            + (x.count *abs(x.bucket_id – pin_first_bucket));
-- Since bucket id 0 and 2 are the two distant buckets we have
abs (0-2) = abs (2-0) = 2

All cases I have tested so far confirmed the reliability of this function. Here’s below few of those test cases (the model as always can be found here):

alter system flush shared_pool;

select count(1) from t_acs where n2 = 1; –- very fist execution at bucket 0
select count(1) from t_acs where n2 = 1;
select count(1) from t_acs where n2 = 1;
–-
select count(1) from t_acs where n2 = 1000;
select count(1) from t_acs where n2 = 1000;
–-
select count(1) from t_acs where n2 = 1e6;

SQL_ID  7ck8k47bnqpnv, child number 0
-------------------------------------
------------------------------------------------
| Id  | Operation         | Name       | Rows  |
------------------------------------------------
|   0 | SELECT STATEMENT  |            |       |
|   1 |  SORT AGGREGATE   |            |     1 |
|*  2 |   INDEX RANGE SCAN| T_ACS_IDX1 |     1 |
------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N2"=:SYS_B_1)

select
    child_number
   ,bucket_id
   ,count
 from
   gv$sql_cs_histogram
 where
    sql_id ='7ck8k47bnqpnv';

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          3 → 3 executions at bucket n°0
           0          1          2 → 2 executions at bucket n°1
           0          2          1 → 1 executions at bucket n°2

Will the next execution at any bucket mark the cursor bind aware and hard parse a new execution plan? Let’s ask the function

SQL> select fv_is_bind_aware(0, '7ck8k47bnqpnv') iba from dual;

IBA
---
Y

The function is saying that the next execution will mark the cursor bind aware. Let’s check

select count(1) from t_acs where n2 = 100;

select
    child_number
   ,bucket_id
   ,count
 from
   gv$sql_cs_histogram
 where
    sql_id ='7ck8k47bnqpnv';
CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           1          0          1  –- a new bind aware child cursor 
           1          1          0
           1          2          0
           0          0          3
           0          1          2
           0          2          1

Indeed the function has correctly expected the transition from bind sensitive to bind aware.

Do you want another example? Here you are:

alter system flush shared_pool;
-- execute this 7 times
–- first execution occurred at bucket n°2
select count(1) from t_acs where n2 = 1e6;

–- execute this 2 times
select count(1) from t_acs where n2 = 1000;

-- execute this 2 times 
select count(1) from t_acs where n2 = 1;

select
    child_number
   ,bucket_id
   ,count
 from
   gv$sql_cs_histogram
 where
    sql_id ='7ck8k47bnqpnv';

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          2
           0          1          2
           0          2          7

Since the very first execution occurred at bucket n°2 we are going to call the function as follows:


select fv_is_bind_aware(2, '7ck8k47bnqpnv') iba from dual;

IBA
---
Y

select count(1) from t_acs where n2 = 1e4;

select
    child_number
   ,bucket_id
   ,count
 from
   gv$sql_cs_histogram
 where
    sql_id ='7ck8k47bnqpnv';

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           1          0          0
           1          1          1
           1          2          0
           0          0          2
           0          1          2
           0          2          7

Spot how the function has again correctly expected the transition.

Here’s another example where the very first execution has been done at bucket id n°1


alter system flush shared_pool;
-- run this 7 times
select count(1) from t_acs where n2 = 1000;

-- run this 2 time
select count(1) from t_acs where n2 = 1e6;

-- run this 2 times
select count(1) from t_acs where n2 = 100;

select
    child_number
   ,bucket_id
   ,count
 from
   gv$sql_cs_histogram
 where
    sql_id ='7ck8k47bnqpnv';

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          2
           0          1          7
           0          2          2
-- check the transition using the function
select fv_is_bind_aware(1, '7ck8k47bnqpnv') iba from dual;

IBA
---
N

According to the function the next execution will not mark the cursor bind aware:

select count(1) from t_acs where n2 = 1e4;

select
    child_number
   ,bucket_id
   ,count
 from
   gv$sql_cs_histogram
 where
    sql_id ='7ck8k47bnqpnv';
CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          2
           0          1          8
           0          2          2

That’s again a correct expectation.

Let me emphasize here the importance of the first parameter of the function fv_is_bind_aware which represents the bucket id at which the very first execution of the cursor has been done under. Should I have used the wrong bucket id (0 and 2) the function would have done a wrong expectation as shown below(test have been done before the 8th execution above):

select fv_is_bind_aware(0, '7ck8k47bnqpnv') iba from dual;

IBA
---
Y

select fv_is_bind_aware(2, '7ck8k47bnqpnv') iba from dual;

IBA
---
Y

Now will the next execution mark the cursor bind aware?

select fv_is_bind_aware(1, '7ck8k47bnqpnv') iba from dual;

IBA
---
N

select count(1) from t_acs where n2 = 100;

select
    child_number
   ,bucket_id
   ,count
 from
   gv$sql_cs_histogram
 where
    sql_id ='7ck8k47bnqpnv';

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          3
           0          1          8
           0          2          2

And the next execution?

select fv_is_bind_aware(1, '7ck8k47bnqpnv') iba from dual;

IBA
---
N

select count(1) from t_acs where n2 = 1e6;

select
    child_number
   ,bucket_id
   ,count
 from
   gv$sql_cs_histogram
 where
    sql_id ='7ck8k47bnqpnv';

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           0          0          3
           0          1          8
           0          2          3

And so on until the function says it time for the cursor to switch

select fv_is_bind_aware(1, '7ck8k47bnqpnv') iba from dual;

IBA
---
N

select count(1) from t_acs where n2 = 100;
select count(1) from t_acs where n2 = 1e6;

select fv_is_bind_aware(1, '7ck8k47bnqpnv') iba from dual;

IBA
---
Y

select count(1) from t_acs where n2 = 1;

select
    child_number
   ,bucket_id
   ,count
 from
   gv$sql_cs_histogram
 where
    sql_id ='7ck8k47bnqpnv';

CHILD_NUMBER  BUCKET_ID      COUNT
------------ ---------- ----------
           1          0          1
           1          1          0
           1          2          0
           0          0          4
           0          1          8
           0          2          4

Summary

Adaptive Cursor Sharing in real life systems is known to cause performance pain rather than to solve the conflictual issue of sharing and optimizing it has been created for. But it is always nice to know how it works. Particularly to understand the transition of the cursor from bind sensistive to bind aware.

March 20, 2018

DDL optimisation is not working in 12cR2: really?

Filed under: Oracle — hourim @ 7:32 pm

If you want to know what DDL optimisation is then you can read this article I wrote a couple of years ago.

As the title already suggests, I have been surprised during the preparation of a database upgrade script from 11gR2 to 12cR2 to see the following alter table taking 5 hours instead of what should have been an instantaneous operation:

SQL> select banner from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production


SQL> alter table t1 add
  	(col1 number  default 0 null
       ,col2 number default 0 null
       ,col3 number default 0 null
       ,col4 number default 0 null
       ,col5 number default 0 null
       ,col6 number default 0 null
       ,col7 number default 0 null
      )
      /
Table altered.

Elapsed: 04:51:24.48

Oracle should have used the DDL optimisation technique which has been extended to nullable columns in 12cR1.

So what the heck is this? From where am I going to start troubleshooting this issue?

The first thing that came to my mind was to check the value of the hidden parameter, _add_col_optim_enabled, driven this feature. It might be possible that the DBA has un-set its default value during the database upgrade:

SQL> select
         n.ksppinm
        ,c.ksppstvl
        ,n.ksppdesc
       from
        sys.x$ksppi n
        ,sys.x$ksppcv c
       where n.indx=c.indx
       and n.ksppinm = '_add_col_optim_enabled';

KSPPINM                KSPPSTVL   KSPPDESC
---------------------- ---------- -----------------------------------
_add_col_optim_enabled TRUE       Allows new add column optimization

The DDL optimisation parameter is correctly set. If this parameter has been changed to FALSE the DDL optimisation feature would have, naturally, been cancelled as the following demonstrates:

create table t as select rownum n1 from dual connect by level <=10;

-- the following alter table will use the DDL optimisation
-- as the predicate of the execution plan confirms

alter table t add col1 number default 0 null;

select count(1) from t where col1=42;

select * from table(dbms_xplan.display_cursor);					   

---------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |
---------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |
|*  2 |   TABLE ACCESS FULL| T    |     1 |     3 |
---------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00002$",0)),NULL
             ,NVL("COL1",0),'0',NVL("COL1",0),'1',"COL1")=42)

-- change the hidden parameter driving the DDL optimisation
alter session set "_add_col_optim_enabled"=false;

-- and add a new equivalent column
alter table t add col2 number default 0 null;

select count(1) from t where col2=42;

select * from table(dbms_xplan.display_cursor);	

---------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |
---------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |
|*  2 |   TABLE ACCESS FULL| T    |     1 |     3 |
---------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 – filter("COL2"=42) -- DDL optimisation didn't occur.

After having ruled out the hidden parameter hypothesis, I embarked on a bibliography research checking the Oracle official documentation to see whether the DDL optimisation has actually been withdrawn in 12cR2 for nullable added columns having a default value or not. It is useless to say that this effort has been pointless.

Nowadays when all else fails Twitter might be there for a quick help:

Unfortunately not everyone is lucky. No answer 🙂

There was, nevertheless, something in the overnight SQL batch log that should have given me a hint about what was happeing under the hood:

SQL> alter table t1 add (colX numeric default 0 not null);

Table altered.

Elapsed: 00:500:00.06

While adding a nullable column with default value was taking an eternity, adding a not null column with default value to the same table was instantaneous. This database is reacting exactly as if I were running under an 11gR2 version. But I’ve completely neglected this option and continued exploring other possibilities until I decided to close my laptop and leave for the week end.

On Monday of the next week I shared this issue with one of my friends who suggested me to check the compatible parameter. No sooner said than done:

SQL> show parameter compatible

NAME                                 TYPE        VALUE
------------------------------------ ----------- ----------
compatible                           string      11.2.0.4.0

Now that I have this value in front of my eyes, bringing together the pieces of the puzzle becomes easy. I provisionned a new fresh copy of Production database, upgraded it to 12.2.0.1.0, checked the compatible parameter:

SQL> show parameter compatible

NAME                                 TYPE        VALUE
------------------------------------ ----------- --------
compatible                           string      12.2.0.1

and launched again the same alter table which obviously completed in less a second:

SQL> alter table t1 add
  	(col1 number  default 0 null
       ,col2 number default 0 null
       ,col3 number default 0 null
       ,col4 number default 0 null
       ,col5 number default 0 null
       ,col6 number default 0 null
       ,col7 number default 0 null
      )
      /
Table altered.

Elapsed: 00:00:00.10

Bottom Line

It doesn’t matter what Oracle version you are running, the only features that you will be allowed to use are those enabled by the Oracle version you will set in the compatible parameter. Thus don’t be surprised to see new features not kicking in if you set the compatible mode to a prior release value where these intended features were not implemented yet.

March 17, 2018

Library cache lock

Filed under: Oracle — hourim @ 7:55 am

This is a very simple note, as much for my own documentation as anything else, showing how I have extremely rapidly identified what was causing a database application to wait on a library cache lock wait event.

The ASH of my customer case was showing the following list of predominant wait events:

select event, count(1)
from gv$active_session_history
where
    sample_time between to_date('09032018 00:46:00', 'ddmmyyyy hh24:mi:ss')
                and     to_date('09032018 10:44:00', 'ddmmyyyy hh24:mi:ss')
group by event
order by 2 desc;

EVENT                                  COUNT(1)
------------------------------------ ----------
library cache lock                       350174
library cache: mutex X                    54474
cursor: pin S wait on X                   10896
                                           2964

Naturally I wanted to know what sql_id is responsible of these library cache wait events first via ASH

select sql_id, count(1)
from gv$active_session_history
where
    sample_time between to_date('09032018 00:46:00', 'ddmmyyyy hh24:mi:ss')
                and     to_date('09032018 10:44:00', 'ddmmyyyy hh24:mi:ss')
and event in 
 ('library cache lock','library cache: mutex X','cursor: pin S wait on X')
group by sql_id
order by 2 desc;

SQL_ID            COUNT(1)
----------------- --------
6tcs65pchhp71       147902
9mqzzppbpa64j         7373
                         3	

And then via classical dbms_xplan to get the corresponding execution plan and v$sql to get the SQL text respectively:

SQL> select * from table(dbms_xplan.display_cursor('6tcs65pchhp71',null));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------
SQL_ID: 6tcs65pchhp71 cannot be found
SQL> select sql_fulltext, executions, end_of_fetch_count
    from gv$sql
    where sql_id = '6tcs65pchhp71';

no rows selected

Let’s summarize: there is a sql_id which is responsible of a dramatic library cache wait event that I am monitoring at real time basis and which

  • has no execution plan in memory
  • and is not present in gv$sql

The above two points manifestly are symptoms of a sql query which hasn’t gone beyond the parse phase. In other words it might be a query which Oracle is not able to soft parse and thereby it has never reached the hard parse phase nor the execution step. Hopefully ASH can clearly show this:

select 
     in_parse
    ,in_hard_parse
    ,in_sql_execution
    ,count(1)
from gv$active_session_history
where
    sample_time between to_date('09032018 00:46:00', 'ddmmyyyy hh24:mi:ss')
                and     to_date('09032018 10:44:00', 'ddmmyyyy hh24:mi:ss')
and
   sql_id = '6tcs65pchhp71'
group by 
     in_parse
    ,in_hard_parse
    ,in_sql_execution
order by 4 desc;

I I I   COUNT(1)
- - - ----------
Y N N     162758
Y Y N        385

Indeed this query is almost always in the “parse” phase. So the initial question of what is causing this dramatic library cache lock turned to be : why this query is sticking at the parse phase?

I don’t know why I decided to look at the dba_hist_sqltext but it made my day:

SQL> select sql_text from dba_hist_sqltext where sql_id = '6tcs65pchhp71';

SQL_TEXT
--------------------------------------------------------------------------------
select col1, col2,...,coln from t1;

I was again lucky that this query has no where clause and no bind variable value to use so that I tried to run it as is and here what I found:

select col1, col2,...,coln 
from t1;
     *
ERROR at line 15:
ORA-00600 : internal error code, arguments: [qksvcGetGuardCol:2], [50574], [0], [],
[],[],[],[],[],[],[],[],[],

The multi-user concurrent system was repeating the same query several times and since this query was always failing during the parse phase we observed this library cache lock at the top of the wait events.

March 11, 2018

Which execution plan will be used?

Filed under: Sql Plan Managment — hourim @ 4:08 pm

Following a question raised by Franck Pachot on Twitter, I was surprised that 43% of the answers were false. The question was terribly simple : an SQL Plan Baseline contains two execution plans:

  • Plan 1: is enabled but not accepted
  • Plan 2: is not enabled but accepted

If the Cost Based Optimiser comes up with Plan 1, which execution plan will the query finally be allowed to use?

There are very simple rules to know when it comes to the interaction between the CBO and the presence of SPM baselined plans:

  • Not enabled SPM plans will not be used regardless of their acceptance status
  • Enabled SPM plans will not be used if they are not accepted

From Franck’s question we understand that SPM Plan 2 is not enabled; so we will rule it out from the correct answer possibilities. We can see as well that SPM Plan 1 is not accepted which makes it unusable by the CBO. Since both SPM execution plans are not usable because of their enabled or accepted properties the CBO will use the plan it will come up with : Plan 1 in this case.

Here below the demonstration (the model can be found here):

alter session set cursor_sharing= force;
alter session set optimizer_capture_sql_plan_baselines=true;

select count(1) from t_acs where n2 = 1e6;
select count(1) from t_acs where n2 = 1e6;

alter session set optimizer_capture_sql_plan_baselines=false;

select  sql_handle, plan_name, enabled, accepted
from dba_sql_plan_baselines
where signature = '7542536418437113777';

SQL_HANDLE             PLAN_NAME                      ENA ACC
---------------------- ------------------------------ --- ---
SQL_68ac78e9166427b1   SQL_PLAN_6jb3sx4b689xj3069e6f9 YES YES

DECLARE
   l_p  PLS_INTEGER;
BEGIN
  l_p := DBMS_SPM.alter_sql_plan_baseline(
            sql_handle      => 'SQL_68ac78e9166427b1',
            plan_name       => 'SQL_PLAN_6jb3sx4b689xj3069e6f9',
            attribute_name  => 'enabled',
            attribute_value => 'NO');
END;
/

select  sql_handle, plan_name, enabled, accepted
from dba_sql_plan_baselines
where signature = '7542536418437113777';

SQL_HANDLE                     PLAN_NAME                      ENA ACC
------------------------------ ------------------------------ --- ---
SQL_68ac78e9166427b1           SQL_PLAN_6jb3sx4b689xj3069e6f9 NO  YES

We have our Plan 2: Accepted but not Enabled.

Next I will generate Plan 1: Enabled but not Accepted

select count(1) from t_acs where n2 = 1;

SQL> select count(1) from t_acs where n2 = 1;

  COUNT(1)
----------
         1

SQL> select * from table(dbms_xplan.display_cursor)


SQL_ID  7ck8k47bnqpnv, child number 1

An uncaught error happened in prepare_sql_statement : ORA-01403: no data found

NOTE: cannot fetch plan for SQL_ID: 7ck8k47bnqpnv, CHILD_NUMBER: 1
      Please verify value of SQL_ID and CHILD_NUMBER;
      It could also be that the plan is no longer in cursor cache (check v$sql_plan)

Whenever I see this error I know that it has been raised because the CBO comes up with an execution plan that doesn’t match one of the SPM baselined plans:

select  sql_handle, plan_name, enabled, accepted
from dba_sql_plan_baselines
where signature = '7542536418437113777';

SQL_HANDLE                     PLAN_NAME                      ENA ACC
------------------------------ ------------------------------ --- ---
SQL_68ac78e9166427b1           SQL_PLAN_6jb3sx4b689xj3069e6f9 NO  YES
SQL_68ac78e9166427b1           SQL_PLAN_6jb3sx4b689xjf5f8c88e YES NO

Here we go: we have an SPM baseline with two execution plans exactly as asked by Franck

select * from table(dbms_xplan.display_sql_plan_baseline(sql_handle => 'SQL_68ac78e9166427b1'));
 
--------------------------------------------------------------------------------
SQL handle: SQL_68ac78e9166427b1
SQL text: select count(:"SYS_B_0") from t_acs where n2 = :"SYS_B_1"
--------------------------------------------------------------------------------
 
--------------------------------------------------------------------------------
Plan name: SQL_PLAN_6jb3sx4b689xjf5f8c88e         Plan id: 4126722190
Enabled: YES     Fixed: NO      Accepted: NO      Origin: AUTO-CAPTURE
Plan rows: From dictionary
--------------------------------------------------------------------------------
 
Plan hash value: 1687207741
--------------------------------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |     1 |     3 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |            |     1 |     3 |            |          |
|*  2 |   INDEX RANGE SCAN| T_ACS_IDX1 |     1 |     3 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
 
   2 - access("N2"=:SYS_B_1)
 
--------------------------------------------------------------------------------
Plan name: SQL_PLAN_6jb3sx4b689xj3069e6f9         Plan id: 812246777
Enabled: NO      Fixed: NO      Accepted: YES     Origin: AUTO-CAPTURE
Plan rows: From dictionary
--------------------------------------------------------------------------------
 
Plan hash value: 535703726
----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |       |     1 |     3 |   628   (3)| 00:00:01 |
|   1 |  SORT AGGREGATE    |       |     1 |     3 |            |          |
|*  2 |   TABLE ACCESS FULL| T_ACS |  1099K|  3219K|   628   (3)| 00:00:01 |
----------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("N2"=:SYS_B_1)

Next I will execute the above query with a bind variable value that will favor the index range scan plan which is present in the SPM baseline, is enabled but not accepted:

SQL> @53on
alter session set events '10053 trace name context forever, level 1';
alter session set "_optimizer_trace"=all;

select count(1) from t_acs where n2 = 100;

SQL_ID  7ck8k47bnqpnv, child number 1

Plan hash value: 1687207741
--------------------------------------------------------
| Id  | Operation         | Name       | Rows  | Bytes |
--------------------------------------------------------
|   0 | SELECT STATEMENT  |            |       |       |
|   1 |  SORT AGGREGATE   |            |     1 |     3 |
|*  2 |   INDEX RANGE SCAN| T_ACS_IDX1 |   100 |   300 |
--------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N2"=:SYS_B_1)

SQL> @53off
alter session set events '10053 trace name context off';

The absence of the Note at the bottom of the above execution plan indicates that the CBO didn’t use any execution plan from the SPM baseline but it uses rather the execution plan it comes up with during the compilation phase. In the corresponding 10053 trace file we can read the following lines:

SPM: statement found in SMB

SPM: finding a match for the generated plan, planId = 4126722190
SPM: plan baseline non-existant or empty (no accepted plans) so using cost-based plan, planId = 4126722190

The developer at Oracle could not be clearer than this : non-existant or empty (no accepted plans) so using cost-based plan

March 5, 2018

SEMI JOIN in modern relational databases

Filed under: Oracle — hourim @ 8:17 pm

A semi join returns rows only from the first table provided at least one matching row is found in the second table. The difference between a conventional join and a semi join is that rows from the first table are returned once at most even when the join column of the second table contains duplicate rows. There is no ANSI SQL way to invoke a semi join. The following syntax doesn’t exist:

  SQL>  select 
          t1.* 
        from t1 
          semi join t2 
        on t1.n1= t2.n1

In almost all modern relational databases, the semi join logical operation is triggered via the predicate IN, ANY, SOME, EXISTS and DISTINCT. MS-SQL Server has an extra particularity which makes it using a semi join during an INTERSECT operation as we will see later in this article.

The following queries illustrate these diferent semi-join cases in a 12.2.0.1 Oracle release:

SQL> create table t1
     as
        select
            rownum n1,
            mod(rownum,5) n2,
            trunc( (rownum - 1 / 3) ) n3
        from
            dual
        connect by
            level <= 1e4; SQL> create table t2
     as
        select
            rownum n1,
            mod(rownum,3) n2,
            trunc( (rownum - 1 / 5) ) n3
        from
            dual
        connect by
        level <= 5;

1. Using IN, ANY and SOME

SQL> select 
       t1.n1
     from
       t1
     where
       t1.n1 in (select t2.n1 from t2);

SQL> select 
       t1.n1
     from
       t1
     where
      t1.n1 = any (select t2.n1 from t2);

SQL> select 
       t1.n1
     from
       t1
     where
    t1.n1 = some (select t2.n1 from t2);
   
Plan hash value: 1275841967
----------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |
----------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      5 |
|*  1 |  HASH JOIN RIGHT SEMI|      |      1 |      5 |      5 |
|   2 |   TABLE ACCESS FULL  | T2   |      1 |      5 |      5 |
|   3 |   TABLE ACCESS FULL  | T1   |      1 |  10000 |  10000 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."N1"="T2"."N1")

2. Using EXISTS

select 
   t1.n1
 from
   t1
 where
   exists (select null from t2 where t2.n1 = t1.n1);

Plan hash value: 1275841967
----------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |
----------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      5 |
|*  1 |  HASH JOIN RIGHT SEMI|      |      1 |      5 |      5 |
|   2 |   TABLE ACCESS FULL  | T2   |      1 |      5 |      5 |
|   3 |   TABLE ACCESS FULL  | T1   |      1 |  10000 |  10000 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N1"="T1"."N1")

3. Using DISTINCT in INNER join and the effect of the 12cR1 Partial join

select 
  distinct
   t1.n1
 from
   t1
 inner join t2
 on (t1.n1 = t2.n1);

Plan hash value: 3703458891
---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      5 |
|   1 |  HASH UNIQUE        |      |      1 |      5 |      5 |
|*  2 |   HASH JOIN SEMI    |      |      1 |      5 |      5 |
|   3 |    TABLE ACCESS FULL| T2   |      1 |      5 |      5 |
|   4 |    TABLE ACCESS FULL| T1   |      1 |  10000 |    256 | –- only 256 rows
---------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$58A6D7F6")
      MERGE(@"SEL$1" >"SEL$2")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SEL$1")
      FULL(@"SEL$58A6D7F6" "T2"@"SEL$1")
      FULL(@"SEL$58A6D7F6" "T1"@"SEL$1")
      LEADING(@"SEL$58A6D7F6" "T2"@"SEL$1" "T1"@"SEL$1")
      USE_HASH(@"SEL$58A6D7F6" "T1"@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$58A6D7F6")
      PARTIAL_JOIN(@"SEL$58A6D7F6" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 – access("T1"."N1"="T2"."N1")

The apparition of the semi join in the above execution plan is the result of the partial join transformation introduced by Oracle as from 12cR1. Earlier versions were not able to use this partial join transformation scanning, as such, the entire content of t1 table is scanned instead of tiny 256 rows as shown below:

select 
   /*+ NO_PARTIAL_JOIN(@"SEL$58A6D7F6" "T1"@"SEL$1") */
  distinct
   t1.n1
 from
   t1
 inner join t2
 on (t1.n1 = t2.n1);

Plan hash value: 975306333

---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      5 |
|   1 |  HASH UNIQUE        |      |      1 |      5 |      5 |
|*  2 |   HASH JOIN         |      |      1 |      5 |      5 |
|   3 |    TABLE ACCESS FULL| T2   |      1 |      5 |      5 |
|   4 |    TABLE ACCESS FULL| T1   |      1 |  10000 |  10000 | -- 10,000 rows
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")

So far all the semi join logical operations have been executed via a HASH JOIN physical operation. This doesn’t mean that a semi joins can’t be used in a NESTED LOOPS physical operation. Here’s below the demonstratation:

SQL> create index t2_idx on t2(n1);
SQL> select 
       t1.n1
     from
       t1
     where
      t1.n1 in (select t2.n1 from t2);

----------------------------------------------------------------
| Id  | Operation          | Name   | Starts | E-Rows | A-Rows |
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |      1 |        |      5 |
|   1 |  NESTED LOOPS SEMI |        |      1 |      5 |      5 |
|   2 |   TABLE ACCESS FULL| T1     |      1 |  10000 |  10000 |
|*  3 |   INDEX RANGE SCAN | T2_IDX |  10000 |      1 |      5 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"="T2"."N1")

4. SEMI join and uniqueness
I started this article by specifying that a semi join guaranties at most one row will be returned (per join column) from the left table even if there are duplicate rows in the join column of the second table. An example being worth a thousand words so here we go:

SQL> -- insert first a duplicate row in the rigth table
SQL> insert into t2 values(5,42,42);
SQL> commit;

-- first we will start using a conventional join
SQL> select 
   t1.n1
 from
   t1
 inner join t2
 on (t1.n1 = t2.n1);
        N1
----------
         1
         2
         3
         4
         5
         5
6 rows selected

----------------------------------------------------------------
| Id  | Operation          | Name   | Starts | E-Rows | A-Rows |
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |      1 |        |      6 |
|   1 |  NESTED LOOPS      |        |      1 |      5 |      6 |
|   2 |   TABLE ACCESS FULL| T1     |      1 |  10000 |  10000 |
|*  3 |   INDEX RANGE SCAN | T2_IDX |  10000 |      1 |      6 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 – access("T1"."N1"="T2"."N1")

-- and now a semi join
SQL> select 
       t1.n1
     from
       t1
     where
      t1.n1 in (select t2.n1 from t2);

        N1
----------
         1
         2
         3
         4
         5

5 rows selected
----------------------------------------------------------------
| Id  | Operation          | Name   | Starts | E-Rows | A-Rows |
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |      1 |        |      5 |
|   1 |  NESTED LOOPS SEMI |        |      1 |      5 |      5 |
|   2 |   TABLE ACCESS FULL| T1     |      1 |  10000 |  10000 |
|*  3 |   INDEX RANGE SCAN | T2_IDX |  10000 |      1 |      5 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"="T2"."N1")

It is for this reason that the semi join has been implemented: do not duplicate rows from the outer table when they multiple join with rows from the inner table.

Since now we know the semi join main goal we can infer that, as long as, Oracle is sure that the right table has no duplicate values in the join column, it will fall back to a simple inner join. Let’s check:

SQL> drop index t2_idx;
SQL> create unique index t2_uq_idx on t2(n1);
SQL> select
       t1.n1
     from
       t1
     where
       exists (select null from t2 where t2.n1 = t1.n1);

        N1
     ------
         1
         2
         3
         4
         5
-------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |      5 |
|   1 |  NESTED LOOPS      |           |      1 |      5 |      5 |
|   2 |   TABLE ACCESS FULL| T1        |      1 |  10000 |  10000 |
|*  3 |   INDEX UNIQUE SCAN| T2_UQ_IDX |  10000 |      1 |      5 |
-------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."N1"="T1"."N1")

Notice that we have no semi join anymore. Thanks to the unique index on t2(n1) table, Oracle knows that there are no duplicate rows in the join column of the outer table. It can then reliably use an inner join because rows from the outer table will at most join exactly once with rows from the inner table.

There exists another way to make Oracle falling to an inner join instead of a semi join even when there are duplicate rows in the join column of the outer table. In fact whenever Oracle uses an operation that eliminates duplicate rows from the outer table (for example sort unique or group by) it can confidently use an inner join in place of a semi join. Here’s below one case demontrated:

SQL> drop index t2_uq_idx;
SQL> insert into t2 values(5,42,42);

SQL> create unique index t1_uq_idx on t1(n1);

SQL> select
       t1.n1
     from
       t1
     where
       exists (select null from t2 where t2.n1 = t1.n1);

        N1
     ------
         1
         2
         3
         4
         5
--------------------------------------------------------------------
| Id  | Operation           | Name      | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT    |           |      1 |        |      5 |
|   1 |  NESTED LOOPS       |           |      1 |      5 |      5 |
|   2 |   SORT UNIQUE       |           |      1 |      5 |      5 |
|   3 |    TABLE ACCESS FULL| T2        |      1 |      5 |      6 |
|*  4 |   INDEX UNIQUE SCAN | T1_UQ_IDX |      5 |      1 |      5 | -- 5 rows
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."N1"="T1"."N1")

As you can see Oracle didn’t use a semi join because it has scanned all rows from the outer table, ordered them and eliminated duplicate rows via the SORT UNIQUE operation. As such when joining with the t1 table Oracle knows that it can join t2 rows at most exactly once. Notice in passing that while the semi join has full scanned the t1 table (10,000 rows) the inner join touched only 5 distinct index keys.

5. SEMI join and foreign key integrity

Suppose now that we want to select rows from a child table matching their parent rows in the parent table.

–- drop and recreate table t1 and t2
SQL> create index t1_idx on t1(n1);
SQL> alter table t1 add constraint t1_uq unique (n1);
SQL> alter table t2 add constraint t2_t1_fk foreign key (n1) references t1(n1);

The setup is now done. But let’s disable first the foreign key, the unique constraint and try a semi join query:

SQL> alter table t1 disable constraint t1_uq;
SQL> alter table t2 disable constraint t2_t1_fk;

SQL> select
        t2.n1
     from
        t2
     where
       exists (select null from t1 where t1.n1 = t2.n1);

-----------------------------------------------------------------------
|   Id  | Operation               | Name   | Starts | E-Rows | A-Rows |
-----------------------------------------------------------------------
|     0 | SELECT STATEMENT        |        |      1 |        |      5 |
|- *  1 |  HASH JOIN SEMI         |        |      1 |      5 |      5 |
|     2 |   NESTED LOOPS SEMI     |        |      1 |      5 |      5 |
|-    3 |    STATISTICS COLLECTOR |        |      1 |        |      5 |
|     4 |     TABLE ACCESS FULL   | T2     |      1 |      5 |      5 |
|  *  5 |    INDEX RANGE SCAN     | T1_IDX |      5 |  10000 |      5 |
|-    6 |   INDEX FAST FULL SCAN  | T1_IDX |      0 |  10000 |      0 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."N1"="T2"."N1")
   5 - access("T1"."N1"="T2"."N1")

Note
-----
   - this is an adaptive plan (rows marked '-' are inactive)

As expected a classical semi join operation.

Let’s now enable the unique key of the parent table t1;

SQL> alter table t1 enable constraint t1_uq;

SQL> select
        t2.n1
     from
        t2
     where
       exists (select null from t1 where t1.n1 = t2.n1);

----------------------------------------------------------------
| Id  | Operation          | Name   | Starts | E-Rows | A-Rows |
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |      1 |        |      5 |
|   1 |  NESTED LOOPS      |        |      1 |      5 |      5 |
|   2 |   TABLE ACCESS FULL| T2     |      1 |      5 |      5 |
|*  3 |   INDEX RANGE SCAN | T1_IDX |      5 |      1 |      5 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"="T2"."N1")

The semi join has gone because Oracle knows that there is a unique constraint on the join column from the outer table t1, it can then safely use an inner join.

Finally let’s enable the foreign key and re-execute the semi join query:

SQL> alter table t2 enable constraint t2_t1_fk;

SQL> select
        t2.n1
     from
        t2
     where
       exists (select null from t1 where t1.n1 = t2.n1);

-------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |
-------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |      5 |
|*  1 |  TABLE ACCESS FULL| T2   |      1 |      5 |      5 |
-------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("T2"."N1" IS NOT NULL)

Oracle can use a trusted foreign key relationship to remove a logical semi-join and accesses only the t2 table.

5. SEMI join and INTERSECT

In SQL Server an intersect set operation can also be done via a semi-join as the following proves:

create table t1(n1 int, n2 int,n3 int) ;
create table t2(n1 int, n2 int, n3 int) ;

insert into t1(n1,n2,n3)
select top 10000
  row_number() over (order by a.name)
 ,row_number() over (order by a.name)%5
 ,(row_number() over (order by a.name) -1) /3
from sys.all_objects a cross join sys.all_objects;

insert into t2(n1,n2,n3)
select top 5
  row_number() over (order by a.name )
 ,row_number() over (order by a.name)%3
 ,(row_number() over (order by a.name)-1)/5
from sys.all_objects a cross join sys.all_objects;

select t2.n1 from t2 intersect select t1.n1 from t1;


But neither Oracle nor PostgreSQL use a semi join when evaluating an intersect set operation as the followings prove respectively:

–- Oracle 12cR2
SQL> select t2.n1 from t2 intersect select t1.n1 from t1;

---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      5 |
|   1 |  INTERSECTION       |      |      1 |        |      5 |
|   2 |   SORT UNIQUE       |      |      1 |      6 |      6 |
|   3 |    TABLE ACCESS FULL| T2   |      1 |      6 |      6 |
|   4 |   SORT UNIQUE       |      |      1 |  10000 |  10000 |
|   5 |    TABLE ACCESS FULL| T1   |      1 |  10000 |  10000 |
---------------------------------------------------------------
-- PostgreSQL 10.1
create table t1 as
  select 
     generate_series(1, 1e4) n1
	,generate_series(1, 1e4)%5 n2
	,(generate_series(1, 1e4)-1)/3 n3;
	
create table t2 as
  select 
     generate_series(1, 5) n1
	,generate_series(1, 5)%3 n2
	,(generate_series(1, 5)-1)/5 n3;
	
select t2.n1 from t2 intersect select t1.n1 from t1;


postgres=# explain analyze select t2.n1 from t2 intersect select t1.n1 from t1;
                                                           QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 HashSetOp Intersect  (cost=0.00..347.00 rows=200 width=36) (actual time=13.186..13.189 rows=5 loops=1)
   ->  Append  (cost=0.00..316.90 rows=12040 width=36) (actual time=0.363..9.571 rows=10005 loops=1)
         ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..55.90 rows=2040 width=36) (actual time=0.362..0.369 rows=5 loops=1)
               ->  Seq Scan on t2  (cost=0.00..30.40 rows=2040 width=4) (actual time=0.050..0.053 rows=5 loops=1)
         ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..261.00 rows=10000 width=9) (actual time=0.031..7.743 rows=10000 loops=1)
               ->  Seq Scan on t1  (cost=0.00..161.00 rows=10000 width=5) (actual time=0.029..4.478 rows=10000 loops=1)
 Planning time: 0.555 ms
 Execution time: 13.911 ms

6. SUMMARY

We have seen in this article how a semi join logical operation is triggered when using IN, ANY, SOME, EXISTS and DISTINCT in several modern relational databases. We’ve outlined as well that, in contrast to Oracle and PostgreSQL, SQL-Server can use a semi join during an INTERSECT set operation. We’ve demonstrated that while a regular joins can duplicate rows, semi joins do not produce any duplicates. They are defined to return (or not) rows only from one table of the join. We’ve also shown that, as long as the CBO is sure that the right table has no duplicate values in the join column, it will fall back to a simple inner join where extra transformations become possible and where the left table might not be fully scanned.

Next Page »

Create a free website or blog at WordPress.com.

Tony's Oracle Tips

Tony Hasler's light hearted approach to learning about Oracle

Richard Foote's Oracle Blog

Focusing Specifically On Oracle Indexes, Database Administration and Some Great Music

Hatem Mahmoud Oracle's blog

Just another Oracle blog : Database topics and techniques

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Diagnostician

Performance troubleshooting as exact science

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

EU Careers info

Your career in the European Union

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)