May | 2018 | Mohamed Houri’s Oracle Notes

May 25, 2018

Indexing Intelligibly

Filed under: Index — hourim @ 2:10 pm

Here’s again a simple but very practical tuning case taken from a real life running system. It started by a client complaint about a product inserting process that started to take a considerable amount of time (60 seconds per product). According to the client, it was not performing so badly a couple of days ago. And, this is sufficiently rare to be worthy of a note, the client supplied the sql_id of the query he found to be the root cause of this delay. In the next section I will explain, step by step, what I did to make my client very happy.

Get SQL monitoring report of the sql_id

And to make my task easy the sql_id was still in v$sql_monitor so that I immediately got its corresponding report (RTSM) shown below:

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)                
 Instance ID         :  1                    
 Session             :  xxxx (15:32901) 
 SQL ID              :  7bm6m1r2xsj5f        
 SQL Execution ID    :  16777216             
 Execution Started   :  04/05/2018 14:08:32  
 First Refresh Time  :  04/05/2018 14:08:38
 Last Refresh Time   :  04/05/2018 14:09:13
 Duration            :  41s                  
 Module/Action       :  JDBC Thin Client/-           
 Service             :  xxxxx            
 Program             :  JDBC Thin Client/-

Global Stats
=================================================
| Elapsed |   Cpu   |  Other   | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  |
=================================================
|    42   |    5.72 |     36   |    1  |  1M    |
=================================================

SQL Plan Monitoring Details (Plan Hash Value=4071256796)
========================================================================
| Id |               Operation |   Name   |  Rows   | Execs |   Rows   |
|    |                         |          | (Estim) |       | (Actual) |
========================================================================
|  0 | SELECT STATEMENT        |          |         |     1 |        1 |
|  1 |   NESTED LOOPS OUTER    |          |       1 |     1 |        1 |
|  2 |    NESTED LOOPS OUTER   |          |       1 |     1 |        1 |
|  3 |     NESTED LOOPS OUTER  |          |       1 |     1 |        1 |
.../...
| 12 |      TABLE ACCESS FULL  | T1       |       1 |    42 |        1 |
.../...
| 33 |   INDEX RANGE SCAN      | IDX-XXX  |       1 |     2 |          |
========================================================================

I didn’t have to search a lot. The query completes in 42 seconds of which 42 are due to the TABLE ACCESS FULL operation at line n°12.

But as interesting as this RTSM report is, it has a key limitation. It doesn’t report the predicate part. And as always, the predicate information is of a crucial importance to check what filters are applied on the above full table scan.

select * from table(dbms_xplan.display_cursor('7bm6m1r2xsj5f'));

Predicate Information (identified by operation id):
---------------------------------------------------
   12 – filter("XXX”.”COL1" = :3 AND "XXX”.”END_DATE" >=:5
               AND "XXX”.”START_DATE" <=:4 )

You don’t like speculation? Neither do I. This is why I immediately checked the following points:

There is no index starting by COL1 in table T1
And I executed the following query to check the pertinence of indexing COL1 column

SQL> select /*+ parallel(4) */ 
         col1, count(1)
     from t1
     group by col1
     order by 2 desc;

      COL1   COUNT(1)
---------- ----------
             60594499
    WTCLZ5         49
     LCOXS         47
    WTCLK1         47
     ../..

As you can see there is manifestly a design issue here as more than 99% of COL1 values are null. After I have got the confirmation from the client that the null value for COL1 is never used, I was going to create a single column index on COL1 when one of my DBA colleagues asked me the following question:

Why don’t you create a composite index on (COL1, END_DATE, START_DATE)?

And that was precisely the question that has motivated this blog post. I provided the following answer:

If we exclude null values, at maximum, we can filter down 49 rows from table T1 using COL1 column
While an inequality is applied on END_DATE and START_DATE columns an equality predicate is used against COL1
If we opt for the composite index (COL1, END_DATE, START_DATE) we will lose the formidable opportunity to have a very small and attractive index on COL1 since null values of COL1 will be added into the composite index.

Having sad that here’s below what happened next:

SQL> create index idx_t1_col1 on t1(COL1) parallel 4;

SQL> alter index idx_t1_col1 noparallel ;

With the new index having only 0,39GB of size out of table of 8,7GB as shown below:

SQL> @sizeBySegName
Enter value for segment_name: T1
Enter value for owner: xxxx

SEGMENT_TYPE       TABLESPACE_NAME   SEGMENT_NAME  PARTITION_NAME   GB
------------------ ----------------- ------------- --------------- -------
TABLE              XXXX               T1                            8.7012
                                                                   -------
Total Segment Size                                                  8.7012


SQL> @sizeBySegName
Enter value for segment_name: idx_t1_col1
Enter value for owner: xxxx

SEGMENT_TYPE       TABLESPACE_NAME   SEGMENT_NAME  PARTITION_NAME   GB
------------------ ----------------- ------------- --------------- ------
INDEX              XXXX               IDX_T1_COL1                  0.387
                                                                   ------
Total Segment Size                                                 0.387

Naturally the new execution plan uses the new index and resulted into a drastically improved query performance as the followings show via the new execution plan and the new elapsed time:

SQL_ID  7bm6m1r2xsj5f, child number 0
-------------------------------------
Plan hash value: 874603108

------------------------------------------------------------------------
| Id  | Operation                               | Name      | Rows     |
------------------------------------------------------------------------
|  0 | SELECT STATEMENT                         |            |         |
|  1 |   NESTED LOOPS OUTER                     |            |       1 |
|  2 |    NESTED LOOPS OUTER                    |            |       1 |
|  3 |     NESTED LOOPS OUTER                   |            |       1 |
.../...
| 12 |      TABLE ACCESS BY INDEX ROWID BATCHED | T1         |       1 |
| 13 |       INDEX RANGE SCAN                   | IDX_T1_COL1|       1 | 
|.../...                                                               |
------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   12 – filter( "XXX”.”END_DATE" >=:6 AND "XXX”.”START_DATE" <=:5 )
   13 - access("XXX”.”COL1" = :4)

SQL> @sqlstats
Enter value for sql_id: 7bm6m1r2xsj5f

     CHILD  PLAN_HASH_VALUE   GETS      ELAPS   EXECUTIONS
----------  --------------- ---------- ------  -----------
         0       874603108        11      0      897

Thanks to the new index the execution time of the client critical query dropped from 42 to 0 second and the logical I/O consumption from 1 million to only 11 buffer gets per execution.

Bottom Line

Through this simple real life example I wanted to show how crucial is the first column of an index when used in an equality predicate. I wanted also to emphasize that, sometimes, and all things being equal, it might be better to prefer a single column index to cover a multi-column where clause provided this single column is very selective and contains a bunch of null values making the single index very small and thereby very attractive to the CBO.

Comments (4)

May 19, 2018

Merge join in modern Relational Database Management Systems

Filed under: Oracle — hourim @ 2:07 pm

There are many posts out there explaining what a MERGE JOIN is, how it works and why it is less popular than NESTED LOOPS and HASH JOIN physical operations. In a nutshell, MERGE JOIN compares two sets of sorted data on the merge column and outputs matched rows. It reads both data sets only once. This is why it is known as an unrelated combined operation as explained by Christian Antognini in his last book.

1. Oracle merge join

Here’s a simple Oracle example illustrating the different subtleties of the MERGE JOIN using Oracle 12cR2:

SQL> create table t1 as 
        select rownum*2 n1, rownum*5 n2, rownum n3
     from dual
     connect by level <=2e1; 

SQL> create table t2 as 
        select rownum*3 n1, rownum*5 n2, rownum n3
     from dual
     connect by level <=1e2; 

SQL> select
        /*+ use_merge(t1,t2) */
        t1.n1 t1n1
       ,t2.n1 t2n1
     from t1
        join t2
     on t1.n1 = t2.n1;

      T1N1       T2N1
---------- ----------
         6          6
        12         12
        18         18
        24         24
        30         30
        36         36

6 rows selected.

---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      6 |
|   1 |  MERGE JOIN         |      |      1 |     20 |      6 |
|   2 |   SORT JOIN         |      |      1 |     20 |     20 |
|   3 |    TABLE ACCESS FULL| T1   |      1 |     20 |     20 |
|*  4 |   SORT JOIN         |      |     20 |    100 |      6 | 
|   5 |    TABLE ACCESS FULL| T2   |      1 |    100 |    100 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."N1"="T2"."N1")
       filter("T1"."N1"="T2"."N1")

The number of rows (A-Rows=20) generated by the first child operation(Line Id n°2) of the MERGE JOIN operation at line id n° 1 and the number of times its second child operation (line id n°4) has been executed ( Starts=20) suggest that the MERGE JOIN is somehow using a NESTED LOOPS kind of algorithm. But we see clearly that, in contrast to NESTED LOOPS, the two merge join inputs (T1 and T2) are scanned only once (Starts = 1 at lines id n°3 and 5). So what does this Starts=20 of operation n°4 mean exactly? Clearly we haven’t made 20 separate sorts as the following proves:

set autotrace on stat

select
        /*+ use_merge(t1,t2) */
        t1.n1 t1n1
       ,t2.n1 t2n1
     from t1
        join t2
     on t1.n1 = t2.n1;

Statistics
---------------------------------------------------
          0  recursive calls
          4  db block gets
         14  consistent gets
          0  physical reads
          0  redo size
        708  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)          --> only two sorts corresponding to operations n°2 and n°4 
          0  sorts (disk)                
          6  rows processed

In effect, the MERGE JOIN parent operation gets a row from each sorted input and compares them. Typically it takes 20 rows from T1 and 100 rows from T2. It then gets the first row of each input and compare them using predicate n°4; if they join their corresponding rows are returned. If not, the MERGE JOIN will then discard the lower value and gets the next row from the lower input data set and continue the comparison process until there is no anymore rows to process. This algorithm can be simplified as follows:

Get 20  sorted rows from T1
Get 100 sorted rows from T2
LOOP until no rows to compare
  if  join value of T1 =  join value of T2
  then
      output the joined rows 
      discard  join value of T2
      get next join value of T1
      get next join value of T2      
  elsif join value of T1 < join value of T2 
      discard join value of T1 
      get next join value of T1 
  elsif join value of T1 >  join value of T2
      discard  join value of T2
      get next join value of T2
  end if;
END LOOP;

So, we can infer that the Starts = 20 of operation at line Id n° 4 represents Oracle comparing each of the 20 join column values of T1 with their equivalent ordered join column of T2 (first rows from T1 with first row from T2 and so on until there is no more rows in T1 to compare).

But let’s now change the order of the join so that table T2 will be the first data set input of the merge join operation:

select
  /*+ leading (t2, t1) use_merge(t2,t1) */
   t1.n1 t1n1
  ,t2.n1 t2n1
from t1
join t2
on t1.n1 = t2.n1;

      T1N1       T2N1
---------- ----------
         6          6
        12         12
        18         18
        24         24
        30         30
        36         36

6 rows selected.
---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      6 |
|   1 |  MERGE JOIN         |      |      1 |     20 |      6 |
|   2 |   SORT JOIN         |      |      1 |    100 |     14 | -- why 14 rows?
|   3 |    TABLE ACCESS FULL| T2   |      1 |    100 |    100 |
|*  4 |   SORT JOIN         |      |     14 |     20 |      6 |
|   5 |    TABLE ACCESS FULL| T1   |      1 |     20 |     20 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."N1"="T2"."N1")
       filter("T1"."N1"="T2"."N1")

We are still generating the same number of rows, 6, but this time the operation at line id n°4 is started 14 times. Why 14 and not 100 the totality of the T2 rows?

The answer to this question is : the merge join doesn’t necessarily need to scan every row from both inputs. It stops scanning as soon as:

it reaches the end of either input
or it reaches a join value from the first input that is greater than the highest join value from the second input.

When I made T2 the driving table of the merge join, Oracle declared the end of the join process as soon as it reached the 14th row of T2 (A-Rows=14). This is simply because the 14th ordered row of T2 is greater than any join value from T1 as the following proves:

-- get the minimum value of t2.n1 that is greater than the max value of t1.n1
SQL> select min(n1) from t2 where n1 > (select max(n1) from t1);

   MIN(N1)
----------
        42

This means that starting from t2.n1= 42 there will be no join possibility since 42 is greater than all join column values from the lesser input (T1). And, as such, remaining join values from T1 should be discarded according to the join algorithm(T2 being the first input of the merge join)

elsif merge value of T2 >  merge value of T1
      discard merge value of T1
      get next merge value of T1
end if;

And how many rows the merge join shoud have already processed when it reaches this merge stop point? 14 naturally:

SQL> select 
         count(1) 
     from t2 
     where 
         n1 <= (select min(n1) from t2 where n1 > (select max(n1) from t1) 
                );
 COUNT(1)
---------
       14

If we redo the same demonstration for the case where T1 will be the driving table we will obviously find that the merge join has to go through the entire number of rows in T1 table (A-Rows = 20) because there is no join value in T1 that is greater than the largest join value in T2.

SQL> select n1 from t1 where n1 > (select max(n1) from t2);
no rows selected

2. One-to-many and many-to-many merge join

In the above setup we unofficially examined a one-to-many join version of the merge join. This type of join occurs when the optimizer knows that there are no duplicate join column values in the first input data set. Although I know t1.n1 is unique I didn’t supply any such extra information like a unique index for the optimizer to opt for a one-to-many join option.This is why officially we have been using a many-to-many merge join in the preceding examples.

2.1. MS-SQL Server
In contrast to Oracle, MS SQL server execution plan makes a clear distinction between these two types of merge join as the following shows:

create table t1(n1 int, n2 int);

insert into t1(n1,n2)
       select top 20
	     2* row_number() over (order by a.name) 
		,abs(row_number() over (order by a.name) -1) 
from sys.all_objects a cross join sys.all_objects;

create table t2(n1 int, n2 int) ;

insert into t2(n1,n2)
       select top 100
	     3* row_number() over (order by a.name) 
		,abs(row_number() over (order by a.name) -1) 
from sys.all_objects a cross join sys.all_objects;

-- many to many
select
   t1.n1 t1n1
  ,t2.n1 t2n1
from
   t1
join t2
on t1.n1 = t2.n1
option (merge join);

But if I create a unique index in T1 indicating to the optimizer the absence of duplicate rows in the join column I will obviously obtain a one-to-many merge join type as the following execution plan shows:

create unique index t1_uk on t1(n1);

-- one-to-many join
select
   t1.n1 t1n1
  ,t2.n1 t2n1
from
   t1
join t2
on t1.n1 = t2.n1
option (merge join);

In a one-to-many join, when two rows join, the optimizer outputs them, discards the join value from the second input (T2), gets the next join value from the first input (T1) and continue the merge process. The optimizer can safely discard the joined value from T2 because it knows that there will be no duplicate rows in T1 that will ever join with the T2 discarded row.

In a many-to-many join, the merge join algorithm, very probably, keeps track of the discarded T2 row somewhere in a memory structure. If the next iteration finds that the current row is duplicated it will then compare it with the saved inmemory row. If, instead, the next row from T1 reveals to be a new one, the optimizer can then safely delete the inmemory T2 saved row. This approach can be backed up by the merge join algorithm displayed above which shows that the merge process goes always forward. It never needs to step backward in the data set. In the complex many-to-many join case this “always walk down” can be ensured by looking for a previous compared join row stored in memory and probably not by stepping backward. If an extra filter is present in the query it will be replayed back to ensure that the saved joined row satisfy the filter predicate or not.

2.2. PostgreSQL
Using PostgreSQL we can have both textual and graphical execution plan. But instead of a many-to-many or a one-to-many merge join, PostgreSQL uses a different terminology which is Inner Unique (True and False) respectively as illustrated below:
First the data model:

create table t1 (n1 int, n2 int, n3 int);
create table t2 (n1 int, n2 int, n3 int);	

with got_my_data (j)
as
 (select generate_series(1, 20)
 )
 insert into t1(n1, n2, n3)
 select 
    j*2
   ,j*5
   ,j
  from 
   got_my_data;
   
with got_my_data (j)
as
 (select generate_series(1, 100)
 )
 insert into t2(n1, n2, n3)
 select 
    j*3
   ,j*5
   ,j
  from 
   got_my_data;

Since there is no hint in PostgreSQL with which I can force a merge join operation, I will cancel the hash join possibility, run the query and get the graphical execution plan using pgAdmin4

postgres=# set enable_hashjoin=false;
SET

 explain analyze
 select
        t1.n1 t1n1
       ,t2.n1 t2n1
     from t1
        join t2
     on t1.n1 = t2.n1;

As you can see this is a many-to-many join as indicated by the Inner Unique set to false. If I create a unique index on T2, re-query and get the corresponding execution plan this is what I will observe:

postgres=# create unique index t2_uk on t2(n1);
CREATE INDEX

explain analyze verbose select
       t1.n1 t1n1
      ,t2.n1 t2n1
    from t1
       join t2
    on t1.n1 = t2.n1;

I don’t have enough experience in reading PostgreSQL execution plans but according to the actual rows generated by the second sort in the plan (rows = 14) it seems that, very probably, the query planner has used T2 table as the first input of the merge join. This is why the unique index on this table has triggered a one-to-many join while a unique index on T1 hasn’t(not show here but tested). For a one-to-many merge join to occur uniquensess is required for the join column of the first input.

2.3. Oracle
Let’s add a duplicate row in T1 and re-execute the same merge join query using Oracle database

SQL> insert into t1 values (6, -1, -1);
1 row created.

SQL> commit;

select
    /*+ use_merge(t1,t2) */
  t1.n1 t1n1
 ,t2.n1 t2n1
from t1
join t2
on t1.n1 = t2.n1;

      T1N1       T2N1
---------- ----------
         6          6
         6          6
        12         12
        18         18
        24         24
        30         30
        36         36

7 rows selected.
---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      7 |
|   1 |  MERGE JOIN         |      |      1 |     20 |      7 |
|   2 |   SORT JOIN         |      |      1 |     20 |     21 |
|   3 |    TABLE ACCESS FULL| T1   |      1 |     20 |     21 |
|*  4 |   SORT JOIN         |      |     21 |    100 |      7 |
|   5 |    TABLE ACCESS FULL| T2   |      1 |    100 |    100 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."N1"="T2"."N1")
       filter("T1"."N1"="T2"."N1")

As expected, the duplicated join value from T1 has been returned by the merge join. But there is no clue in the execution plan about whether this is a many-to-many or a a one-to-many join. Even if I delete the inserted duplicate row and create a unique index on t1.n1, I will still find nothing related to the type of merge join in the corresponding execution plan as shown below:

SQL> delete from t1 where n1=6 and n2 =-1;

SQL> create unique index t1_uk on t1(n1);

select
    /*+ use_merge(t1,t2) */
  t1.n1 t1n1
 ,t2.n1 t2n1
from t1
join t2
on t1.n1  = t2.n1;

      T1N1       T2N1
---------- ----------
         6          6
        12         12
        18         18
        24         24
        30         30
        36         36

6 rows selected.

----------------------------------------------------------------
| Id  | Operation           | Name  | Starts | E-Rows | A-Rows |
----------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |      1 |        |      6 |
|   1 |  MERGE JOIN         |       |      1 |     20 |      6 |
|   2 |   INDEX FULL SCAN   | T1_UK |      1 |     20 |     20 |
|*  3 |   SORT JOIN         |       |     20 |    100 |      6 |
|   4 |    TABLE ACCESS FULL| T2    |      1 |    100 |    100 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"="T2"."N1")
       filter("T1"."N1"="T2"."N1")

There is no clue about the type of merge join after I have made unique the join column of the first input. However, we can observe that, thanks to the INDEX FULL SCAN operation, rows from the first input are acquired pre-sorted and don’t need the usual extra SORT JOIN operation.

Finally, we are not going to finish this merge join investigation without creating a unique index on the second data set input and see what this will change in the execution plan:

SQL> create unique index t2_uk on t2(n1);

SQL> select
        /*+ use_merge(t1,t2) */
        t1.n1 t1n1
       ,t2.n1 t2n1
    from t1
    join t2
    on t1.n1 = t2.n1;

      T1N1       T2N1
---------- ----------
         6          6
        12         12
        18         18
        24         24
        30         30
        36         36

6 rows selected.
--------------------------------------------------------------
| Id  | Operation         | Name  | Starts | E-Rows | A-Rows |
--------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |      1 |        |      6 |
|   1 |  MERGE JOIN       |       |      1 |     20 |      6 |
|   2 |   INDEX FULL SCAN | T2_UK |      1 |    100 |     14 |
|*  3 |   SORT JOIN       |       |     14 |     20 |      6 |
|   4 |    INDEX FULL SCAN| T1_UK |      1 |     20 |     20 |
--------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"="T2"."N1")
       filter("T1"."N1"="T2"."N1")

If you look closely to the above execution plan you will immediately spot out two important points:

Oracle has inverted the join order as T2 becomes the first input and T1 the second one

The Cost Based optimizer is very clever since, by switching the join order, it can declare the end of the query much earlier by stop scanning T2 as soon as it reaches the 14th row of the first input (T2) as explained above.

The second point is related to the extra SORT JOIN operation at line n°3.

This operation receives an already pre-sorted data set via the INDEX FULL SCAN operation at line n°4. So why Oracle needs to apply an extra SORT on an ordered data?

In fact, as already explained in section 1 above, the SORT JOIN operation at line n°3, as its name doesn’t suggest, is responsible for applying the join condition (access and filter predicate n°3) on the right input for each row produced by the left input (14 rows in the current case). This is why, regardless of how the rows are acquired, the SORT JOIN operation is always needed to be applied on the right data set input. The same doesn’t apply for the first input data set where the SORT JOIN operation can be skipped whenever this data is retrieved already sorted.

3. Summary

In this article I tried to explain how the merge join algorithm has been implemented in modern relational database systems. I demonstrated that the merge join doesn’t necessarily need to scan every row from both inputs. It stops scanning as soon as it reaches the end of either input or it reaches a join value from the first input that is greater than the highest join value from the second input. I have outlined, using MS-SQL Sever and PostgreSQL the concept of one-to-many and many-to-many join and how a unique index on the first input data set of the join can switch from a costly many-to-many to a less aggressive one-to-many form of the join.Although I have shown it here, a merge join can work with inequality join predicate and it supports outer (MERGE JOIN OUTER), semi(MERGE JOIN SEMI) and anti (MERGE JOIN ANTI) logical join operations.

Mohamed Houri’s Oracle Notes

May 25, 2018

Indexing Intelligibly

May 19, 2018

Merge join in modern Relational Database Management Systems

Follow Blog via Email

Categories

Blog Stats

Blogs I Follow

Archives