Mohamed Houri’s Oracle Notes

December 25, 2014

Subquery Coalescing

Filed under: Oracle — hourim @ 9:01 am

It’s fairly clear that to improve a performance of a SQL query we sometimes need to rewrite it so that we end up by either reducing the number of scanned tables or by giving the CBO an alternative path the original query was pre-empting it to take. In a previous article, I’ve looked at ways of turning a disjunctive subquery into a conjunctive subquery so that it can be unnested and merged with its parent query block. In this article I aim to explain how to improve performance by coalescing two distinct subqueries. This tuning strategy will be preceded by the conjunction and disjunction subquery concept definition. Finally, simple examples of subquery coalescing will be presented and explained. I aim also to show that sometime de-coalescing can reveal to be a good tuning strategy as far as it might open a new path for an optimal execution plan. This article is based on the Oracle published paper entitled Enhanced Subquery Optimization in Oracle

Conjunction, Disjunction and Containment property

Two subqueries are eligible to be coalesced provided they verify the containment property. A subquery subq1 is said to contain a second subq2 if the result of subq2 is a subset of the subq1 result. In this case subq1 is called the container query block while subq2 is called the contained query block. The same two subqueries will verify the contained property when subq2 contains a conjunctive predicate which, if suppressed, makes subq1 and subq2 equivalent.

Let’s picture the containment property to make it crystal clear. Below I have the following three distinct query blocks

select 
     t1.*
from 
     t1,t2
where t2.id1 = t1.id1;

9000 rows selected.

select 
     t1.*
from 
     t1,t2
where t2.id1 = t1.id1
and   t2.status = 'COM';

8900 rows selected.

select 
     t1.*
from 
     t1,t2
where t2.id1 = t1.id1
and   t2.id1 = t2.id;

4 rows selected.

If we get rid of the predicate part on the status column from the second query block, the first and the second query become equivalent. They will then verify the containment property and therefore can be coalesced. If we look at the second and the third query blocks they are also verifying the containment property as far as the result set of the third query is a subset of the result set of the second one. That is the containment property definition.

A conjunction is the action of linking two subqueries with an AND operator. The conjunction of the two subqueries is true if the two subqueries are simultaneously true; otherwise it is false.

A disjunction is the action of liking two subqueries with an OR predicate. The disjunction of two subqueries is true when one of the two subqueries is true and is false when both are simultaneously true.

Coalescing two subqueries of the same type (exists)

Two conjunctive subqueries satisfying the containment property can be coalesced. Consider the following reproducible model (11.2.0.3)

create table t1
   as select 
    rownum                id1, 
    trunc((rownum-1/3))   n1, 
    date '2012-06-07' + mod((level-1)*2,5) start_date,
    lpad(rownum,10,'0')   small_vc, 
    rpad('x',1000)        padding 
from dual
connect by level <= 1e4;   

create table t2
as select 
    rownum id
    ,mod(rownum,5) + mod(rownum,10)* 10  as id1
    ,case
       when mod(rownum, 1000) = 7 then 'ERR'
       when rownum <= 9900 then 'COM'
       when mod(rownum,10) between 1 and 5 then 'PRP'
     else
       'UNK'
     end status
     ,lpad(rownum,10,'0')    as small_vc  
     ,rpad('x',70)           as padding
from dual
connect by level <= 1e4;

And the following query with two exists subqueries of the same type (exists)

select 
     start_date
    ,count(1) 
from t1 
where 
     start_date >= to_date('10062012','ddmmyyyy')
and (exists (select null
             from t2
             where t2.id1 = t1.id1
             and t2.status = 'COM'
             )
  or exists (select null
             from t2
             where t2.id1 = t1.id1
             and   t2.id1 = t2.id
             )
     )
group by start_date;     

START_DATE          COUNT(1)
----------------- ----------
20120611 00:00:00          2
20120610 00:00:00          1

--------------------------------------------------------------------------------------
| Id  | Operation             | Name    | Starts | E-Rows | A-Rows | Buffers | Reads  |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |         |      1 |        |      2 |    1950 |   1665 |
|   1 |  HASH GROUP BY        |         |      1 |      1 |      2 |    1950 |   1665 |
|*  2 |   HASH JOIN SEMI      |         |      1 |      1 |      3 |    1950 |   1665 |
|*  3 |    TABLE ACCESS FULL  | T1      |      1 |   4908 |   4000 |    1670 |   1665 |
|   4 |    VIEW               | VW_SQ_1 |      1 |  11843 |   9894 |     280 |      0 |
|   5 |     UNION-ALL         |         |      1 |        |   9894 |     280 |      0 |
|*  6 |      TABLE ACCESS FULL| T2      |      1 |      1 |      4 |     140 |      0 |
|*  7 |      TABLE ACCESS FULL| T2      |      1 |  11842 |   9890 |     140 |      0 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("VW_COL_1"="T1"."ID1")
   3 - filter("START_DATE">=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   6 - filter("T2"."ID1"="T2"."ID")
   7 - filter("T2"."STATUS"='COM')

Note
-----
   - dynamic sampling used for this statement (level=2)

Since the above two exists subqueries verify the containment property, let’s then coalesce them and note the performance consequences we will reach through this transformation

select 
     start_date
    ,count(1) 
from t1 
where 
     start_date >= to_date('10062012','ddmmyyyy')
and (exists (select null
            from t2
            where t2.id1 = t1.id1
            and(t2.status = 'COM'
                or t2.id1 = t2.id)
            )                 
          )
group by start_date;     

----------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows | Buffers | Reads  |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      2 |    1808 |   1665 |
|   1 |  HASH GROUP BY      |      |      1 |      1 |      2 |    1808 |   1665 |
|*  2 |   HASH JOIN SEMI    |      |      1 |     10 |      3 |    1808 |   1665 |
|*  3 |    TABLE ACCESS FULL| T1   |      1 |   4000 |   4000 |    1669 |   1665 |
|*  4 |    TABLE ACCESS FULL| T2   |      1 |   9890 |   9890 |     139 |      0 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T2"."ID1"="T1"."ID1")
   3 - filter("START_DATE">=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   4 - filter(("T2"."STATUS"='COM' OR "T2"."ID1"="T2"."ID"))

Thanks to this coalescing operation we are now doing only a single access full t2 table instead of two initial full table scan of the same table. As such, we saved 140 logical I/O dropping the total number of buffers from 1950 to 1808.

Coalescing two subqueries of different type (exists and not exists)
Real life examples can show cases of subqueries verifying the containment property but having a different type. An example of such situation is presented in the below query:

select 
     start_date,
     id1,
     n1
from t1 
where 
     start_date >= to_date('10062012','ddmmyyyy')
and exists     (select null
                from t2 a
                where a.id1 = t1.id1                
                )
and not exists (select null
                from t2 b
                where b.id1 = t1.id1
                and   b.id1 != 83
                );

START_DATE               ID1         N1
----------------- ---------- ----------
20120611 00:00:00         83         82

----------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows | Buffers | Reads  |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |    1947 |   1665 |
|*  1 |  HASH JOIN ANTI     |      |      1 |      7 |      1 |    1947 |   1665 |
|*  2 |   HASH JOIN SEMI    |      |      1 |     10 |      3 |    1808 |   1665 |
|*  3 |    TABLE ACCESS FULL| T1   |      1 |   4000 |   4000 |    1669 |   1665 |
|   4 |    TABLE ACCESS FULL| T2   |      1 |  10000 |  10000 |     139 |      0 |
|*  5 |   TABLE ACCESS FULL | T2   |      1 |   9000 |   9000 |     139 |      0 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("B"."ID1"="T1"."ID1")
   2 - access("A"."ID1"="T1"."ID1")
   3 - filter("START_DATE">=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   5 - filter("B"."ID1"<>83)

The above query contains two distinct subqueries:

exists (select null
        from t2 a
        where a.id1 = t1.id1                
        )

not exists (select null
            from t2 b
            where b.id1 = t1.id1
            and   b.id1 != 83
            );

They verify the containment property because if we get rid of the predicate (b.id1 != 83) from the second one it becomes equivalent to the first one. The sole difference is that they are of different types: the first one is an exists subquery while the second one is a not exists subquery. Coalescing such a kind of subquery yields to the following query

select 
     start_date,
     id1,
     n1
from t1 
where 
     start_date >= to_date('10062012','ddmmyyyy')
and exists     (select null
                from t2 a
                where a.id1 = t1.id1                
                having sum (case when  a.id1 != 83 
                            then 1 else 0 end) = 0
               );
START_DATE               ID1         N1
----------------- ---------- ----------
20120611 00:00:00         83         82

-----------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows | Buffers | Reads  |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      1 |     557K|   1665 |
|*  1 |  FILTER              |      |      1 |        |      1 |     557K|   1665 |
|*  2 |   TABLE ACCESS FULL  | T1   |      1 |   4000 |   4000 |    1670 |   1665 |
|*  3 |   FILTER             |      |   4000 |        |      1 |     556K|      0 |
|   4 |    SORT AGGREGATE    |      |   4000 |      1 |   4000 |     556K|      0 |
|*  5 |     TABLE ACCESS FULL| T2   |   4000 |   1000 |   3000 |     556K|      0 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NOT NULL)
   2 - filter("START_DATE">=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   3 - filter(SUM(CASE  WHEN "A"."ID1"<>83 THEN 1 ELSE 0 END )=0)
   5 - filter("A"."ID1"=:B1)

One of the immediate and obvious observations is that the coalescing process produces a dramatic performance alteration as far as we went from a clean 1947 logical I/O to a terrific 557 thousands of Buffers. This is a clear lesson that it is not because we’ve eliminated a full table access from the picture that we will have a performance improvement. Sometimes visiting twice a table might make the CBO generating an optimal execution plan. The collateral effect of the above coalescing process is that the CBO has been unable to unnest the new exist subquery and therefore it execute it as filter operation even though that the most resource expensive operation is operation 5. Important information in the new suboptimal plan is the terrific (4000) number of executed operations as shown by the Starts column in the execution plan. The operation at line 2 produces 4000 rows and it dictates the number of times the operation at line 3 (and its children operations at lines 4 and 5) it will be executed.

The CBO has been unable to unnest the new coalesced exist subquery because of the presence of the aggregate function (sum) as proofed by the corresponding 10053 trace file

*****************************
Cost-Based Subquery Unnesting
*****************************
SU: Unnesting query blocks in query block SEL$1 (#1) that are valid to unnest.
Subquery Unnesting on query block SEL$1 (#1)SU: Performing unnesting that 
does not require costing.
SU: Considering subquery unnest on query block SEL$1 (#1).
SU:   Checking validity of unnesting subquery SEL$2 (#2)
SU:     SU bypassed: Failed aggregate validity checks.
SU:     SU bypassed: Failed basic validity checks.
SU:   Validity checks failed.

As far as unnesting is impossible de-coalescing might represent a good strategy in this case. And as far as we can go from the coalesced query to the original one thanks to the containment property, we can therefore open the unnesting path leading to original optimal plan we thought that we are going to improve by coalescing the two subqueries.

I was thinking in this particular case of subquery unnesting impossibility to take advantage of virtual column and hide the aggregation function in the coalesced exists subquery. Unfortunately it is impossible to create a virtual column using an aggregation function as shown below:

alter table t2 add aggr_virt number generated always 
                    as (sum(case when id1 != 83 then 1 else 0 end)) virtual;
alter table t2 add aggr_virt number generated always as (sum(case when id1 != 83 then 1 else 0 end)) virtual
ERROR at line 1:
ORA-00934: group function is not allowed here

The Oracle documentation says the following about creating virtual columns

When you define a virtual column, the defining column_expr must refer only to columns of the subject table that have already been defined, in the current statement or in a prior statement.

Conclusion

Coalescing two subqueries of the same type might drastically reduce the number of logical I/O as far as it can eliminate an entire table access. Coalescing two subqueries of different types might pre-empt the CBO from taking advantage of the unnesting transformation. Fortunately if you know how to coalesce two different subqueries you will know how to de-coalesce them to allow the CBO taking advantage of unnesting the subqueries with their main query blocks.

Keep always an open eye on the concept of conjunction and disjunction with subqueries. The CBO is unable to unnest a subquery when it appears into a disjunction operation leading therefore to several executing of the subquery as a filter operation applied on the result set of the main query block. The CBO is however able to transform a conjunction of two subqueries of the same time into a single conjunction subquery and unnest it with its parent query block.

2 Comments »

  1. “If we look at the second and the third query blocks they are also verifying the containment property as far as the result set of the third query is a subset of the result set of the second one.”
    You are talking about the 1st and 3rd query here, aren’t you?

    Best regards,
    Salek

    Comment by Salek Talangi — December 25, 2014 @ 8:31 pm | Reply

  2. Hi Salek

    Sorry for this delay

    Suppose that Q2 represents the second query and Q3 the third query.

    Q3 gives the following set of records

    SQL> select
             t1.id1
        from
             t1,t2
        where t2.id1 = t1.id1
        and   t2.id1 = t2.id;
    
           ID1
      ----------
            11
            22
            33
            44
    

    When Q2 is intersected with Q3 it gives the following result set

    SQL> select
             t1.id1
        from
             t1,t2
        where t2.id1 = t1.id1
        and   t2.id1 = t2.id
        intersect
        select
             t1.id1
       from
            t1,t2
       where t2.id1 = t1.id1
       and   t2.status = 'COM';
    
           ID1
    ----------
            11
            22
            33
            44		
    

    Where we can notice that : Q3 ∩ Q2 = Q3.

    This means that Q2 and Q3 are not disjoint because they intersect at some elements. In fact Q3 is completely contained in Q2 because they intersect at the entire elements of Q3.

    That is what I was meaning by the result set of the third query(Q3) is a subset of the result set of the second one(Q2)

    If you see any discrepancy then don’t hesitate to inform me

    Best regards

    Comment by hourim — December 31, 2014 @ 5:25 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

Tony's Oracle Tips

Tony Hasler's light hearted approach to learning about Oracle

Richard Foote's Oracle Blog

Focusing Specifically On Oracle Indexes, Database Administration and Some Great Music

Hatem Mahmoud Oracle's blog

Just another Oracle blog : Database topics and techniques

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Diagnostician

Performance troubleshooting as exact science

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

EU Careers info

Your career in the European Union

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)

%d bloggers like this: