It’s fairly clear that to improve a performance of a SQL query we sometimes need to rewrite it so that we end up by either reducing the number of scanned tables or by giving the CBO an alternative path the original query was pre-empting it to take. In a previous article, I’ve looked at ways of turning a disjunctive subquery into a conjunctive subquery so that it can be unnested and merged with its parent query block. In this article I aim to explain how to improve performance by coalescing two distinct subqueries. This tuning strategy will be preceded by the conjunction and disjunction subquery concept definition. Finally, simple examples of subquery coalescing will be presented and explained. I aim also to show that sometime de-coalescing can reveal to be a good tuning strategy as far as it might open a new path for an optimal execution plan. This article is based on the Oracle published paper entitled Enhanced Subquery Optimization in Oracle
Conjunction, Disjunction and Containment property
Two subqueries are eligible to be coalesced provided they verify the containment property. A subquery subq1 is said to contain a second subq2 if the result of subq2 is a subset of the subq1 result. In this case subq1 is called the container query block while subq2 is called the contained query block. The same two subqueries will verify the contained property when subq2 contains a conjunctive predicate which, if suppressed, makes subq1 and subq2 equivalent.
Let’s picture the containment property to make it crystal clear. Below I have the following three distinct query blocks
select
t1.*
from
t1,t2
where t2.id1 = t1.id1;
9000 rows selected.
select
t1.*
from
t1,t2
where t2.id1 = t1.id1
and t2.status = 'COM';
8900 rows selected.
select
t1.*
from
t1,t2
where t2.id1 = t1.id1
and t2.id1 = t2.id;
4 rows selected.
If we get rid of the predicate part on the status column from the second query block, the first and the second query become equivalent. They will then verify the containment property and therefore can be coalesced. If we look at the second and the third query blocks they are also verifying the containment property as far as the result set of the third query is a subset of the result set of the second one. That is the containment property definition.
A conjunction is the action of linking two subqueries with an AND operator. The conjunction of the two subqueries is true if the two subqueries are simultaneously true; otherwise it is false.
A disjunction is the action of liking two subqueries with an OR predicate. The disjunction of two subqueries is true when one of the two subqueries is true and is false when both are simultaneously true.
Coalescing two subqueries of the same type (exists)
Two conjunctive subqueries satisfying the containment property can be coalesced. Consider the following reproducible model (11.2.0.3)
create table t1
as select
rownum id1,
trunc((rownum-1/3)) n1,
date '2012-06-07' + mod((level-1)*2,5) start_date,
lpad(rownum,10,'0') small_vc,
rpad('x',1000) padding
from dual
connect by level <= 1e4;
create table t2
as select
rownum id
,mod(rownum,5) + mod(rownum,10)* 10 as id1
,case
when mod(rownum, 1000) = 7 then 'ERR'
when rownum <= 9900 then 'COM'
when mod(rownum,10) between 1 and 5 then 'PRP'
else
'UNK'
end status
,lpad(rownum,10,'0') as small_vc
,rpad('x',70) as padding
from dual
connect by level <= 1e4;
And the following query with two exists subqueries of the same type (exists)
select
start_date
,count(1)
from t1
where
start_date >= to_date('10062012','ddmmyyyy')
and (exists (select null
from t2
where t2.id1 = t1.id1
and t2.status = 'COM'
)
or exists (select null
from t2
where t2.id1 = t1.id1
and t2.id1 = t2.id
)
)
group by start_date;
START_DATE COUNT(1)
----------------- ----------
20120611 00:00:00 2
20120610 00:00:00 1
--------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | Buffers | Reads |
---------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 | 1950 | 1665 |
| 1 | HASH GROUP BY | | 1 | 1 | 2 | 1950 | 1665 |
|* 2 | HASH JOIN SEMI | | 1 | 1 | 3 | 1950 | 1665 |
|* 3 | TABLE ACCESS FULL | T1 | 1 | 4908 | 4000 | 1670 | 1665 |
| 4 | VIEW | VW_SQ_1 | 1 | 11843 | 9894 | 280 | 0 |
| 5 | UNION-ALL | | 1 | | 9894 | 280 | 0 |
|* 6 | TABLE ACCESS FULL| T2 | 1 | 1 | 4 | 140 | 0 |
|* 7 | TABLE ACCESS FULL| T2 | 1 | 11842 | 9890 | 140 | 0 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("VW_COL_1"="T1"."ID1")
3 - filter("START_DATE">=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
6 - filter("T2"."ID1"="T2"."ID")
7 - filter("T2"."STATUS"='COM')
Note
-----
- dynamic sampling used for this statement (level=2)
Since the above two exists subqueries verify the containment property, let’s then coalesce them and note the performance consequences we will reach through this transformation
select
start_date
,count(1)
from t1
where
start_date >= to_date('10062012','ddmmyyyy')
and (exists (select null
from t2
where t2.id1 = t1.id1
and(t2.status = 'COM'
or t2.id1 = t2.id)
)
)
group by start_date;
----------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | Buffers | Reads |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 | 1808 | 1665 |
| 1 | HASH GROUP BY | | 1 | 1 | 2 | 1808 | 1665 |
|* 2 | HASH JOIN SEMI | | 1 | 10 | 3 | 1808 | 1665 |
|* 3 | TABLE ACCESS FULL| T1 | 1 | 4000 | 4000 | 1669 | 1665 |
|* 4 | TABLE ACCESS FULL| T2 | 1 | 9890 | 9890 | 139 | 0 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("T2"."ID1"="T1"."ID1")
3 - filter("START_DATE">=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
4 - filter(("T2"."STATUS"='COM' OR "T2"."ID1"="T2"."ID"))
Thanks to this coalescing operation we are now doing only a single access full t2 table instead of two initial full table scan of the same table. As such, we saved 140 logical I/O dropping the total number of buffers from 1950 to 1808.
Coalescing two subqueries of different type (exists and not exists)
Real life examples can show cases of subqueries verifying the containment property but having a different type. An example of such situation is presented in the below query:
select
start_date,
id1,
n1
from t1
where
start_date >= to_date('10062012','ddmmyyyy')
and exists (select null
from t2 a
where a.id1 = t1.id1
)
and not exists (select null
from t2 b
where b.id1 = t1.id1
and b.id1 != 83
);
START_DATE ID1 N1
----------------- ---------- ----------
20120611 00:00:00 83 82
----------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | Buffers | Reads |
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 | 1947 | 1665 |
|* 1 | HASH JOIN ANTI | | 1 | 7 | 1 | 1947 | 1665 |
|* 2 | HASH JOIN SEMI | | 1 | 10 | 3 | 1808 | 1665 |
|* 3 | TABLE ACCESS FULL| T1 | 1 | 4000 | 4000 | 1669 | 1665 |
| 4 | TABLE ACCESS FULL| T2 | 1 | 10000 | 10000 | 139 | 0 |
|* 5 | TABLE ACCESS FULL | T2 | 1 | 9000 | 9000 | 139 | 0 |
----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("B"."ID1"="T1"."ID1")
2 - access("A"."ID1"="T1"."ID1")
3 - filter("START_DATE">=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
5 - filter("B"."ID1"<>83)
The above query contains two distinct subqueries:
exists (select null
from t2 a
where a.id1 = t1.id1
)
not exists (select null
from t2 b
where b.id1 = t1.id1
and b.id1 != 83
);
They verify the containment property because if we get rid of the predicate (b.id1 != 83) from the second one it becomes equivalent to the first one. The sole difference is that they are of different types: the first one is an exists subquery while the second one is a not exists subquery. Coalescing such a kind of subquery yields to the following query
select
start_date,
id1,
n1
from t1
where
start_date >= to_date('10062012','ddmmyyyy')
and exists (select null
from t2 a
where a.id1 = t1.id1
having sum (case when a.id1 != 83
then 1 else 0 end) = 0
);
START_DATE ID1 N1
----------------- ---------- ----------
20120611 00:00:00 83 82
-----------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | Buffers | Reads |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 | 557K| 1665 |
|* 1 | FILTER | | 1 | | 1 | 557K| 1665 |
|* 2 | TABLE ACCESS FULL | T1 | 1 | 4000 | 4000 | 1670 | 1665 |
|* 3 | FILTER | | 4000 | | 1 | 556K| 0 |
| 4 | SORT AGGREGATE | | 4000 | 1 | 4000 | 556K| 0 |
|* 5 | TABLE ACCESS FULL| T2 | 4000 | 1000 | 3000 | 556K| 0 |
-----------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter( IS NOT NULL)
2 - filter("START_DATE">=TO_DATE(' 2012-06-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
3 - filter(SUM(CASE WHEN "A"."ID1"<>83 THEN 1 ELSE 0 END )=0)
5 - filter("A"."ID1"=:B1)
One of the immediate and obvious observations is that the coalescing process produces a dramatic performance alteration as far as we went from a clean 1947 logical I/O to a terrific 557 thousands of Buffers. This is a clear lesson that it is not because we’ve eliminated a full table access from the picture that we will have a performance improvement. Sometimes visiting twice a table might make the CBO generating an optimal execution plan. The collateral effect of the above coalescing process is that the CBO has been unable to unnest the new exist subquery and therefore it execute it as filter operation even though that the most resource expensive operation is operation 5. Important information in the new suboptimal plan is the terrific (4000) number of executed operations as shown by the Starts column in the execution plan. The operation at line 2 produces 4000 rows and it dictates the number of times the operation at line 3 (and its children operations at lines 4 and 5) it will be executed.
The CBO has been unable to unnest the new coalesced exist subquery because of the presence of the aggregate function (sum) as proofed by the corresponding 10053 trace file
*****************************
Cost-Based Subquery Unnesting
*****************************
SU: Unnesting query blocks in query block SEL$1 (#1) that are valid to unnest.
Subquery Unnesting on query block SEL$1 (#1)SU: Performing unnesting that
does not require costing.
SU: Considering subquery unnest on query block SEL$1 (#1).
SU: Checking validity of unnesting subquery SEL$2 (#2)
SU: SU bypassed: Failed aggregate validity checks.
SU: SU bypassed: Failed basic validity checks.
SU: Validity checks failed.
As far as unnesting is impossible de-coalescing might represent a good strategy in this case. And as far as we can go from the coalesced query to the original one thanks to the containment property, we can therefore open the unnesting path leading to original optimal plan we thought that we are going to improve by coalescing the two subqueries.
I was thinking in this particular case of subquery unnesting impossibility to take advantage of virtual column and hide the aggregation function in the coalesced exists subquery. Unfortunately it is impossible to create a virtual column using an aggregation function as shown below:
alter table t2 add aggr_virt number generated always
as (sum(case when id1 != 83 then 1 else 0 end)) virtual;
alter table t2 add aggr_virt number generated always as (sum(case when id1 != 83 then 1 else 0 end)) virtual
ERROR at line 1:
ORA-00934: group function is not allowed here
The Oracle documentation says the following about creating virtual columns
When you define a virtual column, the defining column_expr must refer only to columns of the subject table that have already been defined, in the current statement or in a prior statement.
Conclusion
Coalescing two subqueries of the same type might drastically reduce the number of logical I/O as far as it can eliminate an entire table access. Coalescing two subqueries of different types might pre-empt the CBO from taking advantage of the unnesting transformation. Fortunately if you know how to coalesce two different subqueries you will know how to de-coalesce them to allow the CBO taking advantage of unnesting the subqueries with their main query blocks.
Keep always an open eye on the concept of conjunction and disjunction with subqueries. The CBO is unable to unnest a subquery when it appears into a disjunction operation leading therefore to several executing of the subquery as a filter operation applied on the result set of the main query block. The CBO is however able to transform a conjunction of two subqueries of the same time into a single conjunction subquery and unnest it with its parent query block.