The damage that can be inflicted by a filter predicate applied on a big table is difficult to calculate. But it is clear that it has been almost always a tragic mistake to filter a huge amount of rows from a very large table. This article outlines one of those practical cases where the creation of a chirurgical index overcame the waste of time and energy spent at the table level in a very critical third party software query.
To begin with here’s a SQL monitoring report of this critical query taken from a 12cR2 release:
SELECT conRef h, vamisin h FROM TABLEHIST h WHERE h.typeisin <> 4 AND h.optionv NOT IN (SELECT vwx.ident FROM AGGREGATE_VW vwx WHERE (vwx.I1 = 2298535) ) AND h.backclient NOT IN (106, 28387) AND h.vamisin IN (91071113) Global Information ------------------------------ Status : DONE (ALL ROWS) Instance ID : 1 Session : XXXXXX (2198:9848) SQL ID : 7v8109fkuusw3 SQL Execution ID : 17529954 Execution Started : 08/01/2018 04:58:31 First Refresh Time : 08/01/2018 04:58:35 Last Refresh Time : 08/01/2018 04:59:16 Duration : 45s Module/Action : JDBC Thin Client/- Service : sphpre1 Program : JDBC Thin Client Fetch Calls : 6 Global Stats ================================================================= | Elapsed | Cpu | IO | Fetch | Buffer | Read | Read | | Time(s) | Time(s) | Waits(s) | Calls | Gets | Reqs | Bytes | ================================================================= | 44 | 2.35 | 42 | 6 | 81930 | 39157 | 311MB | ================================================================= SQL Plan Monitoring Details (Plan Hash Value=4122899908) ======================================================================================================== | Id | Operation | Name | Time | Start | Rows | | | | | Active(s) | Active | (Actual) | ======================================================================================================== | 0 | SELECT STATEMENT | | 42 | +4 | 50549 | | 1 | HASH JOIN ANTI | | 42 | +4 | 50549 | | 2 | TABLE ACCESS BY INDEX ROWID BATCHED |TABLEHIST | 42 | +4 | 50549 | | 3 | INDEX RANGE SCAN |TABLEHIST_IND4 | 42 | +4 | 281K | | 4 | VIEW | | 1 | +45 | 12883 | | 5 | CONNECT BY WITH FILTERING (UNIQUE) | | 1 | +45 | 21028 | | 6 | INDEX RANGE SCAN |IDX_AGGREGAT_VIEW_IDENT | 1 | +45 | 40 | | 7 | HASH JOIN | | 1 | +45 | 0 | | 8 | NESTED LOOPS | | 1 | +45 | 20988 | | 9 | STATISTICS COLLECTOR | | 1 | +45 | 9 | | 10 | CONNECT BY PUMP | | 1 | +45 | 21028 | | 11 | INDEX RANGE SCAN |IDX_AGGREGAT_VIEW_IDENT | 1 | +45 | 20988 | | 12 | INDEX FAST FULL SCAN |IDX_AGGREGAT_VIEW_IDENT | | | | ======================================================================================================== Predicate Information (identified by operation id): --------------------------------------------------- 1 - access("H"."OPTIONV"="IDENT") 2 - filter(("H"."BACKCLIENT"<>:SYS_B_3 AND "H"."BACKCLIENT"<>:SYS_B_2 AND "H"."TYPEISIN"<>:SYS_B_0)) 3 - access("H"."VAMISIN"=:1) 4 - filter((TO_NUMBER(REPLACE(REGEXP_SUBSTR ("IDENT_PATH",'@@(.+?)(@@|$)',1,1,HEXTORAW('5824EBC2000')),'@@',NULL))=:SYS_B_1 AND "IDENT">0)) 5 - access("F"."MGR"=PRIOR NULL) 6 - access("F"."MGR"=1) 9 - access("connect$_by$_pump$_005"."prior f.ident "="F"."MGR") Note ----- - this is an adaptive plan
In their attempt to fix this performance pain, DBAs and developers were focusing their efforts in reviewing the part of the execution plan that involves the hierarchical query (operation n°4 to 12).
But whoever knows how to interpret correctly the multitude information provided by the SQL monitor report would have immediately pointed out where most of the query time is spent. Look at the Start Active column of operations n°0 to 3: they all shows +4 seconds. This means that those operations started to be active 4 seconds after the beginning of the query. And whoever knows how to interpret an execution plan would have pointed out that the parent-child operation n°2-3 is the first to be started by Oracle. Look now to the Time Active(s) column of these two operations. They both indicate 42 seconds. This means that the table/index operation 2-3 took 42 seconds. Since the total duration of this query is 45 seconds we know then clearly what part of the query we have to address.
The index at operations n°3 supplied its parent table at line n°2 with 281K of rowids which has thrown away all but only 50549 using the filter predicate n°2
2 - filter(("H"."BACKCLIENT"<>:SYS_B_3 AND "H"."BACKCLIENT"<>:SYS_B_2 AND "H"."TYPEISIN"<>:SYS_B_0))
Most of the 81930 buffer gets come from the TABLEHIST table at operation line n°2. We need then to engineer an index starting by the column in the predicate on which an equality is applied (VAMISIN) followed by the other three columns from the other predicates on which an inequality is applied (OPTIONV, BACKCLIENT, TYPEISIN).
SQL> create index TABLEHIST_IND_MHO on TABLEHIST (VAMISIN, OPTIONV, BACKCLIENT, TYPEISIN)
Since the select part of this critical query involves only two columns of which one (VAMISIN) is already included in the new index then why not add the second selected column to this index and avoid visiting the table at all?
SQL> create index TABLEHIST_IND_MHO on TABLEHIST (VAMISIN, OPTIONV, BACKCLIENT, TYPEISIN, CONREF)
Although this will increase the size of the index it has, however, the potentiality to eliminate 96% of I/O.
Indeed, following the creation of the new index, Oracle has used the following execution plan:
------------------------------------------------------------------------------- | Id | Operation | Name | Rows | ------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | |* 1 | HASH JOIN ANTI | | 1 | |* 2 | INDEX RANGE SCAN |TABLEHIST_IND_MHO | 49 | |* 3 | VIEW | | 89 | |* 4 | CONNECT BY WITH FILTERING (UNIQUE)| | | |* 5 | INDEX RANGE SCAN |IDX_AGGREGAT_VIEW_IDENT | 5 | | 6 | NESTED LOOPS | | 84 | | 7 | CONNECT BY PUMP | | | |* 8 | INDEX RANGE SCAN |IDX_AGGREGAT_VIEW_IDENT | 17 | ------------------------------------------------------------------------------- SQL> @sqlstats Enter value for sql_id: 7v8109fkuusw3 CHILD SQL_PROFILE PLAN_HASH_VALUE AVG_GETS EXECS ---------- -------------------- --------------- ---------- -------- 1 860019752 3651 204
As you can see Oracle is not visiting anymore the TABLEHIST table and the reduction of the logical I/O consumption is very remarkable as it went from 81930 to 3651.