Mohamed Houri’s Oracle Notes

December 9, 2019

AWR bulk collect

Filed under: Oracle — hourim @ 5:08 pm

There are many posts out there explaining how to collect a bunch of AWR reports which I could have just referenced and leave it there, but as much for my own documentation as anything else, I decided to write this simple note as an answer to my client request of generating and analyzing 49 one-hour AWR reports.

First, the SQL script to generate a bulk of AWR reports between two dates:

-- +----------------------------------------------------------------------------+
-- | Author : Mohamed Houri                                                     |
-- |----------------------------------------------------------------------------|
-- | DATABASE : 12cR1                                                           |
-- | Name     : AwrBulkcollect.sql                                              |
-- | PURPOSE  : Dynamically create a SQL script to generate a list of AWR       |
-- |            reports between two snapshots dates.                            |
-- | NOTE     : As with any code, ensure to test this script in a development   |
-- |            environment before attempting to run it in production.          |
-- | Remarks : CHR(10) adds a new line                                          |
-- |           SET termout off so that AWR content will not be displayed        |
-- |           For RAC configuration consider the instance number               |       
-- |          Enter input date in this format :'mmddyyyy hh24:mi:ss'            |
-- |                                                                            |
-- ------------------------------------------------------------------------------
set termout off
set head off
set define off
spool collectAWRs.sql

SELECT
    'spool awr_XYZ_inst_1_'
    || t.si
    || '_'
    || t.se
    || '.text '
    || CHR(10)
    || 'SELECT * FROM TABLE(dbms_workload_repository.awr_report_text('
    || t.dbid
    || ','
    || t.instance_number
    || ','
    || t.si
    || ','
    || t.se
    || '));'
    || CHR(10)
    || ' spool off;'
FROM
    (SELECT
            dbid,
            snap_id si,
            snap_id + 1 se,
            instance_number
     FROM
            dba_hist_snapshot
     WHERE
            begin_interval_time >   TO_DATE('27102019 18:00:00', 'ddmmyyyy hh24:mi:ss')
      AND end_interval_time   <=  TO_DATE('27102019 19:30:00', 'ddmmyyyy hh24:mi:ss')
      AND  instance_number = 1
      ) t;
	
spool off;
-- +-------------------------------------+	
  prompt running collectAWRs.sql script |
-- +-------------------------------------+
@collectAWRs.sql

Depending on the number of snapshots found between the two entered dates the output of the script might look like the following:

spool awr_XYZ_inst_1_11923_11924.text                                                                                   
SELECT * FROM TABLE(dbms_workload_repository.awr_report_text(4209020461,1,11923,11924));                                
spool off;                                                                                                             
                                                                                                                        
spool awr_XYZ_inst_1_11922_11923.text                                                                                   
SELECT * FROM TABLE(dbms_workload_repository.awr_report_text(4209020461,1,11922,11923));                                
spool off;     

etc.…

When the script ends you can see such a list of AWR files

Mode                LastWriteTime         Length Name
----                -------------         ------ ----
-a----       10/28/2019   2:45 PM         451751 awr_XYZ_inst_1_11946_11947.text
-a----       10/28/2019   2:46 PM         464089 awr_XYZ_inst_1_11952_11953.text
-a----       10/28/2019   2:47 PM         446377 awr_XYZ_inst_1_11945_11946.text
-a----       10/28/2019   2:48 PM         494261 awr_XYZ_inst_1_11949_11950.text
-a----       10/28/2019   2:49 PM         447109 awr_XYZ_inst_1_11954_11955.text
-a----       10/28/2019   2:50 PM         440269 awr_XYZ_inst_1_11956_11957.text
-a----       10/28/2019   2:51 PM         448697 awr_XYZ_inst_1_11955_11956.text
-a----       10/28/2019   2:52 PM         441979 awr_XYZ_inst_1_11942_11943.text
-a----       10/28/2019   2:53 PM         444789 awr_XYZ_inst_1_11943_11944.text
-a----       10/28/2019   2:54 PM         424755 awr_XYZ_inst_1_11947_11948.text
-a----       10/28/2019   2:55 PM         441857 awr_XYZ_inst_1_11951_11952.text
-a----       10/28/2019   2:55 PM         445155 awr_XYZ_inst_1_11944_11945.text
-a----       10/28/2019   2:56 PM         468731 awr_XYZ_inst_1_11948_11949.text
-a----       10/28/2019   2:57 PM         449185 awr_XYZ_inst_1_11950_11951.text
-a----       10/28/2019   2:58 PM         428541 awr_XYZ_inst_1_11953_11954.text
-a----       10/28/2019   2:59 PM         442711 awr_XYZ_inst_1_11957_11958.text

Top 10 Foreground Events

This list of AWR reports has been requested by the Oracle support so that tuning suggestions can be given to my client. One of the suggestions given by Oracle was that reducing the number of log file sync wait event will help improving the performance of the application. I was curious to know what in the AWR reports list makes Oracle support suggesting this log file sync issue. So, I decided to examine the 45 Top 10 Foreground Events to see how much the log file sync wait event participate to the % of the total DB time.

For that, I used the following simple PowerShell script to generate a single file containing only the Top 10 Foreground Events of the totality of the 45 AWR reports:

$SearchString = 'Top 10 Foreground Events by Total Wait Time'
$AllAwrs=Get-ChildItem -Recurse -Include *.text	
for ($i=0; $i -lt $AllAwrs.Count; $i++)
 {
	$result = Select-String $SearchString $AllAwrs[$i].FullName -Context 0, 14 | % {$_.Context.PostContext} 
    $result |  Out-File Top10ForegroundWaitevents.txt   -Append	
 }     	


And, finally, all what I have to do was to analyze the Top10ForegroundWaitevents.txt file

 Event                                 Waits Time Avg(ms)   time Wait Class 
------------------------------ ------------ ---- ------- ------ ---------- 
DB CPU                                      1820           55.2            
db file scattered read               44,406 1032      23   31.3 User I/O   
db file sequential read           2,346,969 735.       0   22.3 User I/O   
direct path write temp                4,085 50.1      12    1.5 User I/O   
log file sync                         3,688 18.3       5     .6 Commit     --> only 0.6%
direct path read temp                13,173 14.2       1     .4 User I/O   
enq: TX - row lock contention           133    1       7     .0 Applicatio 
control file sequential read         11,846   .6       0     .0 System I/O 
db file parallel read                    13   .6      43     .0 User I/O   
direct path read                      2,556   .5       0     .0 User I/O   
                                                                           
                                            Tota    Wait   % DB            
Event                                 Waits Time Avg(ms)   time Wait Class 
------------------------------ ------------ ---- ------- ------ ---------- 
db file scattered read              212,869 3439      16   94.1 User I/O   
DB CPU                                      422.           11.5            
control file sequential read         11,846   .5       0     .0 System I/O 
OJVM: Generic                           139   .5       3     .0 Other      
db file sequential read               5,270   .3       0     .0 User I/O   
log file sync                            67   .3       4     .0 Commit     --> almost 0.0% 
Disk file operations I/O                373    0       0     .0 User I/O   
direct path read                        108    0       0     .0 User I/O   
SQL*Net message to client             1,706    0       0     .0 Network    
direct path sync                          1    0       2     .0 User I/O   
                                                                           
                                            Tota    Wait   % DB            
Event                                 Waits Time Avg(ms)   time Wait Class 
------------------------------ ------------ ---- ------- ------ ---------- 
DB CPU                                      3918           91.8            
db file scattered read               11,553  235      20    5.5 User I/O   
db file sequential read              28,353 87.4       3    2.0 User I/O   
log file sync                         9,537   78       8    1.8 Commit     --> only 1.8% 
resmgr:become active                      4   10    2512     .2 Scheduler  
enq: TX - row lock contention           344  4.3      13     .1 Applicatio 
direct path read temp                 1,241    1       1     .0 User I/O   
direct path read                      6,324    1       0     .0 User I/O   
direct path write temp                  130   .8       6     .0 User I/O   
control file sequential read         11,871   .7       0     .0 System I/O

Etc.….

Bottom-line

In this very simple note I gave a SQL script to generate multiple AWR reports between two dates. I have also examined, via a PowerShell script, how to loop over the content of those AWR reports and extract only an AWR section into a single file for a better and clean analysis.

November 28, 2019

DOP Downgrade

Filed under: Oracle — hourim @ 5:53 pm

Suppose you are confronted to a performance issue due to an initially optimized parallel execution plan that has been 100% DOP downgraded. Suppose also that this DOP downgrade is due to an Oracle Resource Manager max DOP limit directive.
Would you be able to point out this DOP downgrade root cause using gv$sql_monitor view? Would you be able as well to say, a posteriori (using AWR tables), that your issue was due to ORM max DOP limit?

This blog post aims to answer the above two questions.

1.Replaying the issue using a reproducible model

While it is not so straightforward to link a real-life performance issue to a DOP downgrade due to a resource manager maxim allowed DOP, it is nevertheless, very easy to model and reproduce this case at will.

RDEV>  create table t1 as select a.* from dba_objects a,dba_objects where rownum<= 1e6;

RDEV2> alter table t1 parallel 4;

RDEV2> select count(1) from t1 where owner >= 'A';

Parallel Execution Details (DOP=4 , Servers Allocated=4)
===================================================================
|      Name      | Type  | Server# | Elapsed |   Cpu   |    IO    |
|                |       |         | Time(s) | Time(s) | Waits(s) |
===================================================================
| PX Coordinator | QC    |         |    0.12 |    0.01 |          |
| p000           | Set 1 |       1 |    0.20 |    0.03 |     0.15 |
| p001           | Set 1 |       2 |    0.20 |    0.03 |     0.15 |
| p002           | Set 1 |       3 |    0.20 |    0.04 |     0.15 |
| p003           | Set 1 |       4 |    0.20 |    0.04 |     0.12 |
===================================================================
-------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  |    TQ  |IN-OUT|
-------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |       |        |      |
|   1 |  SORT AGGREGATE        |          |     1 |        |      |
|   2 |   PX COORDINATOR       |          |       |        |      |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |  Q1,00 | P->S |
|   4 |     SORT AGGREGATE     |          |     1 |  Q1,00 | PCWP |
|   5 |      PX BLOCK ITERATOR |          |  1000K|  Q1,00 | PCWC |
|*  6 |       TABLE ACCESS FULL| T1       |  1000K|  Q1,00 | PCWP |
-------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)
       filter("OWNER">='A')

Note
-----
   - Degree of Parallelism is 4 because of table property
 

As you can see, the Degree Of Parallelism 4 has been honored and the SQL query has been executed using Parallelism.
However, if under the hood, I create an Oracle Resource Manager which will limit the maximum DOP allowed to be used by the current user (RDEV) then the same query will be DOP downgraded causing performance pain like
in one of my real-life case (a blog post will follow):

BEGIN
-- clear pending area
  DBMS_RESOURCE_MANAGER.clear_pending_area();
  DBMS_RESOURCE_MANAGER.create_pending_area();
  
-- create resource plan
  DBMS_RESOURCE_MANAGER.create_plan(
    plan    => 'DOPDOWNGRAD_RPL',
    comment => 'Resource Plan limiting DOP');
    
-- create consumer group
  DBMS_RESOURCE_MANAGER.create_consumer_group(
    consumer_group => 'DOPDONWGRAD_RSG',
    comment        => 'LOW Parallel Priority group');

-- assign the DOP limitation to the consumer group
  DBMS_RESOURCE_MANAGER.create_plan_directive (
    plan                     => 'DOPDOWNGRAD_RPL',
    group_or_subplan         => 'DOPDONWGRAD_RSG',
    comment                  => 'LOW PDOP Priority',    
    parallel_degree_limit_p1 => 1);

-- This is mandatory to avoid ORA-29377: consumer group OTHER_GROUPS is not part of top-plan DOPDOWNGRAD_RPL
  DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE(
    PLAN             => 'DOPDOWNGRAD_RPL',
    GROUP_OR_SUBPLAN => 'OTHER_GROUPS',
    MGMT_P1          => 10,
    COMMENT          => 'Directive for OTHER_GROUPS (mandatory)');
    
 -- validate the resource plan
  DBMS_RESOURCE_MANAGER.validate_pending_area;
  DBMS_RESOURCE_MANAGER.submit_pending_area();
END;
/

The preceding anonymous PL/SLQ block creates the DOPDOWNGRAD_RPL resource plan, attaches to it the DOPDONWGRAD_RSG consumer group and assigns a directive to this consumer group so that maximum DOP allowed by this consumer group would be 1.

To finish the setup, I will assign the user RDEV to this consumer group and set the DOPDOWNGRAD_RPL plan as the default resource plan:

-- Affect user RDEV to resource group
BEGIN
    DBMS_RESOURCE_MANAGER.CREATE_PENDING_AREA();
    DBMS_RESOURCE_MANAGER.SET_CONSUMER_GROUP_MAPPING(
       ATTRIBUTE      => DBMS_RESOURCE_MANAGER.ORACLE_USER,
       VALUE          => 'RDEV',
       CONSUMER_GROUP => 'DOPDONWGRAD_RSG');
	  
   DBMS_RESOURCE_MANAGER.VALIDATE_PENDING_AREA();
   DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA();
END;
/

We are now ready to run again the same query and check whether it has been executed parallelly or it has been pre-empted by the DOPDOWNGRAD_RPL resource plan to use parallelism:

RDEV> alter system set resource_manager_plan='DOPDOWNGRAD_RPL';

System altered.

RDEV> show parameter resource_manager_plan

NAME                    TYPE        VALUE
----------------------- ----------- ----------------
resource_manager_plan   string      DOPDOWNGRAD_RPL

Let’s execute the same query using RDEV user. Remember that this user belongs to an oracle resource manager group with a parallel degree limit directive as shown above:

SQL> SELECT /*+ monitor */ count(1) from t1 where owner >= 'A';

  COUNT(1)
----------
   1000000

RDEV> @xpsimp

SQL_ID  b60wdaqys7wf9, child number 0
-------------------------------------
SELECT /*+ monitor */ count(1) from t1 where owner >= 'A'

Plan hash value: 3110199320
---------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes |   TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |       |       |       |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |     4 |       |      |            |
|   2 |   PX COORDINATOR       |          |       |       |       |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |     4 | Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |     4 | Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          |  1000K|  3906K| Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL| T1       |  1000K|  3906K| Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)
       filter("OWNER">='A')

The absence of the Note line at the bottom of the execution plan represents, already, a first indication that the query has not been executed using parallelism. Indeed, this is confirmed via the corresponding SQL Monitoring report:

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)     
 Instance ID         :  1                   
 Session             :  RDEV (273:24189)    
 SQL ID              :  b60wdaqys7wf9       
 SQL Execution ID    :  16777216            
 Execution Started   :  11/07/2019 09:02:53 
 First Refresh Time  :  11/07/2019 09:02:53 
 Last Refresh Time   :  11/07/2019 09:02:53 
 Duration            :  .414398s            
 Module/Action       :  SQL*Plus/-          
 Service             :  diiqazbx            
 Program             :  sqlplus.exe         
 DOP Downgrade       :  100%        --> here            
 Fetch Calls         :  1   

The ACTIVE type of the same SQL Monitoring report shows the reason:

Other Plan Line Statistics

   Min DOP after downgrade 1
         downgrade reason  352

Thanks to Yasin Baskan, I knew that the description of the downgrade reason codes differs from release to release. We need to use the following query in the corresponding database (12cR2 in the current case) to get the actual significance of each code:

-- +-----------------------------------|
-- query supplied by Yasin Baskan
-- SQL> desc x$qksxa_reason
--		 Name           Type
--		 -------------- ---------------|
--		 ADDR           RAW(8)
--		 INDX           NUMBER
--		 INST_ID        NUMBER
--		 CON_ID         NUMBER
--		 QKSXAREASONS   VARCHAR2(255)
-- +-----------------------------------|
col qksxareasons format a100
col indx         format 999
SELECT
    indx
   ,qksxareasons
FROM
    x$qksxa_reason
WHERE
    qksxareasons like '%DOP downgrade%';   	


INDX QKSXAREASONS
---- -----------------------------------------------------
 351 DOP downgrade due to adaptive DOP
 352 DOP downgrade due to resource manager max DOP
 353 DOP downgrade due to insufficient number of processes
 354 DOP downgrade because slaves failed to join

We finally, have found the real root cause of the simulated performance issue: 352 DOP downgrade due to resource manager max DOP

2. Back to the aim of the article

The first questions that came to my mind when I have started troubleshooting the DOP Downgrade real-life case were:

• Can I find any Oracle Resource manager related information in the gv$sql_monitor view ?
• Can I find historical (AWR) execution statistics of any active Oracle Resource manager actions ?

2.1. v$sql_monitor

As per regards to the first question, hopefully the gv$sql_monitor view contains 4 columns which seem to be strictly related to the Oracle resource manager as shown below with column name starting by RM_:

SQL> @desc gv$sql_monitor

 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 INST_ID                                            NUMBER
 KEY                                                NUMBER
 REPORT_ID                                          NUMBER
 STATUS                                             VARCHAR2(19)
 USER#                                              NUMBER
 USERNAME                                           VARCHAR2(128)
../..
 RM_LAST_ACTION                                     VARCHAR2(48)
 RM_LAST_ACTION_REASON                              VARCHAR2(128)
 RM_LAST_ACTION_TIME                                DATE
 RM_CONSUMER_GROUP                                  VARCHAR2(128)

Let’s see then what Oracle has stored in those four RM_ columns:

RDEV> SELECT
 decode(process_name, 'ora', 'QC', process_name) as process_name
 ,rm_consumer_group
 ,rm_last_action
 ,rm_last_action_reason
 ,rm_last_action_time
     FROM
        gv$sql_monitor
    WHERE 
        sql_id = 'b60wdaqys7wf9';

PROCES RM_CONSUMER_GROUP RM_LAST_ACTION RM_LAST_ACTION_REASO RM_LAST_ACTION_TIME
------ ----------------- -------------- -------------------- --------------------
QC     DOPDONWGRAD_RSG   {null}               {null}               {null}

Unfortunately, it looks like only the column RM_CONSUMER_GROUP is filled up by Oracle. The remaining RM_ columns are always null.

However, it looks like that whenever the column RM_CONSUMER_GROUP contains a value which is not equal to OTHER_GROUPS (like in the current example where RM_CONSUMER_GROUP= DOPDONWGRAD_RSG), we can infer that the current monitored query
is subject to a resource manager directive which might influence its execution.

For example, let’s execute the same query when connected with a different user (RDEV2) and see what information will be stored in the gv$sql_monitor view:

RDEV2> SELECT /*+ monitor */ count(1) from t1 where owner >= 'A'

COUNT(1)
----------
   1000000

SQL_ID  b60wdaqys7wf9, child number 2
-------------------------------------
SELECT /*+ monitor */ count(1) from t1 where owner >= 'A'
Plan hash value: 3110199320
--------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |       |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |        |      |            |
|   2 |   PX COORDINATOR       |          |       |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          |  1000K|  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL| T1       |  1000K|  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   6 - access(:Z>=:Z AND :Z<=:Z)
       filter("OWNER">='A')

Note
-----
   - Degree of Parallelism is 4 because of table property

RDEV2> SELECT  
   username
  ,decode(process_name, 'ora', 'QC', process_name) as process_name
  ,rm_consumer_group
  ,rm_last_action
  ,rm_last_action_reason
  ,rm_last_action_time
      FROM
        gv$sql_monitor
      WHERE 
        sql_id = 'b60wdaqys7wf9'
order by last_refresh_time;

USERNAME  PROCES  RM_CONSUMER_GROUP  RM_LAST_ACTION  RM_LAST_ACTION_REASO  RM_LAST_ACTION_TIME
-------- -------- ----------------- ---------------- -------------------- --------------------
RDEV      QC     DOPDONWGRAD_RSG    {null}           {null}                {null}
RDEV2     QC     OTHER_GROUPS       {null}           {null}                {null}                 
{null}    p002   OTHER_GROUPS       {null}           {null}                {null}                 
{null}    p000   OTHER_GROUPS       {null}           {null}                {null}                 
{null}    p003   OTHER_GROUPS       {null}           {null}                {null}                 
{null}    p001   OTHER_GROUPS       {null}           {null}                {null}     

When the same query is executed by user RDEV2 there is no DOP downgrade because this user doesn’t belong to any user created resource manager group. There are two information backing this claim:

• User RDEV2 has OTHER_GROUPS as a default consumer group
• There is 4 PX server associated with the QC of user RDEV2

3.2. AWR and Oracle Resource manager
The second investigation was to see whether I can, a posteriori, diagnose that my query underwent a resource manager directive; particularly those concerning parallel degree limit. For that reason, I wrote the following script

SQL> @getConsGrpHist
SELECT
   begin_time
  ,execs 
  ,avg_etime
  ,avg_px
 -- ,trunc(avg_etime/decode(avg_px,0,1,avg_px) ,2) avg_px_time
  ,plan_name
  ,cpu_managed
  ,parallel   
FROM
   (SELECT 	   
        sn.begin_interval_time      begin_time
       ,st.executions_delta         execs 
       ,st.end_of_fetch_count_delta end_of_fetch 
       ,trunc(st.elapsed_time_delta/1e6/decode(st.executions_delta, 0, 1,st.executions_delta))   avg_etime
       ,round(st.px_servers_execs_delta/decode(st.executions_delta,0,1, st.executions_delta), 1) avg_px	  
       ,sr.plan_name
       ,sr.cpu_managed
       ,sr.parallel_execution_managed parallel    
    FROM 
         dba_hist_rsrc_plan sr
        ,dba_hist_snapshot  sn
        ,dba_hist_sqlstat   st
    WHERE
	sr.snap_id = sn.snap_id
    AND sn.snap_id = st.snap_id
    AND sn.begin_interval_time between to_date('&from_date', 'ddmmyyyy hh24:mi:ss')
				      and    to_date('&to_date'  , 'ddmmyyyy hh24:mi:ss')
    AND st.sql_id ='&sql_id'
    AND plan_name != 'DEFAULT_MAINTENANCE_PLAN'	
	) 
order by begin_time asc;      

SQL> @GetConsGrpHist
Enter value for from_date: 08112019 15:07:00
Enter value for to_date: 08112019 15:09:00
Enter value for sql_id: b60wdaqys7wf9

BEGIN_TIME                 EXECS  AVG_ETIME  AVG_PX PLAN_NAME       CPU_MANAGED PARALLEL
------------------------- ------ ---------- ------- --------------- ----------- ----------
08-NOV-19 03.07.08.369 PM      1          0       0 DOPDOWNGRAD_RPL ON          FULL
08-NOV-19 03.07.08.369 PM      1          0       0 DOPDOWNGRAD_RPL ON          FULL

Indeed, I can see that my query has used a non-default resource plan (DOPDOWNGRAD_RPL) in which a parallel statement queuing is enabled as indicated by the FULL value of the PARALLEL column.

Bottom-line

In this simple Note I outlined two main points:

• Show how to identify a DOP downgrade and its reason
• Show how to link a non-default Oracle resource manager to a SQL_ID by checking the value of the column RM_CONSUMER_GROUP from the gv$sql_monitor view
• Show that the remaining 3 resource manager columns (RM_LAST_ACTION, RM_LAST_ACTION_REASON, RM_LAST_ACTION_TIME) in the gv$sql_monitor view are, unfortunately, not maintained by Oracle
• Show how to look in AWR historical execution for queries subject to user defined Oracle Resource Manager directives

November 14, 2019

Incremental and ruin

Filed under: Oracle — hourim @ 8:02 pm

It’s hardly a secret that many people are not fan of histograms. The main problem of them is that they are not very well documented, and they drive, under the hood, many other features like Adaptive Cursor Sharing. They also interact with gathering statistics for partitioned tables when the incremental preference is set to TRUE.

As you might know, if you have read my two preceding articles here and here, I have been struggling with recurrent PeopleSoft performance issues because of a wrong cardinality estimation when visiting a partitioned table. I couldn’t figure out exactly the root cause of this cardinality mis-estimation so that I opened an Oracle Service Request (SR). As an answer to my request, Oracle support send me a reference to my two articles mentioned above😊. While I was still waiting for a viable answer from the Oracle support, the same performance issue kicked in and I was asked to solve it as quickly as possible.

After a couple of thinking minutes, I decided to change the incremental preference of that partitioned table and regather its statistics as in the following:

SQL> exec dbms_stats.set_table_prefs('SYSADM', 'PS_XYZ', 'incremental', 'false');
SQL> exec dbms_stats.gather_table_stats('SYSADM', 'PS_XYZ',degree => 4);

This made the culprit query, identified by the SQL_ID 147gz7h1x7jkq, going from a non-ending 42,886 seconds

Global Information
------------------------------
 Status              :  EXECUTING                         
 Instance ID         :  1                                 
 Session             :  SYSADM (23:47269)                 
 SQL ID              :  147gz7h1x7jkq                     
 SQL Execution ID    :  16777219                          
 Execution Started   :  08/28/2019 21:19:19               
 First Refresh Time  :  08/28/2019 21:19:25               
 Last Refresh Time   :  08/29/2019 09:14:05               
 Duration            :  42886s                            
 

to about only 5 seconds for the worst case

SQL> @sqlstats
Enter value for sql_id: 147gz7h1x7jkq

     CHILD    INST_ID PLAN_HASH_VALUE   AVG_GETS LAST_ACTI            AVG_ETIME      EXECS
---------- ---------- --------------- ---------- ------------------- ---------- ----------
         0          1      3376574399      23452 29/08/2019 17:09:08   5,357016          1
         0          2       430435019       2961 29/08/2019 17:09:36   1,366556          1

The goal of this article is to explain the negative effect the incremental mode can have on the effective use of histogram during cardinality estimation.

By the way, if you are wondering why the same query has got two different execution plans (3376574399,430435019) at almost the same time, then bear in mind that this is the effect of a GTT table having a SESSION private statistics.

1.Few words about the Setup

This is a PeopleSoft software using Oracle 12cR1 under Exadata machine. The cardinality misestimation occurs when Oracle visits a range (or a range/list) partitioned table of 49 partitions. The partition key is of varchar2(11) data type having a uniform distribution for certain partitions, and a skewed distribution at others. The incremental preference of the partitioned table is set to TRUE.

Here’s an image of the partition key column (EMPLID) statistics when the performance issue occurred (incremental = TRUE) and after it has been fixed (incremental =FALSE) respectively (at global and partition level):

-- incremental = TRUE
SELECT
   table_name
  ,column_name
  ,notes
FROM
  all_tab_col_statistics
WHERE 
    owner       = 'SYSADM'
AND table_name  = 'PS_XYZ'
AND column_name = 'EMPLID';    
  
TABLE_NAME  COLUMN_NAME   HISTOGRAM       NOTES   
----------- ------------- --------------- ----------------------------------
PS_XYZ      EMPLID        HYBRID          INCREMENTAL HIST_FOR_INCREM_STATS        
SQL> select
         partition_name
        ,num_distinct
        ,density
        ,histogram
        ,notes
    from
        all_part_col_statistics
    where owner = 'SYSADM'
    and table_name = 'PS_XYZ'
    and column_name = 'EMPLID';


PARTITION_NAME  NUM_DISTINCT    DENSITY HISTOGRAM       NOTES                 
--------------- ------------ ---------- --------------- ----------------------
PS_XYZ_T_1             19888     ,00005 HYBRID                                
PS_XYZ_T_10             1191    ,000831 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_11             1153    ,000861 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_12             1187    ,000835 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_13             1390    ,000707 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_14             1733    ,000552 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_15             1950    ,000479 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_16             1842    ,000516 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_17             1584    ,000611 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_18             1714     ,00056 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_19             1410    ,000697 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_2              1059    ,000941 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_20             1460    ,000672 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_21             1899    ,000526 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_22             1916     ,00049 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_23             1568    ,000621 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_24             1550    ,000625 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_25             1697    ,000568 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_26             1637     ,00059 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_27             1528    ,000639 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_28             1629    ,000596 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_29             1479    ,000663 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_3               960     ,00104 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_30             1480    ,000662 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_31             1839     ,00052 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_32             3007    ,000284 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_33             7693     ,00013 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_34             2776     ,00036 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_35             3653    ,000215 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_36             7940    ,000126 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_37             3277    ,000305 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_38             8461    ,000118 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_39             7855    ,000127 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_4              1764    ,000567 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_40             4878    ,000205 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_41             6361    ,000157 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_42             6064    ,000165 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_43             9641    ,000104 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_44             9142    ,000109 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_45             8043    ,000124 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_46             5664    ,000078 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_47            21980    ,000045 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_48             9550    ,000105 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_49           270752    ,000004 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_5              1379    ,000725 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_6              1050    ,000952 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_7              1016    ,000984 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_8              1032    ,000969 HYBRID          HIST_FOR_INCREM_STATS 
PS_XYZ_T_9              1232    ,000811 HYBRID          HIST_FOR_INCREM_STATS 

49 rows selected.
-- incremental = FALSE

TABLE_NAME  COLUMN_NAME  HISTOGRAM       NOTES                                
----------- ------------ --------------- ------
PS_XYZ      EMPLID       HYBRID

PARTITION_NAME   NUM_DISTINCT    DENSITY HISTOGRAM       NOTES   
--------------- ------------ ---------- --------------- ---------
PS_XYZ_T_1             19888     ,00005 HYBRID                   
PS_XYZ_T_10             1191 ,000839631 NONE                     
PS_XYZ_T_11             1153    ,000861 HYBRID                   
PS_XYZ_T_12             1187  ,00084246 NONE                     
PS_XYZ_T_13             1390 ,000719424 NONE                     
PS_XYZ_T_14             1733 ,000577034 NONE                     
PS_XYZ_T_15             1950    ,000476 HYBRID                   
PS_XYZ_T_16             1842    ,000515 HYBRID                   
PS_XYZ_T_17             1584 ,000631313 NONE                     
PS_XYZ_T_18             1714     ,00056 HYBRID                   
PS_XYZ_T_19             1410  ,00070922 NONE                     
PS_XYZ_T_2              1059 ,000944287 NONE                     
PS_XYZ_T_20             1460    ,000671 HYBRID                   
PS_XYZ_T_21             1899    ,000526 HYBRID                   
PS_XYZ_T_22             1916    ,000491 HYBRID                   
PS_XYZ_T_23             1568     ,00062 HYBRID                   
PS_XYZ_T_24             1550    ,000626 HYBRID                   
PS_XYZ_T_25             1697    ,000562 HYBRID                   
PS_XYZ_T_26             1637    ,000593 HYBRID                   
PS_XYZ_T_27             1528     ,00064 HYBRID                   
PS_XYZ_T_28             1629 ,000613874 NONE                     
PS_XYZ_T_29             1479    ,000663 HYBRID                   
PS_XYZ_T_3               960 ,001041667 NONE                     
PS_XYZ_T_30             1480    ,000661 HYBRID                   
PS_XYZ_T_31             1839    ,000518 HYBRID                   
PS_XYZ_T_32             3007    ,000288 HYBRID                   
PS_XYZ_T_33             7693     ,00013 HYBRID                   
PS_XYZ_T_34             2776 ,000360231 NONE                     
PS_XYZ_T_35             3653    ,000215 HYBRID                   
PS_XYZ_T_36             7940    ,000126 HYBRID                   
PS_XYZ_T_37             3277 ,000305157 NONE                     
PS_XYZ_T_38             8461    ,000118 HYBRID                   
PS_XYZ_T_39             7855    ,000127 HYBRID                   
PS_XYZ_T_4              1764    ,000567 HYBRID                   
PS_XYZ_T_40             4878 ,000205002 NONE                     
PS_XYZ_T_41             6361    ,000157 HYBRID                   
PS_XYZ_T_42             6064    ,000165 HYBRID                   
PS_XYZ_T_43             9641    ,000104 HYBRID                   
PS_XYZ_T_44             9142    ,000109 HYBRID                   
PS_XYZ_T_45             8043 ,000124332 NONE                     
PS_XYZ_T_46             5664    ,000076 HYBRID                   
PS_XYZ_T_47            21980    ,000045 HYBRID                   
PS_XYZ_T_48             9550    ,000105 HYBRID                   
PS_XYZ_T_49           270944    ,000004 HYBRID                   
PS_XYZ_T_5              1379 ,000725163 NONE                     
PS_XYZ_T_6              1050 ,000952381 NONE                     
PS_XYZ_T_7              1016    ,000984 HYBRID                   
PS_XYZ_T_8              1032 ,000968992 NONE                     
PS_XYZ_T_9              1232 ,000811688 NONE                     

49 rows selected.

As you can see, this is a weird non-common situation. There is a clear contradiction here. How could it be differently when we see that, switching the incremental mode from true to false, switches the same data distribution for the same column at the same partition (PS_JOB_T_10 for example) from skewed (HYBIRD) to non-skewed (NONE)?

In fact, if you look closer you will realize that, as far as query optimization is concerned, there is no contradiction in the following situation:

HYBRID  +  HIST_FOR_INCREM_STATS = NONE

Because, when the CBO sees that the HYBRID histogram is accompanied by the Notes HIST_FOR_INCREM_STATS, it knows exactly that this histogram will be ignored at query optimization time, which is equivalent to not having histogram at all. This is for example correct for partitions like PS_JOB_T_10, PS_JOB_T_12 and PS_JOB_T_13 where column values are evenly distributed. But the real threat resides in partitions like PS_JOB_T_11, PS_JOB_T_15 and PS_JOB_T_16 for example where we have the following situation:

HYBRID  +  HIST_FOR_INCREM_STATS != HYBRID

In this case, the column values are skewed but their histogram are ignored by the CBO during query optimization because of the HIST_FOR_INCREM_STATS Notes. This is exactly the root cause of my performance issue.

  • Regardless of whether the partition key is skewed or not it will have a histogram
  • But this histogram will be ignored both at partition (except partition n°1 in this case) and at global level during query optimization.

Therefore, it becomes clear for me that fixing my performance issue resides in making the CBO not ignoring the presence of histogram at both partition level and global level. And, disabling the incremental mode, is what I did to achieve this goal.

2. Why histograms are ignored?

In the SR I opened for the Oracle support I have supplied the trace file of a dbms_stats call under incremental mode. Something resembling to this (partition n°11 in TEST database contains non-skewed value for the EMPLID column):

SQL> exec dbms_stats.set_global_prefs ('TRACE', to_char (1+16));

SQL> begin
        dbms_stats.gather_table_stats('sysadm'
                                      ,'PS_XYZ'
                                      ,partname  =>'PS_XYZ_T_11'
                                      ,granularity =>'partition');
    end;
   /
SQL> exec dbms_stats.set_global_prefs('TRACE', null);
DBMS_STATS: Building Histogram for EMPLID                                                                          
DBMS_STATS:  bktnum=-1, nnv=50819, snnv=5511,604748383025713834989746016721880424, sndv=1191, est_ndv=1191, mnb=254
DBMS_STATS:  Trying hybrid histogram                                                                               
DBMS_STATS:  > cdn 5592, popFreq 0, popCnt 0, bktSize 22,0750988142292490118577075098814229249
             , bktSzFrc ,0750988142292490118577075098814229249     
                                                
DBMS_STATS:  Evaluating hybrid histogram:  cht.count 254, mnb 254, ssize 5592, min_ssize 2500, appr_ndv  TRUE
             ,ndv 1191, selNdv 0, selFreq 0, 
             pct 10,84555923647262975232686543618867329232, avg_bktsize 22, csr.hreq FALSE, normalize TRUE   
DBMS_STATS:   Checking range skew:                                                                                 
DBMS_STATS:      >> NOT SKEWED                           ----------------->                                                   
DBMS_STATS:  Not Discarding histogram (Incremental mode) ----------------->                                        
DBMS_STATS:   Histogram gathering flags: 4623     

As you can see, Oracle is aware that the EMPLID column has an even distribution in partition n°11 but it decides to do not discard the creation of histogram because of the incremental mode. Oracle, nevertheless, updates the corresponding Notes column to indicate that this histogram will not be used during cardinality estimation.

PARTITION_NAME   NUM_DISTINCT    DENSITY HISTOGRAM       NOTES
---------------- ------------ ---------- --------------- ----------------------
PS_XYZ_T_11              1191    ,000833 HYBRID          HIST_FOR_INCREM_STATS

If, I regather the same statistics for partition n°12 where the EMPLID is skewed (in TEST) I will have the following trace snippet:

DBMS_STATS: Building Histogram for EMPLID                                                                          
DBMS_STATS:  bktnum=-1, nnv=47961, snnv=5501,261730969760166840458811261730969758, sndv=1153, est_ndv=1153, mnb=254
DBMS_STATS:  Trying hybrid histogram                                                                               
DBMS_STATS:  > cdn 5443, popFreq 0, popCnt 0, bktSize 21,49407114624505928853754940711462450593, 
              bktSzFrc ,49407114624505928853754940711462450593                                    
DBMS_STATS:  Evaluating hybrid histogram:  cht.count 254, mnb 254, ssize 5443, min_ssize 2500, appr_ndv  TRUE
            ,ndv 1153, selNdv 0, selFreq 0
            ,pct 11,47028154327424400417101147028154327424, avg_bktsize 21, csr.hreq FALSE, normalize TRUE   
DBMS_STATS:   Checking range skew:                                                                                 
DBMS_STATS:      >> SKEWED                   ------------------------>                                                       
DBMS_STATS:   Histogram gathering flags: 527                                                                       
DBMS_STATS:  Accepting histogram             ------------------------>             

DBMS_STATS:          Need Actual Values (DSC_EAVS)                                                                 
DBMS_STATS:          Partition: PS_XYZ_T_12                                                                        
DBMS_STATS:          Histogram Type: HYBRID Data Type: 2                                                           
DBMS_STATS:          Histogram Flags: 4 Histogram Gathering Flags: 10                                              
DBMS_STATS:          Incremental: TRUE Fix Control 13583722: 1      

This time, Oracle has correctly created the histogram and has left the Notes column empty indicating that the histogram will be used by the CBO when computing cardinality estimation for partition n°12:

PARTITION_NAME    NUM_DISTINCT DENSITY HISTOGRAM          NOTES
---------------- ------------ -------- ----------------- ----------------------
PS_XYZ_T_11       1191        ,000833  HYBRID             HIST_FOR_INCREM_STATS
PS_XYZ_T_12       1153        ,000861  HYBRID

However, at global level, Oracle will still ignore the usage of histogram as the following proves:

SQL> SELECT
       table_name
       ,column_name
       ,histogram
       ,notes
    FROM
      all_tab_col_statistics
    WHERE
     owner = 'SYSADM'
     and table_name ='PS_XYZ'
     and column_name = 'EMPLID';

TABLE_NAME  COLUMN_NAME   HISTOGRAM       NOTES   
----------- ------------- --------------- ----------------------------------
PS_XYZ      EMPLID        HYBRID          INCREMENTAL HIST_FOR_INCREM_STATS        

I think that it suffices to have a single partition with that Note value (HIST_FOR_INCREM_STATS ) to see Oracle ignoring the usage of histogram at global level as well.

In passing, if you have exercised eyes you should have already remarked the control fix in the dbms_stats trace file

DBMS_STATS: Incremental: TRUE Fix Control 13583722: 1

SQL> select description from V$SYSTEM_FIX_CONTROL where bugno = '13583722';

DESCRIPTION
----------------------------------------------------------------
preserve partition level histograms &amp; actual values for increment

For the sake of simplicity, I will postpone publishing my investigations on this fix_control to a separate article.

3. Possible reasons

The following references might help understanding the situation explained in this article and the possible reasons pushing Oracle to keep creating histograms but ignoring their usage during query optimisation process:

4. Bottom-line

As we have seen above, partition level histogram will not be discarded for tables with incremental mode set to TRUE. This happens regardless of whether the column values are skewed or evenly distributed. I think that, the reason behind this is that, under incremental mode, when global histogram aggregations are calculated, missing partition level histograms must be regathered at partition level with even distribution. Probably Oracle decided to keep these histograms at partition level to avoid doing the same thing two times under incremental mode. But it decides, at the same time, to pre-empt the CBO from using the histograms both at global and partition level during query optimization.

All in all, I think that disabling incremental mode is something one should envisage if confronted to a similar performance issue.

September 12, 2018

Troubleshooting via indexing

Filed under: Oracle — hourim @ 6:27 pm

The damage that can be inflicted by a filter predicate applied on a big table is difficult to calculate. But it is clear that it has been almost always a tragic mistake to filter a huge amount of rows from a very large table. This article outlines one of those practical cases where the creation of a chirurgical index overcame the waste of time and energy spent at the table level in a very critical third party software query.

To begin with here’s a SQL monitoring report of this critical query taken from a 12cR2 release:

SELECT
    conRef h,
    vamisin h
FROM 
  TABLEHIST h
WHERE h.typeisin <> 4
AND h.optionv NOT IN
  (SELECT  vwx.ident 
   FROM AGGREGATE_VW vwx 
   WHERE (vwx.I1 = 2298535)
  )
AND h.backclient NOT IN (106, 28387)
AND h.vamisin        IN (91071113)

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)       
 Instance ID         :  1                     
 Session             :  XXXXXX (2198:9848) 
 SQL ID              :  7v8109fkuusw3         
 SQL Execution ID    :  17529954              
 Execution Started   :  08/01/2018 04:58:31   
 First Refresh Time  :  08/01/2018 04:58:35   
 Last Refresh Time   :  08/01/2018 04:59:16   
 Duration            :  45s                   
 Module/Action       :  JDBC Thin Client/-    
 Service             :  sphpre1               
 Program             :  JDBC Thin Client      
 Fetch Calls         :  6                     

Global Stats
=================================================================
| Elapsed |   Cpu   |    IO    | Fetch | Buffer | Read  | Read  |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  | Reqs  | Bytes |
=================================================================
|      44 |    2.35 |       42 |     6 |  81930 | 39157 | 311MB |
=================================================================

SQL Plan Monitoring Details (Plan Hash Value=4122899908)
========================================================================================================
| Id |               Operation                |          Name          |   Time    | Start  |   Rows   |
|    |                                        |                        | Active(s) | Active | (Actual) |
========================================================================================================
|  0 | SELECT STATEMENT                       |                        |        42 |     +4 |    50549 |
|  1 |   HASH JOIN ANTI                       |                        |        42 |     +4 |    50549 |
|  2 |    TABLE ACCESS BY INDEX ROWID BATCHED |TABLEHIST               |        42 |     +4 |    50549 |
|  3 |     INDEX RANGE SCAN                   |TABLEHIST_IND4          |        42 |     +4 |     281K |
|  4 |    VIEW                                |                        |         1 |    +45 |    12883 |
|  5 |     CONNECT BY WITH FILTERING (UNIQUE) |                        |         1 |    +45 |    21028 |
|  6 |      INDEX RANGE SCAN                  |IDX_AGGREGAT_VIEW_IDENT |         1 |    +45 |       40 |
|  7 |      HASH JOIN                         |                        |         1 |    +45 |        0 |
|  8 |       NESTED LOOPS                     |                        |         1 |    +45 |    20988 |
|  9 |        STATISTICS COLLECTOR            |                        |         1 |    +45 |        9 |
| 10 |         CONNECT BY PUMP                |                        |         1 |    +45 |    21028 |
| 11 |        INDEX RANGE SCAN                |IDX_AGGREGAT_VIEW_IDENT |         1 |    +45 |    20988 |
| 12 |       INDEX FAST FULL SCAN             |IDX_AGGREGAT_VIEW_IDENT |           |        |          |
========================================================================================================

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("H"."OPTIONV"="IDENT")
   2 - filter(("H"."BACKCLIENT"<>:SYS_B_3 AND "H"."BACKCLIENT"<>:SYS_B_2 
           AND "H"."TYPEISIN"<>:SYS_B_0))
   3 - access("H"."VAMISIN"=:1)
   4 - filter((TO_NUMBER(REPLACE(REGEXP_SUBSTR 
                  ("IDENT_PATH",'@@(.+?)(@@|$)',1,1,HEXTORAW('5824EBC2000')),'@@',NULL))=:SYS_B_1 
              AND "IDENT">0))
   5 - access("F"."MGR"=PRIOR NULL)
   6 - access("F"."MGR"=1)
   9 - access("connect$_by$_pump$_005"."prior f.ident "="F"."MGR")
 
Note
-----
   - this is an adaptive plan 

In their attempt to fix this performance pain, DBAs and developers were focusing their efforts in reviewing the part of the execution plan that involves the hierarchical query (operation n°4 to 12).

But whoever knows how to interpret correctly the multitude information provided by the SQL monitor report would have immediately pointed out where most of the query time is spent. Look at the Start Active column of operations n°0 to 3: they all shows +4 seconds. This means that those operations started to be active 4 seconds after the beginning of the query. And whoever knows how to interpret an execution plan would have pointed out that the parent-child operation n°2-3 is the first to be started by Oracle. Look now to the Time Active(s) column of these two operations. They both indicate 42 seconds. This means that the table/index operation 2-3 took 42 seconds. Since the total duration of this query is 45 seconds we know then clearly what part of the query we have to address.

The index at operations n°3 supplied its parent table at line n°2 with 281K of rowids which has thrown away all but only 50549 using the filter predicate n°2

   2 - filter(("H"."BACKCLIENT"<>:SYS_B_3 AND "H"."BACKCLIENT"<>:SYS_B_2 
           AND "H"."TYPEISIN"<>:SYS_B_0))

Most of the 81930 buffer gets come from the TABLEHIST table at operation line n°2. We need then to engineer an index starting by the column in the predicate on which an equality is applied (VAMISIN) followed by the other three columns from the other predicates on which an inequality is applied (OPTIONV, BACKCLIENT, TYPEISIN).

SQL> create index TABLEHIST_IND_MHO on TABLEHIST 
       (VAMISIN, OPTIONV, BACKCLIENT, TYPEISIN)

Since the select part of this critical query involves only two columns of which one (VAMISIN) is already included in the new index then why not add the second selected column to this index and avoid visiting the table at all?

SQL> create index TABLEHIST_IND_MHO on TABLEHIST 
       (VAMISIN, OPTIONV, BACKCLIENT, TYPEISIN, CONREF)

Although this will increase the size of the index it has, however, the potentiality to eliminate 96% of I/O.
Indeed, following the creation of the new index, Oracle has used the following execution plan:

-------------------------------------------------------------------------------
| Id  | Operation                            | Name                   | Rows  |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                        |       |
|*  1 |  HASH JOIN ANTI                      |                        |     1 |
|*  2 |   INDEX RANGE SCAN                   |TABLEHIST_IND_MHO       |    49 |
|*  3 |   VIEW                               |                        |    89 |
|*  4 |    CONNECT BY WITH FILTERING (UNIQUE)|                        |       |
|*  5 |     INDEX RANGE SCAN                 |IDX_AGGREGAT_VIEW_IDENT |     5 |
|   6 |     NESTED LOOPS                     |                        |    84 |
|   7 |      CONNECT BY PUMP                 |                        |       |
|*  8 |      INDEX RANGE SCAN                |IDX_AGGREGAT_VIEW_IDENT |    17 |
-------------------------------------------------------------------------------


SQL> @sqlstats
Enter value for sql_id: 7v8109fkuusw3

     CHILD SQL_PROFILE          PLAN_HASH_VALUE   AVG_GETS    EXECS
---------- -------------------- --------------- ---------- --------
1	                           860019752       3651      204

As you can see Oracle is not visiting anymore the TABLEHIST table and the reduction of the logical I/O consumption is very remarkable as it went from 81930 to 3651.

September 11, 2018

Exploded index causes a wrong execution plan

Filed under: Oracle — hourim @ 6:57 pm

A couple of days ago I have been asked to look at performance issue which occurred in a very simple delete statement. Below is this delete with its good and bad execution plan respectively:

SQL_ID gq3khfnctny4h
--------------------
DELETE FROM table_t1 WHERE evt_id = :1  AND eng_id = :2
 
Plan hash value: 1362912592 
-----------------------------------------------------------------
| Id  | Operation          | Name          | Rows  | Cost (%CPU)|
-----------------------------------------------------------------
|   0 | DELETE STATEMENT   |               |       |     3 (100)|
|   1 |  DELETE            | TABLE_T1      |       |            |
|   2 |   INDEX UNIQUE SCAN| PK_TABLE_T117 |     1 |     3   (0)| -- cost = 3
-----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
    2 - access("EVT_ID"=:1 AND "ENG_ID"=:2)
SQL_ID gq3khfnctny4h
--------------------
DELETE FROM table_t1 WHERE evt_id = :1  AND eng_id = :2
 
Plan hash value: 3057265474 
-------------------------------------------------------------------
| Id  | Operation             | Name          | Rows  |Cost (%CPU)|
-------------------------------------------------------------------
|   0 | DELETE STATEMENT      |               |       |    2 (100)|
|   1 |  DELETE               | TABLE_T1      |       |           |
|   2 |   INDEX FAST FULL SCAN| PK_TABLE_T117 |     1 |    2   (0)| -- cost = 2
-------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
    2 - access("EVT_ID"=:1 AND "ENG_ID"=:2)

The statistics of the different executions of this delete over the last couple of weeks are:

SQL> @HistStats
Enter value for sql_id: gq3khfnctny4h
Enter value for from_date: 01092018

SNAP_BEGIN                PLAN_HASH_VALUE    EXECS END_OF_FETCH  AVG_ETIME  AVG_LIO   AVG_ROWS
------------------------- --------------- -------- ------------ ---------- -------- ----------
04-SEP-18 01.00.23.666 PM      1362912592     5109         5109          0     19.3          1 -- good execution plan
04-SEP-18 02.00.32.471 PM      1362912592      574          574          0       20          1
04-SEP-18 03.00.48.319 PM      1362912592     1472         1472          0       20          1
05-SEP-18 08.00.10.847 AM      1362912592     3378         3378          0     19.4          1
05-SEP-18 05.00.04.444 PM      1362912592    17759        17759          0     19.7          1
05-SEP-18 08.00.43.235 PM      1362912592   201040       201040          0     18.6          1
05-SEP-18 09.00.30.871 PM      1362912592    59826        59826          0     19.7          1
05-SEP-18 10.01.28.933 PM      1362912592   140850       140850          0     17.7          1
06-SEP-18 12.01.11.664 AM      1362912592    17951        17951          0     22.9          1
06-SEP-18 09.00.31.336 AM      3057265474    22899        22899         19   2136.5          1 -- bad plan switch 
06-SEP-18 10.00.44.721 AM      3057265474    45736        45736         14     1315          1
06-SEP-18 11.00.48.929 AM      3057265474    56729        56729         11  37105.8          1
06-SEP-18 05.00.43.993 PM      3057265474    48568        48568         12    11275          1

As you can see, when the delete statement switched from an execution plan using a INDEX UNIQUE SCAN (1362912592) to an execution plan using an INDEX FAST FULL SCAN (3057265474) of the same primary key index, the average elapsed time of the delete went from 0 to 19 seconds.

The AVG_ROWS column above clearly shows that we are always deleting a single row. This is because the two columns of the predicate part represent the two primary key columns (evt_id, eng_id). So why on earth would Oracle FAST FULL scan an entire index to end up by locating a single row and delete it?

If you look back to the above execution plan you will realize that Oracle has completely messed up its estimations as it has considered that FAST FULL scanning the PK_TABLE_T117 primary key index will cost only 2 while the cost of unique scanning the same index would be 3. This is why it has switched for the bad execution plan causing a serious performance issue.

The statistics of this unique index are:

SQL> select
        index_name
      , blevel
      , leaf_blocks
      , distinct_keys
      , num_rows
      , clustering_factor
      , sample_size    
    from user_indexes
    where index_name = 'PK_TABLE_T117';

INDEX_NAME     BLEVEL LEAF_BLOCKS DISTINCT_KEYS   NUM_ROWS CLUSTERING_FACTOR SAMPLE_SIZE 
-------------- ------ ----------- ------------- ---------- ----------------- ----------- 
PK_TABLE_T117       3       28860        524234     524234            348959      524234   

Since I have at my disposable a PRE-PRODUCTION copy of this database I decided to get the same index statistics from this database:

SQL> select
        index_name
      , blevel
      , leaf_blocks
      , distinct_keys
      , num_rows
      , clustering_factor
      , sample_size    
    from user_indexes
    where index_name = 'PK_TABLE_T117';

INDEX_NAME      BLEVEL LEAF_BLOCKS DISTINCT_KEYS   NUM_ROWS CLUSTERING_FACTOR SAMPLE_SIZE 
-------------- ------- ----------- ------------- ---------- ----------------- ----------- 
PK_TABLE_T117        2       21470        617240     617240            190479      617240 

How could a BLEVEL of an index be equal to 3 in PRODUCTION and the BLEVEL of the same index with much more rows be equal to 2 in another database?

That’s probably an indication of an exploded index due to the delete and its pattern. Let’s check this using a little bit enhanced Jonathan Lewis script

SQL> @IndexEfficiency2
Enter value for m_schemaname: xxxx
../..
PK_TABLE_T117
      Current Index Size GB: 1.44         Expected Index size GB: .07

This index certainly needs a rebuild

SQL> alter index PK_TABLE_T117 rebuild online;
SQL> select
            index_name
          , blevel
          , leaf_blocks
          , distinct_keys
          , num_rows
          , clustering_factor
          , sample_size      
     from user_indexes
     where index_name = 'PK_TABLE_T117';

INDEX_NAME      BLEVEL LEAF_BLOCKS DISTINCT_KEYS   NUM_ROWS CLUSTERING_FACTOR SAMPLE_SIZE 
-------------- ------- ----------- ------------- ---------- ----------------- ----------- 
PK_TABLE_T117        2        1461        524202     524202            348956      524202

SQL> @sizeBySegName
Enter value for segment_name: PK_TABLE_T117
Enter value for owner: xxxx

SEGMENT_TYPE       TABLESPACE_NAME  SEGMENT_NAME      GB
------------------ ---------------- --------------- -----
INDEX              IDX              PK_TABLE_T117   .0117
                                                    ------
Total Segment Size                                  .0117 

And here’s below the new execution plan:

SQL> explain plan for
     DELETE FROM table_t1 WHERE evt_id = :1  AND eng_id = :2;
  
SQL> select * from table(dbms_xplan.display);

--------------------------------------------------------------------
| Id  | Operation          | Name          | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|   0 | DELETE STATEMENT   |               |     1 |    31 |     2 |
|   1 |  DELETE            | TABLE_T1|     |       |       |
|   2 |   INDEX UNIQUE SCAN| PK_TABLE_T117 |     1 |    31 |     2 | -- cost = 2
--------------------------------------------------------------------

SQL> explain plan for 
    delete /*+ index_ffs(TABLE_T1, PK_TABLE_T117) */ table_t1 
    WHERE evt_id = :1  AND eng_id = :2;

SQL> select * from table(dbms_xplan.display);

----------------------------------------------------------------------
| Id  | Operation             | Name          | Rows  | Bytes | Cost  |
-----------------------------------------------------------------------
|   0 | DELETE STATEMENT      |               |     1 |    31 |   402 |
|   1 |  DELETE               | TABLE_T1      |       |       |       |
|   2 |   INDEX FAST FULL SCAN| PK_TABLE_T117 |     1 |    31 |   402 | -- cost = 402
-----------------------------------------------------------------------

CONCLUSION

I have never considered rebuilding an index to be a reliable tuning strategy. This is why, I have already ousted this operation from my list of tuning steps. But, as Tom Kyte says “always is never true, never is always false”, it is then interesting to see practical real life cases where rebuilding index reveals to be a fix for a wrong execution plan choice.

May 19, 2018

Merge join in modern Relational Database Management Systems

Filed under: Oracle — hourim @ 2:07 pm

There are many posts out there explaining what a MERGE JOIN is, how it works and why it is less popular than NESTED LOOPS and HASH JOIN physical operations. In a nutshell, MERGE JOIN compares two sets of sorted data on the merge column and outputs matched rows. It reads both data sets only once. This is why it is known as an unrelated combined operation as explained by Christian Antognini in his last book.

1. Oracle merge join

Here’s a simple Oracle example illustrating the different subtleties of the MERGE JOIN using Oracle 12cR2:

SQL> create table t1 as 
        select rownum*2 n1, rownum*5 n2, rownum n3
     from dual
     connect by level <=2e1; 

SQL> create table t2 as 
        select rownum*3 n1, rownum*5 n2, rownum n3
     from dual
     connect by level <=1e2; 

SQL> select
        /*+ use_merge(t1,t2) */
        t1.n1 t1n1
       ,t2.n1 t2n1
     from t1
        join t2
     on t1.n1 = t2.n1;

      T1N1       T2N1
---------- ----------
         6          6
        12         12
        18         18
        24         24
        30         30
        36         36

6 rows selected.

---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      6 |
|   1 |  MERGE JOIN         |      |      1 |     20 |      6 |
|   2 |   SORT JOIN         |      |      1 |     20 |     20 |
|   3 |    TABLE ACCESS FULL| T1   |      1 |     20 |     20 |
|*  4 |   SORT JOIN         |      |     20 |    100 |      6 | 
|   5 |    TABLE ACCESS FULL| T2   |      1 |    100 |    100 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."N1"="T2"."N1")
       filter("T1"."N1"="T2"."N1")

The number of rows (A-Rows=20) generated by the first child operation(Line Id n°2) of the MERGE JOIN operation at line id n° 1 and the number of times its second child operation (line id n°4) has been executed ( Starts=20) suggest that the MERGE JOIN is somehow using a NESTED LOOPS kind of algorithm. But we see clearly that, in contrast to NESTED LOOPS, the two merge join inputs (T1 and T2) are scanned only once (Starts = 1 at lines id n°3 and 5). So what does this Starts=20 of operation n°4 mean exactly? Clearly we haven’t made 20 separate sorts as the following proves:

set autotrace on stat

select
        /*+ use_merge(t1,t2) */
        t1.n1 t1n1
       ,t2.n1 t2n1
     from t1
        join t2
     on t1.n1 = t2.n1;

Statistics
---------------------------------------------------
          0  recursive calls
          4  db block gets
         14  consistent gets
          0  physical reads
          0  redo size
        708  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          2  sorts (memory)          --> only two sorts corresponding to operations n°2 and n°4 
          0  sorts (disk)                
          6  rows processed

In effect, the MERGE JOIN parent operation gets a row from each sorted input and compares them. Typically it takes 20 rows from T1 and 100 rows from T2. It then gets the first row of each input and compare them using predicate n°4; if they join their corresponding rows are returned. If not, the MERGE JOIN will then discard the lower value and gets the next row from the lower input data set and continue the comparison process until there is no anymore rows to process. This algorithm can be simplified as follows:

Get 20  sorted rows from T1
Get 100 sorted rows from T2
LOOP until no rows to compare
  if  join value of T1 =  join value of T2
  then
      output the joined rows 
      discard  join value of T2
      get next join value of T1
      get next join value of T2      
  elsif join value of T1 < join value of T2 
      discard join value of T1 
      get next join value of T1 
  elsif join value of T1 >  join value of T2
      discard  join value of T2
      get next join value of T2
  end if;
END LOOP;

So, we can infer that the Starts = 20 of operation at line Id n° 4 represents Oracle comparing each of the 20 join column values of T1 with their equivalent ordered join column of T2 (first rows from T1 with first row from T2 and so on until there is no more rows in T1 to compare).

But let’s now change the order of the join so that table T2 will be the first data set input of the merge join operation:

select
  /*+ leading (t2, t1) use_merge(t2,t1) */
   t1.n1 t1n1
  ,t2.n1 t2n1
from t1
join t2
on t1.n1 = t2.n1;

      T1N1       T2N1
---------- ----------
         6          6
        12         12
        18         18
        24         24
        30         30
        36         36

6 rows selected.
---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      6 |
|   1 |  MERGE JOIN         |      |      1 |     20 |      6 |
|   2 |   SORT JOIN         |      |      1 |    100 |     14 | -- why 14 rows?
|   3 |    TABLE ACCESS FULL| T2   |      1 |    100 |    100 |
|*  4 |   SORT JOIN         |      |     14 |     20 |      6 |
|   5 |    TABLE ACCESS FULL| T1   |      1 |     20 |     20 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."N1"="T2"."N1")
       filter("T1"."N1"="T2"."N1")

We are still generating the same number of rows, 6, but this time the operation at line id n°4 is started 14 times. Why 14 and not 100 the totality of the T2 rows?

The answer to this question is : the merge join doesn’t necessarily need to scan every row from both inputs. It stops scanning as soon as:

  • it reaches the end of either input
  • or it reaches a join value from the first input that is greater than the highest join value from the second input.

When I made T2 the driving table of the merge join, Oracle declared the end of the join process as soon as it reached the 14th row of T2 (A-Rows=14). This is simply because the 14th ordered row of T2 is greater than any join value from T1 as the following proves:

-- get the minimum value of t2.n1 that is greater than the max value of t1.n1
SQL> select min(n1) from t2 where n1 > (select max(n1) from t1);

   MIN(N1)
----------
        42

This means that starting from t2.n1= 42 there will be no join possibility since 42 is greater than all join column values from the lesser input (T1). And, as such, remaining join values from T1 should be discarded according to the join algorithm(T2 being the first input of the merge join)

elsif merge value of T2 >  merge value of T1
      discard merge value of T1
      get next merge value of T1
end if;

And how many rows the merge join shoud have already processed when it reaches this merge stop point? 14 naturally:

SQL> select 
         count(1) 
     from t2 
     where 
         n1 <= (select min(n1) from t2 where n1 > (select max(n1) from t1) 
                );
 COUNT(1)
---------
       14

If we redo the same demonstration for the case where T1 will be the driving table we will obviously find that the merge join has to go through the entire number of rows in T1 table (A-Rows = 20) because there is no join value in T1 that is greater than the largest join value in T2.

SQL> select n1 from t1 where n1 > (select max(n1) from t2);
no rows selected

2. One-to-many and many-to-many merge join

In the above setup we unofficially examined a one-to-many join version of the merge join. This type of join occurs when the optimizer knows that there are no duplicate join column values in the first input data set. Although I know t1.n1 is unique I didn’t supply any such extra information like a unique index for the optimizer to opt for a one-to-many join option.This is why officially we have been using a many-to-many merge join in the preceding examples.

       2.1. MS-SQL Server
In contrast to Oracle, MS SQL server execution plan makes a clear distinction between these two types of merge join as the following shows:

create table t1(n1 int, n2 int);

insert into t1(n1,n2)
       select top 20
	     2* row_number() over (order by a.name) 
		,abs(row_number() over (order by a.name) -1) 
from sys.all_objects a cross join sys.all_objects;

create table t2(n1 int, n2 int) ;

insert into t2(n1,n2)
       select top 100
	     3* row_number() over (order by a.name) 
		,abs(row_number() over (order by a.name) -1) 
from sys.all_objects a cross join sys.all_objects;

-- many to many
select
   t1.n1 t1n1
  ,t2.n1 t2n1
from
   t1
join t2
on t1.n1 = t2.n1
option (merge join);

But if I create a unique index in T1 indicating to the optimizer the absence of duplicate rows in the join column I will obviously obtain a one-to-many merge join type as the following execution plan shows:

create unique index t1_uk on t1(n1);

-- one-to-many join
select
   t1.n1 t1n1
  ,t2.n1 t2n1
from
   t1
join t2
on t1.n1 = t2.n1
option (merge join);


In a one-to-many join, when two rows join, the optimizer outputs them, discards the join value from the second input (T2), gets the next join value from the first input (T1) and continue the merge process. The optimizer can safely discard the joined value from T2 because it knows that there will be no duplicate rows in T1 that will ever join with the T2 discarded row.

In a many-to-many join, the merge join algorithm, very probably, keeps track of the discarded T2 row somewhere in a memory structure. If the next iteration finds that the current row is duplicated it will then compare it with the saved inmemory row. If, instead, the next row from T1 reveals to be a new one, the optimizer can then safely delete the inmemory T2 saved row. This approach can be backed up by the merge join algorithm displayed above which shows that the merge process goes always forward. It never needs to step backward in the data set. In the complex many-to-many join case this “always walk down” can be ensured by looking for a previous compared join row stored in memory and probably not by stepping backward. If an extra filter is present in the query it will be replayed back to ensure that the saved joined row satisfy the filter predicate or not.

     2.2. PostgreSQL
Using PostgreSQL we can have both textual and graphical execution plan. But instead of a many-to-many or a one-to-many merge join, PostgreSQL uses a different terminology which is Inner Unique (True and False) respectively as illustrated below:
First the data model:

create table t1 (n1 int, n2 int, n3 int);
create table t2 (n1 int, n2 int, n3 int);	

with got_my_data (j)
as
 (select generate_series(1, 20)
 )
 insert into t1(n1, n2, n3)
 select 
    j*2
   ,j*5
   ,j
  from 
   got_my_data;
   
with got_my_data (j)
as
 (select generate_series(1, 100)
 )
 insert into t2(n1, n2, n3)
 select 
    j*3
   ,j*5
   ,j
  from 
   got_my_data;

Since there is no hint in PostgreSQL with which I can force a merge join operation, I will cancel the hash join possibility, run the query and get the graphical execution plan using pgAdmin4

postgres=# set enable_hashjoin=false;
SET

 explain analyze
 select
        t1.n1 t1n1
       ,t2.n1 t2n1
     from t1
        join t2
     on t1.n1 = t2.n1;


As you can see this is a many-to-many join as indicated by the Inner Unique set to false. If I create a unique index on T2, re-query and get the corresponding execution plan this is what I will observe:

postgres=# create unique index t2_uk on t2(n1);
CREATE INDEX

explain analyze verbose select
       t1.n1 t1n1
      ,t2.n1 t2n1
    from t1
       join t2
    on t1.n1 = t2.n1;

I don’t have enough experience in reading PostgreSQL execution plans but according to the actual rows generated by the second sort in the plan (rows = 14) it seems that, very probably, the query planner has used T2 table as the first input of the merge join. This is why the unique index on this table has triggered a one-to-many join while a unique index on T1 hasn’t(not show here but tested). For a one-to-many merge join to occur uniquensess is required for the join column of the first input.

2.3. Oracle
Let’s add a duplicate row in T1 and re-execute the same merge join query using Oracle database

SQL> insert into t1 values (6, -1, -1);
1 row created.

SQL> commit;

select
    /*+ use_merge(t1,t2) */
  t1.n1 t1n1
 ,t2.n1 t2n1
from t1
join t2
on t1.n1 = t2.n1;

      T1N1       T2N1
---------- ----------
         6          6
         6          6
        12         12
        18         18
        24         24
        30         30
        36         36

7 rows selected.
---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      7 |
|   1 |  MERGE JOIN         |      |      1 |     20 |      7 |
|   2 |   SORT JOIN         |      |      1 |     20 |     21 |
|   3 |    TABLE ACCESS FULL| T1   |      1 |     20 |     21 |
|*  4 |   SORT JOIN         |      |     21 |    100 |      7 |
|   5 |    TABLE ACCESS FULL| T2   |      1 |    100 |    100 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T1"."N1"="T2"."N1")
       filter("T1"."N1"="T2"."N1")

As expected, the duplicated join value from T1 has been returned by the merge join. But there is no clue in the execution plan about whether this is a many-to-many or a a one-to-many join. Even if I delete the inserted duplicate row and create a unique index on t1.n1, I will still find nothing related to the type of merge join in the corresponding execution plan as shown below:

SQL> delete from t1 where n1=6 and n2 =-1;

SQL> create unique index t1_uk on t1(n1);

select
    /*+ use_merge(t1,t2) */
  t1.n1 t1n1
 ,t2.n1 t2n1
from t1
join t2
on t1.n1  = t2.n1;

      T1N1       T2N1
---------- ----------
         6          6
        12         12
        18         18
        24         24
        30         30
        36         36

6 rows selected.

----------------------------------------------------------------
| Id  | Operation           | Name  | Starts | E-Rows | A-Rows |
----------------------------------------------------------------
|   0 | SELECT STATEMENT    |       |      1 |        |      6 |
|   1 |  MERGE JOIN         |       |      1 |     20 |      6 |
|   2 |   INDEX FULL SCAN   | T1_UK |      1 |     20 |     20 |
|*  3 |   SORT JOIN         |       |     20 |    100 |      6 |
|   4 |    TABLE ACCESS FULL| T2    |      1 |    100 |    100 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"="T2"."N1")
       filter("T1"."N1"="T2"."N1")

There is no clue about the type of merge join after I have made unique the join column of the first input. However, we can observe that, thanks to the INDEX FULL SCAN operation, rows from the first input are acquired pre-sorted and don’t need the usual extra SORT JOIN operation.

Finally, we are not going to finish this merge join investigation without creating a unique index on the second data set input and see what this will change in the execution plan:

SQL> create unique index t2_uk on t2(n1);

SQL> select
        /*+ use_merge(t1,t2) */
        t1.n1 t1n1
       ,t2.n1 t2n1
    from t1
    join t2
    on t1.n1 = t2.n1;

      T1N1       T2N1
---------- ----------
         6          6
        12         12
        18         18
        24         24
        30         30
        36         36

6 rows selected.
--------------------------------------------------------------
| Id  | Operation         | Name  | Starts | E-Rows | A-Rows |
--------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |      1 |        |      6 |
|   1 |  MERGE JOIN       |       |      1 |     20 |      6 |
|   2 |   INDEX FULL SCAN | T2_UK |      1 |    100 |     14 |
|*  3 |   SORT JOIN       |       |     14 |     20 |      6 |
|   4 |    INDEX FULL SCAN| T1_UK |      1 |     20 |     20 |
--------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"="T2"."N1")
       filter("T1"."N1"="T2"."N1")

If you look closely to the above execution plan you will immediately spot out two important points:

  • Oracle has inverted the join order as T2 becomes the first input and T1 the second one

The Cost Based optimizer is very clever since, by switching the join order, it can declare the end of the query much earlier by stop scanning T2 as soon as it reaches the 14th row of the first input (T2) as explained above.

  • The second point is related to the extra SORT JOIN operation at line n°3.

This operation receives an already pre-sorted data set via the INDEX FULL SCAN operation at line n°4. So why Oracle needs to apply an extra SORT on an ordered data?

In fact, as already explained in section 1 above, the SORT JOIN operation at line n°3, as its name doesn’t suggest, is responsible for applying the join condition (access and filter predicate n°3) on the right input for each row produced by the left input (14 rows in the current case). This is why, regardless of how the rows are acquired, the SORT JOIN operation is always needed to be applied on the right data set input. The same doesn’t apply for the first input data set where the SORT JOIN operation can be skipped whenever this data is retrieved already sorted.

3. Summary

In this article I tried to explain how the merge join algorithm has been implemented in modern relational database systems. I demonstrated that the merge join doesn’t necessarily need to scan every row from both inputs. It stops scanning as soon as it reaches the end of either input or it reaches a join value from the first input that is greater than the highest join value from the second input. I have outlined, using MS-SQL Sever and PostgreSQL the concept of one-to-many and many-to-many join and how a unique index on the first input data set of the join can switch from a costly many-to-many to a less aggressive one-to-many form of the join.Although I have shown it here, a merge join can work with inequality join predicate and it supports outer (MERGE JOIN OUTER), semi(MERGE JOIN SEMI) and anti (MERGE JOIN ANTI) logical join operations.

March 20, 2018

DDL optimisation is not working in 12cR2: really?

Filed under: Oracle — hourim @ 7:32 pm

If you want to know what DDL optimisation is then you can read this article I wrote a couple of years ago.

As the title already suggests, I have been surprised during the preparation of a database upgrade script from 11gR2 to 12cR2 to see the following alter table taking 5 hours instead of what should have been an instantaneous operation:

SQL> select banner from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 12c Enterprise Edition Release 12.2.0.1.0 - 64bit Production


SQL> alter table t1 add
  	(col1 number  default 0 null
       ,col2 number default 0 null
       ,col3 number default 0 null
       ,col4 number default 0 null
       ,col5 number default 0 null
       ,col6 number default 0 null
       ,col7 number default 0 null
      )
      /
Table altered.

Elapsed: 04:51:24.48

Oracle should have used the DDL optimisation technique which has been extended to nullable columns in 12cR1.

So what the heck is this? From where am I going to start troubleshooting this issue?

The first thing that came to my mind was to check the value of the hidden parameter, _add_col_optim_enabled, driven this feature. It might be possible that the DBA has un-set its default value during the database upgrade:

SQL> select
         n.ksppinm
        ,c.ksppstvl
        ,n.ksppdesc
       from
        sys.x$ksppi n
        ,sys.x$ksppcv c
       where n.indx=c.indx
       and n.ksppinm = '_add_col_optim_enabled';

KSPPINM                KSPPSTVL   KSPPDESC
---------------------- ---------- -----------------------------------
_add_col_optim_enabled TRUE       Allows new add column optimization

The DDL optimisation parameter is correctly set. If this parameter has been changed to FALSE the DDL optimisation feature would have, naturally, been cancelled as the following demonstrates:

create table t as select rownum n1 from dual connect by level <=10;

-- the following alter table will use the DDL optimisation
-- as the predicate of the execution plan confirms

alter table t add col1 number default 0 null;

select count(1) from t where col1=42;

select * from table(dbms_xplan.display_cursor);					   

---------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |
---------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |
|*  2 |   TABLE ACCESS FULL| T    |     1 |     3 |
---------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(DECODE(TO_CHAR(SYS_OP_VECBIT("SYS_NC00002$",0)),NULL
             ,NVL("COL1",0),'0',NVL("COL1",0),'1',"COL1")=42)

-- change the hidden parameter driving the DDL optimisation
alter session set "_add_col_optim_enabled"=false;

-- and add a new equivalent column
alter table t add col2 number default 0 null;

select count(1) from t where col2=42;

select * from table(dbms_xplan.display_cursor);	

---------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes |
---------------------------------------------------
|   0 | SELECT STATEMENT   |      |       |       |
|   1 |  SORT AGGREGATE    |      |     1 |     3 |
|*  2 |   TABLE ACCESS FULL| T    |     1 |     3 |
---------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 – filter("COL2"=42) -- DDL optimisation didn't occur.

After having ruled out the hidden parameter hypothesis, I embarked on a bibliography research checking the Oracle official documentation to see whether the DDL optimisation has actually been withdrawn in 12cR2 for nullable added columns having a default value or not. It is useless to say that this effort has been pointless.

Nowadays when all else fails Twitter might be there for a quick help:

Unfortunately not everyone is lucky. No answer 🙂

There was, nevertheless, something in the overnight SQL batch log that should have given me a hint about what was happeing under the hood:

SQL> alter table t1 add (colX numeric default 0 not null);

Table altered.

Elapsed: 00:500:00.06

While adding a nullable column with default value was taking an eternity, adding a not null column with default value to the same table was instantaneous. This database is reacting exactly as if I were running under an 11gR2 version. But I’ve completely neglected this option and continued exploring other possibilities until I decided to close my laptop and leave for the week end.

On Monday of the next week I shared this issue with one of my friends who suggested me to check the compatible parameter. No sooner said than done:

SQL> show parameter compatible

NAME                                 TYPE        VALUE
------------------------------------ ----------- ----------
compatible                           string      11.2.0.4.0

Now that I have this value in front of my eyes, bringing together the pieces of the puzzle becomes easy. I provisionned a new fresh copy of Production database, upgraded it to 12.2.0.1.0, checked the compatible parameter:

SQL> show parameter compatible

NAME                                 TYPE        VALUE
------------------------------------ ----------- --------
compatible                           string      12.2.0.1

and launched again the same alter table which obviously completed in less a second:

SQL> alter table t1 add
  	(col1 number  default 0 null
       ,col2 number default 0 null
       ,col3 number default 0 null
       ,col4 number default 0 null
       ,col5 number default 0 null
       ,col6 number default 0 null
       ,col7 number default 0 null
      )
      /
Table altered.

Elapsed: 00:00:00.10

Bottom Line

It doesn’t matter what Oracle version you are running, the only features that you will be allowed to use are those enabled by the Oracle version you will set in the compatible parameter. Thus don’t be surprised to see new features not kicking in if you set the compatible mode to a prior release value where these intended features were not implemented yet.

March 17, 2018

Library cache lock

Filed under: Oracle — hourim @ 7:55 am

This is a very simple note, as much for my own documentation as anything else, showing how I have extremely rapidly identified what was causing a database application to wait on a library cache lock wait event.

The ASH of my customer case was showing the following list of predominant wait events:

select event, count(1)
from gv$active_session_history
where
    sample_time between to_date('09032018 00:46:00', 'ddmmyyyy hh24:mi:ss')
                and     to_date('09032018 10:44:00', 'ddmmyyyy hh24:mi:ss')
group by event
order by 2 desc;

EVENT                                  COUNT(1)
------------------------------------ ----------
library cache lock                       350174
library cache: mutex X                    54474
cursor: pin S wait on X                   10896
                                           2964

Naturally I wanted to know what sql_id is responsible of these library cache wait events first via ASH

select sql_id, count(1)
from gv$active_session_history
where
    sample_time between to_date('09032018 00:46:00', 'ddmmyyyy hh24:mi:ss')
                and     to_date('09032018 10:44:00', 'ddmmyyyy hh24:mi:ss')
and event in 
 ('library cache lock','library cache: mutex X','cursor: pin S wait on X')
group by sql_id
order by 2 desc;

SQL_ID            COUNT(1)
----------------- --------
6tcs65pchhp71       147902
9mqzzppbpa64j         7373
                         3	

And then via classical dbms_xplan to get the corresponding execution plan and v$sql to get the SQL text respectively:

SQL> select * from table(dbms_xplan.display_cursor('6tcs65pchhp71',null));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------
SQL_ID: 6tcs65pchhp71 cannot be found
SQL> select sql_fulltext, executions, end_of_fetch_count
    from gv$sql
    where sql_id = '6tcs65pchhp71';

no rows selected

Let’s summarize: there is a sql_id which is responsible of a dramatic library cache wait event that I am monitoring at real time basis and which

  • has no execution plan in memory
  • and is not present in gv$sql

The above two points manifestly are symptoms of a sql query which hasn’t gone beyond the parse phase. In other words it might be a query which Oracle is not able to soft parse and thereby it has never reached the hard parse phase nor the execution step. Hopefully ASH can clearly show this:

select 
     in_parse
    ,in_hard_parse
    ,in_sql_execution
    ,count(1)
from gv$active_session_history
where
    sample_time between to_date('09032018 00:46:00', 'ddmmyyyy hh24:mi:ss')
                and     to_date('09032018 10:44:00', 'ddmmyyyy hh24:mi:ss')
and
   sql_id = '6tcs65pchhp71'
group by 
     in_parse
    ,in_hard_parse
    ,in_sql_execution
order by 4 desc;

I I I   COUNT(1)
- - - ----------
Y N N     162758
Y Y N        385

Indeed this query is almost always in the “parse” phase. So the initial question of what is causing this dramatic library cache lock turned to be : why this query is sticking at the parse phase?

I don’t know why I decided to look at the dba_hist_sqltext but it made my day:

SQL> select sql_text from dba_hist_sqltext where sql_id = '6tcs65pchhp71';

SQL_TEXT
--------------------------------------------------------------------------------
select col1, col2,...,coln from t1;

I was again lucky that this query has no where clause and no bind variable value to use so that I tried to run it as is and here what I found:

select col1, col2,...,coln 
from t1;
     *
ERROR at line 15:
ORA-00600 : internal error code, arguments: [qksvcGetGuardCol:2], [50574], [0], [],
[],[],[],[],[],[],[],[],[],

The multi-user concurrent system was repeating the same query several times and since this query was always failing during the parse phase we observed this library cache lock at the top of the wait events.

March 5, 2018

SEMI JOIN in modern relational databases

Filed under: Oracle — hourim @ 8:17 pm

A semi join returns rows only from the first table provided at least one matching row is found in the second table. The difference between a conventional join and a semi join is that rows from the first table are returned once at most even when the join column of the second table contains duplicate rows. There is no ANSI SQL way to invoke a semi join. The following syntax doesn’t exist:

  SQL>  select 
          t1.* 
        from t1 
          semi join t2 
        on t1.n1= t2.n1

In almost all modern relational databases, the semi join logical operation is triggered via the predicate IN, ANY, SOME, EXISTS and DISTINCT. MS-SQL Server has an extra particularity which makes it using a semi join during an INTERSECT operation as we will see later in this article.

The following queries illustrate these diferent semi-join cases in a 12.2.0.1 Oracle release:

SQL> create table t1
     as
        select
            rownum n1,
            mod(rownum,5) n2,
            trunc( (rownum - 1 / 3) ) n3
        from
            dual
        connect by
            level <= 1e4; SQL> create table t2
     as
        select
            rownum n1,
            mod(rownum,3) n2,
            trunc( (rownum - 1 / 5) ) n3
        from
            dual
        connect by
        level <= 5;

1. Using IN, ANY and SOME

SQL> select 
       t1.n1
     from
       t1
     where
       t1.n1 in (select t2.n1 from t2);

SQL> select 
       t1.n1
     from
       t1
     where
      t1.n1 = any (select t2.n1 from t2);

SQL> select 
       t1.n1
     from
       t1
     where
    t1.n1 = some (select t2.n1 from t2);
   
Plan hash value: 1275841967
----------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |
----------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      5 |
|*  1 |  HASH JOIN RIGHT SEMI|      |      1 |      5 |      5 |
|   2 |   TABLE ACCESS FULL  | T2   |      1 |      5 |      5 |
|   3 |   TABLE ACCESS FULL  | T1   |      1 |  10000 |  10000 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."N1"="T2"."N1")

2. Using EXISTS

select 
   t1.n1
 from
   t1
 where
   exists (select null from t2 where t2.n1 = t1.n1);

Plan hash value: 1275841967
----------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |
----------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |      5 |
|*  1 |  HASH JOIN RIGHT SEMI|      |      1 |      5 |      5 |
|   2 |   TABLE ACCESS FULL  | T2   |      1 |      5 |      5 |
|   3 |   TABLE ACCESS FULL  | T1   |      1 |  10000 |  10000 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N1"="T1"."N1")

3. Using DISTINCT in INNER join and the effect of the 12cR1 Partial join

select 
  distinct
   t1.n1
 from
   t1
 inner join t2
 on (t1.n1 = t2.n1);

Plan hash value: 3703458891
---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      5 |
|   1 |  HASH UNIQUE        |      |      1 |      5 |      5 |
|*  2 |   HASH JOIN SEMI    |      |      1 |      5 |      5 |
|   3 |    TABLE ACCESS FULL| T2   |      1 |      5 |      5 |
|   4 |    TABLE ACCESS FULL| T1   |      1 |  10000 |    256 | –- only 256 rows
---------------------------------------------------------------

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$58A6D7F6")
      MERGE(@"SEL$1" >"SEL$2")
      OUTLINE(@"SEL$2")
      OUTLINE(@"SEL$1")
      FULL(@"SEL$58A6D7F6" "T2"@"SEL$1")
      FULL(@"SEL$58A6D7F6" "T1"@"SEL$1")
      LEADING(@"SEL$58A6D7F6" "T2"@"SEL$1" "T1"@"SEL$1")
      USE_HASH(@"SEL$58A6D7F6" "T1"@"SEL$1")
      USE_HASH_AGGREGATION(@"SEL$58A6D7F6")
      PARTIAL_JOIN(@"SEL$58A6D7F6" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 – access("T1"."N1"="T2"."N1")

The apparition of the semi join in the above execution plan is the result of the partial join transformation introduced by Oracle as from 12cR1. Earlier versions were not able to use this partial join transformation scanning, as such, the entire content of t1 table is scanned instead of tiny 256 rows as shown below:

select 
   /*+ NO_PARTIAL_JOIN(@"SEL$58A6D7F6" "T1"@"SEL$1") */
  distinct
   t1.n1
 from
   t1
 inner join t2
 on (t1.n1 = t2.n1);

Plan hash value: 975306333

---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      5 |
|   1 |  HASH UNIQUE        |      |      1 |      5 |      5 |
|*  2 |   HASH JOIN         |      |      1 |      5 |      5 |
|   3 |    TABLE ACCESS FULL| T2   |      1 |      5 |      5 |
|   4 |    TABLE ACCESS FULL| T1   |      1 |  10000 |  10000 | -- 10,000 rows
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("T1"."N1"="T2"."N1")

So far all the semi join logical operations have been executed via a HASH JOIN physical operation. This doesn’t mean that a semi joins can’t be used in a NESTED LOOPS physical operation. Here’s below the demonstratation:

SQL> create index t2_idx on t2(n1);
SQL> select 
       t1.n1
     from
       t1
     where
      t1.n1 in (select t2.n1 from t2);

----------------------------------------------------------------
| Id  | Operation          | Name   | Starts | E-Rows | A-Rows |
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |      1 |        |      5 |
|   1 |  NESTED LOOPS SEMI |        |      1 |      5 |      5 |
|   2 |   TABLE ACCESS FULL| T1     |      1 |  10000 |  10000 |
|*  3 |   INDEX RANGE SCAN | T2_IDX |  10000 |      1 |      5 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"="T2"."N1")

4. SEMI join and uniqueness
I started this article by specifying that a semi join guaranties at most one row will be returned (per join column) from the left table even if there are duplicate rows in the join column of the second table. An example being worth a thousand words so here we go:

SQL> -- insert first a duplicate row in the rigth table
SQL> insert into t2 values(5,42,42);
SQL> commit;

-- first we will start using a conventional join
SQL> select 
   t1.n1
 from
   t1
 inner join t2
 on (t1.n1 = t2.n1);
        N1
----------
         1
         2
         3
         4
         5
         5
6 rows selected

----------------------------------------------------------------
| Id  | Operation          | Name   | Starts | E-Rows | A-Rows |
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |      1 |        |      6 |
|   1 |  NESTED LOOPS      |        |      1 |      5 |      6 |
|   2 |   TABLE ACCESS FULL| T1     |      1 |  10000 |  10000 |
|*  3 |   INDEX RANGE SCAN | T2_IDX |  10000 |      1 |      6 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 – access("T1"."N1"="T2"."N1")

-- and now a semi join
SQL> select 
       t1.n1
     from
       t1
     where
      t1.n1 in (select t2.n1 from t2);

        N1
----------
         1
         2
         3
         4
         5

5 rows selected
----------------------------------------------------------------
| Id  | Operation          | Name   | Starts | E-Rows | A-Rows |
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |      1 |        |      5 |
|   1 |  NESTED LOOPS SEMI |        |      1 |      5 |      5 |
|   2 |   TABLE ACCESS FULL| T1     |      1 |  10000 |  10000 |
|*  3 |   INDEX RANGE SCAN | T2_IDX |  10000 |      1 |      5 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"="T2"."N1")

It is for this reason that the semi join has been implemented: do not duplicate rows from the outer table when they multiple join with rows from the inner table.

Since now we know the semi join main goal we can infer that, as long as, Oracle is sure that the right table has no duplicate values in the join column, it will fall back to a simple inner join. Let’s check:

SQL> drop index t2_idx;
SQL> create unique index t2_uq_idx on t2(n1);
SQL> select
       t1.n1
     from
       t1
     where
       exists (select null from t2 where t2.n1 = t1.n1);

        N1
     ------
         1
         2
         3
         4
         5
-------------------------------------------------------------------
| Id  | Operation          | Name      | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |      1 |        |      5 |
|   1 |  NESTED LOOPS      |           |      1 |      5 |      5 |
|   2 |   TABLE ACCESS FULL| T1        |      1 |  10000 |  10000 |
|*  3 |   INDEX UNIQUE SCAN| T2_UQ_IDX |  10000 |      1 |      5 |
-------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."N1"="T1"."N1")

Notice that we have no semi join anymore. Thanks to the unique index on t2(n1) table, Oracle knows that there are no duplicate rows in the join column of the outer table. It can then reliably use an inner join because rows from the outer table will at most join exactly once with rows from the inner table.

There exists another way to make Oracle falling to an inner join instead of a semi join even when there are duplicate rows in the join column of the outer table. In fact whenever Oracle uses an operation that eliminates duplicate rows from the outer table (for example sort unique or group by) it can confidently use an inner join in place of a semi join. Here’s below one case demontrated:

SQL> drop index t2_uq_idx;
SQL> insert into t2 values(5,42,42);

SQL> create unique index t1_uq_idx on t1(n1);

SQL> select
       t1.n1
     from
       t1
     where
       exists (select null from t2 where t2.n1 = t1.n1);

        N1
     ------
         1
         2
         3
         4
         5
--------------------------------------------------------------------
| Id  | Operation           | Name      | Starts | E-Rows | A-Rows |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT    |           |      1 |        |      5 |
|   1 |  NESTED LOOPS       |           |      1 |      5 |      5 |
|   2 |   SORT UNIQUE       |           |      1 |      5 |      5 |
|   3 |    TABLE ACCESS FULL| T2        |      1 |      5 |      6 |
|*  4 |   INDEX UNIQUE SCAN | T1_UQ_IDX |      5 |      1 |      5 | -- 5 rows
--------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."N1"="T1"."N1")

As you can see Oracle didn’t use a semi join because it has scanned all rows from the outer table, ordered them and eliminated duplicate rows via the SORT UNIQUE operation. As such when joining with the t1 table Oracle knows that it can join t2 rows at most exactly once. Notice in passing that while the semi join has full scanned the t1 table (10,000 rows) the inner join touched only 5 distinct index keys.

5. SEMI join and foreign key integrity

Suppose now that we want to select rows from a child table matching their parent rows in the parent table.

–- drop and recreate table t1 and t2
SQL> create index t1_idx on t1(n1);
SQL> alter table t1 add constraint t1_uq unique (n1);
SQL> alter table t2 add constraint t2_t1_fk foreign key (n1) references t1(n1);

The setup is now done. But let’s disable first the foreign key, the unique constraint and try a semi join query:

SQL> alter table t1 disable constraint t1_uq;
SQL> alter table t2 disable constraint t2_t1_fk;

SQL> select
        t2.n1
     from
        t2
     where
       exists (select null from t1 where t1.n1 = t2.n1);

-----------------------------------------------------------------------
|   Id  | Operation               | Name   | Starts | E-Rows | A-Rows |
-----------------------------------------------------------------------
|     0 | SELECT STATEMENT        |        |      1 |        |      5 |
|- *  1 |  HASH JOIN SEMI         |        |      1 |      5 |      5 |
|     2 |   NESTED LOOPS SEMI     |        |      1 |      5 |      5 |
|-    3 |    STATISTICS COLLECTOR |        |      1 |        |      5 |
|     4 |     TABLE ACCESS FULL   | T2     |      1 |      5 |      5 |
|  *  5 |    INDEX RANGE SCAN     | T1_IDX |      5 |  10000 |      5 |
|-    6 |   INDEX FAST FULL SCAN  | T1_IDX |      0 |  10000 |      0 |
-----------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."N1"="T2"."N1")
   5 - access("T1"."N1"="T2"."N1")

Note
-----
   - this is an adaptive plan (rows marked '-' are inactive)

As expected a classical semi join operation.

Let’s now enable the unique key of the parent table t1;

SQL> alter table t1 enable constraint t1_uq;

SQL> select
        t2.n1
     from
        t2
     where
       exists (select null from t1 where t1.n1 = t2.n1);

----------------------------------------------------------------
| Id  | Operation          | Name   | Starts | E-Rows | A-Rows |
----------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |      1 |        |      5 |
|   1 |  NESTED LOOPS      |        |      1 |      5 |      5 |
|   2 |   TABLE ACCESS FULL| T2     |      1 |      5 |      5 |
|*  3 |   INDEX RANGE SCAN | T1_IDX |      5 |      1 |      5 |
----------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"="T2"."N1")

The semi join has gone because Oracle knows that there is a unique constraint on the join column from the outer table t1, it can then safely use an inner join.

Finally let’s enable the foreign key and re-execute the semi join query:

SQL> alter table t2 enable constraint t2_t1_fk;

SQL> select
        t2.n1
     from
        t2
     where
       exists (select null from t1 where t1.n1 = t2.n1);

-------------------------------------------------------------
| Id  | Operation         | Name | Starts | E-Rows | A-Rows |
-------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |      1 |        |      5 |
|*  1 |  TABLE ACCESS FULL| T2   |      1 |      5 |      5 |
-------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("T2"."N1" IS NOT NULL)

Oracle can use a trusted foreign key relationship to remove a logical semi-join and accesses only the t2 table.

5. SEMI join and INTERSECT

In SQL Server an intersect set operation can also be done via a semi-join as the following proves:

create table t1(n1 int, n2 int,n3 int) ;
create table t2(n1 int, n2 int, n3 int) ;

insert into t1(n1,n2,n3)
select top 10000
  row_number() over (order by a.name)
 ,row_number() over (order by a.name)%5
 ,(row_number() over (order by a.name) -1) /3
from sys.all_objects a cross join sys.all_objects;

insert into t2(n1,n2,n3)
select top 5
  row_number() over (order by a.name )
 ,row_number() over (order by a.name)%3
 ,(row_number() over (order by a.name)-1)/5
from sys.all_objects a cross join sys.all_objects;

select t2.n1 from t2 intersect select t1.n1 from t1;


But neither Oracle nor PostgreSQL use a semi join when evaluating an intersect set operation as the followings prove respectively:

–- Oracle 12cR2
SQL> select t2.n1 from t2 intersect select t1.n1 from t1;

---------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |
---------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      5 |
|   1 |  INTERSECTION       |      |      1 |        |      5 |
|   2 |   SORT UNIQUE       |      |      1 |      6 |      6 |
|   3 |    TABLE ACCESS FULL| T2   |      1 |      6 |      6 |
|   4 |   SORT UNIQUE       |      |      1 |  10000 |  10000 |
|   5 |    TABLE ACCESS FULL| T1   |      1 |  10000 |  10000 |
---------------------------------------------------------------
-- PostgreSQL 10.1
create table t1 as
  select 
     generate_series(1, 1e4) n1
	,generate_series(1, 1e4)%5 n2
	,(generate_series(1, 1e4)-1)/3 n3;
	
create table t2 as
  select 
     generate_series(1, 5) n1
	,generate_series(1, 5)%3 n2
	,(generate_series(1, 5)-1)/5 n3;
	
select t2.n1 from t2 intersect select t1.n1 from t1;


postgres=# explain analyze select t2.n1 from t2 intersect select t1.n1 from t1;
                                                           QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
 HashSetOp Intersect  (cost=0.00..347.00 rows=200 width=36) (actual time=13.186..13.189 rows=5 loops=1)
   ->  Append  (cost=0.00..316.90 rows=12040 width=36) (actual time=0.363..9.571 rows=10005 loops=1)
         ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..55.90 rows=2040 width=36) (actual time=0.362..0.369 rows=5 loops=1)
               ->  Seq Scan on t2  (cost=0.00..30.40 rows=2040 width=4) (actual time=0.050..0.053 rows=5 loops=1)
         ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..261.00 rows=10000 width=9) (actual time=0.031..7.743 rows=10000 loops=1)
               ->  Seq Scan on t1  (cost=0.00..161.00 rows=10000 width=5) (actual time=0.029..4.478 rows=10000 loops=1)
 Planning time: 0.555 ms
 Execution time: 13.911 ms

6. SUMMARY

We have seen in this article how a semi join logical operation is triggered when using IN, ANY, SOME, EXISTS and DISTINCT in several modern relational databases. We’ve outlined as well that, in contrast to Oracle and PostgreSQL, SQL-Server can use a semi join during an INTERSECT set operation. We’ve demonstrated that while a regular joins can duplicate rows, semi joins do not produce any duplicates. They are defined to return (or not) rows only from one table of the join. We’ve also shown that, as long as the CBO is sure that the right table has no duplicate values in the join column, it will fall back to a simple inner join where extra transformations become possible and where the left table might not be fully scanned.

March 1, 2018

Diagnosing the past : CPU bound

Filed under: Oracle — hourim @ 6:24 pm

Few weeks ago I have been confronted to a weird real life system performance issue which, I believe is worth a sharing blog post. It started by a customer complaint about a dramatic slow in inserts of commands and products so that the application in its entirety appeared frozen. A very quick look-up over ASH shows no particular wait events or any new TOP SQL that I am not aware of. I know this application to such an extent that I can recognize its top queries via their sql_id. This is why my first impression was that there is no particular unusual wait event to point out except a high CPU consumption.

But the Load Profile of the corresponding one hour AWR report looks very odd:

Load Profile                    Per Second   Per Transaction  Per Exec  Per Call
~~~~~~~~~~~~~~~            ---------------   --------------- --------- ---------
             DB Time(s):              95.2               4.7      0.02     0.02
              DB CPU(s):               7.7               0.1      0.00     0.00
      Background CPU(s):               0.2               0.0      0.00     0.00
      Redo size (bytes):         439,216.3          18,666.4
  Logical read (blocks):       1,774,174.9          75,401,1
          Block changes:           2,465.0             104.8
 Physical read (blocks):           3,609.2             153.4
Physical write (blocks):             135.2               5.8
       Read IO requests:           2,436.8             103.6
      Write IO requests:              53.6               2.3
           Read IO (MB):              28.2               1.2
          Write IO (MB):               1.1               0.0
           IM scan rows:               0.0               0.0
Session Logical Read IM:               
             User calls:           5,423.4             230.5
           Parses (SQL):             244.4              10.4
      Hard parses (SQL):               0.5               0.0
     SQL Work Area (MB):              28.3               1.2
                 Logons:               0.7               0.0
         Executes (SQL):           4,470.6             190.0
              Rollbacks:               0.2               0.0
           Transactions:              23.5

Have you ever seen such a dramatic Load Profile? The foreground sessions spend collectively 95 seconds of time for each wall clock second. All those sessions were either actively working or waiting. They burned 7.7 of CPU per wall clock second. For a machine of 24 cores this seems to be a CPU constrained application. The overall workload is generating 1,77 million of logical blocks read per second which represent a rate of approximatively 6GB of logical blocks read per hour. And only a tiny percentage of those block reads are changed (2,465/1,774,174 = 0,13%).

I was really intrigued by this abnormal situation since I didn’t spot any unusual wait event or SQL query. While I was drilling down the different sections of the AWR report, cross-checking properly the performance key indicators they deliver, something in the TOP 10 Foreground Events captured my attention:

Top 10 Foreground Events by Total Wait Time
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                           Total Wait      Avg   % DB    Wait
Event                                Waits Time (sec)    Wait(ms)  time Class
------------------------------ ----------- ---------- --------- ------ --------
db file sequential read          2,813,888        32K     11.37    9.3 User I/O
DB CPU                                          27,9K              8.1
db file parallel read              278,494      10,8K     38.67    3.1 User I/O
read by other session              444,473       4785     10.77    1.4 User I/O
direct path read                   311,471     4718,6     15.15    1.4 User I/O
gc cr block 2-way                  577,081     2366,4      4.10     .7 Cluster
log file sync                       89,040     2254,7     25.32     .7 Commit
db file scattered read              138,798    2136,8     15.40     .6 User I/O
gc current block 2-way              368,907    1994,1      5.41     .6 Cluster
gc buffer busy acquire              329,900    1776,8      5.39     .5 Cluster
  • The % DB time column doesn’t sum up to 100% (or near this number) totalling only 26,4%
  • The disk is very slow servicing the physical I/O and background events with a dramatic average time wait.

When the %DB time column doesn’t add up to nearly 100% it generally indicates a symptom of a CPU bounded application. Indeed this is confirmed a little bit upward in the same AWR report via the HOST CPU section:

Host CPU
~~~~~~~~                  Load Average
 CPUs Cores Sockets     Begin       End     %User   %System  %WIO  %Idle
----- ----- ------- --------- --------- --------- --------- ------ -----
   48    24       4     55.57   219.13       92.0       2.3    2.3   5.0

The application was already over CPU bounded at the beginning (55) and completely frozen at the end of the load (219 CPU). For a machine with 24 cores, the DB CPU load is indeed very high

 DB CPU Load = DB CPU time/Elapsed Time
            = 27900/3600 = 7.7

And so is the DB time load


              Snap Id      Snap Time      Sessions Curs/Sess  Instance
            --------- ------------------- -------- --------- ---------
Begin Snap:     28812 06-Feb-18 13:00:01       365      12.5         2
  End Snap:     28813 06-Feb-18 14:00:15       421      12.5         2
   Elapsed:               60.23 (mins)
   DB Time:            5,734.18 (mins)

DB time Load = DB time/Elapsed Time
              = 5734.1/60.23 = 95.2

So what else?

I asked the storage team to check the performance of the underlying disk. The confirmation of the dramatic disk latency has been obviously confirmed by the storage team. But the fundamental and legitimate question is what has changed per regards to the yesterday normal functioning of the application that might explain this performance deterioration?

While I was trying to figure out a solution I asked a help from a DBA who is the primary owner of another critical application. To my surprise this DBA told me that one of his applications is also suffering from similar performance deterioration and that this application is hosted by the same server as the one hosting the application I am trying to troubleshoot. What a coincidence.

A couple of minutes later we found that a critical batch has been triggered in the other DBA application. This batch was extensively executing a parallel query using a very high degree of parallelism (DOP) which was consuming the whole SAN bandwidth. Needless to say that when this batch job has been stopped the initial problem I was trying to fix vanishes by itself and the database returned to its normal functioning

Bottom Line

When you are troubleshooting an application which seems to be completely frozen and where the disk is very badly servicing the database and where the DB time % column of the AWR Top 10 foreground Events section is not summing up to 100% then you are probably CPU bounded. In such case, don’t forget to consider, among the panoply of your check list, whether others applications hosted in the same server are not stressing the disk with very aggressive parallel queries.

« Previous PageNext Page »

Create a free website or blog at WordPress.com.

Tony's Oracle Tips

Tony Hasler's light hearted approach to learning about Oracle

Richard Foote's Oracle Blog

Focusing Specifically On Oracle Indexes, Database Administration and Some Great Music

Hatem Mahmoud Oracle's blog

Just another Oracle blog : Database topics and techniques

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Diagnostician

Performance troubleshooting as exact science

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)