Mohamed Houri’s Oracle Notes

June 21, 2018

SQL Monitor and parsing

Filed under: SQL Monitor — hourim @ 6:32 pm

In part I of this short series on SQL Monitoring Duration and Elapsed time information we saw that in the case of a query fetching data to the client, we have better to rely on the former information rather than on the latter. Indeed, the Elapsed time of the SQL monitor Global Stats section doesn’t keep count of the time spent passing the data to the client. Piotr Wrzosek, Tanel Poder and Robert have emphasized that, while Duration represents the wall clock time, Elapsed time in SQL monitor refers to the DB time measured for non-idle sessions that are either waiting or actively working on CPU. This is why time spent outside the database by idle sessions is not measured by the SQL monitor Elapsed time information.

In part 2 we will be trying to demonstrate that, all things being equal, the SQL monitor kicks in only during the query execution phase. Any preliminary steps occurring between the launch of the query and its execution phase will not be monitored. Typically, time spent in the hard parse phase, will appear neither in the Duration nor in the Elapsed time of the SQL monitor report.

Here’s below a quick demonstration

Parsing

I started looking around on how to produce a sufficiently long hard parse situation within a single query. Very quickly few ideas germinated in my mind like extreme Adaptive Cursor Sharing, or several SQL Plan Management baselined execution plans which Oracle has to reproduce and cost before selecting the best costed plan. But I finally ended up by using a simple join between two tables of 3 million rows each but with an imense IN-LIST predicate:

create table t1
as select
    rownum n1
   ,trunc((rownum-1)/3) n2
   ,mod(rownum,10) n3
from dual
connect by level <= 3e6;


create table t2
as select
    2*rownum n1
   ,trunc(rownum,5) n2
   ,lpad('x',5) vc
from dual
connect by level <= 3e6;

alter table t1 add constraint t1_pk primary key (n1);
alter table t2 add constraint t2_pk primary key (n1);

select /*+ monitor */
     t1.n1
    ,t1.n2
    ,t1.n3
from
     t1
where
     t1.n1 in
           (select t2.n2
           from t2
           where
                (
                 (
                  (
                   (
                          (t2.n1 in (33183, 31917, 31405)
                           or t2.n1 in (33383, 33917, 38405)
                           or t2.n1 in (150, 234, 456)
                           or t2.n1 in (309, 33456, 675)
                           or t2.n1 in (10, 2345, 42367)
                           or 
                           or
                           etc..
                    )
                   )
                  )
                 )
		   )
	     );

Just before launching this query I actived the 10046 trace event as shown below:


SQL> set timing on

SQL> @46on 12

SQL>@MyLongParsingQuery

        N1         N2         N3
---------- ---------- ----------
         5          1          5
        75         24          5
       117         38          7
       228         75          8
     16728       5575          8

Elapsed: 00:00:14.39

SQL> @46off

As you can see the query took 14 seconds wall clock time to complete. And you have to believe me when I say that the same query when executed without tracing lasted 14 seconds too.

Since the query execution phase is super-fast (less than 5 seconds) and is run serially, it will not be automatically monitored. This is why I forced it to be monitored using the hint /*+ monitor */. Here is below the corresponding SQL monitor report (reduced to the Global Information and Global Stats sections):

Global Information
------------------------------
 Status              :  DONE ( ALL ROWS)             
 Instance ID         :  1                     
 SQL ID              :  1w1jpyxgk0asj                    
 Execution Started   :  06/08/2018 12:16:31   
 First Refresh Time  :  06/08/2018 12:16:31  
 Last Refresh Time   :  06/08/2018 12:16:32   
 Duration            :  1s                  
                 
Global Stats
=================================================
| Elapsed |   Cpu   |  IO      | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  |
=================================================
|  0.08   |  0.02   |   0.06   |   2   |   542  |
=================================================

It is interesting to note that both Duration and Elapsed time are not showing the correct time of 14 seconds taken by the query.

However the TKPROF of the query 10046 trace file generated above is clearly showing where most of those 14 seconds have been spent:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1     12.89      14.22          0          0          3           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        2      0.02       0.08         40        542          0           4
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        4     12.91      14.30         40        542          3           4

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  PGA memory operation                         1179        0.00          0.01
  acknowledge over PGA limit                    117        0.10          1.30
  SQL*Net message to client                       2        0.00          0.00
  db file sequential read                        40        0.01          0.06
  SQL*Net message from client                     2        0.00          0.01
********************************************************************************

Thanks to the above TKPROF lines we now have a clear image of the query execution time distribution:

 elapsed time    = 14.30 seconds (of which 14.22 for parsing)
 cpu consumption = 12.91 seconds (of which 12.89 for parsing)
 wait events     = 0.01 + 1.30 + 0.00 + 0.06 + 0.01 = 1.38 seconds
 elapsed -cpu    = 14.30 -12.91 = 1.39 seconds

Summary

In this second part of the series about Duration and Elapsed time information displayed by the SQL monitor report, we have demonstrated the following two points:

• SQL monitor monitors only the execution phase of the query
• Any preliminary steps preceding the execution phase will not be considered by the SQL monitor. Typically, time taken by a long hard parsing situation, will not show up neither in the Duration nor in the Elapsed time.

Advertisements

June 9, 2018

ASH, SQL Monitor and Extended trace

Filed under: ASH,SQL Monitor — hourim @ 3:21 pm

Without any analogy with Sergio Leone’s famous movie “The Good, the Bad and the Ugly” which has brought up my teenage years, I would rather define the above three Oracle features as “The Good, the Good and the Good” as they made my consultancy days very often. This article has been inspired by the last Nenad Noveljic blog post about negative unaccounted-for where he mentioned that the SQL Monitor Global Stats section might show a wrong value in the Elapsed time. In Nenad’s case this misleading piece of information was due to unaccounted-for time. In fact, I know three other situations where this time discrepancy might occur: when fetching data, during a long hard parsing query and when using parallel execution. In this article I will start by demonstrating the fetching data case. Two other articles for the parse and parallel cases will follow to complete the series.

Fetching data

I executed a query which takes few minutes to complete but spent a considerable amount of time fetching a large number of columns with quite a large number of rows. While the query was running I took several snapshots of its corresponding 12cR2 SQL monitor which I am presenting below:

Global Information
------------------------------
 Status              :  EXECUTING             
 Instance ID         :  1                     
 SQL ID              :  g6px76dmjv1jy                    
 Execution Started   :  06/05/2018 09:54:19   
 First Refresh Time  :  06/05/2018 09:54:25  
 Last Refresh Time   :  06/05/2018 09:54:29   
 Duration            :  11s                  
                 
Global Stats
=========================================
| Elapsed |   Cpu   |  IO      | Buffer |
| Time(s) | Time(s) | Waits(s) |  Gets  |
=========================================
|    10   |    8.36 |     1.34 |   2M   |
=========================================

SQL Plan Monitoring Details (Plan Hash Value=2278376452)
===========================================================================
| Id    |                 Operation |       Name       | Execs |   Rows   |
|       |                           |                  |       | (Actual) |
===========================================================================
|     0 | SELECT STATEMENT          |                  |     1 |          |
|     1 |   NESTED LOOPS OUTER      |                  |     1 |          |
|     2 |    NESTED LOOPS           |                  |     1 |          |
|     3 |     FILTER                |                  |     1 |          |

As you can see, both Elapsed time and Duration are showing the same time of 10 seconds. But spot as well that at this stage of the query execution, data was not yet sent to the client (Actual Rows column is empty).

At exactly the same moment, ASH is also showing approximately the same elapsed time as shown below:

column  cnt   format 9999
compute sum label 'Total Elapsed' of cnt on report

break   on report

select 
   sql_plan_line_id
  ,count(1) cnt
from
   gv$active_session_history
 where
    sample_time between to_date('&datefrom', 'mm/dd/yyyy hh24:mi:ss')
				and     to_date('&dateto', 'mm/dd/yyyy hh24:mi:ss')
and sql_id = '&sql_id'
group by sql_plan_line_id
order by 2 desc;

SQL> @ashplanlineId
Enter value for datefrom:  06/05/2018 09:54:19
Enter value for dateto:  06/05/2018 09:54:29
Enter value for sql_id: g6px76dmjv1jy


SQL_PLAN_LINE_ID   CNT
---------------- -----
              10    7
              13    1
              11    1
                 -----
Total Elapsed       9

However, as soon as Oracle has started sending data to the client, a discrepancy between Elapsed time in Global Stats section and Duration began to be displayed as shown below:

Global Information
------------------------------
 Status              :  EXECUTING             
 Instance ID         :  1                     
 SQL ID              :  g6px76dmjv1jy                    
 Execution Started   :  06/05/2018 09:54:19   
 First Refresh Time  :  06/05/2018 09:54:25  
 Last Refresh Time   :  06/05/2018 09:56:18   
 Duration            :  120s                  
 Fetch Calls         :  375            
    
Global Stats
=================================================
| Elapsed |   Cpu   |  IO      | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  |
=================================================
|    32   |    11   |     21   |   375 |   3M   |
=================================================

SQL Plan Monitoring Details (Plan Hash Value=2278376452)
===========================================================================
| Id    |                 Operation |       Name       | Start |   Rows   |
|       |                           |                  | Active| (Actual) |
===========================================================================
| ->  0 | SELECT STATEMENT          |                  |   +30 |     5611 |
| ->  1 |   NESTED LOOPS OUTER      |                  |   +30 |     5611 |
| ->  2 |    NESTED LOOPS           |                  |   +30 |     5611 |
| ->  3 |     FILTER                |                  |   +30 |          |

ASH, in accordance with SQL monitor Elapsed time, started also to report a wrong elapsed time as shown below:

SQL> @ashplanlineId
Enter value for datefrom:  06/05/2018 09:54:19
Enter value for dateto:  06/05/2018 09:56:18
Enter value for sql_id: g6px76dmjv1jy


SQL_PLAN_LINE_ID   CNT
---------------- -----
              13    25
              10     7
              4      1
              4      1
                 -----
Total Elapsed       34

And so on until the query has finished fetching the entire set of data in about 527 seconds while both ASH and Elapsed time of the SQL monitor section are showing roughly 34 seconds of elapsed time as shown below respectively:

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)             
 Instance ID         :  1                     
 SQL ID              :  g6px76dmjv1jy                    
 Execution Started   :  06/05/2018 09:54:19   
 First Refresh Time  :  06/05/2018 09:54:25  
 Last Refresh Time   :  06/05/2018 10:03:06   
 Duration            :  527s                  
 Fetch Calls         :  2085            
    
Global Stats
=================================================
| Elapsed |   Cpu   |  IO      | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  |
=================================================
|    34   |    12   |     22   |  2085 |   3M   |
=================================================

SQL> @ashplanlineId
Enter value for datefrom:  06/05/2018 09:54:19
Enter value for dateto:  06/05/2018 09:56:18
Enter value for sql_id: g6px76dmjv1jy


SQL_PLAN_LINE_ID   CNT
---------------- -----
              13    26
              10     7
               4     1
              11     1
               5     1
                 -----
Total Elapsed       36

Using Nenad query, I can also report the same wrong elapsed time from ASH.

column  event format a30
column  cnt   format 9999
compute sum label 'Total Elapsed' of cnt on report

break   on report
select 
   session_state, event
  ,count(1) cnt
from
   gv$active_session_history
 where
   sql_id = '&sql_id'
 and sql_exec_id = (select max(sql_exec_id) 
                    from gv$active_session_history
					where sql_id = '&sql_id'
					)
group by event, session_state
;
SESSION EVENT                            CNT
------- ------------------------------ -----
WAITING direct path read                  26
ON CPU                                    11
                                       -----
Total Elapsed                             37

TKPROF

In a second attempt to make this time difference crystal clear I traced another execution of the same query using the 10046 trace events. I reproduced below the corresponding SQL monitor report and TKPROF summary respectively:

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)             
 Instance ID         :  1                     
 SQL ID              :  g6px76dmjv1jy                    
 Execution Started   :  06/05/2018 09:27:23   
 First Refresh Time  :  06/05/2018 09:27:30  
 Last Refresh Time   :  06/05/2018 09:39:21   
 Duration            :  718s                  
 Fetch Calls         :  2085            
    
Global Stats
=================================================
| Elapsed |   Cpu   |  IO      | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  |
=================================================
|    198  |    15   |     183  |  2085 |   3M   |
=================================================


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.21       0.21          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch     2085     15.18     198.03     647106    2918455          0       31246
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     2087     15.39     198.24     647106    2918455          0       31246


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                    2085        0.00          0.00
  SQL*Net more data to client                     2        0.00          0.00
  Disk file operations I/O                       90        0.00          0.00
  db file sequential read                     30313        0.09        158.42
  reliable message                                1        0.00          0.00  
  enq: KO – fast object checkpoint                1        0.00          0.00
  direct path read                             1220        0.09         24.76
  SQL*Net message from client                  2085        0.93        519.45
********************************************************************************

As you can see, the Duration of the SQL monitor is showing that the query took 718 seconds while the Elapsed time is showing only 198 seconds (15 sec for CPU and 183 sec for IO waits).
However, the TKPROF file comforts the reliability of the SQL monitor Duration information since it shows that the total elapsed time of the query is:

158.42 + 24.76     = 183.18 seconds for I/O waits
158.42 + 519.45    = 702.63 seconds for I/O waits and  SQL*Net message from client
702.63 + 15.39(cpu)= 718.02 seconds 

Naturally, ASH can’t display the correct piece of information since a large part of the query has been spent on the idle SQL*Net message from client wait event:

SQL> @ashplanlineId
Enter value for datefrom:  06/05/2018 09:27:23
Enter value for dateto:  06/05/2018 09:39:21
Enter value for sql_id: g6px76dmjv1jy

SQL_PLAN_LINE_ID   CNT
---------------- -----
              12   153
              13    28
              10     9
              11     6
              14     1
                 -----
Total Elapsed      197

Summary

We have clearly shown that when a query is fetching a large amount of data to the client, this transfer time is not taken into account by the Elapsed time of the SQL monitor Global Stats section. This is why, as mentioned by Nenad in his article, you should always rely on the time reported by the Duration in the Global information of the SQL Monitor instead of the Elapsed time of the Global stats section.

Blog at WordPress.com.

Tony's Oracle Tips

Tony Hasler's light hearted approach to learning about Oracle

Richard Foote's Oracle Blog

Focusing Specifically On Oracle Indexes, Database Administration and Some Great Music

Hatem Mahmoud Oracle's blog

Just another Oracle blog : Database topics and techniques

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Diagnostician

Performance troubleshooting as exact science

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

EU Careers info

Your career in the European Union

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)