ASH, SQL Monitor and Extended trace

June 9, 2018

ASH, SQL Monitor and Extended trace

Filed under: ASH,SQL Monitor — hourim @ 3:21 pm

Without any analogy with Sergio Leone’s famous movie “The Good, the Bad and the Ugly” which has brought up my teenage years, I would rather define the above three Oracle features as “The Good, the Good and the Good” as they made my consultancy days very often. This article has been inspired by the last Nenad Noveljic blog post about negative unaccounted-for where he mentioned that the SQL Monitor Global Stats section might show a wrong value in the Elapsed time. In Nenad’s case this misleading piece of information was due to unaccounted-for time. In fact, I know three other situations where this time discrepancy might occur: when fetching data, during a long hard parsing query and when using parallel execution. In this article I will start by demonstrating the fetching data case. Two other articles for the parse and parallel cases will follow to complete the series.

Fetching data

I executed a query which takes few minutes to complete but spent a considerable amount of time fetching a large number of columns with quite a large number of rows. While the query was running I took several snapshots of its corresponding 12cR2 SQL monitor which I am presenting below:

Global Information
------------------------------
 Status              :  EXECUTING             
 Instance ID         :  1                     
 SQL ID              :  g6px76dmjv1jy                    
 Execution Started   :  06/05/2018 09:54:19   
 First Refresh Time  :  06/05/2018 09:54:25  
 Last Refresh Time   :  06/05/2018 09:54:29   
 Duration            :  11s                  
                 
Global Stats
=========================================
| Elapsed |   Cpu   |  IO      | Buffer |
| Time(s) | Time(s) | Waits(s) |  Gets  |
=========================================
|    10   |    8.36 |     1.34 |   2M   |
=========================================

SQL Plan Monitoring Details (Plan Hash Value=2278376452)
===========================================================================
| Id    |                 Operation |       Name       | Execs |   Rows   |
|       |                           |                  |       | (Actual) |
===========================================================================
|     0 | SELECT STATEMENT          |                  |     1 |          |
|     1 |   NESTED LOOPS OUTER      |                  |     1 |          |
|     2 |    NESTED LOOPS           |                  |     1 |          |
|     3 |     FILTER                |                  |     1 |          |

As you can see, both Elapsed time and Duration are showing the same time of 10 seconds. But spot as well that at this stage of the query execution, data was not yet sent to the client (Actual Rows column is empty).

At exactly the same moment, ASH is also showing approximately the same elapsed time as shown below:

column  cnt   format 9999
compute sum label 'Total Elapsed' of cnt on report

break   on report

select 
   sql_plan_line_id
  ,count(1) cnt
from
   gv$active_session_history
 where
    sample_time between to_date('&datefrom', 'mm/dd/yyyy hh24:mi:ss')
				and     to_date('&dateto', 'mm/dd/yyyy hh24:mi:ss')
and sql_id = '&sql_id'
group by sql_plan_line_id
order by 2 desc;

SQL> @ashplanlineId
Enter value for datefrom:  06/05/2018 09:54:19
Enter value for dateto:  06/05/2018 09:54:29
Enter value for sql_id: g6px76dmjv1jy


SQL_PLAN_LINE_ID   CNT
---------------- -----
              10    7
              13    1
              11    1
                 -----
Total Elapsed       9

However, as soon as Oracle has started sending data to the client, a discrepancy between Elapsed time in Global Stats section and Duration began to be displayed as shown below:

Global Information
------------------------------
 Status              :  EXECUTING             
 Instance ID         :  1                     
 SQL ID              :  g6px76dmjv1jy                    
 Execution Started   :  06/05/2018 09:54:19   
 First Refresh Time  :  06/05/2018 09:54:25  
 Last Refresh Time   :  06/05/2018 09:56:18   
 Duration            :  120s                  
 Fetch Calls         :  375            
    
Global Stats
=================================================
| Elapsed |   Cpu   |  IO      | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  |
=================================================
|    32   |    11   |     21   |   375 |   3M   |
=================================================

SQL Plan Monitoring Details (Plan Hash Value=2278376452)
===========================================================================
| Id    |                 Operation |       Name       | Start |   Rows   |
|       |                           |                  | Active| (Actual) |
===========================================================================
| ->  0 | SELECT STATEMENT          |                  |   +30 |     5611 |
| ->  1 |   NESTED LOOPS OUTER      |                  |   +30 |     5611 |
| ->  2 |    NESTED LOOPS           |                  |   +30 |     5611 |
| ->  3 |     FILTER                |                  |   +30 |          |

ASH, in accordance with SQL monitor Elapsed time, started also to report a wrong elapsed time as shown below:

SQL> @ashplanlineId
Enter value for datefrom:  06/05/2018 09:54:19
Enter value for dateto:  06/05/2018 09:56:18
Enter value for sql_id: g6px76dmjv1jy


SQL_PLAN_LINE_ID   CNT
---------------- -----
              13    25
              10     7
              4      1
              4      1
                 -----
Total Elapsed       34

And so on until the query has finished fetching the entire set of data in about 527 seconds while both ASH and Elapsed time of the SQL monitor section are showing roughly 34 seconds of elapsed time as shown below respectively:

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)             
 Instance ID         :  1                     
 SQL ID              :  g6px76dmjv1jy                    
 Execution Started   :  06/05/2018 09:54:19   
 First Refresh Time  :  06/05/2018 09:54:25  
 Last Refresh Time   :  06/05/2018 10:03:06   
 Duration            :  527s                  
 Fetch Calls         :  2085            
    
Global Stats
=================================================
| Elapsed |   Cpu   |  IO      | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  |
=================================================
|    34   |    12   |     22   |  2085 |   3M   |
=================================================

SQL> @ashplanlineId
Enter value for datefrom:  06/05/2018 09:54:19
Enter value for dateto:  06/05/2018 09:56:18
Enter value for sql_id: g6px76dmjv1jy


SQL_PLAN_LINE_ID   CNT
---------------- -----
              13    26
              10     7
               4     1
              11     1
               5     1
                 -----
Total Elapsed       36

Using Nenad query, I can also report the same wrong elapsed time from ASH.

column  event format a30
column  cnt   format 9999
compute sum label 'Total Elapsed' of cnt on report

break   on report
select 
   session_state, event
  ,count(1) cnt
from
   gv$active_session_history
 where
   sql_id = '&sql_id'
 and sql_exec_id = (select max(sql_exec_id) 
                    from gv$active_session_history
					where sql_id = '&sql_id'
					)
group by event, session_state
;
SESSION EVENT                            CNT
------- ------------------------------ -----
WAITING direct path read                  26
ON CPU                                    11
                                       -----
Total Elapsed                             37

TKPROF

In a second attempt to make this time difference crystal clear I traced another execution of the same query using the 10046 trace events. I reproduced below the corresponding SQL monitor report and TKPROF summary respectively:

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)             
 Instance ID         :  1                     
 SQL ID              :  g6px76dmjv1jy                    
 Execution Started   :  06/05/2018 09:27:23   
 First Refresh Time  :  06/05/2018 09:27:30  
 Last Refresh Time   :  06/05/2018 09:39:21   
 Duration            :  718s                  
 Fetch Calls         :  2085            
    
Global Stats
=================================================
| Elapsed |   Cpu   |  IO      | Fetch | Buffer |
| Time(s) | Time(s) | Waits(s) | Calls |  Gets  |
=================================================
|    198  |    15   |     183  |  2085 |   3M   |
=================================================


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.21       0.21          0          0          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch     2085     15.18     198.03     647106    2918455          0       31246
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     2087     15.39     198.24     647106    2918455          0       31246


Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                    2085        0.00          0.00
  SQL*Net more data to client                     2        0.00          0.00
  Disk file operations I/O                       90        0.00          0.00
  db file sequential read                     30313        0.09        158.42
  reliable message                                1        0.00          0.00  
  enq: KO – fast object checkpoint                1        0.00          0.00
  direct path read                             1220        0.09         24.76
  SQL*Net message from client                  2085        0.93        519.45
********************************************************************************

As you can see, the Duration of the SQL monitor is showing that the query took 718 seconds while the Elapsed time is showing only 198 seconds (15 sec for CPU and 183 sec for IO waits).
However, the TKPROF file comforts the reliability of the SQL monitor Duration information since it shows that the total elapsed time of the query is:

158.42 + 24.76     = 183.18 seconds for I/O waits
158.42 + 519.45    = 702.63 seconds for I/O waits and  SQL*Net message from client
702.63 + 15.39(cpu)= 718.02 seconds

Naturally, ASH can’t display the correct piece of information since a large part of the query has been spent on the idle SQL*Net message from client wait event:

SQL> @ashplanlineId
Enter value for datefrom:  06/05/2018 09:27:23
Enter value for dateto:  06/05/2018 09:39:21
Enter value for sql_id: g6px76dmjv1jy

SQL_PLAN_LINE_ID   CNT
---------------- -----
              12   153
              13    28
              10     9
              11     6
              14     1
                 -----
Total Elapsed      197

Summary

We have clearly shown that when a query is fetching a large amount of data to the client, this transfer time is not taken into account by the Elapsed time of the SQL monitor Global Stats section. This is why, as mentioned by Nenad in his article, you should always rely on the time reported by the Duration in the Global information of the SQL Monitor instead of the Elapsed time of the Global stats section.

Comments (10)

10 Comments »

Thanks Mohamed for sharing

Comment by Yahya Mezroui — June 9, 2018 @ 8:55 pm | Reply
Hi Mohamed

I met the exact same fetch “issue” 5 days ago.
In SQL Monitor report Duration was around 1500 sec and Elapsed Time was around 500s.

Then I checked DBA_HIST_ACTIVE_SESS_HISTORY (I call it DASH) and I saw that bulk of the processing – the final 1000s – is missing from DASH almost completely. In fact there was only one record from that period, which clearly showed that fetching the data was happening. At that very moment the fetching activity was on CPU.

However there is no problem with the Elapsed Time information. As my very wise colleague Havasi Zoli pointed out Elapsed Time for an activity is DB Time for the database. DB Time measures the (foreground) activity. When the database waits for the client to request the next array of records the database server process is idle. Elapsed Time is not incremented during this wait. The client may process the data it received with the previous fetch. The network roundtrip between server sending the data and receiving the next fetch request plus the client processing in between is just not part of the Elapsed Time.
This matches nicely with the fact that no ASH record is recorded for such idle waits.

So the conclusion is: Duration tells us the wall clock, Elapsed Time tells us the DB Time. If there is a big gap between investigate the network, the client processing or anything that is not part of the DB Time.

Note that the same is valid for the AWR reports. SQL ordered by Elapsed Time is not SQL ordered by Duration. (By the way we should have a section like this in AWR).

In your example:
Elapsed Time + SQL net message from client = Duration
198 + 519 = 718 (Spot the mistake!:)

Regards,
Robert

Comment by vaurob — June 11, 2018 @ 8:53 pm | Reply
Robert

Thanks a lot for your excellent comment and sorry for the delay. I tried to answer you via my phone and realized that my comment has not been sent

Best regards
Mohamed

Comment by hourim — June 18, 2018 @ 3:58 pm | Reply
- Hi Mohamed!
  
  I’m happy to be able to contribute to your really great blog, which I read a lot.
  
  I seize the opportunity to “nonprofit advertise:)” my two database ideas, one of which was born from this blogpost.
  
  https://community.oracle.com/ideas/22037
  https://community.oracle.com/ideas/21032
  
  Please check them and vote if they make sense to You.
  
  Regards,
  Robert
  
  Comment by vaurob — June 18, 2018 @ 9:39 pm | Reply
[…] part I of this short series on SQL Monitoring Duration and Elapsed time information we saw that in the […]

Pingback by SQL Monitor and parsing | Mohamed Houri’s Oracle Notes — June 21, 2018 @ 6:32 pm | Reply
[…] the first article we learned that SQL monitor Elapsed time doesn’t sum up time spent outside the database by idle sessions. In the second one we outlined […]

Pingback by SQL Monitor and parallel processing | Mohamed Houri’s Oracle Notes — June 28, 2018 @ 6:25 pm | Reply
[…] SQL monitor and fetching rows […]

Pingback by SQL Monitor flaw | Mohamed Houri’s Oracle Notes — March 7, 2020 @ 8:13 am | Reply
Hi Mohamed,
Thanks for this useful article.
I have one question, when the fetch operation is taking long time, increasing the fetch block count / size is considered as best practice to reduce the full execution duration ?

Comment by montast5 — April 23, 2020 @ 8:54 am | Reply
Hi Montasser

Yes increasing the fetch size (it depends on the client : Java, SQLPLUS, .NET, etc…) represents the obvious way to reduce the number of round trip between the database server and the client. But bear in mind that you may encouter a memory penalty if you try to use a very big arraysize.

Click to access raising_the_fetch_size_good_or_bad.pdf

Best regards

Comment by hourim — April 24, 2020 @ 7:43 am | Reply
- OK. Thanks for sharing this document.
  
  Comment by montast5 — April 28, 2020 @ 2:43 pm | Reply

RSS feed for comments on this post. TrackBack URI

Mohamed Houri’s Oracle Notes

June 9, 2018