Without any analogy with Sergio Leone’s famous movie “The Good, the Bad and the Ugly” which has brought up my teenage years, I would rather define the above three Oracle features as “The Good, the Good and the Good” as they made my consultancy days very often. This article has been inspired by the last Nenad Noveljic blog post about negative unaccounted-for where he mentioned that the SQL Monitor Global Stats section might show a wrong value in the Elapsed time. In Nenad’s case this misleading piece of information was due to unaccounted-for time. In fact, I know three other situations where this time discrepancy might occur: when fetching data, during a long hard parsing query and when using parallel execution. In this article I will start by demonstrating the fetching data case. Two other articles for the parse and parallel cases will follow to complete the series.
Fetching data
I executed a query which takes few minutes to complete but spent a considerable amount of time fetching a large number of columns with quite a large number of rows. While the query was running I took several snapshots of its corresponding 12cR2 SQL monitor which I am presenting below:
Global Information ------------------------------ Status : EXECUTING Instance ID : 1 SQL ID : g6px76dmjv1jy Execution Started : 06/05/2018 09:54:19 First Refresh Time : 06/05/2018 09:54:25 Last Refresh Time : 06/05/2018 09:54:29 Duration : 11s Global Stats ========================================= | Elapsed | Cpu | IO | Buffer | | Time(s) | Time(s) | Waits(s) | Gets | ========================================= | 10 | 8.36 | 1.34 | 2M | ========================================= SQL Plan Monitoring Details (Plan Hash Value=2278376452) =========================================================================== | Id | Operation | Name | Execs | Rows | | | | | | (Actual) | =========================================================================== | 0 | SELECT STATEMENT | | 1 | | | 1 | NESTED LOOPS OUTER | | 1 | | | 2 | NESTED LOOPS | | 1 | | | 3 | FILTER | | 1 | |
As you can see, both Elapsed time and Duration are showing the same time of 10 seconds. But spot as well that at this stage of the query execution, data was not yet sent to the client (Actual Rows column is empty).
At exactly the same moment, ASH is also showing approximately the same elapsed time as shown below:
column cnt format 9999 compute sum label 'Total Elapsed' of cnt on report break on report select sql_plan_line_id ,count(1) cnt from gv$active_session_history where sample_time between to_date('&datefrom', 'mm/dd/yyyy hh24:mi:ss') and to_date('&dateto', 'mm/dd/yyyy hh24:mi:ss') and sql_id = '&sql_id' group by sql_plan_line_id order by 2 desc; SQL> @ashplanlineId Enter value for datefrom: 06/05/2018 09:54:19 Enter value for dateto: 06/05/2018 09:54:29 Enter value for sql_id: g6px76dmjv1jy SQL_PLAN_LINE_ID CNT ---------------- ----- 10 7 13 1 11 1 ----- Total Elapsed 9
However, as soon as Oracle has started sending data to the client, a discrepancy between Elapsed time in Global Stats section and Duration began to be displayed as shown below:
Global Information ------------------------------ Status : EXECUTING Instance ID : 1 SQL ID : g6px76dmjv1jy Execution Started : 06/05/2018 09:54:19 First Refresh Time : 06/05/2018 09:54:25 Last Refresh Time : 06/05/2018 09:56:18 Duration : 120s Fetch Calls : 375 Global Stats ================================================= | Elapsed | Cpu | IO | Fetch | Buffer | | Time(s) | Time(s) | Waits(s) | Calls | Gets | ================================================= | 32 | 11 | 21 | 375 | 3M | ================================================= SQL Plan Monitoring Details (Plan Hash Value=2278376452) =========================================================================== | Id | Operation | Name | Start | Rows | | | | | Active| (Actual) | =========================================================================== | -> 0 | SELECT STATEMENT | | +30 | 5611 | | -> 1 | NESTED LOOPS OUTER | | +30 | 5611 | | -> 2 | NESTED LOOPS | | +30 | 5611 | | -> 3 | FILTER | | +30 | |
ASH, in accordance with SQL monitor Elapsed time, started also to report a wrong elapsed time as shown below:
SQL> @ashplanlineId Enter value for datefrom: 06/05/2018 09:54:19 Enter value for dateto: 06/05/2018 09:56:18 Enter value for sql_id: g6px76dmjv1jy SQL_PLAN_LINE_ID CNT ---------------- ----- 13 25 10 7 4 1 4 1 ----- Total Elapsed 34
And so on until the query has finished fetching the entire set of data in about 527 seconds while both ASH and Elapsed time of the SQL monitor section are showing roughly 34 seconds of elapsed time as shown below respectively:
Global Information ------------------------------ Status : DONE (ALL ROWS) Instance ID : 1 SQL ID : g6px76dmjv1jy Execution Started : 06/05/2018 09:54:19 First Refresh Time : 06/05/2018 09:54:25 Last Refresh Time : 06/05/2018 10:03:06 Duration : 527s Fetch Calls : 2085 Global Stats ================================================= | Elapsed | Cpu | IO | Fetch | Buffer | | Time(s) | Time(s) | Waits(s) | Calls | Gets | ================================================= | 34 | 12 | 22 | 2085 | 3M | ================================================= SQL> @ashplanlineId Enter value for datefrom: 06/05/2018 09:54:19 Enter value for dateto: 06/05/2018 09:56:18 Enter value for sql_id: g6px76dmjv1jy SQL_PLAN_LINE_ID CNT ---------------- ----- 13 26 10 7 4 1 11 1 5 1 ----- Total Elapsed 36
Using Nenad query, I can also report the same wrong elapsed time from ASH.
column event format a30 column cnt format 9999 compute sum label 'Total Elapsed' of cnt on report break on report select session_state, event ,count(1) cnt from gv$active_session_history where sql_id = '&sql_id' and sql_exec_id = (select max(sql_exec_id) from gv$active_session_history where sql_id = '&sql_id' ) group by event, session_state ; SESSION EVENT CNT ------- ------------------------------ ----- WAITING direct path read 26 ON CPU 11 ----- Total Elapsed 37
TKPROF
In a second attempt to make this time difference crystal clear I traced another execution of the same query using the 10046 trace events. I reproduced below the corresponding SQL monitor report and TKPROF summary respectively:
Global Information ------------------------------ Status : DONE (ALL ROWS) Instance ID : 1 SQL ID : g6px76dmjv1jy Execution Started : 06/05/2018 09:27:23 First Refresh Time : 06/05/2018 09:27:30 Last Refresh Time : 06/05/2018 09:39:21 Duration : 718s Fetch Calls : 2085 Global Stats ================================================= | Elapsed | Cpu | IO | Fetch | Buffer | | Time(s) | Time(s) | Waits(s) | Calls | Gets | ================================================= | 198 | 15 | 183 | 2085 | 3M | ================================================= call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.21 0.21 0 0 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 2085 15.18 198.03 647106 2918455 0 31246 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 2087 15.39 198.24 647106 2918455 0 31246 Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 2085 0.00 0.00 SQL*Net more data to client 2 0.00 0.00 Disk file operations I/O 90 0.00 0.00 db file sequential read 30313 0.09 158.42 reliable message 1 0.00 0.00 enq: KO – fast object checkpoint 1 0.00 0.00 direct path read 1220 0.09 24.76 SQL*Net message from client 2085 0.93 519.45 ********************************************************************************
As you can see, the Duration of the SQL monitor is showing that the query took 718 seconds while the Elapsed time is showing only 198 seconds (15 sec for CPU and 183 sec for IO waits).
However, the TKPROF file comforts the reliability of the SQL monitor Duration information since it shows that the total elapsed time of the query is:
158.42 + 24.76 = 183.18 seconds for I/O waits 158.42 + 519.45 = 702.63 seconds for I/O waits and SQL*Net message from client 702.63 + 15.39(cpu)= 718.02 seconds
Naturally, ASH can’t display the correct piece of information since a large part of the query has been spent on the idle SQL*Net message from client wait event:
SQL> @ashplanlineId Enter value for datefrom: 06/05/2018 09:27:23 Enter value for dateto: 06/05/2018 09:39:21 Enter value for sql_id: g6px76dmjv1jy SQL_PLAN_LINE_ID CNT ---------------- ----- 12 153 13 28 10 9 11 6 14 1 ----- Total Elapsed 197
Summary
We have clearly shown that when a query is fetching a large amount of data to the client, this transfer time is not taken into account by the Elapsed time of the SQL monitor Global Stats section. This is why, as mentioned by Nenad in his article, you should always rely on the time reported by the Duration in the Global information of the SQL Monitor instead of the Elapsed time of the Global stats section.
Thanks Mohamed for sharing
Comment by Yahya Mezroui — June 9, 2018 @ 8:55 pm |
Hi Mohamed
I met the exact same fetch “issue” 5 days ago.
In SQL Monitor report Duration was around 1500 sec and Elapsed Time was around 500s.
Then I checked DBA_HIST_ACTIVE_SESS_HISTORY (I call it DASH) and I saw that bulk of the processing – the final 1000s – is missing from DASH almost completely. In fact there was only one record from that period, which clearly showed that fetching the data was happening. At that very moment the fetching activity was on CPU.
However there is no problem with the Elapsed Time information. As my very wise colleague Havasi Zoli pointed out Elapsed Time for an activity is DB Time for the database. DB Time measures the (foreground) activity. When the database waits for the client to request the next array of records the database server process is idle. Elapsed Time is not incremented during this wait. The client may process the data it received with the previous fetch. The network roundtrip between server sending the data and receiving the next fetch request plus the client processing in between is just not part of the Elapsed Time.
This matches nicely with the fact that no ASH record is recorded for such idle waits.
So the conclusion is: Duration tells us the wall clock, Elapsed Time tells us the DB Time. If there is a big gap between investigate the network, the client processing or anything that is not part of the DB Time.
Note that the same is valid for the AWR reports. SQL ordered by Elapsed Time is not SQL ordered by Duration. (By the way we should have a section like this in AWR).
In your example:
Elapsed Time + SQL net message from client = Duration
198 + 519 = 718 (Spot the mistake!:)
Regards,
Robert
Comment by vaurob — June 11, 2018 @ 8:53 pm |
Robert
Thanks a lot for your excellent comment and sorry for the delay. I tried to answer you via my phone and realized that my comment has not been sent
Best regards
Mohamed
Comment by hourim — June 18, 2018 @ 3:58 pm |
Hi Mohamed!
I’m happy to be able to contribute to your really great blog, which I read a lot.
I seize the opportunity to “nonprofit advertise:)” my two database ideas, one of which was born from this blogpost.
https://community.oracle.com/ideas/22037
https://community.oracle.com/ideas/21032
Please check them and vote if they make sense to You.
Regards,
Robert
Comment by vaurob — June 18, 2018 @ 9:39 pm |
[…] part I of this short series on SQL Monitoring Duration and Elapsed time information we saw that in the […]
Pingback by SQL Monitor and parsing | Mohamed Houri’s Oracle Notes — June 21, 2018 @ 6:32 pm |
[…] the first article we learned that SQL monitor Elapsed time doesn’t sum up time spent outside the database by idle sessions. In the second one we outlined […]
Pingback by SQL Monitor and parallel processing | Mohamed Houri’s Oracle Notes — June 28, 2018 @ 6:25 pm |
[…] SQL monitor and fetching rows […]
Pingback by SQL Monitor flaw | Mohamed Houri’s Oracle Notes — March 7, 2020 @ 8:13 am |
Hi Mohamed,
Thanks for this useful article.
I have one question, when the fetch operation is taking long time, increasing the fetch block count / size is considered as best practice to reduce the full execution duration ?
Comment by montast5 — April 23, 2020 @ 8:54 am |
Hi Montasser
Yes increasing the fetch size (it depends on the client : Java, SQLPLUS, .NET, etc…) represents the obvious way to reduce the number of round trip between the database server and the client. But bear in mind that you may encouter a memory penalty if you try to use a very big arraysize.
Click to access raising_the_fetch_size_good_or_bad.pdf
Best regards
Comment by hourim — April 24, 2020 @ 7:43 am |
OK. Thanks for sharing this document.
Comment by montast5 — April 28, 2020 @ 2:43 pm |