Mohamed Houri’s Oracle Notes

June 23, 2015

Real Time SQL Monitoring oddity

Filed under: Oracle,Sql Plan Managment — hourim @ 1:45 pm

This is a small note about a situation I have encountered and which I thought it is worth sharing with you. There was an insert/select executing in parallel DOP 16 on a 11.2.0.3 Oracle database for which the end user was complaining about the exceptional time it was taking without completing. Since the job was still running I tried getting its Real Time SQL monitoring report:


Global Information
------------------------------
 Status              :  DONE (ERROR)        
 Instance ID         :  1                   
 Session             :  XXXXX (392:229)    
 SQL ID              :  bbccngk0nn2z2       
 SQL Execution ID    :  16777216            
 Execution Started   :  06/22/2015 11:57:06 
 First Refresh Time  :  06/22/2015 11:57:06 
 Last Refresh Time   :  06/22/2015 11:57:46 
 Duration            :  40s                 

Global Stats
=================================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency |  Other   | Buffer | Read | Read  | Write | Write |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) |  Gets  | Reqs | Bytes | Reqs  | Bytes |
=================================================================================================
|   15315 |   15220 |       54 |        0.38 |       40 |     2G | 8601 |   2GB |  5485 |   1GB |

The insert/select according to the above report summary is DONE with (ERROR).
So why the end user is still complaining about the not ending batch job? And why he didn’t receive an error?

After having ruled out the resumable time out hypothesis I came back to the v$sql_monitor and issued the following two selects:

SQL> SELECT
  2    sql_id,
  3    process_name,
  4    status
  5  FROM v$sql_monitor
  6  WHERE sql_id = 'bbccngk0nn2z2'
  7  AND status   ='EXECUTING'
  8  ORDER BY process_name ;

SQL_ID        PROCE STATUS
------------- ----- ------------
bbccngk0nn2z2 p000  EXECUTING
bbccngk0nn2z2 p001  EXECUTING
bbccngk0nn2z2 p002  EXECUTING
bbccngk0nn2z2 p003  EXECUTING
bbccngk0nn2z2 p004  EXECUTING
bbccngk0nn2z2 p005  EXECUTING
bbccngk0nn2z2 p006  EXECUTING
bbccngk0nn2z2 p007  EXECUTING
bbccngk0nn2z2 p008  EXECUTING
bbccngk0nn2z2 p009  EXECUTING
bbccngk0nn2z2 p010  EXECUTING
bbccngk0nn2z2 p011  EXECUTING
bbccngk0nn2z2 p012  EXECUTING
bbccngk0nn2z2 p013  EXECUTING
bbccngk0nn2z2 p014  EXECUTING
bbccngk0nn2z2 p015  EXECUTING
bbccngk0nn2z2 p019  EXECUTING
bbccngk0nn2z2 p031  EXECUTING

SQL> SELECT
  2    sql_id,
  3    process_name,
  4    status
  5  FROM v$sql_monitor
  6  WHERE sql_id = 'bbccngk0nn2z2'
  7  AND status   ='DONE (ERROR)'
  8  ORDER BY process_name ;

SQL_ID        PROCE STATUS
------------- ----- -------------------
bbccngk0nn2z2 ora   DONE (ERROR)
bbccngk0nn2z2 p016  DONE (ERROR)
bbccngk0nn2z2 p017  DONE (ERROR)
bbccngk0nn2z2 p018  DONE (ERROR)
bbccngk0nn2z2 p020  DONE (ERROR)
bbccngk0nn2z2 p021  DONE (ERROR)
bccngk0nn2z2  p022  DONE (ERROR)
bbccngk0nn2z2 p023  DONE (ERROR)
bbccngk0nn2z2 p024  DONE (ERROR)
bbccngk0nn2z2 p025  DONE (ERROR)
bbccngk0nn2z2 p026  DONE (ERROR)
bbccngk0nn2z2 p027  DONE (ERROR)
bbccngk0nn2z2 p028  DONE (ERROR)
bbccngk0nn2z2 p029  DONE (ERROR)
bbccngk0nn2z2 p030  DONE (ERROR)

Among the 32 parallel servers half are executing and half are in error! How could this be possible? I have already been confronted to a parallel process that ends in its entirety when a single parallel server is in error. For example I have encountered several times the following error which is due to a parallel broadcast distribution of a high data row source exploding henceforth the TEMP tablespace:

ERROR at line 1:
ORA-12801: error signaled in parallel query server P013
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

A simple select against v$active_session_history confirmed that the insert/select is still running and it is consuming CPU

SQL> select sql_id, count(1)
  2  from gv$active_session_history
  3  where sample_time between to_date('22062015 12:30:00', 'ddmmyyyy hh24:mi:ss')
  4                    and     to_date('22062015 13:00:00', 'ddmmyyyy hh24:mi:ss')
  5  group by  sql_id
  6  order by 2 desc;

SQL_ID          COUNT(1)
------------- ----------
bbccngk0nn2z2       2545
                       4
0uuczutvk6jqj          1
8f1sjvfxuup9w          1

SQL> select decode(event,null, 'on cpu', event), count(1)
  2  from gv$active_session_history
  3  where sample_time between to_date('22062015 12:30:00', 'ddmmyyyy hh24:mi:ss')
  4                    and     to_date('22062015 13:00:00', 'ddmmyyyy hh24:mi:ss')
  5  and sql_id = 'bbccngk0nn2z2'
  6  group by  event
  7  order by 2 desc;

DECODE(EVENT,NULL,'ONCPU',EVENT)  COUNT(1)
--------------------------------  ---------
on cpu                            5439
db file sequential read           3

SQL> /

DECODE(EVENT,NULL,'ONCPU',EVENT)  COUNT(1)
--------------------------------- ---------
on cpu                            5460
db file sequential read           3

SQL> /

DECODE(EVENT,NULL,'ONCPU',EVENT)  COUNT(1)
--------------------------------  ---------
on cpu                            5470
db file sequential read           3

And after a while


SQL> /

DECODE(EVENT,NULL,'ONCPU',EVENT)   COUNT(1)
---------------------------------- ---------
on cpu                             15152
db file sequential read            9

While the parallel insert is still running I took several SQL monitoring reports of which the two followings ones:

Parallel Execution Details (DOP=16 , Servers Allocated=32)
============================================================================================
|      Name      | Type  | Server# | Elapsed |Buffer | Read  |         Wait Events         |
|                |       |         | Time(s) | Gets  | Bytes |         (sample #)          |
============================================================================================
| PX Coordinator | QC    |         |    0.48 |  2531 | 16384 |                             |
| p000           | Set 1 |       1 |    1049 |  128M |  63MB | direct path read (1)        |
| p001           | Set 1 |       2 |    1518 |  222M |  61MB |                             |
| p002           | Set 1 |       3 |     893 |  109M |  59MB |                             |
| p003           | Set 1 |       4 |    1411 |  194M |  62MB | direct path read (1)        |
| p004           | Set 1 |       5 |     460 |   64M |  62MB | direct path read (1)        |
| p005           | Set 1 |       6 |     771 |   87M | 322MB | direct path read (1)        |
|                |       |         |         |       |       | direct path read temp (5)   |
| p006           | Set 1 |       7 |     654 |   67M |  62MB | direct path read (1)        |
| p007           | Set 1 |       8 |     179 |   24M |  55MB | direct path read (1)        |
| p008           | Set 1 |       9 |    1638 |  235M |  70MB |                             |
| p009           | Set 1 |      10 |     360 |   46M |  54MB | direct path read (1)        |
| p010           | Set 1 |      11 |    1920 |  294M | 337MB | direct path read temp (6)   | --> 1920s
| p011           | Set 1 |      12 |     289 |   30M |  69MB |                             |
| p012           | Set 1 |      13 |     839 |   98M |  66MB | direct path read (1)        |
| p013           | Set 1 |      14 |     524 |   63M |  55MB |                             |
| p014           | Set 1 |      15 |    1776 |  263M |  69MB |                             |
| p015           | Set 1 |      16 |    1016 |  130M |  61MB | direct path read (1)        |
| p016           | Set 2 |       1 |    0.22 |  1166 |   3MB |                             |
| p017           | Set 2 |       2 |    1.36 |  6867 |  51MB |                             |
| p018           | Set 2 |       3 |    1.02 |  1298 |  36MB |                             |
| p019           | Set 2 |       4 |    6.71 |  2313 | 129MB | direct path read temp (2)   |
| p020           | Set 2 |       5 |    0.40 |   978 |  16MB |                             |
| p021           | Set 2 |       6 |    1.32 |  8639 |  41MB | direct path read temp (1)   |
| p022           | Set 2 |       7 |    0.18 |   896 |   2MB |                             |
| p023           | Set 2 |       8 |    0.23 |   469 |   9MB |                             | --> 0.23s
| p024           | Set 2 |       9 |    0.52 |  3635 |  19MB |                             | --> 0.52s
| p025           | Set 2 |      10 |    0.33 |  1163 |   3MB |                             |
| p026           | Set 2 |      11 |    0.65 |   260 |  31MB | db file sequential read (1) |
| p027           | Set 2 |      12 |    0.21 |  1099 |   6MB |                             |
| p028           | Set 2 |      13 |    0.58 |   497 |  20MB |                             |
| p029           | Set 2 |      14 |    1.43 |  4278 |  54MB |                             |
| p030           | Set 2 |      15 |    0.30 |  3481 |   8MB |                             |
| p031           | Set 2 |      16 |    2.86 |   517 |  91MB |                             |
============================================================================================


Parallel Execution Details (DOP=16 , Servers Allocated=32)
=============================================================================================
|      Name      | Type  | Server# | Elapsed | Buffer | Read  |         Wait Events         |
|                |       |         | Time(s) |  Gets  | Bytes |         (sample #)          |
=============================================================================================
| PX Coordinator | QC    |         |    0.48 |   2531 | 16384 |                             |
| p000           | Set 1 |       1 |    1730 |   202M |  63MB | direct path read (1)        |
| p001           | Set 1 |       2 |    2416 |   351M |  61MB |                             |
| p002           | Set 1 |       3 |    1094 |   133M |  59MB |                             |
| p003           | Set 1 |       4 |    2528 |   348M |  64MB | direct path read (1)        |
| p004           | Set 1 |       5 |     965 |   129M |  63MB | direct path read (1)        |
| p005           | Set 1 |       6 |    1089 |   129M | 322MB | direct path read (1)        |
|                |       |         |         |        |       | direct path read temp (5)   |
| p006           | Set 1 |       7 |    1459 |   165M |  62MB | direct path read (1)        |
| p007           | Set 1 |       8 |     221 |    30M |  55MB | direct path read (1)        |
| p008           | Set 1 |       9 |    2640 |   357M |  70MB |                             |
| p009           | Set 1 |      10 |     952 |   115M |  54MB | direct path read (1)        |
| p010           | Set 1 |      11 |    3117 |   471M | 337MB | direct path read temp (6)   | --> 3117s
| p011           | Set 1 |      12 |     400 |    42M |  69MB |                             |
| p012           | Set 1 |      13 |    1621 |   195M |  66MB | direct path read (1)        |
| p013           | Set 1 |      14 |    1126 |   132M |  55MB |                             |
| p014           | Set 1 |      15 |    2662 |   370M |  72MB |                             |
| p015           | Set 1 |      16 |    1194 |   147M |  61MB | direct path read (1)        |
| p016           | Set 2 |       1 |    0.22 |   1166 |   3MB |                             |
| p017           | Set 2 |       2 |    1.36 |   6867 |  51MB |                             |
| p018           | Set 2 |       3 |    1.02 |   1298 |  36MB |                             |
| p019           | Set 2 |       4 |    6.72 |   2313 | 131MB | direct path read temp (2)   |
| p020           | Set 2 |       5 |    0.40 |    978 |  16MB |                             |
| p021           | Set 2 |       6 |    1.32 |   8639 |  41MB | direct path read temp (1)   |
| p022           | Set 2 |       7 |    0.18 |    896 |   2MB |                             |
| p023           | Set 2 |       8 |    0.23 |    469 |   9MB |                             | --> 0.23s
| p024           | Set 2 |       9 |    0.52 |   3635 |  19MB |                             | --> 0.52s
| p025           | Set 2 |      10 |    0.33 |   1163 |   3MB |                             |
| p026           | Set 2 |      11 |    0.65 |    260 |  31MB | db file sequential read (1) |
| p027           | Set 2 |      12 |    0.21 |   1099 |   6MB |                             |
| p028           | Set 2 |      13 |    0.58 |    497 |  20MB |                             |
| p029           | Set 2 |      14 |    1.43 |   4278 |  54MB |                             |
| p030           | Set 2 |      15 |    0.30 |   3481 |   8MB |                             |
| p031           | Set 2 |      16 |    2.89 |    517 |  92MB |                             |
=============================================================================================

If you look carefully to the above reports you will notice that the elapsed time of the parallel servers mentioned being in ERROR (p16-p30) is not increasing in contrast to the elapsed time of the parallel servers mentioned being in EXECUTION (p0-p15) which is continuously increasing.

Thanks to Randolf Geist (again) I knew that there is a bug in Real time SQL monitoring report which occurs when a parallel server is not working for more than 30 minutes. In such a case the Real time SQL monitoring will starts showing those parallel severs in in ERROR confusing the situation.

As far as I was able to reproduce the issue I started the process again at 16h03 and I kept executing the following select from time to time having no rows for each execution

SELECT 
  sql_id,
  process_name,
  status
FROM v$sql_monitor
WHERE sql_id = '5np4u0m0h69jx' –- changed a little bit the sql_id
AND status   ='DONE (ERROR)'
ORDER BY process_name ;

no rows selected

Until at around 16h37 i.e. after 30 minutes (and a little bit more) of execution the above select started showing processes in error:

SQL> SELECT
  2    sql_id,
  3    process_name,
  4    status
  5  FROM v$sql_monitor
  6  WHERE sql_id = '5np4u0m0h69jx'
  7  AND status   ='DONE (ERROR)'
  8  ORDER BY process_name ;

SQL_ID        PROCE STATUS
------------- ----- ---------------
5np4u0m0h69jx ora   DONE (ERROR)
5np4u0m0h69jx p016  DONE (ERROR)
5np4u0m0h69jx p017  DONE (ERROR)
5np4u0m0h69jx p018  DONE (ERROR)
5np4u0m0h69jx p020  DONE (ERROR)
5np4u0m0h69jx p021  DONE (ERROR)
5np4u0m0h69jx p022  DONE (ERROR)
5np4u0m0h69jx p023  DONE (ERROR)
5np4u0m0h69jx p024  DONE (ERROR)
5np4u0m0h69jx p025  DONE (ERROR)
5np4u0m0h69jx p026  DONE (ERROR)
5np4u0m0h69jx p027  DONE (ERROR)
5np4u0m0h69jx p028  DONE (ERROR)
5np4u0m0h69jx p029  DONE (ERROR)
5np4u0m0h69jx p030  DONE (ERROR)

At the very beginning of the process several parallel servers was not running while several others were busy. And when the first parallel server (p10 in this case) reaches more than 1800 seconds (1861 seconds in this case) the real time Sql monitoring started showing the not working parallel servers in ERROR.

Bottom line: don’t be confused (as I have been) by that DONE (ERROR) status, your SQL statement might still be running consuming time and energy despite this wrong real time SQL monitoring reporting status

2 Comments »

  1. […]  Part 2: where I have explained the oddity shown by the SQL monitoring report when monitoring non active parallel server for more than 30 minutes. […]

    Pingback by Don’t pre-empt the CBO from doing its work | Mohamed Houri’s Oracle Notes — July 2, 2015 @ 2:03 pm | Reply

  2. […] Houri about what happens when a parallel slave is still assigned to an executing statement but isn’t given any work to do for a long time; and an item from Stefan Koehler about _px_trace and tracking down why the […]

    Pingback by PQ Index anomaly | Oracle Scratchpad — July 15, 2015 @ 7:35 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

Tony's Oracle Tips

Tony Hasler's light hearted approach to learning about Oracle

Richard Foote's Oracle Blog

Focusing Specifically On Oracle Indexes, Database Administration and Some Great Music

Hatem Mahmoud Oracle's blog

Just another Oracle blog : Database topics and techniques

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Diagnostician

Performance troubleshooting as exact science

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

EU Careers info

Your career in the European Union

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)

%d bloggers like this: