Mohamed Houri’s Oracle Notes

July 9, 2015

Stressed ASH

Filed under: Oracle — hourim @ 5:29 pm

It is well known that any record found in dba_hist_active_session_history has inevitably been routed there from v$active_session_history. If so, then how could we interpret the following cut & past from a running production system?

ASH first

SQL> select event, count(1)
    from gv$active_session_history
    where sample_time between to_date('06072015 18:30:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('06072015 19:30:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
                                                                        372
direct path read                                                        185
log file parallel write                                                  94
Disk file Mirror Read                                                    22
control file sequential read                                             20
control file parallel write                                              18
direct path write temp                                                   16
Streams AQ: qmn coordinator waiting for slave to start                   12
db file parallel read                                                    11
gc cr multi block request                                                 6
enq: KO - fast object checkpoint                                          4
db file sequential read                                                   3
ges inquiry response                                                      3
os thread startup                                                         2
PX Deq: Signal ACK RSG                                                    2
enq: CF - contention                                                      1
PX Deq: Slave Session Stats                                               1
Disk file operations I/O                                                  1
IPC send completion sync                                                  1
reliable message                                                          1
null event                                                                1
enq: CO - master slave det                                                1
db file parallel write                                                    1
gc current block 2-way                                                    1

AWR next

SQL> select event, count(1)
    from dba_hist_active_sess_history
    where sample_time between to_date('06072015 18:30:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('06072015 19:30:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
SQL*Net break/reset to client                                         12950
enq: TM - contention                                                  12712
                                                                        624
db file sequential read                                                 386
enq: TX - row lock contention                                           259
SQL*Net message from dblink                                              74
direct path read                                                         62
SQL*Net more data from dblink                                            27
log file parallel write                                                  26
log file sync                                                            15
SQL*Net more data from client                                             9
control file sequential read                                              7
Disk file Mirror Read                                                     6
gc cr grant 2-way                                                         5
db file parallel write                                                    4
read by other session                                                     3
control file parallel write                                               3
Streams AQ: qmn coordinator waiting for slave to start                    3
log file sequential read                                                  2
direct path read temp                                                     2
enq: KO - fast object checkpoint                                          2
gc cr multi block request                                                 1
CSS initialization                                                        1
gc current block 2-way                                                    1
reliable message                                                          1
db file parallel read                                                     1
gc buffer busy acquire                                                    1
ges inquiry response                                                      1
direct path write temp                                                    1
rdbms ipc message                                                         1
os thread startup                                                         1

12,950 snapshots of SQL*Net break/reset to client and 12,712 snapshots of an enq: TM – contention wait events in AWR not found in ASH. How can we interpret this situation?

This 11.2.0.4.0 database is implemented under a RAC infrastructure with 2 instances. Let’s look at the ASH of the two instances separately

Instance 1 first

SQL> select event, count(1)
    from v$active_session_history
    where sample_time between to_date('06072015 18:30:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('06072015 19:30:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;

 no rows selected

Instance 2 next

SQL> select event, count(1)
    from v$active_session_history
    where sample_time between to_date('06072015 18:30:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('06072015 19:30:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
                                                                        372
direct path read                                                        185
log file parallel write                                                  94
Disk file Mirror Read                                                    22
control file sequential read                                             20
control file parallel write                                              18
direct path write temp                                                   16
Streams AQ: qmn coordinator waiting for slave to start                   12
db file parallel read                                                    11
gc cr multi block request                                                 6
enq: KO - fast object checkpoint                                          4
db file sequential read                                                   3
ges inquiry response                                                      3
os thread startup                                                         2
PX Deq: Signal ACK RSG                                                    2
enq: CF - contention                                                      1
PX Deq: Slave Session Stats                                               1
Disk file operations I/O                                                  1
IPC send completion sync                                                  1
reliable message                                                          1
null event                                                                1
enq: CO - master slave det                                                1
db file parallel write                                                    1
gc current block 2-way                                                    1

All what is sampled in ASH in that specific time interval is coming from the second instance while the first instance doesn’t report any record for the corresponding time interval. This inevitably questions either the ash size of instance one or an imbalanced workload between the two instances:

ASH size first

SQL> select
  2        inst_id
  3        ,total_size
  4      from gv$ash_info;

   INST_ID TOTAL_SIZE
---------- ----------
         1  100663296
         2  100663296

ASH Activity next

SQL> select
        inst_id
       ,total_size
       ,awr_flush_emergency_count
     from gv$ash_info;

   INST_ID TOTAL_SIZE AWR_FLUSH_EMERGENCY_COUNT
---------- ---------- -------------------------
         1  100663296                       136
         2  100663296                         0

Typically the activity is mainly oriented towards instance 1 and the abnormal and unusual 12,712 SQL*Net break/reset to client wait events have exacerbated the rate of insert into ASH buffers of instance one generating the 136 awr_flush_emergency_count and, as such, the discrepancies between ASH and AWR.

This is also confirmed by the difference in the ASH retention period between the two instances

Instance 1 first where only 3 hours of ASH data are kept

SQL> select min(sample_time), max(sample_time)
  2  from v$active_session_history;

MIN(SAMPLE_TIME)                         MAX(SAMPLE_TIME)
---------------------------------------  -------------------------
08-JUL-15 05.51.20.502 AM                08-JUL-15 08.35.48.233 AM

Instance 2 next where several days worth of ASH data are still present

SQL> select min(sample_time), max(sample_time)
  2  from v$active_session_history;

MIN(SAMPLE_TIME)                         MAX(SAMPLE_TIME)
---------------------------------------  -------------------------
25-JUN-15 20.01.43                       08-JUL-15 08.37.17.233 AM

The solution would be one of the following points (I think in the order of priority):

  • Solve this SQL*Net break/reset to client issue which is dramatically filling up the ash buffer causing unexpected rapid flush of important and more precise data
  • Balance the work load activity between the two instances
  • Increase the ash size of the instance 1 by means of alter system set “_ash_size”=25165824;

In the next article I will explain how I have identified what is causing this unusual SQL*Net break/reset to client wait events.

July 2, 2015

Don’t pre-empt the CBO from doing its work

Filed under: Oracle — hourim @ 2:03 pm

This is the last part of the parallel insert/select saga. As a reminder below is the two preceding episodes:

  •  Part 1: where I have explained why I was unable to get the corresponding SQL monitoring report because of the _sqlmon_max_planlines parameter.
  •  Part 2: where I have explained the oddity shown by the SQL monitoring report when monitoring non active parallel server for more than 30 minutes.

In Part 3 I will share with you how I have succeeded to solve this issue and convinced people to not pre-empt the Oracle optimizer from doing its work.

Thanks to the monitoring of this insert/select I have succeeded to isolate the part of the execution plan that needs absolutely to be tuned:

Error: ORA-12805
------------------------------
ORA-12805: parallel query server died unexpectedly

Global Information
------------------------------
 Status                                 :  DONE (ERROR)
 Instance ID                            :  2
 SQL ID                                 :  bg7h7s8sb5mnt
 SQL Execution ID                       :  33554432
 Execution Started                      :  06/24/2015 05:06:14
 First Refresh Time                     :  06/24/2015 05:06:21
 Last Refresh Time                      :  06/24/2015 09:05:10
 Duration                               :  14336s
 DOP Downgrade                          :  50%                 

Global Stats
============================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | Cluster  |  Other   | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Waits(s) |  Gets  | Reqs | Bytes |
============================================================================================
|   38403 |   35816 |     0.42 |        2581 |     0.16 |     6.09 |     7G |  103 | 824KB |
============================================================================================

SQL Plan Monitoring Details (Plan Hash Value=3668294770)
======================================================================================================
| Id  |                Operation         |             Name  |  Rows   | Execs |   Rows   | Activity |
|     |                                  |                   | (Estim) |       | (Actual) |   (%)    |
======================================================================================================
| 357 |VIEW PUSHED PREDICATE             | NAEHCE            |      59 | 23570 |    23541 |          |
| 358 | NESTED LOOPS                     |                   |      2M | 23570 |    23541 |     0.05 |
| 359 |  INDEX FAST FULL SCAN            | TABLEIND1         |   27077 | 23570 |     667M |     0.19 |
| 360 |  VIEW                            | VW_JF_SET$E6DCA8A3|       1 |  667M |    23541 |     0.10 |
| 361 |   UNION ALL PUSHED PREDICATE     |                   |         |  667M |    23541 |    30.59 |
| 362 |    NESTED LOOPS                  |                   |       1 |  667M |     1140 |     0.12 |
| 363 |     TABLE ACCESS BY INDEX ROWID  | TABLE2            |       1 |  667M |    23566 |     1.25 |
| 364 |      INDEX UNIQUE SCAN           | IDX_TABLE2        |       1 |  667M |     667M |    17.81 |
| 365 |     TABLE ACCESS BY INDEX ROWID  | TABLE3            |       1 | 23566 |     1140 |          |
| 366 |      INDEX RANGE SCAN            | IDX_TABLE3        |      40 | 23566 |     174K |          |
| 367 |    NESTED LOOPS                  |                   |       1 |  667M |    22401 |     0.11 |
| 368 |     TABLE ACCESS BY INDEX ROWID  | TABLE2            |       1 |  667M |    23566 |     1.27 |
| 369 |      INDEX UNIQUE SCAN           | IDX_TABLE2        |       1 |  667M |     667M |    17.72 |
| 370 |     TABLE ACCESS BY INDEX ROWID  | TABLE3            |       1 | 23566 |    22401 |     0.01 |
| 371 |      INDEX RANGE SCAN            | TABLE31           |      36 | 23566 |       4M |          |

The NESTED LOOPS operation at line 358 has an INDEX FAST FULL SCAN (TABLEIND1) as an outer data source driven an inner data row source represented by an internal view (VW_JF_SET$E6DCA8A3) built by Oracle on the fly. Reduced to the bare minimum it should resemble to this:

SQL Plan Monitoring Details (Plan Hash Value=3668294770)
=====================================================================================
| Id  |                 Operation |             Name   |  Rows   | Execs |   Rows   |
|     |                           |                    | (Estim) |       | (Actual) |
=====================================================================================
| 358 |  NESTED LOOPS             |                    |      2M | 23570 |    23541 |
| 359 |   INDEX FAST FULL SCAN    | TABLEIND1          |   27077 | 23570 |     667M |
| 360 |   VIEW                    | VW_JF_SET$E6DCA8A3 |       1 |  667M |    23541 |

Observe carefully operation at line 359 which is the operation upon which Oracle makes its join method choice. Very often a NESTED LOOPS operation is wrongly chosen by the optimizer because of not accurate estimations made at the first operation of the NESTED LOOPS join. Let’s check the accuracy of the estimation done in this case by Oracle for operation at line 359:

   Rows(Estim) * Execs = 27077 * 23570 = 638204890 ~ 638M
   Rows(Actual = 667M

Estimations done by the optimizer at this step are good. So why in earth Oracle will decide to opt for a NESTED LOOPS operation when it knows prior the execution that the outer data row set will produce 667M of rows inducing the inner operations to be executed 667M times? There is no way that Oracle will opt for this solution unless it is instructed to do so. And indeed, looking to the huge insert/select statement I found, among a tremendous amount of hints, a use_nl (o h) hint which dictates the optimizer to join the TABLEIND table with the rest of the view using a NESTED LOOPS operation. It was then a battle to convince the client that he has to get rid of that hint. What makes the client hesitating is that very often the same insert/select statement (including the use_nl hint) completes in an acceptable time. I was then obliged to explain why despite the presence of the use_nl hint (I am suggesting to be the problem of the performance degradation) the insert/select very often completes in an acceptable execution time. To explain this situation it suffices to get the execution plan of the acceptable execution time (reduced to the bare minimum) and spot the obvious:

SQL Plan Monitoring Details (Plan Hash Value=367892000)
====================================================================================
| Id  |                Operation |             Name   |  Rows   | Execs |   Rows   |
|     |                          |                    | (Estim) |       | (Actual) |
====================================================================================
| 168 |VIEW PUSHED PREDICATE     | NAEHCE             |       1 | 35118 |    35105 |
| 169 | NESTED LOOPS             |                    |       2 | 35118 |    35105 |
| 170 |  VIEW                    | VW_JF_SET$86BE946E |       2 | 35118 |    35105 |
| 182 |  INDEX UNIQUE SCAN       | TABLEIND1          |       1 | 35105 |    35105 |

The join order switched from (TABLEIND1, VW_JF_SET$86BE946E) to (VW_JF_SET$86BE946E,TABLEIND1). As far as the use_nl (o h) hint is not completed by a leading (h o) hint in order to indicate in what order Oracle has to join this two objects, then the choice of the important outer operation is left to Oracle. When the index is chosen as the outer operation, the insert/select statement performs very poorly. However when the same index is used as the inner operation of the join then the insert/select statement performs in an acceptable time.

With that explained, the client has been convinced, the hints disabled and the insert/select re-launched and completed within few seconds thanks to the approriate HASH JOIN operation used by the optimizer:

Global Information
------------------------------
 Status                                 :  DONE
 Instance ID                            :  2
 SQL ID                                 :  9g2a3gstkr7dv
 SQL Execution ID                       :  33554432
 Execution Started                      :  06/24/2015 12:53:49
 First Refresh Time                     :  06/24/2015 12:53:52
 Last Refresh Time                      :  06/24/2015 12:54:05
 Duration                               :  16s                      

Global Stats
============================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | Cluster  |  Other   | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Waits(s) |  Gets  | Reqs | Bytes |
============================================================================================
|      23 |      21 |     0.91 |        0.03 |     0.22 |     0.31 |     1M |  187 |   1MB |
============================================================================================

SQL Plan Monitoring Details (Plan Hash Value=3871743977)
=================================================================================================
| Id  |                           Operation   |             Name   |  Rows   | Execs |   Rows   |
|     |                                       |                    | (Estim) |       | (Actual) |
=================================================================================================
| 153 |       VIEW                            | NAEHCE             |      2M |     1 |       2M |
| 154 |        HASH JOIN                      |                    |      2M |     1 |       2M |
| 155 |         INDEX FAST FULL SCAN          | TABLEIND1          |   27077 |     1 |    28320 |
| 156 |         VIEW                          | VW_JF_SET$86BE946E |      2M |     1 |       2M |

Spot as well that when the optimizer opted for a HASH JOIN operation the VIEW PUSHED PREDICATE operation and the JPPD (JOIN PREDICATE PUSH DOWN) underlying transformation cease to used because it is occurs only with NESTED LOOP.

Bottom line: always try to supply Oracle with fresh and representative statistics and let it do its job. Don’t pre-empt it from doing its normal work by systematically hinting it when confronted to a performance issue. And when you decide to use hints make sure to hint correctly particularly for the outer (build) table and the inner(probe) table in case of NESTED LOOPS (HASH JOIN) hinted operation.

June 23, 2015

Real Time SQL Monitoring oddity

Filed under: Oracle,Sql Plan Managment — hourim @ 1:45 pm

This is a small note about a situation I have encountered and which I thought it is worth sharing with you. There was an insert/select executing in parallel DOP 16 on a 11.2.0.3 Oracle database for which the end user was complaining about the exceptional time it was taking without completing. Since the job was still running I tried getting its Real Time SQL monitoring report:


Global Information
------------------------------
 Status              :  DONE (ERROR)        
 Instance ID         :  1                   
 Session             :  XXXXX (392:229)    
 SQL ID              :  bbccngk0nn2z2       
 SQL Execution ID    :  16777216            
 Execution Started   :  06/22/2015 11:57:06 
 First Refresh Time  :  06/22/2015 11:57:06 
 Last Refresh Time   :  06/22/2015 11:57:46 
 Duration            :  40s                 

Global Stats
=================================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency |  Other   | Buffer | Read | Read  | Write | Write |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) |  Gets  | Reqs | Bytes | Reqs  | Bytes |
=================================================================================================
|   15315 |   15220 |       54 |        0.38 |       40 |     2G | 8601 |   2GB |  5485 |   1GB |

The insert/select according to the above report summary is DONE with (ERROR).
So why the end user is still complaining about the not ending batch job? And why he didn’t receive an error?

After having ruled out the resumable time out hypothesis I came back to the v$sql_monitor and issued the following two selects:

SQL> SELECT
  2    sql_id,
  3    process_name,
  4    status
  5  FROM v$sql_monitor
  6  WHERE sql_id = 'bbccngk0nn2z2'
  7  AND status   ='EXECUTING'
  8  ORDER BY process_name ;

SQL_ID        PROCE STATUS
------------- ----- ------------
bbccngk0nn2z2 p000  EXECUTING
bbccngk0nn2z2 p001  EXECUTING
bbccngk0nn2z2 p002  EXECUTING
bbccngk0nn2z2 p003  EXECUTING
bbccngk0nn2z2 p004  EXECUTING
bbccngk0nn2z2 p005  EXECUTING
bbccngk0nn2z2 p006  EXECUTING
bbccngk0nn2z2 p007  EXECUTING
bbccngk0nn2z2 p008  EXECUTING
bbccngk0nn2z2 p009  EXECUTING
bbccngk0nn2z2 p010  EXECUTING
bbccngk0nn2z2 p011  EXECUTING
bbccngk0nn2z2 p012  EXECUTING
bbccngk0nn2z2 p013  EXECUTING
bbccngk0nn2z2 p014  EXECUTING
bbccngk0nn2z2 p015  EXECUTING
bbccngk0nn2z2 p019  EXECUTING
bbccngk0nn2z2 p031  EXECUTING

SQL> SELECT
  2    sql_id,
  3    process_name,
  4    status
  5  FROM v$sql_monitor
  6  WHERE sql_id = 'bbccngk0nn2z2'
  7  AND status   ='DONE (ERROR)'
  8  ORDER BY process_name ;

SQL_ID        PROCE STATUS
------------- ----- -------------------
bbccngk0nn2z2 ora   DONE (ERROR)
bbccngk0nn2z2 p016  DONE (ERROR)
bbccngk0nn2z2 p017  DONE (ERROR)
bbccngk0nn2z2 p018  DONE (ERROR)
bbccngk0nn2z2 p020  DONE (ERROR)
bbccngk0nn2z2 p021  DONE (ERROR)
bccngk0nn2z2  p022  DONE (ERROR)
bbccngk0nn2z2 p023  DONE (ERROR)
bbccngk0nn2z2 p024  DONE (ERROR)
bbccngk0nn2z2 p025  DONE (ERROR)
bbccngk0nn2z2 p026  DONE (ERROR)
bbccngk0nn2z2 p027  DONE (ERROR)
bbccngk0nn2z2 p028  DONE (ERROR)
bbccngk0nn2z2 p029  DONE (ERROR)
bbccngk0nn2z2 p030  DONE (ERROR)

Among the 32 parallel servers half are executing and half are in error! How could this be possible? I have already been confronted to a parallel process that ends in its entirety when a single parallel server is in error. For example I have encountered several times the following error which is due to a parallel broadcast distribution of a high data row source exploding henceforth the TEMP tablespace:

ERROR at line 1:
ORA-12801: error signaled in parallel query server P013
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

A simple select against v$active_session_history confirmed that the insert/select is still running and it is consuming CPU

SQL> select sql_id, count(1)
  2  from gv$active_session_history
  3  where sample_time between to_date('22062015 12:30:00', 'ddmmyyyy hh24:mi:ss')
  4                    and     to_date('22062015 13:00:00', 'ddmmyyyy hh24:mi:ss')
  5  group by  sql_id
  6  order by 2 desc;

SQL_ID          COUNT(1)
------------- ----------
bbccngk0nn2z2       2545
                       4
0uuczutvk6jqj          1
8f1sjvfxuup9w          1

SQL> select decode(event,null, 'on cpu', event), count(1)
  2  from gv$active_session_history
  3  where sample_time between to_date('22062015 12:30:00', 'ddmmyyyy hh24:mi:ss')
  4                    and     to_date('22062015 13:00:00', 'ddmmyyyy hh24:mi:ss')
  5  and sql_id = 'bbccngk0nn2z2'
  6  group by  event
  7  order by 2 desc;

DECODE(EVENT,NULL,'ONCPU',EVENT)  COUNT(1)
--------------------------------  ---------
on cpu                            5439
db file sequential read           3

SQL> /

DECODE(EVENT,NULL,'ONCPU',EVENT)  COUNT(1)
--------------------------------- ---------
on cpu                            5460
db file sequential read           3

SQL> /

DECODE(EVENT,NULL,'ONCPU',EVENT)  COUNT(1)
--------------------------------  ---------
on cpu                            5470
db file sequential read           3

And after a while


SQL> /

DECODE(EVENT,NULL,'ONCPU',EVENT)   COUNT(1)
---------------------------------- ---------
on cpu                             15152
db file sequential read            9

While the parallel insert is still running I took several SQL monitoring reports of which the two followings ones:

Parallel Execution Details (DOP=16 , Servers Allocated=32)
============================================================================================
|      Name      | Type  | Server# | Elapsed |Buffer | Read  |         Wait Events         |
|                |       |         | Time(s) | Gets  | Bytes |         (sample #)          |
============================================================================================
| PX Coordinator | QC    |         |    0.48 |  2531 | 16384 |                             |
| p000           | Set 1 |       1 |    1049 |  128M |  63MB | direct path read (1)        |
| p001           | Set 1 |       2 |    1518 |  222M |  61MB |                             |
| p002           | Set 1 |       3 |     893 |  109M |  59MB |                             |
| p003           | Set 1 |       4 |    1411 |  194M |  62MB | direct path read (1)        |
| p004           | Set 1 |       5 |     460 |   64M |  62MB | direct path read (1)        |
| p005           | Set 1 |       6 |     771 |   87M | 322MB | direct path read (1)        |
|                |       |         |         |       |       | direct path read temp (5)   |
| p006           | Set 1 |       7 |     654 |   67M |  62MB | direct path read (1)        |
| p007           | Set 1 |       8 |     179 |   24M |  55MB | direct path read (1)        |
| p008           | Set 1 |       9 |    1638 |  235M |  70MB |                             |
| p009           | Set 1 |      10 |     360 |   46M |  54MB | direct path read (1)        |
| p010           | Set 1 |      11 |    1920 |  294M | 337MB | direct path read temp (6)   | --> 1920s
| p011           | Set 1 |      12 |     289 |   30M |  69MB |                             |
| p012           | Set 1 |      13 |     839 |   98M |  66MB | direct path read (1)        |
| p013           | Set 1 |      14 |     524 |   63M |  55MB |                             |
| p014           | Set 1 |      15 |    1776 |  263M |  69MB |                             |
| p015           | Set 1 |      16 |    1016 |  130M |  61MB | direct path read (1)        |
| p016           | Set 2 |       1 |    0.22 |  1166 |   3MB |                             |
| p017           | Set 2 |       2 |    1.36 |  6867 |  51MB |                             |
| p018           | Set 2 |       3 |    1.02 |  1298 |  36MB |                             |
| p019           | Set 2 |       4 |    6.71 |  2313 | 129MB | direct path read temp (2)   |
| p020           | Set 2 |       5 |    0.40 |   978 |  16MB |                             |
| p021           | Set 2 |       6 |    1.32 |  8639 |  41MB | direct path read temp (1)   |
| p022           | Set 2 |       7 |    0.18 |   896 |   2MB |                             |
| p023           | Set 2 |       8 |    0.23 |   469 |   9MB |                             | --> 0.23s
| p024           | Set 2 |       9 |    0.52 |  3635 |  19MB |                             | --> 0.52s
| p025           | Set 2 |      10 |    0.33 |  1163 |   3MB |                             |
| p026           | Set 2 |      11 |    0.65 |   260 |  31MB | db file sequential read (1) |
| p027           | Set 2 |      12 |    0.21 |  1099 |   6MB |                             |
| p028           | Set 2 |      13 |    0.58 |   497 |  20MB |                             |
| p029           | Set 2 |      14 |    1.43 |  4278 |  54MB |                             |
| p030           | Set 2 |      15 |    0.30 |  3481 |   8MB |                             |
| p031           | Set 2 |      16 |    2.86 |   517 |  91MB |                             |
============================================================================================


Parallel Execution Details (DOP=16 , Servers Allocated=32)
=============================================================================================
|      Name      | Type  | Server# | Elapsed | Buffer | Read  |         Wait Events         |
|                |       |         | Time(s) |  Gets  | Bytes |         (sample #)          |
=============================================================================================
| PX Coordinator | QC    |         |    0.48 |   2531 | 16384 |                             |
| p000           | Set 1 |       1 |    1730 |   202M |  63MB | direct path read (1)        |
| p001           | Set 1 |       2 |    2416 |   351M |  61MB |                             |
| p002           | Set 1 |       3 |    1094 |   133M |  59MB |                             |
| p003           | Set 1 |       4 |    2528 |   348M |  64MB | direct path read (1)        |
| p004           | Set 1 |       5 |     965 |   129M |  63MB | direct path read (1)        |
| p005           | Set 1 |       6 |    1089 |   129M | 322MB | direct path read (1)        |
|                |       |         |         |        |       | direct path read temp (5)   |
| p006           | Set 1 |       7 |    1459 |   165M |  62MB | direct path read (1)        |
| p007           | Set 1 |       8 |     221 |    30M |  55MB | direct path read (1)        |
| p008           | Set 1 |       9 |    2640 |   357M |  70MB |                             |
| p009           | Set 1 |      10 |     952 |   115M |  54MB | direct path read (1)        |
| p010           | Set 1 |      11 |    3117 |   471M | 337MB | direct path read temp (6)   | --> 3117s
| p011           | Set 1 |      12 |     400 |    42M |  69MB |                             |
| p012           | Set 1 |      13 |    1621 |   195M |  66MB | direct path read (1)        |
| p013           | Set 1 |      14 |    1126 |   132M |  55MB |                             |
| p014           | Set 1 |      15 |    2662 |   370M |  72MB |                             |
| p015           | Set 1 |      16 |    1194 |   147M |  61MB | direct path read (1)        |
| p016           | Set 2 |       1 |    0.22 |   1166 |   3MB |                             |
| p017           | Set 2 |       2 |    1.36 |   6867 |  51MB |                             |
| p018           | Set 2 |       3 |    1.02 |   1298 |  36MB |                             |
| p019           | Set 2 |       4 |    6.72 |   2313 | 131MB | direct path read temp (2)   |
| p020           | Set 2 |       5 |    0.40 |    978 |  16MB |                             |
| p021           | Set 2 |       6 |    1.32 |   8639 |  41MB | direct path read temp (1)   |
| p022           | Set 2 |       7 |    0.18 |    896 |   2MB |                             |
| p023           | Set 2 |       8 |    0.23 |    469 |   9MB |                             | --> 0.23s
| p024           | Set 2 |       9 |    0.52 |   3635 |  19MB |                             | --> 0.52s
| p025           | Set 2 |      10 |    0.33 |   1163 |   3MB |                             |
| p026           | Set 2 |      11 |    0.65 |    260 |  31MB | db file sequential read (1) |
| p027           | Set 2 |      12 |    0.21 |   1099 |   6MB |                             |
| p028           | Set 2 |      13 |    0.58 |    497 |  20MB |                             |
| p029           | Set 2 |      14 |    1.43 |   4278 |  54MB |                             |
| p030           | Set 2 |      15 |    0.30 |   3481 |   8MB |                             |
| p031           | Set 2 |      16 |    2.89 |    517 |  92MB |                             |
=============================================================================================

If you look carefully to the above reports you will notice that the elapsed time of the parallel servers mentioned being in ERROR (p16-p30) is not increasing in contrast to the elapsed time of the parallel servers mentioned being in EXECUTION (p0-p15) which is continuously increasing.

Thanks to Randolf Geist (again) I knew that there is a bug in Real time SQL monitoring report which occurs when a parallel server is not working for more than 30 minutes. In such a case the Real time SQL monitoring will starts showing those parallel severs in in ERROR confusing the situation.

As far as I was able to reproduce the issue I started the process again at 16h03 and I kept executing the following select from time to time having no rows for each execution

SELECT 
  sql_id,
  process_name,
  status
FROM v$sql_monitor
WHERE sql_id = '5np4u0m0h69jx' –- changed a little bit the sql_id
AND status   ='DONE (ERROR)'
ORDER BY process_name ;

no rows selected

Until at around 16h37 i.e. after 30 minutes (and a little bit more) of execution the above select started showing processes in error:

SQL> SELECT
  2    sql_id,
  3    process_name,
  4    status
  5  FROM v$sql_monitor
  6  WHERE sql_id = '5np4u0m0h69jx'
  7  AND status   ='DONE (ERROR)'
  8  ORDER BY process_name ;

SQL_ID        PROCE STATUS
------------- ----- ---------------
5np4u0m0h69jx ora   DONE (ERROR)
5np4u0m0h69jx p016  DONE (ERROR)
5np4u0m0h69jx p017  DONE (ERROR)
5np4u0m0h69jx p018  DONE (ERROR)
5np4u0m0h69jx p020  DONE (ERROR)
5np4u0m0h69jx p021  DONE (ERROR)
5np4u0m0h69jx p022  DONE (ERROR)
5np4u0m0h69jx p023  DONE (ERROR)
5np4u0m0h69jx p024  DONE (ERROR)
5np4u0m0h69jx p025  DONE (ERROR)
5np4u0m0h69jx p026  DONE (ERROR)
5np4u0m0h69jx p027  DONE (ERROR)
5np4u0m0h69jx p028  DONE (ERROR)
5np4u0m0h69jx p029  DONE (ERROR)
5np4u0m0h69jx p030  DONE (ERROR)

At the very beginning of the process several parallel servers was not running while several others were busy. And when the first parallel server (p10 in this case) reaches more than 1800 seconds (1861 seconds in this case) the real time Sql monitoring started showing the not working parallel servers in ERROR.

Bottom line: don’t be confused (as I have been) by that DONE (ERROR) status, your SQL statement might still be running consuming time and energy despite this wrong real time SQL monitoring reporting status

June 12, 2015

Why Dynamic Sampling has not been used?

Filed under: Oracle — hourim @ 10:15 am

Experienced tuning guys are known for their pronounced sense of looking at details others are very often ignoring. This is why I am always paying attention to their answers in otn and oracle-l list. Last week I have been asked to look at a query performing badly which has been monitored via the following execution plan:

Global Information
------------------------------
 Status              :  EXECUTING
 Instance ID         :  1
 SQL ID              :  8114dqz1k5arj
 SQL Execution ID    :  16777217            

Global Stats
=============================================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | Cluster  |  Other   | Buffer | Read  | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Waits(s) | Waits(s) |  Gets  | Reqs  | Bytes |
=============================================================================================
|  141842 |  140516 |       75 |        5.82 |       69 |     1176 |    21G | 26123 | 204MB |
=============================================================================================

SQL Plan Monitoring Details (Plan Hash Value=3787402507)
===========================================================================================
| Id   |             Operation             |      Name       |  Rows   | Execs |   Rows   |
|      |                                   |                 | (Estim) |       | (Actual) |
===========================================================================================
|    0 | SELECT STATEMENT                  |                 |         |     1 |          |
|    1 |   SORT ORDER BY                   |                 |       1 |     1 |          |
|    2 |    FILTER                         |                 |         |     1 |          |
|    3 |     NESTED LOOPS                  |                 |         |     1 |        0 |
| -> 4 |      NESTED LOOPS                 |                 |       1 |     1 |       4G |
| -> 5 |       TABLE ACCESS BY INDEX ROWID | TABLEXXX        |       1 |     1 |     214K |
| -> 6 |        INDEX RANGE SCAN           | IDX_MESS_RCV_ID |      2M |     1 |     233K |
| -> 7 |       INDEX RANGE SCAN            | VGY_TEST2       |       1 |  214K |       4G |->
|    8 |      TABLE ACCESS BY INDEX ROWID  | T_TABL_YXZ      |       1 |    4G |        0 |->
|      |                                   |                 |         |       |          |
===========================================================================================

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(TO_DATE(:SYS_B_2,:SYS_B_3)<=TO_DATE(:SYS_B_4,:SYS_B_5))
   5 - filter(("TABLEXXX"."T_NAME"=:SYS_B_6 AND
              "TABLEXXX"."M_TYPE"=:SYS_B_0 AND
              "TABLEXXX"."A_METHOD"=:SYS_B_7 AND
              "TABLEXXX"."M_STATUS"<>:SYS_B_8))
   6 - access("TABLEXXX"."R_ID"=:SYS_B_1)
   7 - access("T_TABL_YXZ"."SX_DATE">=TO_DATE(:SYS_B_2,:SYS_B_3) AND
              "T_TABL_YXZ"."SX_DATE"<=TO_DATE(:SYS_B_4,:SYS_B_5))
   8 - filter("T_TABL_YXZ"."T_ID"="TABLEXXX"."T_ID")

Those 214K and 4G executions (Execs) of operations 7 and 8 respectively are the classical wrong NESTED LOOP join the CBO has decided to go with because of the wrong cardinality estimation at operation n° 5 (the double NESTED LOOP operation is the effect of the NLJ_BATCHING optimisation).
There was no previous historical plan_hash_value for this particular sql_id in order to compare with the current execution plan. But the report has certainly been executed in the past without any complaint from the end user.
The outline_data section of the execution plan is where I usually look when trying to understand what the optimizer has done behind the scene:

Outline Data
-------------
   /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
      DB_VERSION('11.2.0.3')
      OPT_PARAM('_b_tree_bitmap_plans' 'false')
      OPT_PARAM('optimizer_dynamic_sampling' 4) ---------------------------> spot this
      OPT_PARAM('optimizer_index_cost_adj' 20)
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      INDEX_RS_ASC(@"SEL$1" "TABLEXXX"@"SEL$1" ("TABLEXXX"."R_ID"))
      INDEX(@"SEL$1" "T_TABL_YXZ"@"SEL$1" ("T_TABL_YXZ"."SX_DATE"
              "T_TABL_YXZ"."GL_ACCOUNT_ID" "T_TABL_YXZ"."CASH_ACCOUNT_ID"))
      LEADING(@"SEL$1" "TABLEXXX"@"SEL$1" "T_TABL_YXZ"@"SEL$1")
      USE_NL(@"SEL$1" "T_TABL_YXZ"@"SEL$1")
      NLJ_BATCHING(@"SEL$1" "T_TABL_YXZ"@"SEL$1")
      END_OUTLINE_DATA
  */

As you can see apart from the optimizer_index_cost_adj parameter value we should never change, there is one thing that has kept my attention: optimizer_dynamic_sampling. Since the outline is showing that the optimizer has used dynamic sampling why then there is no Note about dynamic sampling at the bottom of the above corresponding execution plan?
I decided to run the same query in a CLONE data base (cloned via RMAN). Below is the corresponding execution plan for the same set of input parameters:

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 SQL ID              :  8114dqz1k5arj
 SQL Execution ID    :  16777217
 Duration            :  904s           

SQL Plan Monitoring Details (Plan Hash Value=2202725716)
========================================================================================
| Id |            Operation             |      Name       |  Rows   | Execs |   Rows   |
|    |                                  |                 | (Estim) |       | (Actual) |
========================================================================================
|  0 | SELECT STATEMENT                 |                 |         |     1 |      280 |
|  1 |   SORT ORDER BY                  |                 |    230K |     1 |      280 |
|  2 |    FILTER                        |                 |         |     1 |      280 |
|  3 |     HASH JOIN                    |                 |    230K |     1 |      280 |
|  4 |      TABLE ACCESS BY INDEX ROWID | T_TABL_YXZ      |    229K |     1 |     301K |
|  5 |       INDEX RANGE SCAN           | VGY_TEST2       |       1 |     1 |     301K |
|  6 |      TABLE ACCESS BY INDEX ROWID | TABLEXXX        |    263K |     1 |       2M |
|  7 |       INDEX RANGE SCAN           | IDX_MESS_RCV_ID |      2M |     1 |       2M |
========================================================================================

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(TO_DATE(:SYS_B_2,:SYS_B_3)<=TO_DATE(:SYS_B_4,:SYS_B_5))
   3 - access("T_TABL_YXZ"."T_ID"="TABLEXXX"."T_ID")
   5 - access("T_TABL_YXZ"."SX_DATE">=TO_DATE(:SYS_B_2,:SYS_B_3) AND
              "T_TABL_YXZ"."SX_DATE"<=TO_DATE(:SYS_B_4,:SYS_B_5))
   6 - filter(("TABLEXXX"."T_NAME"=:SYS_B_6 AND
              "TABLEXXX"."M_TYPE"=:SYS_B_0 AND
              "TABLEXXX"."A_METHOD"=:SYS_B_7 AND
              "TABLEXXX"."M_STATUS"<>:SYS_B_8))
   7 - access("TABLEXXX"."R_ID"=:SYS_B_1)

Note
-----
   - dynamic sampling used for this statement (level=4)

In this CLONED database, in contrast to the Production database, the optimizer has used dynamic sampling at its level 4 and has come up with a different estimation when visiting TABLEXXX (263K instead of 1) and T_TABL_YXZ (229K instead of 1) tables so that it has judiciously opted for a HASH JOIN instead of that dramatic production NESTED LOOP operation making the query completing in 904 seconds.

The fundamental question turns then from why the report is performing badly to why the optimizer has ignored using dynamic sampling at level 4?

There are several ways to answer this question (a) 10053 trace file, (b) 10046 or (c) trace file or tracing directly dynamic sampling as it has been suggested to me by Stefan Koehler

SQL> alter session set events 'trace[RDBMS.SQL_DS] disk=high';

The corresponding 10053 optimizer trace shows the following lines related to dynamic sampling:

10053 of the COPY database

*** 2015-06-03 11:05:43.701
** Executed dynamic sampling query:
    level : 4
    sample pct. : 0.000489
    actual sample size : 837
    filtered sample card. : 1
    orig. card. : 220161278
    block cnt. table stat. : 6272290
    block cnt. for sampling: 6345946
    max. sample block cnt. : 32
sample block cnt. : 31
min. sel. est. : 0.00000000
** Using single table dynamic sel. est. : 0.00119474
  Table: TABLEXXX  Alias: TABLEXXX
    Card: Original: 220161278.000000  Rounded: 263036  Computed: 263036.17  Non Adjusted: 263036.17

In the COPY data base, the optimiser has used dynamic sampling at level 4 and did come up with a cardinality estimation of TABLEXXX of be 263K which obviously has conducted the CBO to opt for a reasonable HASH JOIN operation.

10053 of the PRODUCTION database

*** 2015-06-03 13:39:03.992
** Executed dynamic sampling query:
    level : 4
    sample pct. : 0.000482
    actual sample size : 1151
    filtered sample card. : 0  ------------------>  spot this information
    orig. card. : 220161278
    block cnt. table stat. : 6272290
    block cnt. for sampling: 6435970
    max. sample block cnt. : 32
sample block cnt. : 31
min. sel. est. : 0.00000000
** Not using dynamic sampling for single table sel. or cardinality.
DS Failed for : ----- Current SQL Statement for this session (sql_id=82x3mm8jqn5ah) -----
  Table: TABLEXXX  Alias: TABLEXXX
    Card: Original: 220161278.000000  Rounded: 1  Computed: 0.72  Non Adjusted: 0.72

In the PRODUCTION database, the CBO failed to use dynamic sampling at level 4 as clearly shown by the following line taken from the above 10053 trace file:

** Not using dynamic sampling for single table sel. or cardinality.
DS Failed for : ----- Current SQL Statement for this session (sql_id=82x3mm8jqn5ah)

PS: 10053 trace file has been applied on the important part of the query this is
    why the sql_id is not the same as the one mentioned above.

Thanks to Randolf Geist I learnt that the internal code of the Dynamic Sampling algorithm is so that when the predicate part has been applied on a sample of the TABLEXXX it returned 0 rows

filtered sample card. : 0

which is the reason why the optimizer has ignored Dynamic sampling at level 4 and falls back to the available object statistics producing a 1 row cardinality estimation and henceforth a dramatic wrong NESTED LOOP operation. By the way, should I have been in 12c database release the STATISTICS COLLECTOR placed above the first operation in the NESTED LOOP join would have reached the inflexion point and would have, hopefully, switched to a HASH JOIN operation during execution time.

A quick solution to this very critical report was to up the level of the dynamic sampling to a higher value. And, as far as this query belongs to a third party software I decided to use Kerry Osborne script in order to inject a dynamic sampling hint as shown below:

SQL>@create_1_hint_sql_profile.sql
Enter value for sql_id: 8114dqz1k5arj
Enter value for profile_name (PROFILE_sqlid_MANUAL):
Enter value for category (DEFAULT):
Enter value for force_matching (false): true
Enter value for hint: dynamic_sampling(6)
Profile PROFILE_8114dqz1k5arj_MANUAL created.

Once this done, the end user re-launched the report which completed within 303 seconds instead of those not ending 141,842 seconds


Global Information
------------------------------
 Status              :  DONE (ALL ROWS)
 Instance ID         :  1
 SQL ID              :  8114dqz1k5arj
 SQL Execution ID    :  16777216
 Execution Started   :  06/10/2015 11:40:39
 First Refresh Time  :  06/10/2015 11:40:45
 Last Refresh Time   :  06/10/2015 11:45:39
 Duration            :  300s           

SQL Plan Monitoring Details (Plan Hash Value=2202725716)
========================================================================================
| Id |            Operation             |      Name       |  Rows   | Execs |   Rows   |
|    |                                  |                 | (Estim) |       | (Actual) |
========================================================================================
|  0 | SELECT STATEMENT                 |                 |         |     1 |     2989 |
|  1 |   SORT ORDER BY                  |                 |    234K |     1 |     2989 |
|  2 |    FILTER                        |                 |         |     1 |     2989 |
|  3 |     HASH JOIN                    |                 |    234K |     1 |     2989 |
|  4 |      TABLE ACCESS BY INDEX ROWID | T_TABL_YXZ      |    232K |     1 |     501K |
|  5 |       INDEX RANGE SCAN           | VGY_TEST2       |       1 |     1 |     501K |
|  6 |      TABLE ACCESS BY INDEX ROWID | TABLEXXX        |    725K |     1 |       2M |
|  7 |       INDEX RANGE SCAN           | IDX_MESS_RCV_ID |      2M |     1 |       2M |
========================================================================================

Note
-----
   - dynamic sampling used for this statement (level=6)
   - SQL profile PROFILE_8114dqz1k5arj_MANUAL used for this statement

June 6, 2015

SUBQ INTO VIEW FOR COMPLEX UNNEST

Filed under: Oracle — hourim @ 8:42 am

If you are a regular reader of Jonathan Lewis blog you will have probably came across this article in which the author explains why an “OR subquery” pre-empts the optimizer from unnesting the subquery and merging it with its parent query for a possible optimal join path. This unnesting impossibility is so that the “OR subquery” is executed as a FILTER predicate which when applied on a huge data row set penalizes dramatically the performance of the whole query. In the same article, you will have hopefully also learned how by re-writing the query using a UNION ALL (and taking care of the always threatening NULL via the LNNVL() function) you can open a new path for the CBO allowing an unnest of the subquery.

Unfortunately, nowadays there is a massive expansion of third party software where changing SQL code is not possible so that I hoped that the optimizer was capable to automatically re-factor a disjunctive subquery and consider unnesting it using the UNION ALL workaround.

I was under that impression that this hope is never exhausted by the optimizer until last week when I have received from my friend Ahmed Aangour an e-mail showing a particular disjunctive subquery which has been unnested by the optimizer without any rewrite of the original query by the developer. I have found the case very interesting so that I decided to model it and to share it with you. Take a look to the query and the execution plan first in 11.2.0.2 (the table script is supplied at the end of the article)

SQL> alter session set statistics_level=all;

SQL> alter session set optimizer_features_enable='11.2.0.2';

SQL> select
 a.id1
 ,a.n1
 ,a.start_date
from t1 a
where (a.id1 in
 (select
 b.id
 from t2 b
 where
 b.status = 'COM'
 )
 OR
 a.id1 in
 (select
 c.id1
 from t2 c
 where
 c.status = 'ERR'
 )
 );

SQL> select * from table(dbms_xplan.display_cursor(null,null, ‘allstats last’));

-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |   9890 |00:00:02.23 |     742K|<--
|*  1 |  FILTER            |      |      1 |        |   9890 |00:00:02.23 |     742K|
|   2 |   TABLE ACCESS FULL| T1   |      1 |  10000 |  10000 |00:00:00.01 |    1686 |
|*  3 |   TABLE ACCESS FULL| T2   |  10000 |      1 |   9890 |00:00:02.16 |     725K|
|*  4 |   TABLE ACCESS FULL| T2   |    110 |      1 |      0 |00:00:00.05 |   15400 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
 1 - filter(( IS NOT NULL OR IS NOT NULL))
 3 - filter(("B"."ID"=:B1 AND "B"."STATUS"='COM'))
 4 - filter(("C"."ID1"=:B1 AND "C"."STATUS"='ERR'))

The double full access to table t2 plus the FILTER operation indicate clearly that the OR clause has not been combined with the parent query. If you want to know what is behind the filter predicate n°1 above then the “not so famous” explain plan for command will help in this case:

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   975 | 15600 |   462   (0)| 00:00:01 |
|*  1 |  FILTER            |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   156K|   462   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T2   |     1 |     8 |    42   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS FULL| T2   |     1 |     7 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( EXISTS (SELECT 0 FROM "T2" "B" WHERE "B"."ID"=:B1 AND
              "B"."STATUS"='COM') OR  EXISTS (SELECT 0 FROM "T2" "C" WHERE
              "C"."ID1"=:B2 AND "C"."STATUS"='ERR'))
   3 - filter("B"."ID"=:B1 AND "B"."STATUS"='COM')
   4 - filter("C"."ID1"=:B1 AND "C"."STATUS"='ERR')

Notice how the subquery has been executed as a FILTER operation which sometimes (if not often) represents a real performance threat. 

However, when I‘ve executed the same query under optimizer 11.2.0.3 I got the following interesting execution plan

SQL> alter session set optimizer_features_enable='11.2.0.3';

--------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |          |      1 |        |   9890 |00:00:00.03 |    1953 |<--
|*  1 |  HASH JOIN            |          |      1 |   5000 |   9890 |00:00:00.03 |    1953 |
|   2 |   VIEW                | VW_NSO_1 |      1 |   5000 |   9890 |00:00:00.01 |     282 |
|   3 |    HASH UNIQUE        |          |      1 |   5000 |   9890 |00:00:00.01 |     282 |
|   4 |     UNION-ALL         |          |      1 |        |   9900 |00:00:00.01 |     282 |
|*  5 |      TABLE ACCESS FULL| T2       |      1 |   2500 |     10 |00:00:00.01 |     141 |
|*  6 |      TABLE ACCESS FULL| T2       |      1 |   2500 |   9890 |00:00:00.01 |     141 |
|   7 |   TABLE ACCESS FULL   | T1       |      1 |  10000 |  10000 |00:00:00.01 |    1671 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("A"."ID1"="ID1")
   5 - filter("C"."STATUS"='ERR')
   6 - filter("B"."STATUS"='COM')

Notice now how the new plan is showing a HASH JOIN operation between an internal view( VW_NSO_1) and table t1 coming from the parent query block. Notice as well the HASH JOIN condition (access(“A”.”ID1″=”ID1″)) that appears in filter n°1. The optimizer has done a double transformation:

  • created an internal view VW_NSO_1  representing a UNION-ALL between the two subqueries present in the where clause
  • joined the newly online created view with table t1 present in the parent query block

Looking at the corresponding 10053 trace file I have found how the CBO has transformed the initial query:

select a.id1 id1,
  a.n1 n1,
  a.start_date start_date
from (
  (select c.id1 id1 from c##mhouri.t2 c where c.status='ERR')
union
  (select b.id id from c##mhouri.t2 b where b.status='COM')
     ) vw_nso_1,
  c##mhouri.t1 a
where a.id1= vw_nso_1.id1;

In fact the optimizer has first combined the two subqueries into a VIEW and finished by UNNESTING them with the parent query. This is a transformation which Oracle optimizer seems to name : SUBQ INTO VIEW FOR COMPLEX UNNEST

In the same 10053 trace file we can spot the following lines:

*****************************
Cost-Based Subquery Unnesting
*****************************
Query after disj subq unnesting:******* UNPARSED QUERY IS *******

SU:   Transform an ANY subquery to semi-join or distinct.
Registered qb: SET$7FD77EFD 0x15b5d4d0 (SUBQ INTO VIEW FOR COMPLEX UNNEST SET$E74BECDC)

SU: Will unnest subquery SEL$3 (#2)
SU: Will unnest subquery SEL$2 (#3)
SU: Reconstructing original query from best state.
SU: Considering subquery unnest on query block SEL$1 (#1).
SU:   Checking validity of unnesting subquery SEL$2 (#3)
SU:   Checking validity of unnesting subquery SEL$3 (#2)
Query after disj subq unnesting:******* UNPARSED QUERY IS *******

SU:   Checking validity of unnesting subquery SET$E74BECDC (#6)
SU:   Passed validity checks.

This is a clear enhancement made in the optimizer query transformation that will help improving performance of disjunctive subqueries automatically without any external intervention.

Unfortunately, I was going to end this article until I’ve realized that although I am testing this case under 12.1.0.1.0 database release I still have not executed the same query under optimizer feature 12.1.0.1.0

SQL> alter session set optimizer_features_enable='12.1.0.1.1';
SQL > execute query
-------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |   9890 |00:00:03.84 |     716K|
|*  1 |  FILTER            |      |      1 |        |   9890 |00:00:03.84 |     716K|
|   2 |   TABLE ACCESS FULL| T1   |      1 |  10000 |  10000 |00:00:00.01 |    1686 |
|*  3 |   TABLE ACCESS FULL| T2   |  10000 |      2 |   9890 |00:00:03.81 |     715K|
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter( IS NOT NULL)
   3 - filter((("B"."ID1"=:B1 AND "B"."STATUS"='ERR') OR ("B"."ID"=:B2 AND
              "B"."STATUS"='COM')))

The automatic unnesting of the disjunctive subquery has been removed in the 12.1.0.1.1 optimizer model.

If you want to reproduce and test this case here below is the model (I would be interested to see if the disjunctive subquery is unnested or not in the 12.1.0.1.2 release )

create table t1
   as select
    rownum                id1,
    trunc((rownum-1/3))   n1,
    date '2012-06-07' + mod((level-1)*2,5) start_date,
    lpad(rownum,10,'0')   small_vc,
    rpad('x',1000)        padding
from dual
connect by level <= 1e4;   

create table t2
as select
    rownum id
    ,mod(rownum,5) + mod(rownum,10)* 10  as id1
    ,case
       when mod(rownum, 1000) = 7 then 'ERR'
       when rownum <= 9900 then 'COM'
       when mod(rownum,10) between 1 and 5 then 'PRP'
     else
       'UNK'
     end status
     ,lpad(rownum,10,'0')    as small_vc
     ,rpad('x',70)           as padding
from dual
connect by level <= 1e4;

alter table t1 add constraint t1_pk primary key (id1);

May 25, 2015

Extended Statistics Part I : histogram effect

Filed under: Statistics — hourim @ 3:37 pm

Extended statistic, also known as column group extension, is one of the important statistic improvements introduced with Oracle 11g. While Oracle Cost Based Optimizer is able to get a correct single column selectivity estimation, it is, however, unable to figure out the cardinality of a conjunction of two or more correlated columns present in a query predicate. A column group extension calculated for this conjunction of columns aims to help the CBO figuring out this columns correlation in order to get an accurate estimation. But there are cases where the CBO refuses to use a column group extension. This article aims to show one of those cases via a concrete example.

The scene

Below is the table and its unique index on which I am going to show you when the CBO will not use the column group extension:

create table t_ext_stat
  ( dvpk_id    number(10) not null
  , vpk_id     number(10) not null
  , layer_code varchar2(1 char) not null
  , dvpk_day   date not null
  , cre_date   date not null
  , cre_usr    varchar2(40 char) not null
  , mod_date   date not null
  , mod_usr    varchar2(40 char) not null
 );

create unique index t_ext_uk_i on t_ext_stat(vpk_id, layer_code, dvpk_day);

And this is the query I will be using all over the article

select
  count(1)
from
  t_ext_stat
where
  vpk_id = 63148
and
  layer_code = 'R';

 COUNT(1)
----------
 338

The two columns in the predicate part, layer_code and vpk_id are compared against an equality which makes them candidate for a column group extension; but let’s see first how skew are these two columns starting by the layer_code
vpk_id

The layer_code column has 4 distinct values with two popular ones: R (400,087) and S (380,069) which can be captured via a frequency histogram.

The vpk_id does not present such a noticeable skewness in its data scattering as shown by its representative chart:
vpk_idIt has 4947 distinct values ranging from a vpk_id value of 62866 with 1456 occurrences to a vpk_id value 62972 with a single occurrence

SQL> select
       vpk_id
      ,count(1)
    from
      t_ext_stat
    group by
      vpk_id
    order by 2 desc;

    VPK_ID   COUNT(1)
---------- ----------
     62866       1456
     62953       1456
     63528       1456
     63526       1456
     63518       1456
     62947       1456
     62850       1456
     62849       1456
     62851       1456
     62954       1456
     64362       1452
     64538       1424
     64483       1358
….
     63207          1
     63021          1
     62972          1

4947 rows selected.

Extended Statistics and histogram

In order to create a column group  extension we need to make a call to the following piece of code:

SQL> SELECT
         dbms_stats.create_extended_stats
         (ownname   => user
         ,tabname   => 't_ext_stat'
         ,extension =>'(vpk_id,layer_code)'
         )
    FROM dual;

which will create a virtual column (SYS_STUMVIRBZA6_$QWEX6DE2NGQA1) supporting the two predicate column correlation.

Next, I will gather statistics with histogram for all t_ext_stat table columns including the above newly created virtual one:

BEGIN

dbms_stats.gather_table_stats
 (user
 ,'t_ext_stat'
 ,method_opt => 'for all columns size auto'
 ,cascade => true
 ,no_invalidate => false
 );
END;
/

And let’s check the collected columns statistics

SQL> SELECT
       column_name
      ,num_distinct
      ,density
      ,histogram
    FROM
       user_tab_col_statistics
    WHERE
       table_name = 'T_EXT_STAT'
    AND
      column_name in ('VPK_ID','LAYER_CODE','SYS_STUMVIRBZA6_$QWEX6DE2NGQA1');

COLUMN_NAME                    NUM_DISTINCT    DENSITY HISTOGRAM
------------------------------ ------------ ---------- ---------------
SYS_STUMVIRBZA6_$QWEX6DE2NGQA1         4967 .000201329  NONE
LAYER_CODE                                4  6.2471E-07 FREQUENCY
VPK_ID                                 2862 .000349406  NONE

As expected a skew has been identified on the layer_code column and therefore a frequency histogram has been gathered on it to indicate this skewness. There is nevertheless two remarks which seems to be worth to mention:

  • Since one of the column group extension has a histogram why the extension itself has not been identified as a skewed column as well?
  • What happens in this particular case where there is no histogram on the extension and a histogram on one of the column forming the extension?

It is easy to answer the first question by looking directly at  the column group scattering chart presented below:

ColumnGroupWhere we can notice that the extension does not present a skewness in its data scattering. In fact the extension has 10,078 distinct values where the most popular value appears 728 times while the less popular appears only once:

SQL> select
        to_char(SYS_STUMVIRBZA6_$QWEX6DE2NGQA1) extension
       ,count(1)
     from
       t_ext_stat
     group by
       SYS_STUMVIRBZA6_$QWEX6DE2NGQA1
      order by 2 desc;

EXTENSION               COUNT(1)
--------------------- ----------
10113707817839868275         728
6437420856234749785          728
6264201076174478674          728
7804673458963442057          728
2433504440213765306          728
6976215179539283979          728
493591537539092624           728

6710977030485345437            1
18158393637293365880           1
5275318825200713603            1
13895660777899711317           1

This is a clear demonstration that it is not because there is a massive skew in one column forming the extension that the resulting column group combination will necessary present a skew. This is particularly true when the other column has a large number of dictint values (> 254)

But you might wonder why one has to care about this absence of histogram in the extension?  Christian Antognini has already answered this question in this article where he wrote “be careful of extensions without histograms. They might be bypassed by the query optimizer”.  In fact if one of the columns forming the extension has a histogram while the extension itself has no histogram then the Optimizer will not use the extension.

Here below is a demonstration of this claim taken from this current model:

select
   count(1)
from
   t_ext_stat
where vpk_id = 63148
and layer_code = 'R';

COUNT(1)
----------
338

SQL_ID  d26ra17afbfyh, child number 0
-------------------------------------
-------------------------------------------------------------------
| Id  | Operation         | Name       | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |      1 |        |      1 |
|   1 |  SORT AGGREGATE   |            |      1 |      1 |      1 |
|*  2 |   INDEX RANGE SCAN| T_EXT_UK_I |      1 |    142 |    338 |
-------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("VPK_ID"=63148 AND "LAYER_CODE"='R')

How can we prove that Oracle didn’t used the extension to compute the 142  estimated rows when accessing the underlying index table? Well by looking at the corresponding 10053 trace file

Access path analysis for T_EXT_STAT
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T_EXT_STAT[T_EXT_STAT]
SPD: Directive valid: dirid = 17542990197075222359, state = 5, flags = 1, loc = 1 {EC(98564)[2, 3]}
SPD: Return code in qosdDSDirSetup: EXISTS, estType = TABLE
Column (#2): VPK_ID(NUMBER)
  AvgLen: 5 NDV: 2862 Nulls: 0 Density: 0.000000 Min: 0.000000 Max: 62849.000000
Column (#3):
   NewDensity:0.002043, OldDensity:0.000001 BktCnt:5873.000000, PopBktCnt:5873.000000, PopValCnt:4, NDV:4

Column (#3): LAYER_CODE(VARCHAR2)
    AvgLen: 2 NDV: 4 Nulls: 0 Density: 0.000000
    Histogram: Freq  #Bkts: 4  UncompBkts: 5873  EndPtVals: 4  ActualVal: no 

Column (#9): SYS_STUMVIRBZA6_$QWEX6DE2NGQA1(NUMBER)
    AvgLen: 12 NDV: 4967 Nulls: 0 Density: 0.000000 Min: 0.000000 Max: 1980066.000000
ColGroup (#2, Index) T_EXT_UK_I
    Col#: 2 3 4    CorStregth: -1.00
ColGroup (#1, VC) SYS_STUMVIRBZA6_$QWEX6DE2NGQA1
    Col#: 2 3    CorStregth: 2.30

ColGroup Usage:: PredCnt: 2  Matches Full:  Partial:
Table: T_EXT_STAT  Alias: T_EXT_STAT
Card: Original: 803809.000000  Rounded: 142  Computed: 141.74  Non Adjusted: 141.74

If Oracle has used the extension to compute the 142 estimated rows it will have then used the following formula:

E-rows = num_rows(t_ext_stat) * 1/(NDV(SYS_STUMVIRBZA6_$QWEX6DE2NGQA1))
E-rows = 803809 * 1/(4967) = 161.829877

Another clue showing that the Optimizer didn’t used the extension is visible in the above 10053 trace file as well via the following lines:

ColGroup Usage:: PredCnt: 2  Matches Full:  Partial:

where the Matches Full and Partial information are null.
As mentioned by Christian’s article there is a fix to what seems to be identified as a bug we can set to make Oracle using the extension:

SQL> alter session set "_fix_control"="6972291:ON";

SQL> alter session set events '10053 trace name context forever, level 1';

SQL> select
      count(1)
    from
     t_ext_stat
    where
      vpk_id = 63148
    and
      layer_code = 'R';

  COUNT(1)
----------
       338

SQL> alter session set events '10053 trace name context off';

Below is the corresponding execution plan (with a new estimation 162) and the part of the 10053 trace file related to the extension

============
Plan Table
============
---------------------------------------+-----------------------------------+
| Id  | Operation          | Name      | Rows  | Bytes | Cost  | Time      |
---------------------------------------+-----------------------------------+
| 0   | SELECT STATEMENT   |           |       |       |     3 |           |
| 1   |  SORT AGGREGATE    |           |     1 |     7 |       |           |
| 2   |   INDEX RANGE SCAN | T_EXT_UK_I|   162 |  1134 |     3 |  00:00:01 |
---------------------------------------+-----------------------------------+
Predicate Information:
----------------------
2 - access("VPK_ID"=63148 AND "LAYER_CODE"='R')

=====================================
Access path analysis for T_EXT_STAT
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T_EXT_STAT[T_EXT_STAT]
SPD: Directive valid: dirid = 17542990197075222359, state = 5, flags = 1, loc = 1 {EC(98564)[2, 3]}
SPD: Return code in qosdDSDirSetup: EXISTS, estType = TABLE
  Column (#2): VPK_ID(NUMBER)
    AvgLen: 5 NDV: 2899 Nulls: 0 Density: 0.000000 Min: 0.000000 Max: 62849.000000
  Column (#3):
    NewDensity:0.001753, OldDensity:0.000001 BktCnt:6275.000000, PopBktCnt:6275.000000, PopValCnt:4, NDV:4
  Column (#3): LAYER_CODE(VARCHAR2)
    AvgLen: 2 NDV: 4 Nulls: 0 Density: 0.000000
    Histogram: Freq  #Bkts: 4  UncompBkts: 6275  EndPtVals: 4  ActualVal: no
  Column (#9): SYS_STUMVIRBZA6_$QWEX6DE2NGQA1(NUMBER)
    AvgLen: 12 NDV: 4985 Nulls: 0 Density: 0.000000 Min: 0.000000 Max: 1980066.000000
  ColGroup (#2, Index) T_EXT_UK_I
    Col#: 2 3 4    CorStregth: -1.00
  ColGroup (#1, VC) SYS_STUMVIRBZA6_$QWEX6DE2NGQA1
    Col#: 2 3    CorStregth: 2.33
  ColGroup Usage:: PredCnt: 2  Matches Full: #1  Partial:  Sel: 0.0002
  Table: T_EXT_STAT  Alias: T_EXT_STAT
    Card: Original: 806857.000000  Rounded: 162  Computed: 161.86  Non Adjusted: 161.86

Where we can notice that, this time, the CBO has used the extension to compute its rows estimation as far as 162 comes from the following formula:

E-rows = num_rows(t_ext_stat) * 1/(NDV(SYS_STUMVIRBZA6_$QWEX6DE2NGQA1))
E-rows = 806857* 1/(4985) = 161.856971 --> rounded to 162

But instead of setting the fix I would have preferred to delete histogram from the layer_code column so that both the extension and its column combination will not have histogram:

SQL> exec dbms_stats.gather_table_stats(user ,'t_ext_stat', method_opt => 'for all columns size 1');

SQL> SELECT
       column_name
      ,num_distinct
      ,density
      ,histogram
    FROM
       user_tab_col_statistics
    WHERE
       table_name = 'T_EXT_STAT'
    AND
      column_name in ('VPK_ID','LAYER_CODE','SYS_STUMVIRBZA6_$QWEX6DE2NGQA1');

COLUMN_NAME                    NUM_DISTINCT    DENSITY HISTOGRAM
------------------------------ ------------ ---------- ----------
SYS_STUMVIRBZA6_$QWEX6DE2NGQA1         5238 .000190913 NONE
LAYER_CODE                                4        .25 NONE
VPK_ID                                 2982 .000335345 NONE

In which case the extension would be used as shown below:

-------------------------------------------------------------------
| Id  | Operation         | Name       | Starts | E-Rows | A-Rows |
-------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |      1 |        |      1 |
|   1 |  SORT AGGREGATE   |            |      1 |      1 |      1 |
|*  2 |   INDEX RANGE SCAN| T_EXT_UK_I |      1 |    154 |    338 |
-------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("VPK_ID"=63148 AND "LAYER_CODE"='R')

 Column (#2): VPK_ID(NUMBER)
    AvgLen: 5 NDV: 2982 Nulls: 0 Density: 0.000000 Min: 0.000000 Max: 62849.000000
  Column (#3): LAYER_CODE(VARCHAR2)
    AvgLen: 2 NDV: 4 Nulls: 0 Density: 0.000000
  Column (#9): SYS_STUMVIRBZA6_$QWEX6DE2NGQA1(NUMBER)
    AvgLen: 12 NDV: 5238 Nulls: 0 Density: 0.000000
  ColGroup (#2, Index) T_EXT_UK_I
    Col#: 2 3 4    CorStregth: -1.00
  ColGroup (#1, VC) SYS_STUMVIRBZA6_$QWEX6DE2NGQA1
    Col#: 2 3    CorStregth: 2.28
  ColGroup Usage:: PredCnt: 2  Matches Full: #1  Partial:  Sel: 0.0002
  Table: T_EXT_STAT  Alias: T_EXT_STAT
    Card: Original: 807515.000000  Rounded: 154  Computed: 154.16  Non Adjusted: 154.16

Where it is clearly shown that the extension has been used

E-rows = num_rows(t_ext_stat) * 1/(NDV(SYS_STUMVIRBZA6_$QWEX6DE2NGQA1))
E-rows = 807515 * 1/(5238) = 154.164758 --> rounded to 154

Notice by the way, that despite the extension has been used, the estimation is not as good as expected (154 instead of 338). An explanation of this discrepancy might come from the very weak correlation strenght that exist between the layer_code and the vpk_id (CorStregth: 2.30) which will be considered in a separate article

The bottom line of this article is : be careful about collecting histogram when you have the intention to use extended statistics. It is not necessary for the extension to present a skew if one of the columns from the combination has a histogram. And in such case Oracle will bypass the extension.

May 12, 2015

Index Efficiency

Filed under: Index — hourim @ 7:03 am

I used Jonathan Lewis script to locate degenerated indexes –-or indexes that are occupying more space than they should–. Among those indexes I have isolated this one:

16:20:33:TABLE1 - PK_TAB1
Current Leaf blocks: 2,846,555 Target size:1,585,492

According to this SQL script the above index possesses 2.8 million worth of leaf blocks while it should normally occupy half this number of blocks.

The sys_op_lb_id function when applied on this index gives the following average leaf block per index key picture:

sys_op_lbid

ROWS_PER_BLOCK BLOCKS
-------------- ----------
 2              1
 7              1
 27             1
 32             1
 92             1
 94             1
 103            1
 107            1
 108            1
 111            1
 112            800
 113            1,627,529
……
 422            980,894
 423            40
 432            1
 434            1
 448            5496
 449            32803
 450            7
 456            3
 458            1
 466            1
 478            54
 479            200
 487            1
 ----------    -----------
sum             2,979,747

Spot that odd value of 1.6 million leaf blocks (out of a total of 2,9 million) we have to visit to get only 113 index keys. Add to this the other 980,984 leaf blocks we need to visit to get an extra 422 index keys and you might end up by approximatively having to visit the entire index leaf blocks to get only a couple of hundred of index keys. That is a completely degenerated index.

Let’s then rebuild it and check if we will get back the reclaimed space:

SQL> alter index PK_TAB1 rebuild parallel 8;

SQL> alter index PK_TAB1 noparallel;

SQL> break on report skip 1

SQL> compute sum of blocks on report

SQL> select
        rows_per_block,
        count(*) blocks
     from
       (
        select
            /*+
              cursor_sharing_exact
              dynamic_sampling(0)
              no_monitoring
              no_expand
              index_ffs(t1,t1_i1)
              noparallel_index(t,t1_i1)
             */
        sys_op_lbid( &m_ind_id ,'L',t1.rowid) as block_id,
        count(*) as rows_per_block
      from
        TABLE1 t1
      where
        tab_id is not null
      group by
       sys_op_lbid( &m_ind_id ,'L',t1.rowid)
     )
   group by rows_per_block
   order by rows_per_block
   ;
Enter value for m_ind_id: 53213
Enter value for m_ind_id: 53213

ROWS_PER_BLOCK BLOCKS
-------------- ----------
 26            1
 206           1
 208           1
 243           1
 249           1
 272           1
 316           1
 339           1
 422           1,558,800
 423           53
 432           1
 448           5496
 449           32803
 458           1
 478           54
 479           200
 487           1
 ----------------------
sum          1,597,417

Notice the new number of index leaf block we’ve got after rebuilding the index (1,597,417) and compare it with the number predicted by Jonathan Lewis script (1,585,492). That’s really very accurate. The initial estimation is almost 100% accurate. In passing the new index size has been reduced by at factor of 46%.

While rebuilding the index has reduced drastically the number of leaf blocks and the disk space they occupy, that odd value of 1,558,800 leaf blocks we have to visit to get only 422 index keys is still present. This has prompted me to try coalescing the index even though I was not very confident that such a high number of leaf blocks could be merged with adjacent leaf blocks making the index less smashed.

SQL> alter index PK_TAB1 coalesce;

ROWS_PER_BLOCK    BLOCKS
-------------- ----------
           26          1
          206          1
          208          1
          243          1
          249          1
          272          1
          316          1
          339          1
          422          1,558,800
          423          53
          432          1
          448          5496
          449          32803
          458          1
          478          54
          479          200
          487          1
              -------------
sum              1,597,417

Definitely this primary key index has a strange way of being filled up which I have to figure out with the Java developers.

The bottom line of this article is that Jonathan Lewis script locating degenerated index is amazingly precise.

April 29, 2015

Real time SQL monitoring limitation

Filed under: Oracle — hourim @ 3:56 pm

I was trying to explain a performance deterioration of a very complex query honoured via an execution plan with 386 operations (386 lines). From where would someone start deciphering this complex and big execution plan without the help of a Real Time SQL monitoring report? As far as this query took 2 hours to complete it is fairly likely that Oracle has monitored it. Unfortunately a select against v$sql_monitor view didn’t returned any rows for this particular sql_id. What came to my mind in front of this situation is that the report has been flushed from memory due to a stress on the library cache. Hopefully, I was able to get the bind variable and re-execute the same query. While the query was running I opened a sqlplus window and run this:

SQL> select sql_id from v$sql_monitor where status = 'EXECUTING';
no rows selected

The query was still runing after a couple of minutes but was still not monitored. I was suspecting the number of operations in the execution plan but has no way to proof the correlation between this number of lines and the absence of the monitoring

Plan hash value: 1504525856
----------------------------------------------------------------------
| Id  | Operation                                                    |
----------------------------------------------------------------------
|   0 | SELECT STATEMENT                                             |
|   1 |  UNION-ALL                                                   |
|   2 |   SORT UNIQUE                                                |
|   3 |    MERGE JOIN CARTESIAN                                      |
|   4 |     MERGE JOIN CARTESIAN                                     |
|   5 |      NESTED LOOPS                                            |
|*  6 |       HASH JOIN OUTER                                        |
|   7 |        MERGE JOIN CARTESIAN                                  |
|   8 |         NESTED LOOPS OUTER                                   |
|*  9 |          HASH JOIN OUTER                                     |
|* 10 |           HASH JOIN OUTER                                    |
|* 11 |            HASH JOIN OUTER                                   |
|* 12 |             HASH JOIN OUTER                                  |
|* 13 |              HASH JOIN OUTER                                 |
.../...
|*202 |                                               HASH JOIN      |
| 203 | OUTER                                          NESTED LOOPS  |
| 204 |  OUTER                                          NESTED LOOPS |
|*205 |                                                  HASH JOIN   |
|*206 | OUTER                                             HASH JOIN  |
.../...
| 383 |          BUFFER SORT                                         |
| 384 |           PX RECEIVE                                         |
| 385 |            PX SEND BROADCAST                                 |
| 386 |             TABLE ACCESS FULL                                |
----------------------------------------------------------------------

Spot in passing where the OUTER at line 51 and 53 has been placed.

Google being a good friend I asked him and he directed me to this article where Doug Burns pointed that there is a hidden parameter(_sqlmon_max_planlines) which fixes the maximum number of lines an execution plan must not exceed to be, all other things being equal, monitored.

I decided then to give it a try and I have altered this parameter to accept monitoring my 386 plan operations:

SQL> alter session set "_sqlmon_max_planlines" = 400;

Session altered.

And to my pleasant surprise I found that my query started being monitored


SQL Monitoring Report

Global Information
------------------------------
 Status              :  EXECUTING
 Instance ID         :  1
 Session             :  xxxx (626:3043)
 SQL ID              :  315sc2w0cy05w
 SQL Execution ID    :  16777216
 Execution Started   :  04/29/2015 11:29:39
 First Refresh Time  :  04/29/2015 11:29:46
 Last Refresh Time   :  04/29/2015 11:35:26
 Duration            :  348s
 Module/Action       :  sqldeveloper64W.exe/-
 Service             :  SYS$USERS
 Program             :  sqldeveloper64W.exe   

 SQL Plan Monitoring Details (Plan Hash Value=1504525856)
==============================================================================================
| Id    |                                     Operation                                      |
|       |                                                                                    |
==============================================================================================
|     0 | SELECT STATEMENT                                                                   |
|     1 |   UNION-ALL                                                                        |
|     2 |    SORT UNIQUE                                                                     |
|     3 |     MERGE JOIN CARTESIAN                                                           |
|     4 |      MERGE JOIN CARTESIAN                                                          |
|     5 |       NESTED LOOPS                                                                 |
|     6 |        HASH JOIN OUTER                                                             |
|     7 |         MERGE JOIN CARTESIAN                                                       |

|   383 |           BUFFER SORT                                                              |
|   384 |            PX RECEIVE                                                              |
|   385 |             PX SEND BROADCAST                                                      |
|   386 |              TABLE ACCESS FULL                                                     |
==============================================================================================

And now the serious stuff can start :-)

April 18, 2015

Parallel refreshing a materialized view

Filed under: Materialized view — hourim @ 4:58 pm

I have been asked to troubleshoot a monthly on demand materialized view refresh job which has got the bad idea to crash with the ORA-01555 error after 25,833 seconds (more than 7 hours) of execution. Despite my several years of professional experience this is the first time I have been asked to look at a materialized view refresh. This issue came up Friday afternoon so I was given a week-end to familiarize myself with materialized views. Coincidentally a couple of days before there was an Oracle webcast on Materialized view basics, architecture and internal working  which I have replayed on Saturday and practiced its demo. Christian Antognini book contains a chapter on this topic which I have also gone through as far as Christian book is from where I always like to start when trying to learn an Oracle concept.

Materialized view capabilities

The following Monday morning, armed with this week-end accelerated auto-training, I opened again the e-mail I have been sent about the failing refresh job and started re-reading it. The first thing that has retained my attention this time, in contrast to my last Friday quick pass through reading, was a suggestion made by the DBA to try fast refreshing the materialized view instead of completely refreshing it. I learnt from the Oracle webcast that Oracle is able to let us know wether a materialized view can be fast (also know as incremental) refreshed or not. Here below the steps to do if you want to get this information:

You need first to create the mv_capabilities_table table (in the schema you are going to use against it the dbms_mview package) using the following script :

SQL> $ORACLE_HOME/rdbms/admin/utlxmv.sql

SQL> select * from mv_capabilities_table;
no rows selected

Once this table created you can execute the dbms_mview.explain_mview package as shown below:

SQL> exec dbms_mview.explain_mview ('my_materialied_mv');

PL/SQL procedure successfully completed.

SQL> select
  2     mvname
  3    ,capability_name
  4    ,possible
  5  from
  6    mv_capabilities_table
  7  where
  8     mvname = 'MY_MATERIALIED_MV'
  9  and
 10    capability_name  like '%REFRESH%';

MVNAME                         CAPABILITY_NAME                P
------------------------------ ------------------------------ -
MY_MATERIALIED_MV              REFRESH_COMPLETE               Y  
MY_MATERIALIED_MV              REFRESH_FAST                   N --> spot this
MY_MATERIALIED_MV              REFRESH_FAST_AFTER_INSERT      N
MY_MATERIALIED_MV              REFRESH_FAST_AFTER_INSERT      N
MY_MATERIALIED_MV              REFRESH_FAST_AFTER_INSERT      N
MY_MATERIALIED_MV              REFRESH_FAST_AFTER_INSERT      N
MY_MATERIALIED_MV              REFRESH_FAST_AFTER_ONETAB_DML  N
MY_MATERIALIED_MV              REFRESH_FAST_AFTER_ANY_DML     N
MY_MATERIALIED_MV              REFRESH_FAST_PCT               N

As spotted above, fast refreshing this materialized view is impossible.

The first learned lesson: instead of trying the create a materialized view log and fast refreshing a complex materialized view which might be impossible to be refreshed incrementally, try first getting the capabilities of the view using the explain_mview procedure. You will certainly save time and resource.

SQL> SELECT
         refresh_method
       , refresh_mode
       , staleness
       , last_refresh_type
       , last_refresh_date
    FROM
          user_mviews
    WHERE mview_name = 'MY_MATERIALIED_MV';

REFRESH_ REFRES STALENESS           LAST_REF LAST_REFRES
-------- ------ ------------------- -------- --------------------
COMPLETE DEMAND NEEDS_COMPILE       COMPLETE 02-APR-2015 16:16:35

Parallel clause in the SQL create statement : any effect on the mview creation?

Since I have ruled out an incremental refresh I decided to get the materialized view definition so that I can investigate its content

SQL> SELECT
       replace (dbms_metadata.get_ddl(replace(
                                      OBJECT_TYPE, ' ', '_'),    
                                      OBJECT_NAME,OWNER)
                    ,'q#"#'
                    ,'q#''#'
                    )
     FROM DBA_OBJECTS
     WHERE OBJECT_TYPE = 'MATERIALIZED VIEW'
     AND object_name = 'MY_MATERIALIED_MV';

------------------------------------------------------------------
CREATE MATERIALIZED VIEW MY_MATERIALIED_MV
   ({list of columns}) 
  TABLESPACE xxxx
  PARALLEL 16 –----------------------------------> spot this
  BUILD IMMEDIATE
  USING INDEX
  REFRESH COMPLETE ON DEMAND
  USING DEFAULT LOCAL ROLLBACK SEGMENT
  USING ENFORCED CONSTRAINTS DISABLE QUERY REWRITE
AS
-- select n°1
 SELECT
    {list of columns}
 FROM
  {list of tables}
 WHERE
  {list of predicates}
 GROUP BY
  {list of columns}
.../...
UNION ALL
-- select n°5
SELECT
    {list of columns}
 FROM
  {list of tables}
 WHERE
  {list of predicates}
GROUP BY
  {list of columns} ;

Have you noticed that parallel 16 clause in the materialized view create script? The developer intention was to create the materialized view using parallel process. Having a Production equivalent database I was happy enough to try re-creating this materialized view:

SQL> set timing on

SQL> start ddl_mv1.sql

Materialized view created.

Elapsed: 00:22:33.52

Global Information
------------------------------
 Status              :  DONE               
 Instance ID         :  1                  
 Session             :  XZYY (901:25027)  
 SQL ID              :  f9s6kdyysz84m      
 SQL Execution ID    :  16777216           
 Execution Started   :  04/16/2015 09:49:22
 First Refresh Time  :  04/16/2015 09:49:23
 Last Refresh Time   :  04/16/2015 10:11:48
 Duration            :  1346s              
 Module/Action       :  SQL*Plus/-         
 Service             :  XZYY
 Program             :  sqlplus.exe         

Global Stats
========================================================================
| Elapsed |   Cpu   |    IO    | Buffer | Read | Read  | Write | Write |
| Time(s) | Time(s) | Waits(s) |  Gets  | Reqs | Bytes | Reqs  | Bytes |
========================================================================
|   20338 |    5462 |    14205 |    63M |   3M | 716GB |    2M | 279GB |
========================================================================

Parallel Execution Details (DOP=16 , Servers Allocated=32)

SQL Plan Monitoring Details (Plan Hash Value=853136481)
==================================================================================================
| Id  |                       Operation            | Name    |  Rows   | Execs |   Rows   |Temp  |
|     |                                            |         | (Estim) |       | (Actual) |(Max) |
==================================================================================================
|   0 | CREATE TABLE STATEMENT                     |         |         |    33 |       16 |      |
|   1 |   PX COORDINATOR                           |         |         |    33 |       16 |      |
|   2 |    PX SEND QC (RANDOM)                     | :TQ10036|         |    16 |       16 |      |
|   3 |     LOAD AS SELECT                         |         |         |    16 |       16 |      |
|   4 |      UNION-ALL                             |         |         |    16 |     117M |      |
|   5 |       HASH GROUP BY                        |         |    259M |    16 |      58M |  36G |
|   6 |        PX RECEIVE                          |         |    259M |    16 |     264M |      |
|   7 |         PX SEND HASH                       | :TQ10031|    259M |    16 |     264M |      |
|   8 |          HASH JOIN RIGHT OUTER BUFFERED    |         |    259M |    16 |     264M |  61G |
|   9 |           PX RECEIVE                       |         |      4M |    16 |       4M |      |
|  10 |            PX SEND HASH                    | :TQ10013|      4M |    16 |       4M |      |
|  11 |             PX BLOCK ITERATOR              |         |      4M |    16 |       4M |      |
|     |                                            |         |         |       |          |      |
| 180 |                PX RECEIVE                  |         |     19M |    16 |      20M |      |
| 181 |                 PX SEND HASH               | :TQ10012|     19M |    16 |      20M |      |
| 182 |                  PX BLOCK ITERATOR         |         |     19M |    16 |      20M |      |
| 183 |                   TABLE ACCESS FULL        | TABLE_M |     19M |   268 |      20M |      |
==================================================================================================

Surprisingly the materialized view has been created in less than 23 minutes. And this creation has been parallelised with a DOP of 16 as shown by the corresponding Real Time Sql Monitoring report (RTSM).The master table has been henceforth created with a DOP of 16 as shown below:

SQL> select
  2    table_name
  3   ,degree
  4  from
  5    user_tables
  6  where table_name = 'MY_MATERIALIED_MV';

TABLE_NAME                     DEGREE
------------------------------ ----------
MY_MATERIALIED_MV               16

A simple select against the created materialized view will go parallel as well

SQL> select count(1) from MY_MATERIALIED_MV;               

SQL Plan Monitoring Details (Plan Hash Value=3672954679)
============================================================================================
| Id |          Operation          |           Name           |  Rows   | Execs |   Rows   |
|    |                             |                          | (Estim) |       | (Actual) |
============================================================================================
|  0 | SELECT STATEMENT            |                          |         |     1 |        1 |
|  1 |   SORT AGGREGATE            |                          |       1 |     1 |        1 |
|  2 |    PX COORDINATOR           |                          |         |    17 |       16 |
|  3 |     PX SEND QC (RANDOM)     | :TQ10000                 |       1 |    16 |       16 |
|  4 |      SORT AGGREGATE         |                          |       1 |    16 |       16 |
|  5 |       PX BLOCK ITERATOR     |                          |    104M |    16 |     117M |
|  6 |        MAT_VIEW ACCESS FULL | MY_MATERIALIED_MV        |    104M |   191 |     117M |
============================================================================================

You might have already pointed out in the above RTSM report that the select part of the “create as select” statement has been parallelised as well. It is as if the parallel 16 clause of the “create” part of the SQL  materialized view script induced implicitly its  “select” part to be done in parallel with a DOP of 16.

Parallel clause in the SQL create statement : any effect on the mview refresh ?

As far as I am concerned, the problem I have been asked to trouble shoot resides in refreshing the materialized view and not in creating it.  Since, the materialized view has been created in 23 minutes, I should be optimistic for its refresh time; isn’t it?

SQL> exec dbms_mview.refresh ('MY_MATERIALIED_MV','C',atomic_refresh=>FALSE);

After more than 4,200 seconds of execution time I finally gave up and decided to stop this refresh. Below is an overview of its corresponding Real Time Sql Monitoring (RTSM) report:

Global Information
------------------------------
 Status              :  DONE (ERROR) --> I have cancelled it after more than 1 hour   
 Instance ID         :  1                  
 Session             :  XZYY (901:25027)  
 SQL ID              :  d5n03tuht2cg8      
 SQL Execution ID    :  16777216           
 Execution Started   :  04/16/2015 10:55:46
 First Refresh Time  :  04/16/2015 10:55:52
 Last Refresh Time   :  04/16/2015 12:06:39
 Duration            :  4253s               
 Module/Action       :  SQL*Plus/-         
 Service             :  XZYY
 Program             :  sqlplus.exe         

Global Stats
===================================================================================
| Elapsed |   Cpu   |    IO    |  Other   | Buffer | Read | Read  | Write | Write |
| Time(s) | Time(s) | Waits(s) | Waits(s) |  Gets  | Reqs | Bytes | Reqs  | Bytes |
===================================================================================
|    4253 |    1640 |     2563 |       50 |    53M | 824K | 227GB |  570K | 120GB |
===================================================================================

SQL Plan Monitoring Details (Plan Hash Value=998958099)
=============================================================================
| Id |                  Operation                   |Name |  Rows   | Cost  |
|    |                                              |     | (Estim) |       |
=============================================================================
|  0 | INSERT STATEMENT                             |     |         |       |
|  1 |   LOAD AS SELECT                             |     |         |       |
|  2 |    UNION-ALL                                 |     |         |       |
|  3 |     HASH GROUP BY                            |     |    259M |       |
|  4 |      CONCATENATION                           |     |         |       |
|  5 |       NESTED LOOPS OUTER                     |     |       7 |  4523 |
|  6 |        NESTED LOOPS OUTER                    |     |       7 |  4495 |
|  7 |         NESTED LOOPS                         |     |       7 |  4474 |
|  8 |          NESTED LOOPS                        |     |       7 |  4460 |
|  9 |           PARTITION REFERENCE ALL            |     |       7 |  4439 |
…/…

In contrast to the creation process, the materialized view refresh has been done serially. This confirms that the above parallel 16 clause in the create DDL script concerns only the parallel materialized view creation and not its refresh process.

The second learned lesson : I think that a parallel clause specified in the create statement of a materialized view is not used during the refresh of the same materialized view.  The parallel run is considered in this kind of situations only at the materialized view creation time.

dbms_mview.refresh and its parallelism parameter : any effect on the mview refresh ?

The tables on which the materialized view is based have all a degree = 1

 SQL> select
  2      table_name
  3    , degree
  4  from user_tables
  5  where trim(degree) <> '1';

TABLE_NAME            DEGREE
--------------------- -------
MY_MATERIALIED_MV     16

Having said that, what if I try refreshing this materialized view using the parallelism parameter of the dbms_mview.refresh procedure as shown below:

SQL> exec dbms_mview.refresh ('MY_MATERIALIED_MV','C', atomic_refresh=>FALSE, parallelism =>16);

SQL Plan Monitoring Details (Plan Hash Value=998958099)
==========================================================================================
| Id |                  Operation                   |           Name           |  Rows   |
|    |                                              |                          | (Estim) |
==========================================================================================
|  0 | INSERT STATEMENT                             |                          |         |
|  1 |   LOAD AS SELECT                             |                          |         |
|  2 |    UNION-ALL                                 |                          |         |
|  3 |     HASH GROUP BY                            |                          |    259M |
|  4 |      CONCATENATION                           |                          |         |
|  5 |       NESTED LOOPS OUTER                     |                          |       7 |
|  6 |        NESTED LOOPS OUTER                    |                          |       7 |
|  7 |         NESTED LOOPS                         |                          |       7 |
|  8 |          NESTED LOOPS                        |                          |       7 |
|  9 |           PARTITION REFERENCE ALL            |                          |       7 |
| 10 |            TABLE ACCESS BY LOCAL INDEX ROWID | TABLE_XX_ZZ              |       7 |
../..
| 94 |           PARTITION RANGE ALL                |                          |    369M |
| 95 |            PARTITION LIST ALL                |                          |    369M |
| 96 |             TABLE ACCESS FULL                | TABLE_AA_BB_123          |    369M |
==========================================================================================

As confirmed by the above corresponding RTSM report, the parallelism parameter has not been obeyed and the refresh has been done serially in this case as well.

The third learned lesson : using the parameter parallelism of the dbms_mview.refresh procedure has no effect on the parallel refresh of the underlying materialized view.

Adding a parallel hint in the select part of the mview : any effect on the mview refresh ?

At this stage of the troubleshooting process I have emphasized the following points:

  • The parallel clause used in the create statement of a materialized view is considered only during the materialized view creation. This parallel clause is ignored during the refresh process
  • The parallelism parameter of the dbms_mview.refresh procedure will not refresh the materialized view in parallel

Now that I have ruled out all the above steps I was almost convinced that to expedite the refresh process I need to add a parallel hint directly in the materialized view definition (ddl_mv2.sql):

CREATE MATERIALIZED VIEW MY_MATERIALIED_MV
   ({list of columns}) 
  TABLESPACE xxxx
  PARALLEL 16
  BUILD IMMEDIATE
  USING INDEX
  REFRESH COMPLETE ON DEMAND
  USING DEFAULT LOCAL ROLLBACK SEGMENT
  USING ENFORCED CONSTRAINTS DISABLE QUERY REWRITE
AS
 SELECT /*+ parallel(8) pq_distribute(tab1 hash hash)*/
    {list of columns}
 FROM
  {list of tables}
 WHERE
  {list of predicates}
 GROUP BY
  {list of columns}
UNION ALL
 SELECT /*+ parallel(8) pq_distribute(tab1 hash hash)*/
    {list of columns}
 FROM
  {list of tables}
 WHERE
  {list of predicates}
 GROUP BY
    {list of columns}
;

Having changed the select part of materialized view DDL script I launched again it creation which completes in 25 minutes as shown below:

SQL> start ddl_mv2.sql
Materialized view created.
Elapsed: 00:25:05.37

And immediately after the creation I launched the refresh process :

SQL> exec dbms_mview.refresh ('MY_MATERIALIED_MV','C',atomic_refresh=>FALSE);

PL/SQL procedure successfully completed.
Elapsed: 00:26:11.12

And hopefully this time the refresh completed in 26 minutes thanks to the parallel run exposed below in the corresponding RTSM report:

Global Information
------------------------------
 Status              :  DONE               
 Instance ID         :  1                  
 Session             :  XZYY
 SQL ID              :  1w1v742mr35g3      
 SQL Execution ID    :  16777216           
 Execution Started   :  04/16/2015 13:38:13
 First Refresh Time  :  04/16/2015 13:38:13
 Last Refresh Time   :  04/16/2015 14:04:24
 Duration            :  1571s              
 Module/Action       :  SQL*Plus/-         
 Service             :  XZYY            
 Program             :  sqlplus.exe         

Parallel Execution Details (DOP=8, Servers Allocated=80)

SQL Plan Monitoring Details (Plan Hash Value=758751629)
===============================================================================
| Id  |                       Operation          |           Name   |  Rows   |
|     |                                          |                  | (Estim) |
===============================================================================
|   0 | INSERT STATEMENT                         |                  |         |
|   1 |   LOAD AS SELECT                         |                  |         |
|   2 |    UNION-ALL                             |                  |         |
|   3 |     PX COORDINATOR                       |                  |         |
|   4 |      PX SEND QC (RANDOM)                 | :TQ10005         |    259M |
|   5 |       HASH GROUP BY                      |                  |    259M |
| 177 |                PX RECEIVE                |                  |     19M |
| 178 |                 PX SEND HASH             | :TQ50004         |     19M |
| 179 |                  PX BLOCK ITERATOR       |                  |     19M |
| 180 |                   TABLE ACCESS FULL      | TABLE_KZ_YX      |     19M |
===============================================================================

I’ve added the pq_distribute (tab1 hash hash) hint above because several refreshes crashed because of the broadcast distribution that ended up by overconsuming TEMP space raising the now classical error:

ERROR at line 484:
ORA-12801: error signaled in parallel query server P012
ORA-01652: unable to extend temp segment by 128 in tablespace TEMP

The fourth learned lesson : if you want to parallelise your materialized view refresh process you had better to include the parallel hint in the select part of the materialized view. This is better than to change the parallel degree of the tables on which the materialized view is based on.

April 8, 2015

The dark side of using bind variables : sharing everything

Filed under: Tuning — hourim @ 5:48 pm

An interesting academic situation happened last week which I, honestly believe, is worth a blog article as far as experienced DBA have spent time trying to solve it without success. An overnight job was running for hours in the night from 01/04 to 02/04. The on call DBA spent all the night killing and re-launching the job (sql_id) several attempts without any success. When I arrived at work the next day I was asked to help. As far as this job was still running, I generated the Real Time SQL monitoring report (RTSM) for the corresponding sql_id which showed the classical NESTED LOOP having a huge outer data row set driving an inner data set in which at least 50 different operations have been started 519K times while one operation has been executed 2M times. The corresponding execution plan contains 213 operations. The underlying query uses 628 user bind variables and 48 system generated bind variables (thanks to cursor sharing set to FORCE)

SQL Plan Monitoring Details (Plan Hash Value=1511784243)

Global Information
------------------------------
 Status              :  EXECUTING               
 Instance ID         :  2                       
 Session             :  xxxxx (350:9211)   
 SQL ID              :  dmh5vhkcm877v           
 SQL Execution ID    :  33554436                
 Execution Started   :  04/02/2015 07:52:03     
 First Refresh Time  :  04/02/2015 07:52:47     
 Last Refresh Time   :  04/02/2015 10:04:28     
 Duration            :  7947s                   
 Module/Action       :  wwwww
 Service             :  zzzzz               
 Program             :  wwwww  
 DOP Downgrade       :  100%    

Global Stats
===================================================================================
| Elapsed |   Cpu   |    IO    | Application | Concurrency | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   |  Waits(s)   |  Gets  | Reqs | Bytes |
====================================================================================
|    7900 |    7839 |       20 |        0.00 |        0.82 |   243M | 5946 | 659MB |
====================================================================================

The above 7,839 seconds spent consuming CPU with almost no user wait time represents the classical wrong NESTED LOOP operation starting many times several inner operations as mentioned above.

The job was running without any sign of improvement, the client was waiting for its critical report and I have a query with 700 bind variables honored via an execution plan of 213 operations to figure out how to make this report finishing smoothly as soon as possible.

I was dissecting the execution plan when the end user send me an e-mail saying that the same job ran successfully yesterday within 6 min. With that information in mind I have managed to get the RTSM of the yesterday successful job. The first capital information was that the yesterday query and the today not ending one used the same plan_hash_value (same execution plan). Comparing the 628 input bind variable values of both runs, I found that the yesterday job ran for a one month period (monthly job) while the current job is running for a one day interval (daily job).Of course the end user has not supplied any information about the kind of job they are currently running compared to the previous one. All what I have been told is the yesterday job completed in 6 minutes. It is only until I’ve found the difference in the input bind variable values that the end user said “the current run is for the daily job while the previous one was for the monthly job”.

And the sun starts rising. I was able to figure out that the two set of bind variables are not doing the same amount of work and sharing the same execution plan is probably not a good idea.  This is why I have suggested the DBA to do the following:

  • Kill the not ending session
  • Purge the sql_id from the shared pool
  • Ask the end user to re-launch the job
  • Cross fingers :-)

And you know what?  The job completed within a couple of hundreds of seconds:

SQL Plan Monitoring Details (Plan Hash Value=2729107228)

Global Information
------------------------------
 Status              :  DONE (ALL ROWS)         
 Instance ID         :  2                       
 Session             :  xxxxx (1063:62091) 
 SQL ID              :  dmh5vhkcm877v           
 SQL Execution ID    :  33554437                
 Execution Started   :  04/02/2015 10:43:17     
 First Refresh Time  :  04/02/2015 10:43:20     
 Last Refresh Time   :  04/02/2015 10:47:38     
 Duration            :  261s                    
 Module/Action       :  wwwww
 Service             :  zzzzz
 Program             :  wwwww
 Fetch Calls         :  57790     

Global Stats
==============================================================================
| Elapsed |   Cpu   |    IO    | Concurrency | Fetch | Buffer | Read | Read  |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   | Calls |  Gets  | Reqs | Bytes |
==============================================================================
|     134 |     107 |     7.10 |          17 | 57790 |    18M | 7857 | 402MB |
==============================================================================

This is the dark side of using bind variable: when sharing resource we share also execution plan. The current daily job was running using the plan optimized for the monthly job. The solution was to force the CBO compiling a new execution plan for the new input bind variable. The new plan (2729107228) is still showing 200 operations and several operations started 578K times. I have the intention to study both execution plans to know exactly from where the enhancement is coming. The clue here might be that in the first shared monthly execution plan the query, for a reason I am unable to figure out, run serially

 DOP Downgrade       :  100%   

While the new hard parsed execution has been executed in parallel:

 Parallel Execution Details (DOP=4 , Servers Allocated=20)

Bottom Line: when you have the intention to run a critical report once per day (and once per month) then it is worth to let the CBO compiling a new execution plan for each execution. All what you will have is one hard parse for one execution. This will never hurt from a memory and CPU point of view

Next Page »

The Rubric Theme. Create a free website or blog at WordPress.com.

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Diagnostician

Performance troubleshooting as exact science

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

EU Careers info

Your career in the European Union

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

Tanel Poder's Performance & Troubleshooting blog

Linux, Oracle, Exadata and Hadoop.

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)

Follow

Get every new post delivered to your Inbox.

Join 143 other followers