Suppose you are confronted to a performance issue due to an initially optimized parallel execution plan that has been 100% DOP downgraded. Suppose also that this DOP downgrade is due to an Oracle Resource Manager max DOP limit directive.
Would you be able to point out this DOP downgrade root cause using gv$sql_monitor view? Would you be able as well to say, a posteriori (using AWR tables), that your issue was due to ORM max DOP limit?
This blog post aims to answer the above two questions.
1.Replaying the issue using a reproducible model
While it is not so straightforward to link a real-life performance issue to a DOP downgrade due to a resource manager maxim allowed DOP, it is nevertheless, very easy to model and reproduce this case at will.
RDEV> create table t1 as select a.* from dba_objects a,dba_objects where rownum<= 1e6; RDEV2> alter table t1 parallel 4; RDEV2> select count(1) from t1 where owner >= 'A'; Parallel Execution Details (DOP=4 , Servers Allocated=4) =================================================================== | Name | Type | Server# | Elapsed | Cpu | IO | | | | | Time(s) | Time(s) | Waits(s) | =================================================================== | PX Coordinator | QC | | 0.12 | 0.01 | | | p000 | Set 1 | 1 | 0.20 | 0.03 | 0.15 | | p001 | Set 1 | 2 | 0.20 | 0.03 | 0.15 | | p002 | Set 1 | 3 | 0.20 | 0.04 | 0.15 | | p003 | Set 1 | 4 | 0.20 | 0.04 | 0.12 | =================================================================== ------------------------------------------------------------------- | Id | Operation | Name | Rows | TQ |IN-OUT| ------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | | | 1 | SORT AGGREGATE | | 1 | | | | 2 | PX COORDINATOR | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | Q1,00 | P->S | | 4 | SORT AGGREGATE | | 1 | Q1,00 | PCWP | | 5 | PX BLOCK ITERATOR | | 1000K| Q1,00 | PCWC | |* 6 | TABLE ACCESS FULL| T1 | 1000K| Q1,00 | PCWP | ------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 6 - access(:Z>=:Z AND :Z<=:Z) filter("OWNER">='A') Note ----- - Degree of Parallelism is 4 because of table property
As you can see, the Degree Of Parallelism 4 has been honored and the SQL query has been executed using Parallelism.
However, if under the hood, I create an Oracle Resource Manager which will limit the maximum DOP allowed to be used by the current user (RDEV) then the same query will be DOP downgraded causing performance pain like
in one of my real-life case (a blog post will follow):
BEGIN -- clear pending area DBMS_RESOURCE_MANAGER.clear_pending_area(); DBMS_RESOURCE_MANAGER.create_pending_area(); -- create resource plan DBMS_RESOURCE_MANAGER.create_plan( plan => 'DOPDOWNGRAD_RPL', comment => 'Resource Plan limiting DOP'); -- create consumer group DBMS_RESOURCE_MANAGER.create_consumer_group( consumer_group => 'DOPDONWGRAD_RSG', comment => 'LOW Parallel Priority group'); -- assign the DOP limitation to the consumer group DBMS_RESOURCE_MANAGER.create_plan_directive ( plan => 'DOPDOWNGRAD_RPL', group_or_subplan => 'DOPDONWGRAD_RSG', comment => 'LOW PDOP Priority', parallel_degree_limit_p1 => 1); -- This is mandatory to avoid ORA-29377: consumer group OTHER_GROUPS is not part of top-plan DOPDOWNGRAD_RPL DBMS_RESOURCE_MANAGER.CREATE_PLAN_DIRECTIVE( PLAN => 'DOPDOWNGRAD_RPL', GROUP_OR_SUBPLAN => 'OTHER_GROUPS', MGMT_P1 => 10, COMMENT => 'Directive for OTHER_GROUPS (mandatory)'); -- validate the resource plan DBMS_RESOURCE_MANAGER.validate_pending_area; DBMS_RESOURCE_MANAGER.submit_pending_area(); END; /
The preceding anonymous PL/SLQ block creates the DOPDOWNGRAD_RPL resource plan, attaches to it the DOPDONWGRAD_RSG consumer group and assigns a directive to this consumer group so that maximum DOP allowed by this consumer group would be 1.
To finish the setup, I will assign the user RDEV to this consumer group and set the DOPDOWNGRAD_RPL plan as the default resource plan:
-- Affect user RDEV to resource group BEGIN DBMS_RESOURCE_MANAGER.CREATE_PENDING_AREA(); DBMS_RESOURCE_MANAGER.SET_CONSUMER_GROUP_MAPPING( ATTRIBUTE => DBMS_RESOURCE_MANAGER.ORACLE_USER, VALUE => 'RDEV', CONSUMER_GROUP => 'DOPDONWGRAD_RSG'); DBMS_RESOURCE_MANAGER.VALIDATE_PENDING_AREA(); DBMS_RESOURCE_MANAGER.SUBMIT_PENDING_AREA(); END; /
We are now ready to run again the same query and check whether it has been executed parallelly or it has been pre-empted by the DOPDOWNGRAD_RPL resource plan to use parallelism:
RDEV> alter system set resource_manager_plan='DOPDOWNGRAD_RPL'; System altered. RDEV> show parameter resource_manager_plan NAME TYPE VALUE ----------------------- ----------- ---------------- resource_manager_plan string DOPDOWNGRAD_RPL
Let’s execute the same query using RDEV user. Remember that this user belongs to an oracle resource manager group with a parallel degree limit directive as shown above:
SQL> SELECT /*+ monitor */ count(1) from t1 where owner >= 'A'; COUNT(1) ---------- 1000000 RDEV> @xpsimp SQL_ID b60wdaqys7wf9, child number 0 ------------------------------------- SELECT /*+ monitor */ count(1) from t1 where owner >= 'A' Plan hash value: 3110199320 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | TQ |IN-OUT| PQ Distrib | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | | | | | 1 | SORT AGGREGATE | | 1 | 4 | | | | | 2 | PX COORDINATOR | | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | 4 | Q1,00 | P->S | QC (RAND) | | 4 | SORT AGGREGATE | | 1 | 4 | Q1,00 | PCWP | | | 5 | PX BLOCK ITERATOR | | 1000K| 3906K| Q1,00 | PCWC | | |* 6 | TABLE ACCESS FULL| T1 | 1000K| 3906K| Q1,00 | PCWP | | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 6 - access(:Z>=:Z AND :Z<=:Z) filter("OWNER">='A')
The absence of the Note line at the bottom of the execution plan represents, already, a first indication that the query has not been executed using parallelism. Indeed, this is confirmed via the corresponding SQL Monitoring report:
Global Information ------------------------------ Status : DONE (ALL ROWS) Instance ID : 1 Session : RDEV (273:24189) SQL ID : b60wdaqys7wf9 SQL Execution ID : 16777216 Execution Started : 11/07/2019 09:02:53 First Refresh Time : 11/07/2019 09:02:53 Last Refresh Time : 11/07/2019 09:02:53 Duration : .414398s Module/Action : SQL*Plus/- Service : diiqazbx Program : sqlplus.exe DOP Downgrade : 100% --> here Fetch Calls : 1
The ACTIVE type of the same SQL Monitoring report shows the reason:
Other Plan Line Statistics Min DOP after downgrade 1 downgrade reason 352
Thanks to Yasin Baskan, I knew that the description of the downgrade reason codes differs from release to release. We need to use the following query in the corresponding database (12cR2 in the current case) to get the actual significance of each code:
-- +-----------------------------------| -- query supplied by Yasin Baskan -- SQL> desc x$qksxa_reason -- Name Type -- -------------- ---------------| -- ADDR RAW(8) -- INDX NUMBER -- INST_ID NUMBER -- CON_ID NUMBER -- QKSXAREASONS VARCHAR2(255) -- +-----------------------------------| col qksxareasons format a100 col indx format 999 SELECT indx ,qksxareasons FROM x$qksxa_reason WHERE qksxareasons like '%DOP downgrade%'; INDX QKSXAREASONS ---- ----------------------------------------------------- 351 DOP downgrade due to adaptive DOP 352 DOP downgrade due to resource manager max DOP 353 DOP downgrade due to insufficient number of processes 354 DOP downgrade because slaves failed to join
We finally, have found the real root cause of the simulated performance issue: 352 DOP downgrade due to resource manager max DOP
2. Back to the aim of the article
The first questions that came to my mind when I have started troubleshooting the DOP Downgrade real-life case were:
• Can I find any Oracle Resource manager related information in the gv$sql_monitor view ?
• Can I find historical (AWR) execution statistics of any active Oracle Resource manager actions ?
2.1. v$sql_monitor
As per regards to the first question, hopefully the gv$sql_monitor view contains 4 columns which seem to be strictly related to the Oracle resource manager as shown below with column name starting by RM_:
SQL> @desc gv$sql_monitor Name Null? Type ----------------------------------------- -------- ---------------------------- INST_ID NUMBER KEY NUMBER REPORT_ID NUMBER STATUS VARCHAR2(19) USER# NUMBER USERNAME VARCHAR2(128) ../.. RM_LAST_ACTION VARCHAR2(48) RM_LAST_ACTION_REASON VARCHAR2(128) RM_LAST_ACTION_TIME DATE RM_CONSUMER_GROUP VARCHAR2(128)
Let’s see then what Oracle has stored in those four RM_ columns:
RDEV> SELECT decode(process_name, 'ora', 'QC', process_name) as process_name ,rm_consumer_group ,rm_last_action ,rm_last_action_reason ,rm_last_action_time FROM gv$sql_monitor WHERE sql_id = 'b60wdaqys7wf9'; PROCES RM_CONSUMER_GROUP RM_LAST_ACTION RM_LAST_ACTION_REASO RM_LAST_ACTION_TIME ------ ----------------- -------------- -------------------- -------------------- QC DOPDONWGRAD_RSG {null} {null} {null}
Unfortunately, it looks like only the column RM_CONSUMER_GROUP is filled up by Oracle. The remaining RM_ columns are always null.
However, it looks like that whenever the column RM_CONSUMER_GROUP contains a value which is not equal to OTHER_GROUPS (like in the current example where RM_CONSUMER_GROUP= DOPDONWGRAD_RSG), we can infer that the current monitored query
is subject to a resource manager directive which might influence its execution.
For example, let’s execute the same query when connected with a different user (RDEV2) and see what information will be stored in the gv$sql_monitor view:
RDEV2> SELECT /*+ monitor */ count(1) from t1 where owner >= 'A' COUNT(1) ---------- 1000000 SQL_ID b60wdaqys7wf9, child number 2 ------------------------------------- SELECT /*+ monitor */ count(1) from t1 where owner >= 'A' Plan hash value: 3110199320 -------------------------------------------------------------------------------- | Id | Operation | Name | Rows | TQ |IN-OUT| PQ Distrib | -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | | | | 1 | SORT AGGREGATE | | 1 | | | | | 2 | PX COORDINATOR | | | | | | | 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | Q1,00 | P->S | QC (RAND) | | 4 | SORT AGGREGATE | | 1 | Q1,00 | PCWP | | | 5 | PX BLOCK ITERATOR | | 1000K| Q1,00 | PCWC | | |* 6 | TABLE ACCESS FULL| T1 | 1000K| Q1,00 | PCWP | | -------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 6 - access(:Z>=:Z AND :Z<=:Z) filter("OWNER">='A') Note ----- - Degree of Parallelism is 4 because of table property RDEV2> SELECT username ,decode(process_name, 'ora', 'QC', process_name) as process_name ,rm_consumer_group ,rm_last_action ,rm_last_action_reason ,rm_last_action_time FROM gv$sql_monitor WHERE sql_id = 'b60wdaqys7wf9' order by last_refresh_time; USERNAME PROCES RM_CONSUMER_GROUP RM_LAST_ACTION RM_LAST_ACTION_REASO RM_LAST_ACTION_TIME -------- -------- ----------------- ---------------- -------------------- -------------------- RDEV QC DOPDONWGRAD_RSG {null} {null} {null} RDEV2 QC OTHER_GROUPS {null} {null} {null} {null} p002 OTHER_GROUPS {null} {null} {null} {null} p000 OTHER_GROUPS {null} {null} {null} {null} p003 OTHER_GROUPS {null} {null} {null} {null} p001 OTHER_GROUPS {null} {null} {null}
When the same query is executed by user RDEV2 there is no DOP downgrade because this user doesn’t belong to any user created resource manager group. There are two information backing this claim:
• User RDEV2 has OTHER_GROUPS as a default consumer group
• There is 4 PX server associated with the QC of user RDEV2
3.2. AWR and Oracle Resource manager
The second investigation was to see whether I can, a posteriori, diagnose that my query underwent a resource manager directive; particularly those concerning parallel degree limit. For that reason, I wrote the following script
SQL> @getConsGrpHist SELECT begin_time ,execs ,avg_etime ,avg_px -- ,trunc(avg_etime/decode(avg_px,0,1,avg_px) ,2) avg_px_time ,plan_name ,cpu_managed ,parallel FROM (SELECT sn.begin_interval_time begin_time ,st.executions_delta execs ,st.end_of_fetch_count_delta end_of_fetch ,trunc(st.elapsed_time_delta/1e6/decode(st.executions_delta, 0, 1,st.executions_delta)) avg_etime ,round(st.px_servers_execs_delta/decode(st.executions_delta,0,1, st.executions_delta), 1) avg_px ,sr.plan_name ,sr.cpu_managed ,sr.parallel_execution_managed parallel FROM dba_hist_rsrc_plan sr ,dba_hist_snapshot sn ,dba_hist_sqlstat st WHERE sr.snap_id = sn.snap_id AND sn.snap_id = st.snap_id AND sn.begin_interval_time between to_date('&from_date', 'ddmmyyyy hh24:mi:ss') and to_date('&to_date' , 'ddmmyyyy hh24:mi:ss') AND st.sql_id ='&sql_id' AND plan_name != 'DEFAULT_MAINTENANCE_PLAN' ) order by begin_time asc; SQL> @GetConsGrpHist Enter value for from_date: 08112019 15:07:00 Enter value for to_date: 08112019 15:09:00 Enter value for sql_id: b60wdaqys7wf9 BEGIN_TIME EXECS AVG_ETIME AVG_PX PLAN_NAME CPU_MANAGED PARALLEL ------------------------- ------ ---------- ------- --------------- ----------- ---------- 08-NOV-19 03.07.08.369 PM 1 0 0 DOPDOWNGRAD_RPL ON FULL 08-NOV-19 03.07.08.369 PM 1 0 0 DOPDOWNGRAD_RPL ON FULL
Indeed, I can see that my query has used a non-default resource plan (DOPDOWNGRAD_RPL) in which a parallel statement queuing is enabled as indicated by the FULL value of the PARALLEL column.
Bottom-line
In this simple Note I outlined two main points:
• Show how to identify a DOP downgrade and its reason
• Show how to link a non-default Oracle resource manager to a SQL_ID by checking the value of the column RM_CONSUMER_GROUP from the gv$sql_monitor view
• Show that the remaining 3 resource manager columns (RM_LAST_ACTION, RM_LAST_ACTION_REASON, RM_LAST_ACTION_TIME) in the gv$sql_monitor view are, unfortunately, not maintained by Oracle
• Show how to look in AWR historical execution for queries subject to user defined Oracle Resource Manager directives