Mohamed Houri’s Oracle Notes

July 9, 2015

Stressed ASH

Filed under: Oracle — hourim @ 5:29 pm

It is well known that any record found in dba_hist_active_session_history has inevitably been routed there from v$active_session_history. If so, then how could we interpret the following cut & past from a running production system?

ASH first

SQL> select event, count(1)
    from gv$active_session_history
    where sample_time between to_date('06072015 18:30:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('06072015 19:30:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
                                                                        372
direct path read                                                        185
log file parallel write                                                  94
Disk file Mirror Read                                                    22
control file sequential read                                             20
control file parallel write                                              18
direct path write temp                                                   16
Streams AQ: qmn coordinator waiting for slave to start                   12
db file parallel read                                                    11
gc cr multi block request                                                 6
enq: KO - fast object checkpoint                                          4
db file sequential read                                                   3
ges inquiry response                                                      3
os thread startup                                                         2
PX Deq: Signal ACK RSG                                                    2
enq: CF - contention                                                      1
PX Deq: Slave Session Stats                                               1
Disk file operations I/O                                                  1
IPC send completion sync                                                  1
reliable message                                                          1
null event                                                                1
enq: CO - master slave det                                                1
db file parallel write                                                    1
gc current block 2-way                                                    1

AWR next

SQL> select event, count(1)
    from dba_hist_active_sess_history
    where sample_time between to_date('06072015 18:30:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('06072015 19:30:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
SQL*Net break/reset to client                                         12950
enq: TM - contention                                                  12712
                                                                        624
db file sequential read                                                 386
enq: TX - row lock contention                                           259
SQL*Net message from dblink                                              74
direct path read                                                         62
SQL*Net more data from dblink                                            27
log file parallel write                                                  26
log file sync                                                            15
SQL*Net more data from client                                             9
control file sequential read                                              7
Disk file Mirror Read                                                     6
gc cr grant 2-way                                                         5
db file parallel write                                                    4
read by other session                                                     3
control file parallel write                                               3
Streams AQ: qmn coordinator waiting for slave to start                    3
log file sequential read                                                  2
direct path read temp                                                     2
enq: KO - fast object checkpoint                                          2
gc cr multi block request                                                 1
CSS initialization                                                        1
gc current block 2-way                                                    1
reliable message                                                          1
db file parallel read                                                     1
gc buffer busy acquire                                                    1
ges inquiry response                                                      1
direct path write temp                                                    1
rdbms ipc message                                                         1
os thread startup                                                         1

12,950 snapshots of SQL*Net break/reset to client and 12,712 snapshots of an enq: TM – contention wait events in AWR not found in ASH. How can we interpret this situation?

This 11.2.0.4.0 database is implemented under a RAC infrastructure with 2 instances. Let’s look at the ASH of the two instances separately

Instance 1 first

SQL> select event, count(1)
    from v$active_session_history
    where sample_time between to_date('06072015 18:30:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('06072015 19:30:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;

 no rows selected

Instance 2 next

SQL> select event, count(1)
    from v$active_session_history
    where sample_time between to_date('06072015 18:30:00', 'ddmmyyyy hh24:mi:ss')
                      and     to_date('06072015 19:30:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;

EVENT                                                              COUNT(1)
---------------------------------------------------------------- ----------
                                                                        372
direct path read                                                        185
log file parallel write                                                  94
Disk file Mirror Read                                                    22
control file sequential read                                             20
control file parallel write                                              18
direct path write temp                                                   16
Streams AQ: qmn coordinator waiting for slave to start                   12
db file parallel read                                                    11
gc cr multi block request                                                 6
enq: KO - fast object checkpoint                                          4
db file sequential read                                                   3
ges inquiry response                                                      3
os thread startup                                                         2
PX Deq: Signal ACK RSG                                                    2
enq: CF - contention                                                      1
PX Deq: Slave Session Stats                                               1
Disk file operations I/O                                                  1
IPC send completion sync                                                  1
reliable message                                                          1
null event                                                                1
enq: CO - master slave det                                                1
db file parallel write                                                    1
gc current block 2-way                                                    1

All what is sampled in ASH in that specific time interval is coming from the second instance while the first instance doesn’t report any record for the corresponding time interval. This inevitably questions either the ash size of instance one or an imbalanced workload between the two instances:

ASH size first

SQL> select
  2        inst_id
  3        ,total_size
  4      from gv$ash_info;

   INST_ID TOTAL_SIZE
---------- ----------
         1  100663296
         2  100663296

ASH Activity next

SQL> select
        inst_id
       ,total_size
       ,awr_flush_emergency_count
     from gv$ash_info;

   INST_ID TOTAL_SIZE AWR_FLUSH_EMERGENCY_COUNT
---------- ---------- -------------------------
         1  100663296                       136
         2  100663296                         0

Typically the activity is mainly oriented towards instance 1 and the abnormal and unusual 12,712 SQL*Net break/reset to client wait events have exacerbated the rate of insert into ASH buffers of instance one generating the 136 awr_flush_emergency_count and, as such, the discrepancies between ASH and AWR.

This is also confirmed by the difference in the ASH retention period between the two instances

Instance 1 first where only 3 hours of ASH data are kept

SQL> select min(sample_time), max(sample_time)
  2  from v$active_session_history;

MIN(SAMPLE_TIME)                         MAX(SAMPLE_TIME)
---------------------------------------  -------------------------
08-JUL-15 05.51.20.502 AM                08-JUL-15 08.35.48.233 AM

Instance 2 next where several days worth of ASH data are still present

SQL> select min(sample_time), max(sample_time)
  2  from v$active_session_history;

MIN(SAMPLE_TIME)                         MAX(SAMPLE_TIME)
---------------------------------------  -------------------------
25-JUN-15 20.01.43                       08-JUL-15 08.37.17.233 AM

The solution would be one of the following points (I think in the order of priority):

  • Solve this SQL*Net break/reset to client issue which is dramatically filling up the ash buffer causing unexpected rapid flush of important and more precise data
  • Balance the work load activity between the two instances
  • Increase the ash size of the instance 1 by means of alter system set “_ash_size”=25165824;

In the next article I will explain how I have identified what is causing this unusual SQL*Net break/reset to client wait events.

4 Comments »

  1. Yes, I’ve noticed that unbalanced workload / activity results in differences for ASH stats in the AWR repository between database instances. I prefer to pull from v$ASH if possible and do a UNION between V$ASH and DBA_HIST_ASH for a “mixed” view to show older data with current data.

    Comment by Hemant K Chitale — July 10, 2015 @ 3:05 am | Reply

  2. Herman,

    And I have another issue I still have not got enough time to investigate it. Notice the following select

    Mohamed HOURI [15:31]: 
    SQL> select event, count(1)
    from gv$active_session_history
    where sample_time between to_date('26062015 15:15:00', 'ddmmyyyy hh24:mi:ss')
                       and    to_date('26062015 15:30:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;
    
    no rows selected
    

    There are no rows in ASH at 15:31.

    I kept issuing this select several times say every 5 min or so and there were no rows selected

    Until at around 16:18 where when I have executed again the same query and this time it give this:

    Mohamed HOURI [16:18]: 
    
    SQL> select event, count(1)
    from gv$active_session_history
    where sample_time between to_date('26062015 15:15:00', 'ddmmyyyy hh24:mi:ss')
                       and    to_date('26062015 15:30:00', 'ddmmyyyy hh24:mi:ss')
    group by event
    order by 2 desc;
    
    EVENT                                       COUNT(1)
    ------------------------------------------- -------
    											797
    db file sequential read   					114
    log file parallel write      				12
    log file sync       						11
    LNS wait on SENDREQ   						5
    db file parallel read       				2
    enq: TX - index contention          		2
    db file async I/O submit  					1
    control file sequential read         		1
    os thread startup           				1
    SQL*Net break/reset to client      			1
    

    I can’t give you the exact laps of time between 15:31 and 16:18 in which ASH was not presenting anything but for sure it took several minutes without showing any rows.
    I have already raised this question in oracl-list because while ASH was not showing anything Tanel Poder Snapper was instead presenting several wait events. But when I came back and issued the same query again in order to verify one thing ASH was showing data. I was confused but now I know that there is an issue I need to figure out. May be MMON was busy doing something else and couldn’t insert data into ASH very quickly? This is what I want to figure out when the issue will come back.

    Best Regards

    Comment by hourim — July 10, 2015 @ 7:01 am | Reply

  3. AWR_FLUSH_EMERGENCY occurences are logged in the ALERT log, with a message such as this one:
    “Active Session History (ASH) performed an emergency flush. This may mean that ASH is undersized. If emergency flushes are a recurring issue, you may consider increasing ASH size by setting the value of _ASH_SIZE to a sufficiently large value. Currently, ASH size is 8388608
    bytes. Both ASH size and the total number of emergency flushes since instance startup can be monitored by running the following query:
    select total_size,awr_flush_emergency_count from v$ash_info;”
    So when info seems to be missing from v$active_session_history, it’s good idea to check your alert log.

    Comment by Frank (@fdernoncourt) — July 20, 2015 @ 11:38 am | Reply

  4. Frank

    Yes checking alert log (and trace file) about MMON process (responsible of filling ASH) is a good starting point when troubleshooting ASH issues. I have created an otn thread which I prefere linking it here for me when coming back to this issue🙂

    https://community.oracle.com/thread/3768134?sr=inbox

    Best regards
    Mohamed

    Comment by hourim — July 20, 2015 @ 12:00 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

Tony's Oracle Tips

Tony Hasler's light hearted approach to learning about Oracle

Richard Foote's Oracle Blog

Focusing Specifically On Oracle Indexes, Database Administration and Some Great Music

Hatem Mahmoud Oracle's blog

Just another Oracle blog : Database topics and techniques

Mohamed Houri’s Oracle Notes

Qui se conçoit bien s’énonce clairement

Oracle Diagnostician

Performance troubleshooting as exact science

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

EU Careers info

Your career in the European Union

Carlos Sierra's Tools and Tips

Tools and Tips for Oracle Performance and SQL Tuning

Oracle Scratchpad

Just another Oracle weblog

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)

%d bloggers like this: