When Disk file operations I/O wait event starts showing at the top of the AWR Top 10 Foreground Events by Total Wait Time it can consume a lot of CPU and freeze your database. Since I have gone through that myself a couple of weeks ago, I thought it might be interesting to write a succinct blog article explaining how to find and fix the root cause of this non-common wait event.
When I started investigating the real-life case all that I have on my hand was the p1 and p3 parameters of this wait event:
- p1=8 corresponds to wait for miscellaneous io (ftp, block dump, passwd file)
- p3=8 corresponds to any other type of file
To say the least I have been unfortunate with this number.
As I was searching online, I came across a forum question where one of the contributors was pointing to an audit file type and operation. And from there I built the following simple model
SQL> show parameter audit
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
audit_file_dest string /u01/app/oracle/product/19c/dbhome_1/rdbms/audit
audit_sys_operations boolean TRUE
audit_syslog_level string
audit_trail string OS
unified_audit_common_systemlog string
unified_audit_sga_queue_size integer 1048576
unified_audit_systemlog string
AUDIT ALL BY c##test BY ACCESS;
AUDIT SELECT TABLE, UPDATE TABLE, INSERT TABLE, DELETE TABLE BY c##test BY ACCESS;
AUDIT EXECUTE PROCEDURE BY c##test BY ACCESS;
create table t1 as select rownum n1, mod(rownum,2) n2, lpad('x',10) vc from dual connect by level <=1e5;
create or replace procedure p1
as
cursor c1 is
select *
from t1;
begin
for r1 in c1
loop
update t1 set vc = 'yz' where n1 = r1.n1;
insert into t1 values (r1.n1, r1.n2, r1.vc);
delete from t1 where n1=r1.n1 and n2=r1.n2 and vc =r1.vc;
end loop;
commit;
end p1;
I’ve set the audit trail parameter at OS level and I managed to audit several DML operations done by the c##test user.
All that I have to do now to reproduce the Disk file operations I/O wait event on the p1 and p3 values set to 8 is to execute the p1 procedure as shown below
SQL> exec p1();
While the session executing the p1 procedure (sid=205) was running I snapped it to see what it was doing behind the scene
SQL> @snapper all 5 1 205
Sampling SID 205 with interval 5 seconds, taking 1 snapshots...
-- Session Snapper v4.30 - by Tanel Poder ( http://blog.tanelpoder.com/snapper )
- Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet!
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH , NUM_WAITS, WAITS/SEC, AVERAGES
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
205, C##TEST , STAT, opened cursors cumulative , 1859, 188.24, , , , , 1.01 per execution
205, C##TEST , STAT, recursive calls , 1866, 188.95, , , , , 1.28ms recursive CPU per recursive call
205, C##TEST , STAT, recursive cpu usage , 238, 24.1, , , , , 2.38s total recursive CPU usage
205, C##TEST , STAT, session logical reads , 413327, 41.85k, , , , , 414.55k total buffer visits
205, C##TEST , STAT, CPU used by this session , 241, 24.4, , , , , 2.41s total CPU used by this session
205, C##TEST , STAT, user I/O wait time , 6, .61, , , , , 0 per execution
205, C##TEST , STAT, non-idle wait time , 6, .61, , , , , 0 per execution
205, C##TEST , STAT, non-idle wait count , 1858, 188.14, , , , , 1.01 per execution
205, C##TEST , STAT, physical read total IO requests , 1, .1, , , , , 16.38k bytes per request
205, C##TEST , STAT, physical read total bytes , 16384, 1.66k, , , , , 8.94 per execution
205, C##TEST , STAT, cell physical IO interconnect bytes , 16384, 1.66k, , , , , 8.94 per execution
205, C##TEST , STAT, db block gets , 1911, 193.5, , , , , 1.04 per execution
205, C##TEST , STAT, db block gets from cache , 1911, 193.5, , , , , 1.04 per execution
205, C##TEST , STAT, db block gets from cache (fastpath) , 644, 65.21, , , , , .35 per execution
205, C##TEST , STAT, consistent gets , 412820, 41.8k, , , , , 225.34 per execution
205, C##TEST , STAT, consistent gets from cache , 412811, 41.8k, , , , , 225.33 per execution
205, C##TEST , STAT, consistent gets pin , 412811, 41.8k, , , , , 225.33 per execution
205, C##TEST , STAT, consistent gets pin (fastpath) , 412811, 41.8k, , , , , 225.33 per execution
205, C##TEST , STAT, consistent gets direct , 9, .91, , , , , 0 per execution
205, C##TEST , STAT, logical read bytes from cache , 3397402624, 344.01M, , , , , 1.85M per execution
205, C##TEST , STAT, physical reads , 2, .2, , , , , 0 per execution
205, C##TEST , STAT, physical reads direct , 2, .2, , , , , 0 per execution
205, C##TEST , STAT, physical read IO requests , 1, .1, , , , , 16.38k bytes per request
205, C##TEST , STAT, physical read bytes , 16384, 1.66k, , , , , 8.94 per execution
205, C##TEST , STAT, db block changes , 3749, 379.61, , , , , 2.05 per execution
205, C##TEST , STAT, change write time , 49, 4.96, , , , , .03 per execution
205, C##TEST , STAT, free buffer requested , 1265, 128.09, , , , , .69 per execution
205, C##TEST , STAT, dirty buffers inspected , 10, 1.01, , , , , .01 per execution
205, C##TEST , STAT, pinned buffers inspected , 1, .1, , , , , .55*10^-3 per execution
205, C##TEST , STAT, hot buffers moved to head of LRU , 65, 6.58, , , , , .04 per execution
205, C##TEST , STAT, free buffer inspected , 123, 12.45, , , , , .07 per execution
205, C##TEST , STAT, switch current to new buffer , 1240, 125.56, , , , , .68 per execution
205, C##TEST , STAT, switch current caused by our pin , 1240, 125.56, , , , , .68 per execution
205, C##TEST , STAT, shared hash latch upgrades - no wait , 726, 73.51, , , , , .4 per execution
205, C##TEST , STAT, calls to kcmgcs , 3714, 376.07, , , , , 2.03 per execution
205, C##TEST , STAT, calls to kcmgas , 4, .41, , , , , 0 per execution
205, C##TEST , STAT, calls to get snapshot scn: kcmgss , 1855, 187.83, , , , , 1.01 per execution
205, C##TEST , STAT, redo entries , 1856, 187.93, , , , , 1.01 per execution
205, C##TEST , STAT, redo size , 521376, 52.79k, , , , , ~ bytes per user commit
205, C##TEST , STAT, redo ordering marks , 4, .41, , , , , 0 per execution
205, C##TEST , STAT, file io wait time , 22467, 2.27k, , , , , 22.47ms bad guess of IO wait time per IO request
205, C##TEST , STAT, Number of read IOs issued , 1, .1, , , , , .55*10^-3 per execution
205, C##TEST , STAT, undo change vector size , 182320, 18.46k, , , , , 99.52 per execution
205, C##TEST , STAT, no work - consistent read gets , 406064, 41.12k, , , , , 221.65 per execution
205, C##TEST , STAT, table scans (long tables) , 1231, 124.65, , , , , .67 per execution
205, C##TEST , STAT, table scan rows gotten , 123963294, 12.55M, , , , , 67.67k per execution
205, C##TEST , STAT, table scan disk non-IMC rows gotten , 123898306, 12.55M, , , , , 67.63k per execution
205, C##TEST , STAT, table scan blocks gotten , 404643, 40.97k, , , , , 220.88 per execution
205, C##TEST , STAT, heap block compress , 6, .61, , , , , 0 per execution
205, C##TEST , STAT, HSC Heap Segment Block Changes , 1853, 187.63, , , , , 1.01 per execution
205, C##TEST , STAT, session cursor cache hits , 1840, 186.31, , , , , 1.84k softparses avoided thanks to cursor cache
205, C##TEST , STAT, buffer is pinned count , 1222, 123.74, , , , , .29 % buffer gets avoided thanks to buffer pin caching
205, C##TEST , STAT, buffer is not pinned count , 2446, 247.68, , , , , 1.34 per execution
205, C##TEST , STAT, execute count , 1832, 185.5, , , , , ~ executions per parse
205, C##TEST , TIME, PL/SQL execution elapsed time , 11791, 1.19ms, .1%, [ ], , ,
205, C##TEST , TIME, DB CPU , 2460784, 249.17ms, 24.9%, [@@@ ], , ,
205, C##TEST , TIME, sql execute elapsed time , 8059797, 816.12ms, 81.6%, [######### ], , ,
205, C##TEST , TIME, DB time , 8059797, 816.12ms, 81.6%, [######### ], , , 18.39 % unaccounted time
205, C##TEST , WAIT, Disk file operations I/O , 46307, 4.69ms, .5%, [ ], 1828, 185.1, 25.33us average wait ----------> here
205, C##TEST , WAIT, ASM IO for non-blocking poll , 8, .81us, .0%, [ ], 1, .1, 8us average wait
-- End of Stats snap 1, end=2020-12-20 09:02:36, seconds=9.9
------------------------------------------------------------------------------------------
ActSes %Thread | INST | SQL_ID | SQL_CHILD | EVENT | WAIT_CLASS
------------------------------------------------------------------------------------------
.48 (48%) | 1 | fnbt9ts7bsqzn | 0 | ON CPU | ON CPU
.48 (48%) | 1 | 7wv0zz7965u54 | 0 | ON CPU | ON CPU
.03 (3%) | 1 | 1nf0v48kt3a30 | 0 | ON CPU | ON CPU
.03 (3%) | 1 | 8myfvr1fb8gju | 0 | ON CPU | ON CPU
-- End of ASH snap 1, end=2020-12-20 09:02:36, seconds=5, samples_taken=40, AAS=1
SQL> @snapper ash 5 1 205
Sampling SID 205 with interval 5 seconds, taking 1 snapshots...
-- Session Snapper v4.30 - by Tanel Poder ( http://blog.tanelpoder.com/snapper )
- Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet!
----------------------------------------------------------------------------------------------------
ActSes %Thread | INST | SQL_ID | SQL_CHILD | EVENT | WAIT_CLASS
----------------------------------------------------------------------------------------------------
.46 (46%) | 1 | fnbt9ts7bsqzn | 0 | ON CPU | ON CPU
.46 (46%) | 1 | 7wv0zz7965u54 | 0 | ON CPU | ON CPU
.03 (3%) | 1 | 1nf0v48kt3a30 | 0 | ON CPU | ON CPU
.03 (3%) | 1 | fnbt9ts7bsqzn | 0 | Disk file operations I/O | User I/O ----------> here
.03 (3%) | 1 | 8myfvr1fb8gju | 0 | ON CPU | ON CPU
As you can see there is a certain proportion of time spent waiting on Disk file operations I/O. And, naturally, the p1 and p3 values, in this case, are respectively 8 and 8 as the followings prove:
SQL> @ashtopSID
Enter value for sid: 205
Enter value for from: 12-20 09:00:00
Enter value for to: 12-20 09:15:00
Total
Seconds AAS %This SQL_ID SQL_OPNAME TOP_LEVEL_SQL EVENT2 P1TEXT P1 P3TEXT P3
--------- ------- ------- ------------- --------------- ------------- -------------------------- --------------- --- ----------- ----
309 .3 45% | 7wv0zz7965u54 DELETE 8myfvr1fb8gju ON CPU FileOperation 8 filetype 8
306 .3 44% | fnbt9ts7bsqzn UPDATE 8myfvr1fb8gju ON CPU FileOperation 8 filetype 8
24 .0 3% | 1nf0v48kt3a30 INSERT 8myfvr1fb8gju ON CPU FileOperation 8 filetype 8
13 .0 2% | 7wv0zz7965u54 DELETE 8myfvr1fb8gju db file sequential read file# 1 blocks 1
8 .0 1% | fnbt9ts7bsqzn UPDATE 8myfvr1fb8gju db file sequential read file# 1 blocks 1
7 .0 1% | 7wv0zz7965u54 DELETE 8myfvr1fb8gju Disk file operations I/O FileOperation 8 filetype 8 ----------> here
6 .0 1% | fnbt9ts7bsqzn UPDATE 8myfvr1fb8gju Disk file operations I/O FileOperation 8 filetype 8 ----------> here
5 .0 1% | 1nf0v48kt3a30 INSERT 8myfvr1fb8gju Disk file operations I/O FileOperation 8 filetype 8 ----------> here
3 .0 0% | 7wv0zz7965u54 DELETE 8myfvr1fb8gju db file scattered read file# 1 blocks 3
3 .0 0% | 8myfvr1fb8gju ON CPU FileOperation 8 filetype 8
2 .0 0% | 0jjc60pmrntdv SELECT 8myfvr1fb8gju ON CPU FileOperation 8 filetype 8
1 .0 0% | 1nf0v48kt3a30 INSERT 8myfvr1fb8gju ON CPU file# 1 blocks 1
1 .0 0% | 1nf0v48kt3a30 INSERT 8myfvr1fb8gju db file sequential read file# 5 blocks 1
1 .0 0% | 7wv0zz7965u54 DELETE 8myfvr1fb8gju resmgr:cpu quantum location 3 0
1 .0 0% | 8myfvr1fb8gju PL/SQL EXECUTE 8myfvr1fb8gju ON CPU FileOperation 8 filetype 8
Interestingly ASH shows that the CPU consumed by Oracle is also related to the manipulation of files of type 8. I don’t know really how to correctly interpret these p1 and p3 values in the context of CPU
select decode(event,null, 'on cpu', event), p1,p3, count(1)
from dba_hist_active_sess_history
where sample_time between to_date('20122020 09:00:00', 'ddmmyyyy hh24:mi:ss')
and to_date('20122020 09:15:00', 'ddmmyyyy hh24:mi:ss')
group by decode(event,null, 'on cpu', event), p1,p3
order by 4 desc;
DECODE(EVENT,NULL,'ONCPU',EVENT) P1 P3 COUNT(1)
-------------------------------- ---------- ---------- ----------
on cpu 8 8 64
db file sequential read 1 1 10
on cpu 100 0 6
Disk file operations I/O 8 8 4
on cpu 1 1 4
db file scattered read 3 3 3
on cpu 0 0 3
control file sequential read 0 1 3
on cpu 3 1 2
ASM file metadata operation 0 0 2
on cpu 2490368 0 1
db file parallel write 1 2147483647 1
db file async I/O submit 106 0 1
on cpu 3 0 1
on cpu 2 1 1
on cpu 131072 0 1
on cpu 5 1 1
on cpu 65536 0 1
oracle thread bootstrap 5910576 0 1
on cpu 262144 0 1
Summary
Whenever you see a Disk file operations I/O wait event consuming a lot of time and CPU then get the p1,p2,p3 values from ASH or AWR
SQL> @sed "Disk file operations"
Show wait event descriptions matching %Disk file operations%..
EVENT_NAME WAIT_CLASS PARAMETER1 PARAMETER2 PARAMETER3
------------------------------ -------------------- -------------------- -------------------- --------------------
Disk file operations I/O User I/O FileOperation fileno filetype
If p1 and p3 values equal to 8 and 8 respectively then check if you are, inadvertently, auditing too many repetitive actions by means of the following select
SQL> select user_name, audit_option from DBA_STMT_AUDIT_OPTS where user_name='C##TEST';
USER_NAME AUDIT_OPTION
------------------------- ----------------------------------------
C##TEST ALTER SYSTEM
C##TEST SYSTEM AUDIT
C##TEST CREATE SESSION
C##TEST TABLE
C##TEST CLUSTER
C##TEST TABLESPACE
C##TEST USER
C##TEST ROLLBACK SEGMENT
C##TEST TYPE
C##TEST SYNONYM
C##TEST PUBLIC SYNONYM
C##TEST VIEW
C##TEST SEQUENCE
C##TEST DATABASE LINK
C##TEST PUBLIC DATABASE LINK
C##TEST ROLE
C##TEST DIMENSION
C##TEST PROCEDURE
C##TEST TRIGGER
C##TEST PROFILE
C##TEST DIRECTORY
C##TEST MATERIALIZED VIEW
C##TEST MINING MODEL
C##TEST INDEX
C##TEST SELECT TABLE
C##TEST INSERT TABLE
C##TEST UPDATE TABLE
C##TEST DELETE TABLE
C##TEST NOT EXISTS
C##TEST SYSTEM GRANT
C##TEST EXECUTE PROCEDURE
C##TEST CONTEXT
C##TEST SQL TRANSLATION PROFILE
C##TEST LOCKDOWN PROFILE
C##TEST ATTRIBUTE DIMENSION
C##TEST HIERARCHY
C##TEST ANALYTIC VIEW
C##TEST READ DIRECTORY
C##TEST WRITE DIRECTORY
C##TEST EXECUTE DIRECTORY
If so then reduce the number of audited actions if you want to reduce the impact of this wait event.
In passing, I think that this wait event would not have shown up at the TOP if I have set the audit trail parameter value to DB. I left this not-demonstrated though.