When Disk file operations I/O wait event starts showing at the top of the AWR Top 10 Foreground Events by Total Wait Time it can consume a lot of CPU and freeze your database. Since I have gone through that myself a couple of weeks ago, I thought it might be interesting to write a succinct blog article explaining how to find and fix the root cause of this non-common wait event.
When I started investigating the real-life case all that I have on my hand was the p1 and p3 parameters of this wait event:
- p1=8 corresponds to wait for miscellaneous io (ftp, block dump, passwd file)
- p3=8 corresponds to any other type of file
To say the least I have been unfortunate with this number.
As I was searching online, I came across a forum question where one of the contributors was pointing to an audit file type and operation. And from there I built the following simple model
SQL> show parameter audit NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ audit_file_dest string /u01/app/oracle/product/19c/dbhome_1/rdbms/audit audit_sys_operations boolean TRUE audit_syslog_level string audit_trail string OS unified_audit_common_systemlog string unified_audit_sga_queue_size integer 1048576 unified_audit_systemlog string
AUDIT ALL BY c##test BY ACCESS; AUDIT SELECT TABLE, UPDATE TABLE, INSERT TABLE, DELETE TABLE BY c##test BY ACCESS; AUDIT EXECUTE PROCEDURE BY c##test BY ACCESS;
create table t1 as select rownum n1, mod(rownum,2) n2, lpad('x',10) vc from dual connect by level <=1e5; create or replace procedure p1 as cursor c1 is select * from t1; begin for r1 in c1 loop update t1 set vc = 'yz' where n1 = r1.n1; insert into t1 values (r1.n1, r1.n2, r1.vc); delete from t1 where n1=r1.n1 and n2=r1.n2 and vc =r1.vc; end loop; commit; end p1;
I’ve set the audit trail parameter at OS level and I managed to audit several DML operations done by the c##test user.
All that I have to do now to reproduce the Disk file operations I/O wait event on the p1 and p3 values set to 8 is to execute the p1 procedure as shown below
SQL> exec p1();
While the session executing the p1 procedure (sid=205) was running I snapped it to see what it was doing behind the scene
SQL> @snapper all 5 1 205 Sampling SID 205 with interval 5 seconds, taking 1 snapshots... -- Session Snapper v4.30 - by Tanel Poder ( http://blog.tanelpoder.com/snapper ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH , NUM_WAITS, WAITS/SEC, AVERAGES ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 205, C##TEST , STAT, opened cursors cumulative , 1859, 188.24, , , , , 1.01 per execution 205, C##TEST , STAT, recursive calls , 1866, 188.95, , , , , 1.28ms recursive CPU per recursive call 205, C##TEST , STAT, recursive cpu usage , 238, 24.1, , , , , 2.38s total recursive CPU usage 205, C##TEST , STAT, session logical reads , 413327, 41.85k, , , , , 414.55k total buffer visits 205, C##TEST , STAT, CPU used by this session , 241, 24.4, , , , , 2.41s total CPU used by this session 205, C##TEST , STAT, user I/O wait time , 6, .61, , , , , 0 per execution 205, C##TEST , STAT, non-idle wait time , 6, .61, , , , , 0 per execution 205, C##TEST , STAT, non-idle wait count , 1858, 188.14, , , , , 1.01 per execution 205, C##TEST , STAT, physical read total IO requests , 1, .1, , , , , 16.38k bytes per request 205, C##TEST , STAT, physical read total bytes , 16384, 1.66k, , , , , 8.94 per execution 205, C##TEST , STAT, cell physical IO interconnect bytes , 16384, 1.66k, , , , , 8.94 per execution 205, C##TEST , STAT, db block gets , 1911, 193.5, , , , , 1.04 per execution 205, C##TEST , STAT, db block gets from cache , 1911, 193.5, , , , , 1.04 per execution 205, C##TEST , STAT, db block gets from cache (fastpath) , 644, 65.21, , , , , .35 per execution 205, C##TEST , STAT, consistent gets , 412820, 41.8k, , , , , 225.34 per execution 205, C##TEST , STAT, consistent gets from cache , 412811, 41.8k, , , , , 225.33 per execution 205, C##TEST , STAT, consistent gets pin , 412811, 41.8k, , , , , 225.33 per execution 205, C##TEST , STAT, consistent gets pin (fastpath) , 412811, 41.8k, , , , , 225.33 per execution 205, C##TEST , STAT, consistent gets direct , 9, .91, , , , , 0 per execution 205, C##TEST , STAT, logical read bytes from cache , 3397402624, 344.01M, , , , , 1.85M per execution 205, C##TEST , STAT, physical reads , 2, .2, , , , , 0 per execution 205, C##TEST , STAT, physical reads direct , 2, .2, , , , , 0 per execution 205, C##TEST , STAT, physical read IO requests , 1, .1, , , , , 16.38k bytes per request 205, C##TEST , STAT, physical read bytes , 16384, 1.66k, , , , , 8.94 per execution 205, C##TEST , STAT, db block changes , 3749, 379.61, , , , , 2.05 per execution 205, C##TEST , STAT, change write time , 49, 4.96, , , , , .03 per execution 205, C##TEST , STAT, free buffer requested , 1265, 128.09, , , , , .69 per execution 205, C##TEST , STAT, dirty buffers inspected , 10, 1.01, , , , , .01 per execution 205, C##TEST , STAT, pinned buffers inspected , 1, .1, , , , , .55*10^-3 per execution 205, C##TEST , STAT, hot buffers moved to head of LRU , 65, 6.58, , , , , .04 per execution 205, C##TEST , STAT, free buffer inspected , 123, 12.45, , , , , .07 per execution 205, C##TEST , STAT, switch current to new buffer , 1240, 125.56, , , , , .68 per execution 205, C##TEST , STAT, switch current caused by our pin , 1240, 125.56, , , , , .68 per execution 205, C##TEST , STAT, shared hash latch upgrades - no wait , 726, 73.51, , , , , .4 per execution 205, C##TEST , STAT, calls to kcmgcs , 3714, 376.07, , , , , 2.03 per execution 205, C##TEST , STAT, calls to kcmgas , 4, .41, , , , , 0 per execution 205, C##TEST , STAT, calls to get snapshot scn: kcmgss , 1855, 187.83, , , , , 1.01 per execution 205, C##TEST , STAT, redo entries , 1856, 187.93, , , , , 1.01 per execution 205, C##TEST , STAT, redo size , 521376, 52.79k, , , , , ~ bytes per user commit 205, C##TEST , STAT, redo ordering marks , 4, .41, , , , , 0 per execution 205, C##TEST , STAT, file io wait time , 22467, 2.27k, , , , , 22.47ms bad guess of IO wait time per IO request 205, C##TEST , STAT, Number of read IOs issued , 1, .1, , , , , .55*10^-3 per execution 205, C##TEST , STAT, undo change vector size , 182320, 18.46k, , , , , 99.52 per execution 205, C##TEST , STAT, no work - consistent read gets , 406064, 41.12k, , , , , 221.65 per execution 205, C##TEST , STAT, table scans (long tables) , 1231, 124.65, , , , , .67 per execution 205, C##TEST , STAT, table scan rows gotten , 123963294, 12.55M, , , , , 67.67k per execution 205, C##TEST , STAT, table scan disk non-IMC rows gotten , 123898306, 12.55M, , , , , 67.63k per execution 205, C##TEST , STAT, table scan blocks gotten , 404643, 40.97k, , , , , 220.88 per execution 205, C##TEST , STAT, heap block compress , 6, .61, , , , , 0 per execution 205, C##TEST , STAT, HSC Heap Segment Block Changes , 1853, 187.63, , , , , 1.01 per execution 205, C##TEST , STAT, session cursor cache hits , 1840, 186.31, , , , , 1.84k softparses avoided thanks to cursor cache 205, C##TEST , STAT, buffer is pinned count , 1222, 123.74, , , , , .29 % buffer gets avoided thanks to buffer pin caching 205, C##TEST , STAT, buffer is not pinned count , 2446, 247.68, , , , , 1.34 per execution 205, C##TEST , STAT, execute count , 1832, 185.5, , , , , ~ executions per parse 205, C##TEST , TIME, PL/SQL execution elapsed time , 11791, 1.19ms, .1%, [ ], , , 205, C##TEST , TIME, DB CPU , 2460784, 249.17ms, 24.9%, [@@@ ], , , 205, C##TEST , TIME, sql execute elapsed time , 8059797, 816.12ms, 81.6%, [######### ], , , 205, C##TEST , TIME, DB time , 8059797, 816.12ms, 81.6%, [######### ], , , 18.39 % unaccounted time 205, C##TEST , WAIT, Disk file operations I/O , 46307, 4.69ms, .5%, [ ], 1828, 185.1, 25.33us average wait ----------> here 205, C##TEST , WAIT, ASM IO for non-blocking poll , 8, .81us, .0%, [ ], 1, .1, 8us average wait -- End of Stats snap 1, end=2020-12-20 09:02:36, seconds=9.9 ------------------------------------------------------------------------------------------ ActSes %Thread | INST | SQL_ID | SQL_CHILD | EVENT | WAIT_CLASS ------------------------------------------------------------------------------------------ .48 (48%) | 1 | fnbt9ts7bsqzn | 0 | ON CPU | ON CPU .48 (48%) | 1 | 7wv0zz7965u54 | 0 | ON CPU | ON CPU .03 (3%) | 1 | 1nf0v48kt3a30 | 0 | ON CPU | ON CPU .03 (3%) | 1 | 8myfvr1fb8gju | 0 | ON CPU | ON CPU -- End of ASH snap 1, end=2020-12-20 09:02:36, seconds=5, samples_taken=40, AAS=1
SQL> @snapper ash 5 1 205 Sampling SID 205 with interval 5 seconds, taking 1 snapshots... -- Session Snapper v4.30 - by Tanel Poder ( http://blog.tanelpoder.com/snapper ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! ---------------------------------------------------------------------------------------------------- ActSes %Thread | INST | SQL_ID | SQL_CHILD | EVENT | WAIT_CLASS ---------------------------------------------------------------------------------------------------- .46 (46%) | 1 | fnbt9ts7bsqzn | 0 | ON CPU | ON CPU .46 (46%) | 1 | 7wv0zz7965u54 | 0 | ON CPU | ON CPU .03 (3%) | 1 | 1nf0v48kt3a30 | 0 | ON CPU | ON CPU .03 (3%) | 1 | fnbt9ts7bsqzn | 0 | Disk file operations I/O | User I/O ----------> here .03 (3%) | 1 | 8myfvr1fb8gju | 0 | ON CPU | ON CPU
As you can see there is a certain proportion of time spent waiting on Disk file operations I/O. And, naturally, the p1 and p3 values, in this case, are respectively 8 and 8 as the followings prove:
SQL> @ashtopSID Enter value for sid: 205 Enter value for from: 12-20 09:00:00 Enter value for to: 12-20 09:15:00 Total Seconds AAS %This SQL_ID SQL_OPNAME TOP_LEVEL_SQL EVENT2 P1TEXT P1 P3TEXT P3 --------- ------- ------- ------------- --------------- ------------- -------------------------- --------------- --- ----------- ---- 309 .3 45% | 7wv0zz7965u54 DELETE 8myfvr1fb8gju ON CPU FileOperation 8 filetype 8 306 .3 44% | fnbt9ts7bsqzn UPDATE 8myfvr1fb8gju ON CPU FileOperation 8 filetype 8 24 .0 3% | 1nf0v48kt3a30 INSERT 8myfvr1fb8gju ON CPU FileOperation 8 filetype 8 13 .0 2% | 7wv0zz7965u54 DELETE 8myfvr1fb8gju db file sequential read file# 1 blocks 1 8 .0 1% | fnbt9ts7bsqzn UPDATE 8myfvr1fb8gju db file sequential read file# 1 blocks 1 7 .0 1% | 7wv0zz7965u54 DELETE 8myfvr1fb8gju Disk file operations I/O FileOperation 8 filetype 8 ----------> here 6 .0 1% | fnbt9ts7bsqzn UPDATE 8myfvr1fb8gju Disk file operations I/O FileOperation 8 filetype 8 ----------> here 5 .0 1% | 1nf0v48kt3a30 INSERT 8myfvr1fb8gju Disk file operations I/O FileOperation 8 filetype 8 ----------> here 3 .0 0% | 7wv0zz7965u54 DELETE 8myfvr1fb8gju db file scattered read file# 1 blocks 3 3 .0 0% | 8myfvr1fb8gju ON CPU FileOperation 8 filetype 8 2 .0 0% | 0jjc60pmrntdv SELECT 8myfvr1fb8gju ON CPU FileOperation 8 filetype 8 1 .0 0% | 1nf0v48kt3a30 INSERT 8myfvr1fb8gju ON CPU file# 1 blocks 1 1 .0 0% | 1nf0v48kt3a30 INSERT 8myfvr1fb8gju db file sequential read file# 5 blocks 1 1 .0 0% | 7wv0zz7965u54 DELETE 8myfvr1fb8gju resmgr:cpu quantum location 3 0 1 .0 0% | 8myfvr1fb8gju PL/SQL EXECUTE 8myfvr1fb8gju ON CPU FileOperation 8 filetype 8
Interestingly ASH shows that the CPU consumed by Oracle is also related to the manipulation of files of type 8. I don’t know really how to correctly interpret these p1 and p3 values in the context of CPU
select decode(event,null, 'on cpu', event), p1,p3, count(1) from dba_hist_active_sess_history where sample_time between to_date('20122020 09:00:00', 'ddmmyyyy hh24:mi:ss') and to_date('20122020 09:15:00', 'ddmmyyyy hh24:mi:ss') group by decode(event,null, 'on cpu', event), p1,p3 order by 4 desc; DECODE(EVENT,NULL,'ONCPU',EVENT) P1 P3 COUNT(1) -------------------------------- ---------- ---------- ---------- on cpu 8 8 64 db file sequential read 1 1 10 on cpu 100 0 6 Disk file operations I/O 8 8 4 on cpu 1 1 4 db file scattered read 3 3 3 on cpu 0 0 3 control file sequential read 0 1 3 on cpu 3 1 2 ASM file metadata operation 0 0 2 on cpu 2490368 0 1 db file parallel write 1 2147483647 1 db file async I/O submit 106 0 1 on cpu 3 0 1 on cpu 2 1 1 on cpu 131072 0 1 on cpu 5 1 1 on cpu 65536 0 1 oracle thread bootstrap 5910576 0 1 on cpu 262144 0 1
Summary
Whenever you see a Disk file operations I/O wait event consuming a lot of time and CPU then get the p1,p2,p3 values from ASH or AWR
SQL> @sed "Disk file operations" Show wait event descriptions matching %Disk file operations%.. EVENT_NAME WAIT_CLASS PARAMETER1 PARAMETER2 PARAMETER3 ------------------------------ -------------------- -------------------- -------------------- -------------------- Disk file operations I/O User I/O FileOperation fileno filetype
If p1 and p3 values equal to 8 and 8 respectively then check if you are, inadvertently, auditing too many repetitive actions by means of the following select
SQL> select user_name, audit_option from DBA_STMT_AUDIT_OPTS where user_name='C##TEST'; USER_NAME AUDIT_OPTION ------------------------- ---------------------------------------- C##TEST ALTER SYSTEM C##TEST SYSTEM AUDIT C##TEST CREATE SESSION C##TEST TABLE C##TEST CLUSTER C##TEST TABLESPACE C##TEST USER C##TEST ROLLBACK SEGMENT C##TEST TYPE C##TEST SYNONYM C##TEST PUBLIC SYNONYM C##TEST VIEW C##TEST SEQUENCE C##TEST DATABASE LINK C##TEST PUBLIC DATABASE LINK C##TEST ROLE C##TEST DIMENSION C##TEST PROCEDURE C##TEST TRIGGER C##TEST PROFILE C##TEST DIRECTORY C##TEST MATERIALIZED VIEW C##TEST MINING MODEL C##TEST INDEX C##TEST SELECT TABLE C##TEST INSERT TABLE C##TEST UPDATE TABLE C##TEST DELETE TABLE C##TEST NOT EXISTS C##TEST SYSTEM GRANT C##TEST EXECUTE PROCEDURE C##TEST CONTEXT C##TEST SQL TRANSLATION PROFILE C##TEST LOCKDOWN PROFILE C##TEST ATTRIBUTE DIMENSION C##TEST HIERARCHY C##TEST ANALYTIC VIEW C##TEST READ DIRECTORY C##TEST WRITE DIRECTORY C##TEST EXECUTE DIRECTORY
If so then reduce the number of audited actions if you want to reduce the impact of this wait event.
In passing, I think that this wait event would not have shown up at the TOP if I have set the audit trail parameter value to DB. I left this not-demonstrated though.