When Disk file operations I/O wait event starts showing at the top of the AWR Top 10 Foreground Events by Total Wait Time it can consume a lot of CPU and freeze your database. Since I have gone through that myself a couple of weeks ago, I thought it might be interesting to write a succinct blog article explaining how to find and fix the root cause of this non-common wait event.
When I started investigating the real-life case all that I have on my hand was the p1 and p3 parameters of this wait event:
- p1=8 corresponds to wait for miscellaneous io (ftp, block dump, passwd file)
- p3=8 corresponds to any other type of file
To say the least I have been unfortunate with this number.
As I was searching online, I came across a forum question where one of the contributors was pointing to an audit file type and operation. And from there I built the following simple model
SQL> show parameter audit NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ audit_file_dest string /u01/app/oracle/product/19c/dbhome_1/rdbms/audit audit_sys_operations boolean TRUE audit_syslog_level string audit_trail string OS unified_audit_common_systemlog string unified_audit_sga_queue_size integer 1048576 unified_audit_systemlog string
AUDIT ALL BY c##test BY ACCESS; AUDIT SELECT TABLE, UPDATE TABLE, INSERT TABLE, DELETE TABLE BY c##test BY ACCESS; AUDIT EXECUTE PROCEDURE BY c##test BY ACCESS;
create table t1 as select rownum n1, mod(rownum,2) n2, lpad('x',10) vc from dual connect by level <=1e5; create or replace procedure p1 as cursor c1 is select * from t1; begin for r1 in c1 loop update t1 set vc = 'yz' where n1 = r1.n1; insert into t1 values (r1.n1, r1.n2, r1.vc); delete from t1 where n1=r1.n1 and n2=r1.n2 and vc =r1.vc; end loop; commit; end p1;
I’ve set the audit trail parameter at OS level and I managed to audit several DML operations done by the c##test user.
All that I have to do now to reproduce the Disk file operations I/O wait event on the p1 and p3 values set to 8 is to execute the p1 procedure as shown below
SQL> exec p1();
While the session executing the p1 procedure (sid=205) was running I snapped it to see what it was doing behind the scene
SQL> @snapper all 5 1 205 Sampling SID 205 with interval 5 seconds, taking 1 snapshots... -- Session Snapper v4.30 - by Tanel Poder ( http://blog.tanelpoder.com/snapper ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ SID, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH , NUM_WAITS, WAITS/SEC, AVERAGES ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 205, C##TEST , STAT, opened cursors cumulative , 1859, 188.24, , , , , 1.01 per execution 205, C##TEST , STAT, recursive calls , 1866, 188.95, , , , , 1.28ms recursive CPU per recursive call 205, C##TEST , STAT, recursive cpu usage , 238, 24.1, , , , , 2.38s total recursive CPU usage 205, C##TEST , STAT, session logical reads , 413327, 41.85k, , , , , 414.55k total buffer visits 205, C##TEST , STAT, CPU used by this session , 241, 24.4, , , , , 2.41s total CPU used by this session 205, C##TEST , STAT, user I/O wait time , 6, .61, , , , , 0 per execution 205, C##TEST , STAT, non-idle wait time , 6, .61, , , , , 0 per execution 205, C##TEST , STAT, non-idle wait count , 1858, 188.14, , , , , 1.01 per execution 205, C##TEST , STAT, physical read total IO requests , 1, .1, , , , , 16.38k bytes per request 205, C##TEST , STAT, physical read total bytes , 16384, 1.66k, , , , , 8.94 per execution 205, C##TEST , STAT, cell physical IO interconnect bytes , 16384, 1.66k, , , , , 8.94 per execution 205, C##TEST , STAT, db block gets , 1911, 193.5, , , , , 1.04 per execution 205, C##TEST , STAT, db block gets from cache , 1911, 193.5, , , , , 1.04 per execution 205, C##TEST , STAT, db block gets from cache (fastpath) , 644, 65.21, , , , , .35 per execution 205, C##TEST , STAT, consistent gets , 412820, 41.8k, , , , , 225.34 per execution 205, C##TEST , STAT, consistent gets from cache , 412811, 41.8k, , , , , 225.33 per execution 205, C##TEST , STAT, consistent gets pin , 412811, 41.8k, , , , , 225.33 per execution 205, C##TEST , STAT, consistent gets pin (fastpath) , 412811, 41.8k, , , , , 225.33 per execution 205, C##TEST , STAT, consistent gets direct , 9, .91, , , , , 0 per execution 205, C##TEST , STAT, logical read bytes from cache , 3397402624, 344.01M, , , , , 1.85M per execution 205, C##TEST , STAT, physical reads , 2, .2, , , , , 0 per execution 205, C##TEST , STAT, physical reads direct , 2, .2, , , , , 0 per execution 205, C##TEST , STAT, physical read IO requests , 1, .1, , , , , 16.38k bytes per request 205, C##TEST , STAT, physical read bytes , 16384, 1.66k, , , , , 8.94 per execution 205, C##TEST , STAT, db block changes , 3749, 379.61, , , , , 2.05 per execution 205, C##TEST , STAT, change write time , 49, 4.96, , , , , .03 per execution 205, C##TEST , STAT, free buffer requested , 1265, 128.09, , , , , .69 per execution 205, C##TEST , STAT, dirty buffers inspected , 10, 1.01, , , , , .01 per execution 205, C##TEST , STAT, pinned buffers inspected , 1, .1, , , , , .55*10^-3 per execution 205, C##TEST , STAT, hot buffers moved to head of LRU , 65, 6.58, , , , , .04 per execution 205, C##TEST , STAT, free buffer inspected , 123, 12.45, , , , , .07 per execution 205, C##TEST , STAT, switch current to new buffer , 1240, 125.56, , , , , .68 per execution 205, C##TEST , STAT, switch current caused by our pin , 1240, 125.56, , , , , .68 per execution 205, C##TEST , STAT, shared hash latch upgrades - no wait , 726, 73.51, , , , , .4 per execution 205, C##TEST , STAT, calls to kcmgcs , 3714, 376.07, , , , , 2.03 per execution 205, C##TEST , STAT, calls to kcmgas , 4, .41, , , , , 0 per execution 205, C##TEST , STAT, calls to get snapshot scn: kcmgss , 1855, 187.83, , , , , 1.01 per execution 205, C##TEST , STAT, redo entries , 1856, 187.93, , , , , 1.01 per execution 205, C##TEST , STAT, redo size , 521376, 52.79k, , , , , ~ bytes per user commit 205, C##TEST , STAT, redo ordering marks , 4, .41, , , , , 0 per execution 205, C##TEST , STAT, file io wait time , 22467, 2.27k, , , , , 22.47ms bad guess of IO wait time per IO request 205, C##TEST , STAT, Number of read IOs issued , 1, .1, , , , , .55*10^-3 per execution 205, C##TEST , STAT, undo change vector size , 182320, 18.46k, , , , , 99.52 per execution 205, C##TEST , STAT, no work - consistent read gets , 406064, 41.12k, , , , , 221.65 per execution 205, C##TEST , STAT, table scans (long tables) , 1231, 124.65, , , , , .67 per execution 205, C##TEST , STAT, table scan rows gotten , 123963294, 12.55M, , , , , 67.67k per execution 205, C##TEST , STAT, table scan disk non-IMC rows gotten , 123898306, 12.55M, , , , , 67.63k per execution 205, C##TEST , STAT, table scan blocks gotten , 404643, 40.97k, , , , , 220.88 per execution 205, C##TEST , STAT, heap block compress , 6, .61, , , , , 0 per execution 205, C##TEST , STAT, HSC Heap Segment Block Changes , 1853, 187.63, , , , , 1.01 per execution 205, C##TEST , STAT, session cursor cache hits , 1840, 186.31, , , , , 1.84k softparses avoided thanks to cursor cache 205, C##TEST , STAT, buffer is pinned count , 1222, 123.74, , , , , .29 % buffer gets avoided thanks to buffer pin caching 205, C##TEST , STAT, buffer is not pinned count , 2446, 247.68, , , , , 1.34 per execution 205, C##TEST , STAT, execute count , 1832, 185.5, , , , , ~ executions per parse 205, C##TEST , TIME, PL/SQL execution elapsed time , 11791, 1.19ms, .1%, [ ], , , 205, C##TEST , TIME, DB CPU , 2460784, 249.17ms, 24.9%, [@@@ ], , , 205, C##TEST , TIME, sql execute elapsed time , 8059797, 816.12ms, 81.6%, [######### ], , , 205, C##TEST , TIME, DB time , 8059797, 816.12ms, 81.6%, [######### ], , , 18.39 % unaccounted time 205, C##TEST , WAIT, Disk file operations I/O , 46307, 4.69ms, .5%, [ ], 1828, 185.1, 25.33us average wait ----------> here 205, C##TEST , WAIT, ASM IO for non-blocking poll , 8, .81us, .0%, [ ], 1, .1, 8us average wait -- End of Stats snap 1, end=2020-12-20 09:02:36, seconds=9.9 ------------------------------------------------------------------------------------------ ActSes %Thread | INST | SQL_ID | SQL_CHILD | EVENT | WAIT_CLASS ------------------------------------------------------------------------------------------ .48 (48%) | 1 | fnbt9ts7bsqzn | 0 | ON CPU | ON CPU .48 (48%) | 1 | 7wv0zz7965u54 | 0 | ON CPU | ON CPU .03 (3%) | 1 | 1nf0v48kt3a30 | 0 | ON CPU | ON CPU .03 (3%) | 1 | 8myfvr1fb8gju | 0 | ON CPU | ON CPU -- End of ASH snap 1, end=2020-12-20 09:02:36, seconds=5, samples_taken=40, AAS=1
SQL> @snapper ash 5 1 205 Sampling SID 205 with interval 5 seconds, taking 1 snapshots... -- Session Snapper v4.30 - by Tanel Poder ( http://blog.tanelpoder.com/snapper ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! ---------------------------------------------------------------------------------------------------- ActSes %Thread | INST | SQL_ID | SQL_CHILD | EVENT | WAIT_CLASS ---------------------------------------------------------------------------------------------------- .46 (46%) | 1 | fnbt9ts7bsqzn | 0 | ON CPU | ON CPU .46 (46%) | 1 | 7wv0zz7965u54 | 0 | ON CPU | ON CPU .03 (3%) | 1 | 1nf0v48kt3a30 | 0 | ON CPU | ON CPU .03 (3%) | 1 | fnbt9ts7bsqzn | 0 | Disk file operations I/O | User I/O ----------> here .03 (3%) | 1 | 8myfvr1fb8gju | 0 | ON CPU | ON CPU
As you can see there is a certain proportion of time spent waiting on Disk file operations I/O. And, naturally, the p1 and p3 values, in this case, are respectively 8 and 8 as the followings prove:
SQL> @ashtopSID Enter value for sid: 205 Enter value for from: 12-20 09:00:00 Enter value for to: 12-20 09:15:00 Total Seconds AAS %This SQL_ID SQL_OPNAME TOP_LEVEL_SQL EVENT2 P1TEXT P1 P3TEXT P3 --------- ------- ------- ------------- --------------- ------------- -------------------------- --------------- --- ----------- ---- 309 .3 45% | 7wv0zz7965u54 DELETE 8myfvr1fb8gju ON CPU FileOperation 8 filetype 8 306 .3 44% | fnbt9ts7bsqzn UPDATE 8myfvr1fb8gju ON CPU FileOperation 8 filetype 8 24 .0 3% | 1nf0v48kt3a30 INSERT 8myfvr1fb8gju ON CPU FileOperation 8 filetype 8 13 .0 2% | 7wv0zz7965u54 DELETE 8myfvr1fb8gju db file sequential read file# 1 blocks 1 8 .0 1% | fnbt9ts7bsqzn UPDATE 8myfvr1fb8gju db file sequential read file# 1 blocks 1 7 .0 1% | 7wv0zz7965u54 DELETE 8myfvr1fb8gju Disk file operations I/O FileOperation 8 filetype 8 ----------> here 6 .0 1% | fnbt9ts7bsqzn UPDATE 8myfvr1fb8gju Disk file operations I/O FileOperation 8 filetype 8 ----------> here 5 .0 1% | 1nf0v48kt3a30 INSERT 8myfvr1fb8gju Disk file operations I/O FileOperation 8 filetype 8 ----------> here 3 .0 0% | 7wv0zz7965u54 DELETE 8myfvr1fb8gju db file scattered read file# 1 blocks 3 3 .0 0% | 8myfvr1fb8gju ON CPU FileOperation 8 filetype 8 2 .0 0% | 0jjc60pmrntdv SELECT 8myfvr1fb8gju ON CPU FileOperation 8 filetype 8 1 .0 0% | 1nf0v48kt3a30 INSERT 8myfvr1fb8gju ON CPU file# 1 blocks 1 1 .0 0% | 1nf0v48kt3a30 INSERT 8myfvr1fb8gju db file sequential read file# 5 blocks 1 1 .0 0% | 7wv0zz7965u54 DELETE 8myfvr1fb8gju resmgr:cpu quantum location 3 0 1 .0 0% | 8myfvr1fb8gju PL/SQL EXECUTE 8myfvr1fb8gju ON CPU FileOperation 8 filetype 8
Interestingly ASH shows that the CPU consumed by Oracle is also related to the manipulation of files of type 8. I don’t know really how to correctly interpret these p1 and p3 values in the context of CPU
select decode(event,null, 'on cpu', event), p1,p3, count(1) from dba_hist_active_sess_history where sample_time between to_date('20122020 09:00:00', 'ddmmyyyy hh24:mi:ss') and to_date('20122020 09:15:00', 'ddmmyyyy hh24:mi:ss') group by decode(event,null, 'on cpu', event), p1,p3 order by 4 desc; DECODE(EVENT,NULL,'ONCPU',EVENT) P1 P3 COUNT(1) -------------------------------- ---------- ---------- ---------- on cpu 8 8 64 db file sequential read 1 1 10 on cpu 100 0 6 Disk file operations I/O 8 8 4 on cpu 1 1 4 db file scattered read 3 3 3 on cpu 0 0 3 control file sequential read 0 1 3 on cpu 3 1 2 ASM file metadata operation 0 0 2 on cpu 2490368 0 1 db file parallel write 1 2147483647 1 db file async I/O submit 106 0 1 on cpu 3 0 1 on cpu 2 1 1 on cpu 131072 0 1 on cpu 5 1 1 on cpu 65536 0 1 oracle thread bootstrap 5910576 0 1 on cpu 262144 0 1
Summary
Whenever you see a Disk file operations I/O wait event consuming a lot of time and CPU then get the p1,p2,p3 values from ASH or AWR
SQL> @sed "Disk file operations" Show wait event descriptions matching %Disk file operations%.. EVENT_NAME WAIT_CLASS PARAMETER1 PARAMETER2 PARAMETER3 ------------------------------ -------------------- -------------------- -------------------- -------------------- Disk file operations I/O User I/O FileOperation fileno filetype
If p1 and p3 values equal to 8 and 8 respectively then check if you are, inadvertently, auditing too many repetitive actions by means of the following select
SQL> select user_name, audit_option from DBA_STMT_AUDIT_OPTS where user_name='C##TEST'; USER_NAME AUDIT_OPTION ------------------------- ---------------------------------------- C##TEST ALTER SYSTEM C##TEST SYSTEM AUDIT C##TEST CREATE SESSION C##TEST TABLE C##TEST CLUSTER C##TEST TABLESPACE C##TEST USER C##TEST ROLLBACK SEGMENT C##TEST TYPE C##TEST SYNONYM C##TEST PUBLIC SYNONYM C##TEST VIEW C##TEST SEQUENCE C##TEST DATABASE LINK C##TEST PUBLIC DATABASE LINK C##TEST ROLE C##TEST DIMENSION C##TEST PROCEDURE C##TEST TRIGGER C##TEST PROFILE C##TEST DIRECTORY C##TEST MATERIALIZED VIEW C##TEST MINING MODEL C##TEST INDEX C##TEST SELECT TABLE C##TEST INSERT TABLE C##TEST UPDATE TABLE C##TEST DELETE TABLE C##TEST NOT EXISTS C##TEST SYSTEM GRANT C##TEST EXECUTE PROCEDURE C##TEST CONTEXT C##TEST SQL TRANSLATION PROFILE C##TEST LOCKDOWN PROFILE C##TEST ATTRIBUTE DIMENSION C##TEST HIERARCHY C##TEST ANALYTIC VIEW C##TEST READ DIRECTORY C##TEST WRITE DIRECTORY C##TEST EXECUTE DIRECTORY
If so then reduce the number of audited actions if you want to reduce the impact of this wait event.
In passing, I think that this wait event would not have shown up at the TOP if I have set the audit trail parameter value to DB. I left this not-demonstrated though.
Hi Mohamed,
In the case of ambiguous p1 and p3 values we can avoid guesswork by tracing read and write system calls. Their first argument is the file descriptor. Then we can map the file descriptors to file names with the lsof or pfiles utility.
Best regards,
Nenad
Comment by Nenad Noveljic — December 21, 2020 @ 10:22 am |
Hi Nenad,
… or just use pSnapper by Tanel Poder (with parameter syscall,filename) – no need to trace … /proc is enough 🙂
Best Regards
Stefan
Comment by Stefan Koehler — December 21, 2020 @ 11:30 am |
Hi Stefan,
Exactly. Sampling with profiling is another option. On Solaris, it would be DTrace.
Best regards,
Nenad
Comment by Nenad Noveljic — December 21, 2020 @ 11:38 am
Hi Nenad
Thanks a lot for this precision. Always very interesting to have comments from you.
Best regards
Comment by hourim — December 21, 2020 @ 11:05 am |
Hi Mohamed,
You’re welcome. Thank you for sharing. I always enjoy reading your posts.
Comment by Nenad Noveljic — December 21, 2020 @ 12:22 pm |
p1 and p3 values in the context of CPU: meaningless, they are just not cleared up.
Comment by xkglob — December 26, 2020 @ 7:29 am |
Yes, you are right. This is in the same vein as for current_obj# which is not cleared properly
Best regards
Comment by hourim — December 26, 2020 @ 8:16 am |
Hello Houri, thank you for sharing.
About “p1” and “p3″ : they are meaningless when event=”on cpu”.
About summary/conclusion : more sure to re-test with audit disabled and compare, to exclude any other possibilities
Comment by mlaiti — December 26, 2020 @ 5:04 pm |
Hello Majda,
I did the experiment for a user without any audit and I didn’t observe any Disk file operations i/o as shown below:
Bien Cordialement
Comment by hourim — December 27, 2020 @ 10:28 am |