Here is a simple but very practical example to show how to identify the root cause of a sudden and intermittent lock that considerably affects the proper functioning of a database:
SQL> @dashtopmho
@dashtop2 instance_number,session_id,sql_id,event2,blocking_inst_id,blocking_session,blocking_session_status 1=1 "timestamp'2020-&from'" "timestamp'2020-&to'"
Enter value for from: 11-19 06:00:00
Enter value for to: 11-19 09:00:00
Total
Seconds %This SESSION_ID SQL_ID EVENT2 BLOCKING_INST_ID BLOCKING_SESSION BLOCKING_SE
--------- ----------------- ------------- ------------------------------ ---------------- ---------------- -----------
860 8% 1368 bnygjh7hdf9v9 enq: TM - contention [mode=3] 1 1363 VALID
840 8% 976 bnygjh7hdf9v9 enq: TM - contention [mode=3] 1 1363 VALID
610 6% 574 ON CPU NOT IN WAIT
570 5% 1363 1suthd9j77v21 enq: TM - contention [mode=5] 1 1396 VALID
430 4% 1363 1suthd9j77v21 enq: TM - contention [mode=5] 1 802 VALID
410 4% 1170 bnygjh7hdf9v9 enq: TM - contention [mode=3] 1 1363 VALID
370 3% 1171 bnygjh7hdf9v9 enq: TM - contention [mode=3] 1 1363 VALID
350 3% 244 bd9anzv38cdsh enq: TM - contention [mode=3] 1 1363 VALID
340 3% 426 fgtt9x4dd6ahn enq: TM - contention [mode=3] 1 1363 VALID
130 1% 1 ON CPU NOT IN WAIT
120 1% 766 ON CPU NOT IN WAIT
110 1% 1205 8msv09nr0rcx0 ON CPU NOT IN WAIT
100 1% 195 2b064ybzkwf1y ON CPU NOT IN WAIT
100 1% 1170 8ju64nrz3r1b9 ON CPU NOT IN WAIT
100 1% 1205 8ayncy95bk88w ON CPU NOT IN WAIT
100 1% 1205 8ayncy95bk88w direct path read UNKNOWN
100 1% 1205 ay8kmp9188rvw ON CPU NOT IN WAIT
90 1% 786 8msv09nr0rcx0 ON CPU NOT IN WAIT
80 1% 226 bd9anzv38cdsh enq: TM - contention [mode=3] 1 1363 VALID
80 1% 408 8msv09nr0rcx0 ON CPU NOT IN WAIT
SQL> @dashtopmho
Enter value for from: 11-18 19:10:00
Enter value for to: 11-18 19:30:00
Total
Seconds %This SESSION_ID SQL_ID EVENT2 BLOCKING_INST_ID BLOCKING_SESSION BLOCKING_SE
--------- ------- ---------- ------------- ------------------------------ ---------------- ---------------- -----------
80 6% 786 8ayncy95bk88w direct path read UNKNOWN
80 6% 1368 bnygjh7hdf9v9 enq: TM - contention [mode=3] 1 1374 VALID
60 4% 1004 ay8kmp9188rvw direct path read UNKNOWN
60 4% 1374 19fan32ayuy6w enq: TM - contention [mode=5] 1 792 VALID
50 3% 574 ON CPU NOT IN WAIT
40 3% 213 2qpjahhyzy7mm direct path read UNKNOWN
40 3% 792 254vjyr1z0svd direct path read UNKNOWN
40 3% 1004 8ayncy95bk88w direct path read UNKNOWN
30 2% 1 ON CPU NOT IN WAIT
30 2% 408 ay8kmp9188rvw direct path read UNKNOWN
30 2% 599 ay8kmp9188rvw direct path read UNKNOWN
20 1% 213 2qpjahhyzy7mm ON CPU NOT IN WAIT
20 1% 213 ab5v870f5xa82 ON CPU NOT IN WAIT
20 1% 408 8msv09nr0rcx0 direct path read UNKNOWN
20 1% 599 ay8kmp9188rvw ON CPU NOT IN WAIT
20 1% 617 8msv09nr0rcx0 direct path read UNKNOWN
20 1% 792 6p188706cjbdc direct path read UNKNOWN
20 1% 1004 8msv09nr0rcx0 ON CPU NOT IN WAIT
20 1% 1170 fvvtz709a9pb4 direct path read UNKNOWN
20 1% 1374 19fan32ayuy6w enq: TM - contention [mode=5] 1 1170 VALID
Above, I have presented the output of the dashtop Tanel Poder script applied during two periods where the application was suffering a clear slowdown.
As you can see, dashtop, not only shows the lock type the above sessions were waiting on, enq: TM – contention, but it shows as well the mode on which those enqueues are requested : [mode=3] and [mode=5]. If this script had not already done this work for us, we would have been obliged to interpret the name and the mode from the value of the parameter p1(name|mode) linked with this enqueue wait event
SQL> @sed "enq: TM - contention"
Show wait event descriptions matching %enq: TM - contention%..
EVENT# EVENT_NAME WAIT_CLASS PARAMETER1 PARAMETER2 PARAMETER3 ENQUEUE_NAME
------ --------------------- ------------ ----------- ----------- ---------------- -------------
301 enq: TM - contention Application name|mode object # table/partition DML
So, let me start by pointing out that this is already quite a great simplification implemented into the ashtop/dashtop script to make the diagnosis easier and faster.
Well, now that the diagnosis is well established, we will have to move on to the interpretation stage before concluding with the ideal remedy for this situation. We see two pieces of information here:
- The top sessions wait events shown by dashtop concern a table lock contention : enq: TM – contention
- Few of these table locks are held on mode 3 while others are held at mode 5
Jonathan Lewis did a very nice summary about the definition of the different Oracle lock modes:
- mode 3 : Sub-exclusive(SX)/Row-exclusive(RX)
- Update (also “select for update” from 9.2.0.5/6)
- Lock table in row exclusive mode
- Since 11.1 used at opposite end of RI during DML
- mode 5 : share sub exclusive (SSX)/share row exclusive (SRX)
- Lock table in share row exclusive mode
- Less common symptom of “foreign key locking” but likely to be more frequent if the FK constraint is defined with “on delete cascade.”
If I want to simplify this situation, I can say that when you see a couple of concurrent sessions waiting on an enq: TM contention held on mode 3 and/or requested on mode 5, then this can almost always be traced back to a non-indexed foreign key in an existing parent-child relationship.
Since this lock situation appeared immediately after a maintenance operation on the database, I immediately understood that I had to check the status of all the application’s indexes as a priority. Indeed, if this lock occurred just after an intervention on the database it probably means that some indexes are in an UNUSABLE status.
SQL> select table_name, index_name,status
from all_indexes
where owner='CTEST'
and status ='UNUSABLE';
TABLE_NAME INDEX_NAME STATUS
--------------- ------------------ ---------
ABC_M ABC_M_I_FK_ME_IDX1 UNUSABLE
ABC_M ABC_M_I_FK_ME_IDX2 UNUSABLE
ABC_M ABC_M_I_FK_ME_IDX3 UNUSABLE
.../...
DEF_M DEF_M_I_FK_DEF_IDX1 UNUSABLE
DEF_M DEF_M_I_FK_DEF_IDX2 UNUSABLE
DEF_M DEF_M_I_FK_DEF_IDX3 UNUSABLE
../..
GHI_M GHI_M_I_FK_SIT_IDX1 UNUSABLE
GHI_M GHI_M_I_FK_SIT_IDX2 UNUSABLE
GHI_M GHI_M_I_FK_SIT_IDX3 UNUSABLE
GHI_M GHI_M_I_FK_SIT_IDX4 UNUSABLE
Effectively, not only there are a bunch of UNUSABLE indexes, but all these indexes have the suffix FK in their name. This clearly indicates that they were created to cover the lock and deadlock threat an unindexed foreign key may introduce. All I had to do then was to rebuild these indexes:
SQL> select 'alter index '||owner ||'.'|| index_name ||' rebuild;'
from all_indexes
where owner='CTEST'
and status ='UNUSABLE';
alter index CTEST.ABC_M_I_FK_ME_IDX1 rebuild;
alter index CTEST.ABC_M_I_FK_ME_IDX2 rebuild;
alter index CTEST.ABC_M_I_FK_ME_IDX3 rebuild;
../..
alter index CTEST.DEF_M_I_FK_DEF_IDX1 rebuild;
alter index CTEST.DEF_M_I_FK_DEF_IDX2 rebuild;
alter index CTEST.DEF_M_I_FK_DEF_IDX3 rebuild;
../..
alter index CTEST.GHI_M_I_FK_SIT_IDX1 rebuild;
alter index CTEST.GHI_M_I_FK_SIT_IDX2 rebuild;
alter index CTEST.GHI_M_I_FK_SIT_IDX3 rebuild;
Bottom line
Sometimes it is enough to follow a simple logic of investigation which consists in saying that if an event happened following a particular action then this action must have very certainly paved the way for this event by removing what prevented it from happening. Applied to the current situation it becomes: that a maintenance action has invalidated several indexes. These indexes, created to prevent a DML lock on tables linked by a parent-child relationship, have stopped working causing an application slowdown. The use of a standard naming convention for the indexes made the diagnosis much easier. In itself, this is another proof of the usefulness of a good naming convention