A couple of days ago I have been asked to look at performance issue which occurred in a very simple delete statement. Below is this delete with its good and bad execution plan respectively:
SQL_ID gq3khfnctny4h -------------------- DELETE FROM table_t1 WHERE evt_id = :1 AND eng_id = :2 Plan hash value: 1362912592 ----------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| ----------------------------------------------------------------- | 0 | DELETE STATEMENT | | | 3 (100)| | 1 | DELETE | TABLE_T1 | | | | 2 | INDEX UNIQUE SCAN| PK_TABLE_T117 | 1 | 3 (0)| -- cost = 3 ----------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("EVT_ID"=:1 AND "ENG_ID"=:2)
SQL_ID gq3khfnctny4h -------------------- DELETE FROM table_t1 WHERE evt_id = :1 AND eng_id = :2 Plan hash value: 3057265474 ------------------------------------------------------------------- | Id | Operation | Name | Rows |Cost (%CPU)| ------------------------------------------------------------------- | 0 | DELETE STATEMENT | | | 2 (100)| | 1 | DELETE | TABLE_T1 | | | | 2 | INDEX FAST FULL SCAN| PK_TABLE_T117 | 1 | 2 (0)| -- cost = 2 ------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("EVT_ID"=:1 AND "ENG_ID"=:2)
The statistics of the different executions of this delete over the last couple of weeks are:
SQL> @HistStats Enter value for sql_id: gq3khfnctny4h Enter value for from_date: 01092018 SNAP_BEGIN PLAN_HASH_VALUE EXECS END_OF_FETCH AVG_ETIME AVG_LIO AVG_ROWS ------------------------- --------------- -------- ------------ ---------- -------- ---------- 04-SEP-18 01.00.23.666 PM 1362912592 5109 5109 0 19.3 1 -- good execution plan 04-SEP-18 02.00.32.471 PM 1362912592 574 574 0 20 1 04-SEP-18 03.00.48.319 PM 1362912592 1472 1472 0 20 1 05-SEP-18 08.00.10.847 AM 1362912592 3378 3378 0 19.4 1 05-SEP-18 05.00.04.444 PM 1362912592 17759 17759 0 19.7 1 05-SEP-18 08.00.43.235 PM 1362912592 201040 201040 0 18.6 1 05-SEP-18 09.00.30.871 PM 1362912592 59826 59826 0 19.7 1 05-SEP-18 10.01.28.933 PM 1362912592 140850 140850 0 17.7 1 06-SEP-18 12.01.11.664 AM 1362912592 17951 17951 0 22.9 1 06-SEP-18 09.00.31.336 AM 3057265474 22899 22899 19 2136.5 1 -- bad plan switch 06-SEP-18 10.00.44.721 AM 3057265474 45736 45736 14 1315 1 06-SEP-18 11.00.48.929 AM 3057265474 56729 56729 11 37105.8 1 06-SEP-18 05.00.43.993 PM 3057265474 48568 48568 12 11275 1
As you can see, when the delete statement switched from an execution plan using a INDEX UNIQUE SCAN (1362912592) to an execution plan using an INDEX FAST FULL SCAN (3057265474) of the same primary key index, the average elapsed time of the delete went from 0 to 19 seconds.
The AVG_ROWS column above clearly shows that we are always deleting a single row. This is because the two columns of the predicate part represent the two primary key columns (evt_id, eng_id). So why on earth would Oracle FAST FULL scan an entire index to end up by locating a single row and delete it?
If you look back to the above execution plan you will realize that Oracle has completely messed up its estimations as it has considered that FAST FULL scanning the PK_TABLE_T117 primary key index will cost only 2 while the cost of unique scanning the same index would be 3. This is why it has switched for the bad execution plan causing a serious performance issue.
The statistics of this unique index are:
SQL> select index_name , blevel , leaf_blocks , distinct_keys , num_rows , clustering_factor , sample_size from user_indexes where index_name = 'PK_TABLE_T117'; INDEX_NAME BLEVEL LEAF_BLOCKS DISTINCT_KEYS NUM_ROWS CLUSTERING_FACTOR SAMPLE_SIZE -------------- ------ ----------- ------------- ---------- ----------------- ----------- PK_TABLE_T117 3 28860 524234 524234 348959 524234
Since I have at my disposable a PRE-PRODUCTION copy of this database I decided to get the same index statistics from this database:
SQL> select index_name , blevel , leaf_blocks , distinct_keys , num_rows , clustering_factor , sample_size from user_indexes where index_name = 'PK_TABLE_T117'; INDEX_NAME BLEVEL LEAF_BLOCKS DISTINCT_KEYS NUM_ROWS CLUSTERING_FACTOR SAMPLE_SIZE -------------- ------- ----------- ------------- ---------- ----------------- ----------- PK_TABLE_T117 2 21470 617240 617240 190479 617240
How could a BLEVEL of an index be equal to 3 in PRODUCTION and the BLEVEL of the same index with much more rows be equal to 2 in another database?
That’s probably an indication of an exploded index due to the delete and its pattern. Let’s check this using a little bit enhanced Jonathan Lewis script
SQL> @IndexEfficiency2 Enter value for m_schemaname: xxxx ../.. PK_TABLE_T117 Current Index Size GB: 1.44 Expected Index size GB: .07
This index certainly needs a rebuild
SQL> alter index PK_TABLE_T117 rebuild online; SQL> select index_name , blevel , leaf_blocks , distinct_keys , num_rows , clustering_factor , sample_size from user_indexes where index_name = 'PK_TABLE_T117'; INDEX_NAME BLEVEL LEAF_BLOCKS DISTINCT_KEYS NUM_ROWS CLUSTERING_FACTOR SAMPLE_SIZE -------------- ------- ----------- ------------- ---------- ----------------- ----------- PK_TABLE_T117 2 1461 524202 524202 348956 524202 SQL> @sizeBySegName Enter value for segment_name: PK_TABLE_T117 Enter value for owner: xxxx SEGMENT_TYPE TABLESPACE_NAME SEGMENT_NAME GB ------------------ ---------------- --------------- ----- INDEX IDX PK_TABLE_T117 .0117 ------ Total Segment Size .0117
And here’s below the new execution plan:
SQL> explain plan for DELETE FROM table_t1 WHERE evt_id = :1 AND eng_id = :2; SQL> select * from table(dbms_xplan.display); -------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | -------------------------------------------------------------------- | 0 | DELETE STATEMENT | | 1 | 31 | 2 | | 1 | DELETE | TABLE_T1| | | | | 2 | INDEX UNIQUE SCAN| PK_TABLE_T117 | 1 | 31 | 2 | -- cost = 2 -------------------------------------------------------------------- SQL> explain plan for delete /*+ index_ffs(TABLE_T1, PK_TABLE_T117) */ table_t1 WHERE evt_id = :1 AND eng_id = :2; SQL> select * from table(dbms_xplan.display); ---------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost | ----------------------------------------------------------------------- | 0 | DELETE STATEMENT | | 1 | 31 | 402 | | 1 | DELETE | TABLE_T1 | | | | | 2 | INDEX FAST FULL SCAN| PK_TABLE_T117 | 1 | 31 | 402 | -- cost = 402 -----------------------------------------------------------------------
CONCLUSION
I have never considered rebuilding an index to be a reliable tuning strategy. This is why, I have already ousted this operation from my list of tuning steps. But, as Tom Kyte says “always is never true, never is always false”, it is then interesting to see practical real life cases where rebuilding index reveals to be a fix for a wrong execution plan choice.
Mohammed,
Nice example. But there’s something a little odd about the costing.
Why would a fast full scan of an index that reported 28,800 leaf blocks produce a cost of 2 ? That would suggest you’ve got a throughput figure (MBPS) that’s so large that Oracle thinks it can scan 14,400 blocks just as fast as it can read one block !
The figure of 1.44GB as current size also looks a little odd as I think my code takes the leaf-block count and multiplies up from there to get an approximate current size – and that would suggest your block size is in excess of 32KB.
Which version of Oracle was this ? 11g used to ignore the fast full scan for deletes, but 12.2 has a patch to consider them – I’m wondering if the patch has managed to introduce a silly arithmetic error at the same time. (See https://jonathanlewis.wordpress.com/2014/06/19/delete-costs/ )
Comment by Jonathan Lewis — September 11, 2018 @ 8:02 pm |
Jonathan,
Thanks a lot for your comment. I always forget to mention the version of the database. This is a 12.2.0.1.0 with a 8K block size. However, your remark about the index size made me terribly confused so that I went back to this particular database where the issue happened and re-executed my script size and got this:
But this was after the index rebuild.
Hopefully I have got another copy of the same database where the size of the index is 1.44GB, the number of leaf block is 17108, the number of rows is 177244 and the sample size is 177244.
But this database copy is not showing the same IFFS oddity because it is running 11.2.0.4
I forget also to mention that the issue of the blog post occurred in a virtual Delphix database (may be it has nothing to do with the oddity)
I will still continue investigating this cost oddity as it seems for me very odd.
Instead of rebuilding the index I adopted a second strategy which is to fix the good plan using a SPM baseline. This has got the double effect of cancelling ACS(this cursor was bind sensitive and bind aware). Indeed starting from 12cR2, fixing a single SPM plan on a bind sensitive cursor will automatically cancel ACS for this cursor
Best regards
Mohamed
Comment by hourim — September 12, 2018 @ 12:10 pm |
Hi Mohamed,
Again another good example. Can you share the index script which gives the approximate size? Is that you are just multiplying the leaf block count with the DB block size and getting the data in GB?
Regards,
Rakesh RA
Comment by Rakesh RA — September 13, 2018 @ 5:01 am |
Hi Rakesh
I am getting the SUM(bytes) of the index from dba_segments and multiplying this by (m_leaf_estimate/leaf_blocks). You can find how to get the values of these two parameters in Jonathan’s script mentioned in the article
Best regards
Comment by hourim — September 14, 2018 @ 7:43 am |
Thanks Mohammed.
Comment by Rakesh RA — September 15, 2018 @ 4:02 pm