Recently, I received an e-mail about an Oracle SQL query performing very badly. The corresponding explain plan is presented here below:
----------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ---------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 506 | 611 (1)| 00:00:08 | | | | 1 | SORT GROUP BY | | 2 | 506 | 611 (1)| 00:00:08 | | | | 2 | NESTED LOOPS OUTER | | 2 | 506 | 610 (1)| 00:00:08 | | | | 3 | MERGE JOIN SEMI | | 2 | 278 | 574 (1)| 00:00:07 | | | | 4 | SORT JOIN | | 766 | 75834 | 568 (1)| 00:00:07 | | | | 5 | PARTITION RANGE SINGLE | | 766 | 75834 | 567 (1)| 00:00:07 | 7 | 7| | 6 | PARTITION LIST SINGLE | | 766 | 75834 | 567 (1)| 00:00:07 | KEY | KEY| | 7 | INLIST ITERATOR | | | | | | | | |* 8 | TABLE ACCESS BY LOCAL INDEX ROWID| MH_LEHOUR | 766 | 75834 | 567 (1)| 00:00:07 | 68 | 68| |* 9 | INDEX RANGE SCAN | MHLEHORIN | 242 | | 112 (0)| 00:00:02 | 68 | 68| |* 10 | FILTER | | | | | | | | |* 11 | SORT JOIN | | 364 | 14560 | 6 (17)| 00:00:01 | | | |* 12 | INDEX RANGE SCAN | MH_PSSELMAINTRT10 | 364 | 14560 | 5 (0)| 00:00:01 | | | |* 13 | VIEW PUSHED PREDICATE | MH_PS_SEL_YASM_VW | 1 | 114 | 18 (0)| 00:00:01 | | | |* 14 | FILTER | | | | | | | | | 15 | NESTED LOOPS | | 1 | 141 | 18 (0)| 00:00:01 | | | | 16 | TABLE ACCESS BY INDEX ROWID | MH_YASI_HEADER | 1 | 44 | 13 (0)| 00:00:01 | | | |* 17 | INDEX RANGE SCAN | MHEYASI_HEADER | 1 | | 12 (0)| 00:00:01 | | | |* 18 | TABLE ACCESS BY INDEX ROWID | MH_SELM_LN | 1 | 97 | 5 (0)| 00:00:01 | | | |* 19 | INDEX RANGE SCAN | MH_SELM_LN | 1 | | 4 (0)| 00:00:01 | | | -----------------------------------------------------------------------------------------------------------------------------
I deliberately omitted to show the predicate part of this explain plan because my first reaction to this e-mail was to ask for the real explain plan showing the estimations (E-Rows) done by the Oracle Optimizer against the actual rows (A-Rows) generated by the SQL engine. This is my method of localized SQL query tuning: the first thing to start with is to have the explain plan taken from memory. As such, you can have an idea about Estimations done by the CBO versus the Actual rows generated after the query has been executed. Generally, when the CBO has fresh, adequate and suitable statistics, it can do correct estimations and hence generate (unless a bug in the model) a performant explain plan.
Unfortunately, this query has never completed. They have always been forced to kill it before it finishes. It is then impossible to wait several hours for the query to complete in order to get its real explain plan using dbms_xplan package.
So what else?
I’ve asked then to trace the execution of this query using 10046 events for a couple of minutes. The corresponding “short time” trace file when “tkrpofed” presents the following picture:
call count cpu elapsed disk query current rows ------- ------ -------- ---------- ---------- ---------- ---------- ---------- Parse 1 0.04 0.04 0 6 0 0 Execute 1 0.00 0.00 0 0 0 0 Fetch 1 631.01 634.21 2204 45185062 0 0 ------- ------ -------- ---------- ---------- ---------- ---------- ---------- total 3 631.05 634.26 2204 45185068 0 0 Misses in library cache during parse: 1 Optimizer mode: ALL_ROWS Parsing user id: SYS Rows Row Source Operation ------- --------------------------------------------------- 0 SORT GROUP BY (cr=0 pr=0 pw=0 time=49 us cost=611 size=506 card=2) 6981 NESTED LOOPS OUTER (cr=45180782 pr=2204 pw=0 time=562867438 us cost=610 size=506 card=2) 2194 MERGE JOIN SEMI (cr=7823 pr=0 pw=0 time=608993 us cost=574 size=278 card=2) 2194 SORT JOIN (cr=7819 pr=0 pw=0 time=580184 us cost=568 size=75834 card=766) 140975 PARTITION RANGE SINGLE PARTITION: 7 7 (cr=7819 pr=0 pw=0 time=396102 us cost=567 size=75834 card=766) 140975 PARTITION LIST SINGLE PARTITION: KEY KEY (cr=7819 pr=0 pw=0 time=358613 us cost=567 size=75834 card=766) 140975 INLIST ITERATOR (cr=7819 pr=0 pw=0 time=315626 us) 140975 TABLE ACCESS BY LOCAL INDEX ROWID MH_LEHOURPARTITION: 68 68 (cr=7819 pr=0 pw=0 time=275165 us cost=567 size=75834 card=766) 287258 INDEX RANGE SCAN MHLEHORIN PARTITION: 68 68 (cr=1841 pr=0 pw=0 time=125845 us cost=112 size=0 card=242)(object id 2203013) 2194 FILTER (cr=4 pr=0 pw=0 time=23271 us) 2194 SORT JOIN (cr=4 pr=0 pw=0 time=16250 us cost=6 size=14560 card=364) 364 INDEX RANGE SCAN MH_PSSELMAINTRT10 (cr=4 pr=0 pw=0 time=296 us cost=5 size=14560 card=364)(object id 670917) 6981 VIEW PUSHED PREDICATE MH_PS_SEL_YASM_VW (cr=45172959 pr=2204 pw=0 time=485213108 us cost=18 size=114 card=1) 374473 FILTER (cr=45175528 pr=2204 pw=0 time=1971379980 us) 374473 NESTED LOOPS (cr=45175528 pr=2204 pw=0 time=1971240476 us cost=18 size=141 card=1) 672438 TABLE ACCESS BY INDEX ROWID PS_JRNL_HEADER (cr=763923 pr=0 pw=0 time=9257466 us cost=13 size=44 card=1) 672438 INDEX RANGE SCAN PSEJRNL_HEADER (cr=229345 pr=0 pw=0 time=6810329 us cost=12 size=0 card=1)(object id 773857) 374473 TABLE ACCESS BY INDEX ROWID MH_SELM_LN (cr=44411987 pr=2204 pw=0 time=622519272 us cost=5 size=97 card=1) 269483276 INDEX RANGE SCAN MH_SELM_LN (cr=30389896 pr=2204 pw=0 time=562432822 us cost=4 size=0 card=1)(object id 773933) Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 1 0.00 0.00 latch: cache buffers chains 3 0.00 0.00 db file sequential read 2204 0.02 3.21 SQL*Net break/reset to client 1 0.00 0.00 SQL*Net message from client 1 2.66 2.66 ********************************************************************************
Hmmm!!! Something interesting to deal with
Do you know what to spot from this? Any idea?
Well, looking at the two last line of the row source explain plan I spotted this
374473 TABLE ACCESS BY INDEX ROWID MH_SELM_LN (cr=44411987 pr=2204 pw=0 time=622519272 us cost=5 size=97 card=1) 269483276 INDEX RANGE SCAN MH_SELM_LN (cr=30389896 pr=2204 pw=0 time=562432822 us cost=4 size=0 card=1)(object id 773933)
What is this INDEX RANGE SCAN MH_SELM_LN feeding it parent operation TABLE ACCESS BY INDEX ROWID MH_SELM_LN with 269 millions of rowids which are almost totally discarded letting only 374,473 records to survive. That is 269,108,803 of rowids thrown away between the index and table access!!! This is an enormous waste of time and energy which, if avoided, will certainly make the query run faster.
How, if confronted to this situation, would you explain this enormous waste of time between the index range scan and its table access?
Would you have pointed out, as I did, the same flaw in this explain plan?
I will go for:
v$sql_monitor or plan_line_id column in ASH
Run this SQL for some time then cancel it and pull the information from v$sql_plan_statistics
Use os_explain script which Tanel Poder wrote for identifying in which exec plan line time is spent from OS side.
About pointing the flaw here, I will go for reading raw 10046 trace file to account total tim value for db file sequential read (This is just approximation).
Comment by Yasser — June 8, 2012 @ 12:51 pm |
Yasser,
That is an excellent suggestion.
As per regards to Tanel Poder script, I am very often using snapper tool. And using it in such condition (long runing query) will also be a good idea. Insn’t it?
Best Regards
Comment by Houri Mohamed — June 8, 2012 @ 12:59 pm |
Obviously Tanle Poder’s snapper script is awesome. But i was referring to os_explain script as it will be useful to find the time consuming exec plan line for this never ending sql.
Just curious…what was the final solution provided for this sql ?
-Yasser
Comment by Yasser — June 11, 2012 @ 5:29 pm |
Hi Yasser,
That was a remote question with the disadvantage of not having all stuff in my hand to follow the issue from start to end. Anyway, I suggested, given the information I have been sent, to hint the query so that a FULL TABLE SCAN is used instead of scanning a millions of index rowids.
I was immediately told that the query was instantaneously executed when they hinted it as indicated. That’s all what I can say about this issue
Best regards
Comment by hourim — June 12, 2012 @ 6:32 am |