Mohamed Houri’s Oracle Notes

June 7, 2012

Tuning a SQL query using TKPROF

Filed under: Oracle — hourim @ 8:34 am

Recently, I received an e-mail about an Oracle SQL query performing very badly. The corresponding explain plan is presented here below:

-----------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                | Name              | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                         |                   |     2 |   506 |   611   (1)| 00:00:08 |       |      |
|   1 |  SORT GROUP BY                           |                   |     2 |   506 |   611   (1)| 00:00:08 |       |      |
|   2 |   NESTED LOOPS OUTER                     |                   |     2 |   506 |   610   (1)| 00:00:08 |       |      |
|   3 |    MERGE JOIN SEMI                       |                   |     2 |   278 |   574   (1)| 00:00:07 |       |      |
|   4 |     SORT JOIN                            |                   |   766 | 75834 |   568   (1)| 00:00:07 |       |      |
|   5 |      PARTITION RANGE SINGLE              |                   |   766 | 75834 |   567   (1)| 00:00:07 |     7 |     7|
|   6 |       PARTITION LIST SINGLE              |                   |   766 | 75834 |   567   (1)| 00:00:07 |   KEY |   KEY|
|   7 |        INLIST ITERATOR                   |                   |       |       |            |          |       |      |
|*  8 |         TABLE ACCESS BY LOCAL INDEX ROWID| MH_LEHOUR         |   766 | 75834 |   567   (1)| 00:00:07 |    68 |    68|
|*  9 |          INDEX RANGE SCAN                | MHLEHORIN         |   242 |       |   112   (0)| 00:00:02 |    68 |    68|
|* 10 |     FILTER                               |                   |       |       |            |          |       |      |
|* 11 |      SORT JOIN                           |                   |   364 | 14560 |     6  (17)| 00:00:01 |       |      |
|* 12 |       INDEX RANGE SCAN                   | MH_PSSELMAINTRT10 |   364 | 14560 |     5   (0)| 00:00:01 |       |      |
|* 13 |    VIEW PUSHED PREDICATE                 | MH_PS_SEL_YASM_VW |     1 |   114 |    18   (0)| 00:00:01 |       |      |
|* 14 |     FILTER                               |                   |       |       |            |          |       |      |
|  15 |      NESTED LOOPS                        |                   |     1 |   141 |    18   (0)| 00:00:01 |       |      |
|  16 |       TABLE ACCESS BY INDEX ROWID        | MH_YASI_HEADER    |     1 |    44 |    13   (0)| 00:00:01 |       |      |
|* 17 |        INDEX RANGE SCAN                  | MHEYASI_HEADER    |     1 |       |    12   (0)| 00:00:01 |       |      |
|* 18 |       TABLE ACCESS BY INDEX ROWID        | MH_SELM_LN        |     1 |    97 |     5   (0)| 00:00:01 |       |      |
|* 19 |        INDEX RANGE SCAN                  | MH_SELM_LN        |     1 |       |     4   (0)| 00:00:01 |       |      |
-----------------------------------------------------------------------------------------------------------------------------
 

I deliberately omitted to show the predicate part of this explain plan because my first reaction to this e-mail was to ask for  the real explain plan showing the estimations (E-Rows) done by the Oracle Optimizer against the actual rows (A-Rows) generated by the SQL engine. This is my method of localized SQL query tuning: the first thing to start with is to have the explain plan taken from memory. As such, you can have an idea about Estimations done by the CBO versus the Actual rows generated after the query has been executed.  Generally, when the CBO has fresh, adequate and suitable statistics, it can do correct estimations and hence generate (unless a bug in the model) a performant explain plan.

Unfortunately, this query has never completed. They have always been forced to kill it before it finishes. It is then impossible to wait several hours for the query to complete in order to get its real explain plan using dbms_xplan package.

So what else?

I’ve asked then to trace the execution of this query using 10046 events for a couple of minutes. The corresponding “short time” trace file when “tkrpofed” presents the following picture:

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.04       0.04          0          6          0           0
Execute      1      0.00       0.00          0          0          0           0
Fetch        1    631.01     634.21       2204   45185062          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total        3    631.05     634.26       2204   45185068          0           0

Misses in library cache during parse: 1
Optimizer mode: ALL_ROWS
Parsing user id: SYS
Rows     Row Source Operation
-------  ---------------------------------------------------
      0  SORT GROUP BY (cr=0 pr=0 pw=0 time=49 us cost=611 size=506 card=2)
   6981   NESTED LOOPS OUTER (cr=45180782 pr=2204 pw=0 time=562867438 us cost=610 size=506 card=2)
   2194    MERGE JOIN SEMI (cr=7823 pr=0 pw=0 time=608993 us cost=574 size=278 card=2)
   2194     SORT JOIN (cr=7819 pr=0 pw=0 time=580184 us cost=568 size=75834 card=766)
 140975      PARTITION RANGE SINGLE PARTITION: 7 7 (cr=7819 pr=0 pw=0 time=396102 us cost=567 size=75834 card=766)
 140975       PARTITION LIST SINGLE PARTITION: KEY KEY (cr=7819 pr=0 pw=0 time=358613 us cost=567 size=75834 card=766)
 140975        INLIST ITERATOR  (cr=7819 pr=0 pw=0 time=315626 us)
 140975         TABLE ACCESS BY LOCAL INDEX ROWID MH_LEHOURPARTITION: 68 68 (cr=7819 pr=0 pw=0 time=275165 us cost=567 size=75834 card=766)
 287258          INDEX RANGE SCAN MHLEHORIN PARTITION: 68 68 (cr=1841 pr=0 pw=0 time=125845 us cost=112 size=0 card=242)(object id 2203013)
   2194     FILTER  (cr=4 pr=0 pw=0 time=23271 us)
   2194      SORT JOIN (cr=4 pr=0 pw=0 time=16250 us cost=6 size=14560 card=364)
    364       INDEX RANGE SCAN MH_PSSELMAINTRT10 (cr=4 pr=0 pw=0 time=296 us cost=5 size=14560 card=364)(object id 670917)
   6981    VIEW PUSHED PREDICATE  MH_PS_SEL_YASM_VW (cr=45172959 pr=2204 pw=0 time=485213108 us cost=18 size=114 card=1)
 374473     FILTER  (cr=45175528 pr=2204 pw=0 time=1971379980 us)
 374473      NESTED LOOPS  (cr=45175528 pr=2204 pw=0 time=1971240476 us cost=18 size=141 card=1)
 672438       TABLE ACCESS BY INDEX ROWID PS_JRNL_HEADER (cr=763923 pr=0 pw=0 time=9257466 us cost=13 size=44 card=1)
 672438        INDEX RANGE SCAN PSEJRNL_HEADER (cr=229345 pr=0 pw=0 time=6810329 us cost=12 size=0 card=1)(object id 773857)
 374473       TABLE ACCESS BY INDEX ROWID MH_SELM_LN (cr=44411987 pr=2204 pw=0 time=622519272 us cost=5 size=97 card=1)
269483276        INDEX RANGE SCAN MH_SELM_LN (cr=30389896 pr=2204 pw=0 time=562432822 us cost=4 size=0 card=1)(object id 773933)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
 SQL*Net message to client                       1        0.00          0.00
  latch: cache buffers chains                     3        0.00          0.00
  db file sequential read                      2204        0.02          3.21
  SQL*Net break/reset to client                   1        0.00          0.00
  SQL*Net message from client                     1        2.66          2.66
********************************************************************************
 

Hmmm!!! Something interesting to deal with

Do you know what to spot from this? Any idea?

Well, looking at the two last line of the row source explain plan I spotted this

374473       TABLE ACCESS BY INDEX ROWID MH_SELM_LN (cr=44411987 pr=2204 pw=0 time=622519272 us cost=5 size=97 card=1)
269483276        INDEX RANGE SCAN MH_SELM_LN (cr=30389896 pr=2204 pw=0 time=562432822 us cost=4 size=0 card=1)(object id 773933)

What is this INDEX RANGE SCAN MH_SELM_LN feeding it parent operation TABLE ACCESS BY INDEX ROWID MH_SELM_LN with 269 millions of rowids which are almost totally discarded letting only 374,473 records to survive. That is 269,108,803 of rowids thrown away between the index and table access!!! This is an enormous waste of time and energy which, if avoided, will certainly make the query run faster.

How, if confronted to this situation, would you explain this enormous waste of time between the index range scan and its table access?

Would you have pointed out, as I did, the same flaw in this explain plan?

About these ads

4 Comments »

  1. I will go for:
    v$sql_monitor or plan_line_id column in ASH
    Run this SQL for some time then cancel it and pull the information from v$sql_plan_statistics
    Use os_explain script which Tanel Poder wrote for identifying in which exec plan line time is spent from OS side.

    About pointing the flaw here, I will go for reading raw 10046 trace file to account total tim value for db file sequential read (This is just approximation).

    Comment by Yasser — June 8, 2012 @ 12:51 pm | Reply

  2. Yasser,

    That is an excellent suggestion.

    As per regards to Tanel Poder script, I am very often using snapper tool. And using it in such condition (long runing query) will also be a good idea. Insn’t it?

    Best Regards

    Comment by Houri Mohamed — June 8, 2012 @ 12:59 pm | Reply

    • Obviously Tanle Poder’s snapper script is awesome. But i was referring to os_explain script as it will be useful to find the time consuming exec plan line for this never ending sql.

      Just curious…what was the final solution provided for this sql ?

      -Yasser

      Comment by Yasser — June 11, 2012 @ 5:29 pm | Reply

  3. Hi Yasser,

    That was a remote question with the disadvantage of not having all stuff in my hand to follow the issue from start to end. Anyway, I suggested, given the information I have been sent, to hint the query so that a FULL TABLE SCAN is used instead of scanning a millions of index rowids.

    I was immediately told that the query was instantaneously executed when they hinted it as indicated. That’s all what I can say about this issue

    Best regards

    Comment by hourim — June 12, 2012 @ 6:32 am | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme Blog at WordPress.com.

Oracle Database 11g

Oracle Database

Raheel's Blog

Things I have learnt as Oracle DBA

Coskan's Approach to Oracle

What I learned about Oracle

So Many Oracle Manuals, So Little Time

“Books to the ceiling, Books to the sky, My pile of books is a mile high. How I love them! How I need them! I'll have a long beard by the time I read them”—Lobel, Arnold. Whiskers and Rhymes. William Morrow & Co, 1988.

EU Careers info

Your career in the European Union

Oracle SQL Tuning Tools and Tips

SQLTXPLAIN (SQLT), TRCANLZR (TRCA), SQL Health-Check (SQLHC) and SQL Tuning Topics

Oracle Scratchpad

Just another Oracle weblog

Tanel Poder's blog: Responsible data management

Linux, Oracle, Exadata and Hadoop.

OraStory

Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)

Follow

Get every new post delivered to your Inbox.

Join 64 other followers

%d bloggers like this: