As from Oralce 12cR1, the Optimizer can automatically collect statistics on empty tables provided they are, notably, direct path loaded. This is recognisable in execution plans through the new row source operation named OPTIMIZER STATISTICS GATHERING
----------------------------------------------------------- | Id | Operation | Name | Rows | ----------------------------------------------------------- | 0 | INSERT STATEMENT | | 1000K| | 1 | LOAD AS SELECT | T1_TAR | | | 2 | OPTIMIZER STATISTICS GATHERING | | 1000K| | 3 | TABLE ACCESS FULL | T1 | 1000K| -----------------------------------------------------------
This is, in principle, a good initiative. But look at what I have been asked to trouble shoot a couple of days ago:
Global Information ------------------------------ Status : EXECUTING Instance ID : 1 Session : xxxx (15:32901) SQL ID : am6923m45s203 SQL Execution ID : 16777216 Execution Started : 12/02/2017 08:01:17 First Refresh Time : 12/02/2017 08:01:28 Last Refresh Time : 12/05/2017 11:55:51 Duration : 272276s Module/Action : SQL*Plus/- Service : xxxxx Program : sqlplus@xxxx =================================================================================================== | Id | Operation | Name | Activity | Activity Detail | | | | | (%) | (# samples) | =================================================================================================== | -> 0 | INSERT STATEMENT | | 0.00 | Data file init write | | -> 1 | LOAD AS SELECT | T1_TAR | 0.06 | Cpu(28) | | | | | direct path write(2) | | -> 2 | OPTIMIZER STATISTICS GATHERING | | 99.93 | Cpu(52956) | | -> 3 | TABLE ACCESS FULL | T1 | | SQL*Net more data from dblink(1)| ==================================================================================================
You have to believe me when I say that this insert had been running for 3 days when the above SQL report was taken. As you can see 99% of the direct path load execution time was spent gathering statistics at operation in line n°2. This is how a new feature designed to help starts disturbing you.
The insert/select statement selects a CLOB from a remote database. I was curious to see whether getting rid of this CLOB from the insert operation would make things better:
SQL>set timing on INSERT /*+ append */ INTO T1_TAR (col1 -- ,clob_col2 ,col3 ,col4 ,col5 ,col6 ,col7 ,col8 ) SELECT t1.col1 --,t1.clob_col2 ,t1.col3 ,t1.col4 ,t1.col5 ,t1.col6 ,t1.col7 ,t1.col8 FROM t1@dblink t1; 338481182 rows created. Elapsed: 00:11:38.85 ------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes |IN-OUT| ----------------------------------------------------------- | 0 | INSERT STATEMENT | | 318M| 28GB| | | 1 | LOAD AS SELECT | T1_TAR | | | | | 2 | REMOTE | T1 | 318M| 28GB|R->S | ------------------------------------------------------------
That’s really interesting.
Why would the presence or the absence of a CLOB column in the insert/select statement allow or prevent the CBO from gathering statistics online?
In fact the CLOB column has nothing to do with the online statistics gathering decision. Should I have commented any other column I would have had the same behaviour as the following prooves:
INSERT /*+ append */ INTO T1_TAR (col1 ,clob_col2 ,col3 ,col4 ,col5 ,col6 --,col7 ,col8 ) SELECT t1.col1 ,t1.clob_col2 ,t1.col3 ,t1.col4 ,t1.col5 ,t1.col6 --,t1.col7 ,t1.col8 FROM t1@dblink t1; ------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes |IN-OUT| ----------------------------------------------------------- | 0 | INSERT STATEMENT | | 318M| 28GB| | | 1 | LOAD AS SELECT | T1_TAR | | | | | 2 | REMOTE | T1 | 318M| 28GB|R->S | ------------------------------------------------------------
The OPTIMIZER STATISTICS GATHERING operation requires the presence of the totality of the table columns in order to kick in during direct path load operations. This is probably because Oracle doesn’t want to collect statistics for a bunch of columns and keep others without statistics. This being said, this is another non-really documented restriction that comes with this new 12c feature.
Since I was not going to challenge my client getting rid of one column from the insert/select I finally opted for locally cancelling the online statistics gathering using the corresponding hint as shown below:
INSERT /*+ append no_gather_optimizer_statistics */ INTO T1_TAR (col1 ,clob_col2 ,col3 ,col4 ,col5 ,col6 ,col7 ,col8 ) SELECT t1.col1 ,t1.clob_col2 ,t1.col3 ,t1.col4 ,t1.col5 ,t1.col6 ,t1.col7 ,t1.col8 FROM t1@dblink t1; Global Information ------------------------------ Status : DONE Instance ID : 1 Session : xxxx (132:63271) SQL ID : Od221zf0srs6q SQL Execution ID : 16777216 Execution Started : 12/02/2017 15:10:16 First Refresh Time : 12/02/2017 15:10:22 Last Refresh Time : 12/05/2017 15:32:56 Duration : 1360s Module/Action : SQL*Plus/- Service : xxxxx Program : sqlplus@xxxx SQL Plan Monitoring Details (Plan Hash Value=2098243032) ================================================================ | Id | Operation | Name | Time | Rows | | | | | Active(s) | (Actual) | ================================================================ | 0 | INSERT STATEMENT | | 1355 | 2 | | 1 | LOAD AS SELECT | T1_TAR | 1355 | 2 | | 2 | REMOTE | T1 | 1355 | 338M | ================================================================
That’s how we fixed this issue.
Watch out your direct path load over empty tables selecting from either local or distant databases; the new 12c online table and columns statics gathering might considerably delay your treatement.
Leave a comment