Recently an interesting question came up on an Oracle French Forum about a performance problem during an insert/select operation via a dblink. The poster was suffering from the very long time (several hours without any response) the insert was taking. This is why he came to that forum and posted a question which looks like “what do you think if I commit every 1000 rows? Will my insert speed up”? That’s not the right way to follow in the process of troubleshooting an Oracle performance problem. I always remember Cary Millsap mantra “Why guess when you can know”. Fortunately Oracle gives us a lot fo tools and trace events to instrument what going on inside the Oracle database. Here below are the steps of discussion we’ve had (the original poster, others and I) in that forum.
I first have asked the poster to trace his insert select using events 10046
alter session set events '10046 trace name context forever, level 12'; INSERT /*+ append */ INTO my_table ( champs_1, champs_2, champs_3, champs_4, champs_5, champs_6, champs_7 ) SELECT * FROM la_table_cible@mon_dblink ; commit ; alter session set events '10046 trace name context off';
I am not going to stress that I have asked the poster to ensure that his direct path load will not be silently ignored by Oracle in case the table “my_table” has trigger or foreign key integrity constraints implemented on it. I have also asked if there will be subsequent deletes (after the insert) on this table making future direct path load not reusing any freed space made by that delete.
The generated trace file profiled via orasrp profiler shows that 98% of the insert/select response time is spent waiting in the event SQL*Net message from dblink.
Up to this step, we persuaded the poster that his initial guess of committing every 1000 records was not the ultimate solution; but instead he should focus on reducing the volume of data transferred from the distant data base to the local one. The poster sends us the explain plan and Statistics of the select part when this one is executed in the distant data base
Statistics ---------------------------------------------- 212 recursive calls 3 db block gets 678993 consistent gets 483744 physical reads 592 redo size 124692665 bytes sent via SQL*Net TO client 1391305 bytes received via SQL*Net FROM client 126437 SQL*Net roundtrips TO/FROM client 17 sorts (memory) 0 sorts (disk) 1896535 rows processed
He is selecting 1,896,535 records via 126,437 SQL*Net roundtrips meaning that he is using the default arraysize which is 15 (1,896,535/126,437). For this, he did 0.358 (678,993/1,896,535) logical read for each selected row which seems to be quite acceptable. But, he is sending a very large amount of data 124MB over a data base link and making a hundred of thousands SQL*Net roundtrips. Note, that he is doing also a fairly large amount of physical read (483,744) when compared to the total number of logical reads (678,993).
I’ve then asked the poster to stop selecting all the rows from the distant data base and to start selecting only necessary rows. I asked also to post the explain plan of this query in attempt to help him tuning the select part of his problematic insert. When he said that he can’t intervene into the distant data base, I have then asked him to manage his insert with a PL/SQL stored procedure as shown below:
CREATE type distant_typ AS object (champs_1 number,champs_2 number,champs_3 number, ... champs_7 date); 2 / CREATE type distant_typ_tab AS TABLE of distant_typ; 2 / DECLARE CURSOR get_data_from_db IS SELECT champs_1, champs_2, champs_3, champs_4, champs_5, champs_6, champs_7 FROM la_table_cible@mon_dblink ; BEGIN OPEN get_data_from_db; LOOP FETCH get_data_from_db BULK collect INTO distant_typ_tab LIMIT 100; FOR i in 1..distant_typ_tab.COUNT LOOP INSERT INTO INSERT /*+ append */ INTO ma_table (champs_1, champs_2, champs_3, champs_4, champs_5, champs_6, champs_7 ) SELECT champs_1, champs_2, champs_3, champs_4, champs_5, champs_6, champs_7 FROM table (distant_typ_tab); EXIXT when get_data_from_db%notfound; END LOOP; END LOOP; COMMIT; END;
And magically the poster comes out and said “fantastic my insert/select now is completing in a very acceptable time”.
Hi, can you explain more about why the PL/SQL store procedure provide a better performance
Comment by Jason Zhong — December 19, 2012 @ 5:58 am |
Jason,
There is a crucial issue to tackle when dealing with insert/select via db-link. The select part might involve both the local and the distant databases. By default, Oracle will consider the local database as the driving site and will (1) go the distant database (2) pull as many data as allowed to do by the SQL*Net and the arraysize and (3) join/eliminate data in the local database. When the volume of data to be exchanged between the distant and the local database is very huge it alters considerably the response time of the entire select process. In this case you can simply decide to make the distant database as the default driving site by using the strategic hint /*+ driving_site (..) */.
However, using this hint in an insert/select via db-link is of no effect. The local database will be always considered as the driving site. There are additional details in the following blog article:
So, in response to your question I could say that when troubleshooting an insert/select via db-link where time is found to be consumed in the data exchange between the local and the distant database and due to the above restrictions on the choice of the driving site, we could walk around this issue by dividing the insert/select into small pieces of Pl/SQL inserts instead of one and unique large insert.
Comment by Houri Mohamed — December 20, 2012 @ 10:57 am |
Using cardinality hint in subselect seems to work to get work of select done on remote db first
Driving hint and other hints are ignored for dml insert
Comment by Lavallee Alain — February 13, 2013 @ 5:48 pm |
Alain,
When you use the cardinality hint you are changing the arithmetic of the Optimizer. You are telling the CBO that he has to trust you and consider the cardinality (number of rows of the corresponding table) you are supplying it thanks to this hint.
You don’t really have to do that if the statistics reflect the exact reality of the volume of the data and how this data is scattered (distributed) .Based on an adequate statistics the CBO is able to get the best execution plan possible. However, the reality is such sometimes you know your data better than the CBO does and hence you can hint your query as you seem to have succeeded to do in this insert/select via a data base link
But keep an open eye on this. Things might change when data change!!!
Comment by Houri — February 14, 2013 @ 1:31 pm |
the followings error occurred…
* @local : loopback link
Comment by Yeo — February 6, 2014 @ 8:44 am |
Yeo,
I am sorry for that. I haven’t managed to give a reproducible example. So thanks for prompting me to do it
And here below is the code : t is a table I have created in the distant database while type distant_typ are types I have created in the local database (sorry for the confusion)
Comment by Mohamed — February 6, 2014 @ 10:31 am |
[…] https://hourim.wordpress.com/2012/02/25/tuning-an-insertselect-via-dblink/ […]
Pingback by How to Speed up Deletes | abhayus — November 4, 2014 @ 8:51 pm |
Hi Mohamed,
interesting thoughts. But for me this is shifting the problem from a plain SQL Solution to a mixture of PL/SQL and SQL.
I guess this also implies that different memorie areas are used.
Maybe You should spend some words on context switches as well?
Best regards
Jonas
Comment by Jonas Gassenmeyer — December 22, 2016 @ 1:59 pm |
Jonas
This was a real life situation where most of the insert/select time was spent transferring data from the distant to the local database. Since this is an insert there is no way to force a driving site. The driving site is by default the site where the inserted table resides. Of course that whenever possible a plain SQL insert/select is better than a PL/SQL block. But depending on the amount of data that transits between the two database it might reveal better to opt for the PL/SQL option. There is, however, sometimes a pure SQL way to solve performance issue in distributed queries that are due to Oracle going back and forth between the two databases. My comment in Jonathan Lewis article below illustrates this way
Best regards
Mohamed Houri
Comment by hourim — December 22, 2016 @ 2:49 pm |
Hello!
For me it is not clear why the PL/SQL solution is faster.
Does it do less roundtrips?
Does it send less data to remote DB?
Or else?
Please clarify the reason why it is a better solution.
Thanks
RobK
Comment by RobK — April 5, 2018 @ 9:53 am |
We should always prefer a SQL insert/select instead of a PL/SQL block with a loop insert end loop. But when it comes to an insert/select via db link sometimes there is performance problem that kicks in because of for example the driving site is always the site where the inserted table is situated. In such a situation a plain SQL insert/select will suffer from this restriction which we can eventually overcome using a PL/SQL block. And the blog example was real life case of such a situation
Best regards
Mohamed
Comment by Mohamed — April 12, 2018 @ 7:22 pm |
I get PL/SQL: ORA-22800: invalid user-defined type distant_typ for this line
SELECT
distant_typ (a.n1)
FROM t@xxx.world a;
Comment by Raja — July 11, 2018 @ 10:34 pm |