Left Join vs. not in SubQuery | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Left Join vs. not in SubQuery

Hello I have to tables which have the same structure Table1 and Table2 (#Product_ID varchar(10), #Num_Year char(4), #Num_Month char(2), Qty_Forecast numeric). I would like to insert into Table1 all the rows from Table2 that it does not already has. Which of the following statements is better and why? First statement:
insert into Table1
select
Producto_ID,
Num_Year,
Num_Month,
Qty_Forecast
from Table2
where
Product_ID + Num_Year + Num_Month not in
(
select
Product_ID + Num_Year + Num_Month
from Table1
) Second statement:
insert into Table1
select
Producto_ID,
Num_Year,
Num_Month,
Qty_Forecast
from Table2
left join Table1 on
Table1.Product_ID=Table2.Product_ID and
Table1.Num_Year=Table2.Num_Year and
Table1.Num_Month=Table2.Num_Month
where
Table1.Product_ID is null and
Table1.Num_Year is null and
Table1.Num_Month is null
Thanks a lot for you help.
I suggest running both statements with Show Execution Plan, compare what the server is doing and the query plan cost. My strong hunch is that the second one is better, because it can use indexes where the first cannot.
I would second this. RDBMS will mostly try to flatten queries into JOIN operations anyway. So if you can express express a query with in a JOIN way, it is (in most cases) the "better" choice. —
Frank Kalis
Moderator
Microsoft SQL Server MVP
Webmaster:http://www.insidesql.de
A small change is required in your second statement.
Use the Outer join for your query to succeed. The current query will insert zero records. insert into Table1
selectProducto_ID, Num_Year, Num_Month, Qty_Forecast
from Table2
left Outer join Table1 on — Changed…
Table1.Product_ID=Table2.Product_ID and
Table1.Num_Year=Table2.Num_Year and
Table1.Num_Month=Table2.Num_Month
where
Table1.Product_ID is null and
Table1.Num_Year is null and
Table1.Num_Month is null I executed both queries in three iterations (ofcourse resetting to original state before next iteration) and found same performance results in terms of CPU, reads, durations.
Any thoughts? Thanks and Regards,
Bhushan
http://www.metacafe.com/watch/365213/wonder_what_you_are
Better late than never, but one thing you should never do is concatenating columns for criteria … where Product_ID + Num_Year + Num_Month not in (select Product_ID + Num_Year + Num_Month from Table1) … because this causes the server to ignore all appropriate indexes. The syntax you’re looking for is a correlated subquery, which is pretty much the same subquery, but with fields from the main query in the WHERE clause of the subquery: insert into Table1
select Product_ID, Num_Year, Num_Month, Qty_Forecast
from Table2
where Table2.Product_ID not in
(select x.ProductID FROM table1 x WHERE x.Num_Year = Table2.Num_Year AND x.Num_Month = Table2.Num_Month) … or a NOT EXISTS correlated subquery: where NOT EXISTS
(select x.ProductID FROM table1 x WHERE x.ProductID = table2.ProductId x.Num_Year = Table2.Num_Year AND x.Num_Month = Table2.Num_Month) Performance can vary depending on data distribution, but the LEFT JOIN syntax is probably a safe bet.
Where do the stars under the User name come from?
For e.g. Adriaan has 5 stars. Thanks and Regards,
Bhushan
http://www.metacafe.com/watch/365213/wonder_what_you_are

That’s the forum indication for how much a user has posted. —
Frank Kalis
Microsoft SQL Server MVP
Contributing Editor, Writer & Forum Moderatorhttp://www.sql-server-performance.com
Webmaster:http://www.insidesql.de
Bhushan, I was not so much responding to your posting, as to the original poster. Sorry if I seemed to jump all over you.
No probs! Thanks and Regards,
Bhushan
http://www.metacafe.com/watch/365213/wonder_what_you_are

]]>