SQL Server T-SQL Tuning – NOT IN and NOT Exists
Following on from the previous TSQL performance tuning article in this series, we now focus on the important topic of T-SQL “Not IN” And “Not Exists “
Not in and Not Exists commands :In some circumstances we will need to select/update/delete orphan data – that is data does not exists in the major table but still exists in other interrelated tables
The response to such a circumstance will often be to utilize the “Not in” and “Not Exists” commands. This can however have a major impact on performance as the Not IN command requires a check to be made on each record individually resulting in a strain on resources and can also lead to locks particularly when updating or deleting a large volume of data .
Alternatives to NOT IN and NOT ExistsSQL Server 2008 introduced the Merge command for bulk insert / update / delete operations which can be used with the Except clause to minimize its performance hit.
Examples: First we will create two tables :
use [Workshops] Create table Mytable1 (companyname varchar (100) ) Create table Mytable2 (companyname varchar (100) )
Select/update/delete orphaned data using Not in command :
SELECT companyname FROM Mytable1 where companyname not in (select companyname from Mytable2) UPDATE Mytable1 SET companyname =N'Company_Name' where companyname not in (select companyname from Mytable2) DELETE Mytable1 FROM Mytable1 where companyname not in (select companyname from Mytable2)Improved performance using Merge and Except :
merge Mytable1 T using (select companyname from mytable1 except select companyname from mytable2 )S on t.companyname=s.companyname when matched then update SET companyname=N'Company_Name' ; merge Mytable1 T using (select companyname from mytable1 except select companyname from mytable2 )S on t.companyname=s.companyname when matched then delete ; SELECT * FROM MYTable1 S where not exists (select 1 from MYTable1 inner join MYTable2 on MYTable1.companyname=MYTable2.companyname and MYTable1.companyname=s.companyname)
Note that this also applies Not Exists :
SELECT * FROM MYTable1 S where not exists (select 1 from MYTable1 inner join MYTable2 on MYTable1.companyname=MYTable2.companyname and MYTable1.companyname=s.companyname)Healthy T-SQL practices to Select orphaned data using Except Command
select companyname from mytable1 except select companyname from mytable2In the next article of this series we will look at optimizing the performance of Views.



I Like the way you write your articles.
thanks for writing your blog.
I have used Merge succefully with one execption, delete. There are a few cases when Merge does not do exactly what you think it will.
Merge is one of the new rich 2008 T-SQL Codes that can apply for most of cases but let me know about some samples of these cases that don’t apply for Merge to be able to help in them
valuable information…
I really enjoy your blog. Much informative!!
It would be nice to see some test results which back up your assertions, using sample data with hundreds of thousands of rows. You have really only provided some alternative syntax. Here’s one of mine …
delete t1
from Mytable1 t1
where not exists (
select 1
from MyTable2 t2
where t2.CompanyName = t1.CompanyName
)
I’d be very interested to find out whether MERGE beats this.
Thx for your comment …I didn’t add concession details about a certain realistic case to support the efficiency of Merge and Except commands rather than NOT IN or NOT exists commands but they actually differed much on both aspect resources Consumption and execution time once I tried them within heavy DWH and data migrations solutions running over HUGE TB DBs …..You can try them and let me know your feedback
In addition, I consider largely DB stress powers so using Merge Command with Except can be much powerful if replacing Not IN or Not Exists and you can justify that using any DB stress tool like SQLQueryStress
Yes, but why don’t you just use:
SELECT Mytable1.companyname FROM Mytable1
left join Mytable2 on Mytable1.companyname=Mytable2.companyname
where Mytable2.companyname is null
It will not satisfy the same business logic precisely coz Mytable2.companyname is null is just seeking the data of MYtable2 having companyname is null which isn’t our case here actually but our case here seeks the data of MYtable2 having companyname values mismatching with companyname values of Mytable1
Please let me know if any further help is needed
I would caution on the use of EXCEPT. On large sets of data, it can overflow your log file.
As for LEFT JOIN, it is perfectly acceptable. I don;t see where it doesn’t meet the need.
Using your NOT EXISTS example:
– give me everything in table 1 that is not in table 2.
SELECT
*
FROM
MYTable1 S
WHERE
NOT EXISTS ( SELECT
1
FROM
MYTable1
INNER JOIN MYTable2
ON MYTable1.companyname = MYTable2.companyname
AND MYTable1.companyname = s.companyname )
is equivilent to
SELECT *
FROM MyTable1 t1
LEFT JOIN MyTable2 t2
ON t1.CompanyName=t2.CompanyName
OR — handle NULL on both sides.
(t1.CompanyName IS NULL
AND t2.CompanyName IS NULL)
WHERE t2.CompanyName is null — no matching record for table1 in table2
The left join is also cleaner to read.
Your not exists example is overly complex, the equivalent to the except is
SELECT * FROM MYTable1 as MT1 where not exists (select * from MYTable2 as MT2 where MT2.companyname=MT1.companyname)
Which produces the same execution plan and performance as the example for except.
You have not posted the improvement. I doubt that merge will improve the queries you have posted.
Thx for your comment …You are right , I didn’t post adequate performance benchmarks about a certain realistic case but it actually differed much on both aspect resources Consumption and execution time once I tried Merge & Except commands within heavy DWH and data migrations solutions running over HUGE TB DBs …..You can try them and let me know your feedback
In addition, I consider largely DB stress powers so using Merge Command with Except can be much powerful if replacing Not IN or Not Exists and you can justify that using any DB stress tool like SQLQueryStress
Once again …Thx for your follow and comments
I am sorry
*my clients still use SQL Server 2000/2005*
If so , you can use the below DML query instead of Merge ;
Delete t from Mytable1 t inner join (select companyname from mytable1 except select companyname from mytable2 )S on t.companyname=s.companyname
But it does worth much thinking about the upgrade to 2008 R2 to enjoy with the rich powers added by Merge commands that can afford efficiently bulk DML operations
Let me know if any further help is needed
I see claims of performance and would like to caution people that, unless there is a test harness for a good number of rows that proves the performance, “It Depends”.
Extremely usefull. Thanks!!
Please beware nulls are not handled the same in not exists and except
select 1 as a,null b into #t1
select 1 as a,null as b into #t2
select * from #t1
except
select * from #t2
select * from #t1 a where not exists
(select 1 from #t2 b where a.a=b.a
and a.b=b.b)
drop table #t1
drop table #t2
This justifies more the accuracy and simplicity of Except commands but even not exists can reach the same accuracy if using isnull or coalesce functions
select 1 as a,null b into #t1
select 1 as a,null as b into #t2
select * from #t1
except
select * from #t2
select * from #t1 a where not exists
(select 1 from #t2 b where a.a like b.a
and isnull (a.b,0) = isnull (b.b,0))
select * from #t1 a where not exists
(select 1 from #t2 b where a.a like b.a
and coalesce (a.b,0) = coalesce (b.b,0))
drop table #t1
drop table #t2
That depends on how much of a purist you are but normally null is not equal to null as you don’t know its value and as this tends to be the ANSII standard then this is a bug
.
Very helpful article !!!!!!!! THX
I really Interested in that!!! Thx
just to make you all aware with the MERGE statement on SQL 2008 + 2008 R2
http://www.sqlperformance.com/2013/02/t-sql-queries/another-merge-bug
That is definitely right
Very helpful and useful! THX
very informative article !
Very helpful!
Much interesting article….
very interesting and momentum blog .Thanks
your blog is very interesting and momentum .Thanks
Very good tip! Especially in dealing with very large amounts of data in a DW schema.