Efficient T-SQL Code – A Case of Incorrect Use of a UNION Operator

A couple of days back, I had a developer complaining me the
slow performance of some T-SQL which was causing the application to throw a Timeout
error. During Investigation, I noticed that the
developer made an incorrect use of a UNION operator which was the culprit. In this article, I will demonstrate the issue using a demo student database.

First, create the database :

Create database student

Create a table named student inside the student
database :

create table student


student_id int identity(1,1),

student_name varchar(50),

city varchar(50)


Insert a few rows of dummy data into the student table :

insert student(student_name,city)

select 'Narayan','Mumbai'


select 'Nikhil','Goa'


select 'Rahul','Gurgaon'


select 'Rajat','Kanpur'

Create indexes on the student table :




Now, assume a user executes the below T-SQL:

select * from student where student_name like '%N%'


select * from student where city='Kanpur'

have a look at the query execution plan as shown below:

As seen from the above screen capture, there is a SORT
operator which costs around 63% of the overall plan which is very high.

if you have a careful look at the T-SQL; both the select statements are the
except for the conditional clause in it. Therefore the same query could have
been written :

select * from student where ((student_name like '%N%') OR (city='Kanpur'))

Now the query execution plan looks perfect. There is an
Index Scan which means the query optimizer gets an appropriate index
to execute the desired query.

Therefore, it is essential that we carefully
analyze the logic to be used while writing T-SQL. Writing untidy code
unnecessary puts a burden on the query optimizer which in turn affects the
performance of the server as well as the application.

Please let us know in the comments if you have any feedback or suggestions, alternatively you can contact me at singhsatnam84@yahoo.com.


4 Responses to “Efficient T-SQL Code – A Case of Incorrect Use of a UNION Operator”

  1. Problem here is this:

    student_name like ‘%N%’

    It requires an index scan for the entire leaf-level of the IDX_STUDENT_NAME_CITY index. It’s expensive either before the collapse of the union or after. This use of a leading % negates the use of an index “well” but can be done with an index scan rather than a table scan. For a 1GB Customer table and 100MB index – it’s better to scan the index, right? But still poor query processing as a CSR doing a query with %N% for name is a drag on the system if they do it because they are too lazy to type in a name. Also, some developers pre-load the leading % in some systems (I contract at one of these places). That is fine with today’s large caches and fast processors – but 10 years ago, this code wouldn’t have been allowed.

    We do see the improper use of UNION at customer sites a lot when they use:


    I continuously have to tell developers not to use DISTINCT when paired with UNION since the UNION itself is a distinct-resulting operator.

  2. This UNION vs OR problem has been in SQL for along time, not just T-SQL. The one-table VIEW with the OR is updatable; the UNION version counts as two tables, so the VIEW is not updateable, even tho it is logically equivalent.

    When we wrote the Standards, we knew that VIEW updatign is an NP-Complete problem, so we went with the easiest definition for the Standards.

  3. An NP-Complete problem! Thanks Joe, that lead some interesting research.

  4. Apologies as this is now an old post – but I feel like I am missing something here.

    I can appreciate the difference between the UNION and the OR in the above example. In the first case, the UNION must compare two results set, discarding any duplicates. If setA has ‘n’ elements and setB has ‘m’ elements, then it seems intuitive that the UNION must make (n x m) comparisons.

    Compare this to the OR operator, which must query each element of setA and see if it matches either of 2 conditions. This seems to be ‘n’ operations, or at worst ’2n’ operations.

    I can see how there can be an underlying debate about UNION vs. OR. But in this case, the cost to me seems to be in the implied DISTINCT of the UNION. Therefore, the reference to a debate over UNION / OR doesn’t seem well-placed in this example.

    Again, I apologise – but I am very interested and it has sparked my interest. I am new to this, so would really like your thoughts – am I missing something here?

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |