Efficient T-SQL Code – A Case of Incorrect Use of a UNION Operator

A couple of days back, I had a developer complaining me the slow performance of some T-SQL which was causing the application to throw a Timeout Expired error. During Investigation, I noticed that the developer made an incorrect use of a UNION operator which was the culprit. In this article, I will demonstrate the issue using a demo student database.

First, create the database :

Create database student

Create a table named student inside the student database :

create table student
(
student_id int identity(1,1),
student_name varchar(50),
city varchar(50)
)

Insert a few rows of dummy data into the student table :

insert student(student_name,city)
select 'Narayan','Mumbai'
union
select 'Nikhil','Goa'
union
select 'Rahul','Gurgaon'
union
select 'Rajat','Kanpur'

Create indexes on the student table :

CREATE CLUSTERED INDEX IDX_STUDENT_ID ON STUDENT(STUDENT_ID)
CREATE NONCLUSTERED INDEX IDX_STUDENT_NAME_CITY ON STUDENT(STUDENT_NAME,CITY)

 

Now, assume a user executes the below T-SQL:

select * from student where student_name like '%N%'
UNION
select * from student where city='Kanpur'

Let’s have a look at the query execution plan as shown below:

As seen from the above screen capture, there is a SORT operator which costs around 63% of the overall plan which is very high.

Now if you have a careful look at the T-SQL; both the select statements are the except for the conditional clause in it. Therefore the same query could have been written :

select * from student where ((student_name like '%N%') OR (city='Kanpur'))

Now the query execution plan looks perfect. There is an Index Scan which means the query optimizer gets an appropriate index to execute the desired query.

Therefore, it is essential that we carefully analyze the logic to be used while writing T-SQL. Writing untidy code unnecessary puts a burden on the query optimizer which in turn affects the performance of the server as well as the application.

Please let us know in the comments if you have any feedback or suggestions, alternatively you can contact me at singhsatnam84@yahoo.com.




Related Articles :

3 Responses to “Efficient T-SQL Code – A Case of Incorrect Use of a UNION Operator”

  1. Problem here is this:

    student_name like ‘%N%’

    It requires an index scan for the entire leaf-level of the IDX_STUDENT_NAME_CITY index. It’s expensive either before the collapse of the union or after. This use of a leading % negates the use of an index “well” but can be done with an index scan rather than a table scan. For a 1GB Customer table and 100MB index – it’s better to scan the index, right? But still poor query processing as a CSR doing a query with %N% for name is a drag on the system if they do it because they are too lazy to type in a name. Also, some developers pre-load the leading % in some systems (I contract at one of these places). That is fine with today’s large caches and fast processors – but 10 years ago, this code wouldn’t have been allowed.

    We do see the improper use of UNION at customer sites a lot when they use:

    SELECT DISTINCT ….
    UNION
    SELECT DISTINCT ….
    UNION
    SELECT DISTINCT ….

    I continuously have to tell developers not to use DISTINCT when paired with UNION since the UNION itself is a distinct-resulting operator.

  2. This UNION vs OR problem has been in SQL for along time, not just T-SQL. The one-table VIEW with the OR is updatable; the UNION version counts as two tables, so the VIEW is not updateable, even tho it is logically equivalent.

    When we wrote the Standards, we knew that VIEW updatign is an NP-Complete problem, so we went with the easiest definition for the Standards.

  3. An NP-Complete problem! Thanks Joe, that lead some interesting research.

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |