How to Optimize SQL Server Query Plans – The Devil is in the Detail

When dealing with the
performance database any detail should can never be neglected, because even the
smallest thing can turn out to be a great opportunity for query optimization.
Generally, one needs a significant amount of experience to detect that kind of
details easily. But what shall the rest do till they gain the required

Inthis case, the
salvation are specialized products,that enable usto carry out a
deep analysis of query execution plans and find bottlenecks in them. Query
Profiler, present indbForge Studio for SQLServer,is one of the
tools that will help you with those tasks.

Now, let’s take a look
at how we can use the profiler for trivial, but withal commonly-used
task — record duplicates deletion in specified table.

Let’s assume that we
have a table with login history and we need to delete duplicates herewith
keeping the latest login for every user.

        --Initial table



         login_id INT IDENTITY(1,1) PRIMARY KEY

       , [user_id] INT

       , login_date DATETIME


INSERT INTO @logins ([user_id], login_date)


       (1, '20130911 14:33:29'), (1, '20130912 15:36:39'),

       (2, '20130914 14:13:03'), (2, '20130917 05:56:18'),

       (3, '20130910 16:30:29'), (1, '20130908 18:25:39')


A standard solution for
this task is to get a list with the maximal ID for each user and delete the
records that were not included in the list:

        --Example #1


WHERE login_id NOT IN (

       SELECT MAX(login_id)

       FROM @logins

       GROUP BY [user_id]


Let’shave a look
at the execution plan:

Actually we refer to
the table two times. Before the deletion, to avoid several
scans of the same data  from the initial table, SQL Server applies Table Spool
operation ( the most resource-consuming in our case).

This operator scans the
input table and copies each row into a hidden spool table. The table is stored
in the tempdb database and exists only for the lifetime of the query. This
operator allows the next ones to read data from the buffer instead of the
original table.

To avoid the table
spool reading, let’s try to rewrite this query by using ROW_NUMBER:

        --Example #2


       SELECT rn = ROW_NUMBER() OVER (

                                 PARTITION BY [user_id]

                                 ORDER BY login_date DESC


       FROM @logins

) t

WHERE t.rn > 1 

The execution plan got
simpler and that means that the query will be executed faster. However now we
have another problem – the most resource consuming operation is sorting, which
is used for grouping data by users.

The probability that
the same user simultaneously logs in in the system several times is very low,
so we can drop IDENTITY column in favor of a composite primary key:

        --Modified table



         [user_id] INT

       , login_date DATETIME

       , PRIMARY KEY CLUSTERED ([user_id], login_date DESC)


If we turn to the
theory, we remember that a cluster index, which sorts table rows, is based on
primary key columns. We have also simplified the
initial table by removing the unnecessary column.

Execution of query #2
with the modified table has given us the following
execution plan:

It’s easy to notice
that the primary key modification allows us to omit sorting operator and makes
the query execution faster:

Now let’s check which
one of the examples performs the formulated task the most efficiently:

To sum it up, you only
need few things to see whatmight not be obvious from the first sight and
as a result get a chance to significantly accelerate query execution and
simplify database structure:

– some theoretical

– analyze query plan
with profiler (for example, the one which is implemented in the dbForge
Studio for SQL Server

– use a combo of different


No comments yet... Be the first to leave a reply!