When dealing with the performance database any detail should can never be neglected, because even the smallest thing can turn out to be a great opportunity for query optimization. Generally, one needs a significant amount of experience to detect that kind of details easily. But what shall the rest do till they gain the required knowledge?
Inthis case, the salvation are specialized products,that enable usto carry out a deep analysis of query execution plans and find bottlenecks in them. Query Profiler, present indbForge Studio for SQLServer,is one of the tools that will help you with those tasks.
Let’s assume that we have a table with login history and we need to delete duplicates herewith keeping the latest login for every user.
--Initial table DECLARE @logins TABLE ( login_id INT IDENTITY(1,1) PRIMARY KEY , [user_id] INT , login_date DATETIME ) INSERT INTO @logins ([user_id], login_date) VALUES (1, '20130911 14:33:29'), (1, '20130912 15:36:39'), (2, '20130914 14:13:03'), (2, '20130917 05:56:18'), (3, '20130910 16:30:29'), (1, '20130908 18:25:39')
A standard solution for this task is to get a list with the maximal ID for each user and delete the records that were not included in the list:
--Example #1 DELETE FROM @logins WHERE login_id NOT IN ( SELECT MAX(login_id) FROM @logins GROUP BY [user_id] )
Let’shave a look at the execution plan:
Actually we refer to the table two times. Before the deletion, to avoid several scans of the same data from the initial table, SQL Server applies Table Spool operation ( the most resource-consuming in our case).
This operator scans the input table and copies each row into a hidden spool table. The table is stored in the tempdb database and exists only for the lifetime of the query. This operator allows the next ones to read data from the buffer instead of the original table.
To avoid the table spool reading, let’s try to rewrite this query by using ROW_NUMBER:
--Example #2 DELETE t FROM ( SELECT rn = ROW_NUMBER() OVER ( PARTITION BY [user_id] ORDER BY login_date DESC ) FROM @logins ) t WHERE t.rn > 1
The execution plan got simpler and that means that the query will be executed faster. However now we have another problem – the most resource consuming operation is sorting, which is used for grouping data by users.
The probability that the same user simultaneously logs in in the system several times is very low, so we can drop IDENTITY column in favor of a composite primary key:
--Modified table DECLARE @logins TABLE ( [user_id] INT , login_date DATETIME , PRIMARY KEY CLUSTERED ([user_id], login_date DESC) )
It’s easy to notice that the primary key modification allows us to omit sorting operator and makes the query execution faster:
Now let’s check which one of the examples performs the formulated task the most efficiently:
To sum it up, you only need few things to see whatmight not be obvious from the first sight and as a result get a chance to significantly accelerate query execution and simplify database structure:
– some theoretical knowledge
– analyze query plan with profiler (for example, the one which is implemented in the dbForge Studio for SQL Server)
– use a combo of different techniques.]]>