SQL Server 2008 - Worth the Wait
Consider the Pubs sample database. Our task is to determine the third, but last date when one employee joined the company. Several approaches are possible here. Let's first have a look at the different methods available to us, before getting into a basic performance analysis.
1) Using TOP. This is probably the most intuitive one.
SELECT TOP 1 hire_dateFROMemployeeWHEREhire_date NOT IN(SELECT TOP 2 hire_dateFROMemployeeORDER BYhire_date DESC)ORDER BYhire_date DESChire_date ------------------------------------------------------ 1994-01-19 00:00:00.000(1 row(s) affected)
Not much explanation needed here. The NOT IN rules out the TOP 2 hire dates. And from the remaining row we take the TOP 1 hire date. This is a straightforward approach.
2) Here we us the SQL Server feature to assign the value of a variable to the last row processed.
DECLARE @dt DATETIMESELECT TOP 3@dt = hire_dateFROMemployeeORDER BYhire_date DESCSELECT @dt hire_datehire_date ------------------------------------------------------ 1994-01-19 00:00:00.000(1 row(s) affected)
The variable @dt is assigned to every row in the resultset. But since we force an ORDER BY, the last row processed contains the date we are interested in and @dt is assigned this value. We now only need to SELECT the variable to get the desired result.
3) Use a temporary table. Below, we show a generic approach that uses a stored procedure that accepts the desired row as an input parameter and returns the corresponding hire date:USE PUBSGOCREATE PROC dbo.GetNthLatestEntry (@NthLatest INT)AS SET NOCOUNT ONBEGINCREATE TABLE #Entry (ID INT PRIMARY KEY NOT NULL IDENTITY(1,1), Entry DATETIME NOT NULL)INSERT INTO #Entry (Entry) SELECT hire_date FROM employee ORDER BY hire_date DESCSELECT Entry hire_dateFROM #Entry WHERE ID = @NthLatestDROP TABLE #EntryENDSET NOCOUNT OFFGOEXEC dbo.GetNthLatestEntry 3DROP PROCEDURE dbo.GetNthLatestEntryhire_date ------------------------------------------------------ 1994-01-19 00:00:00.000
4) Until now we always used either this or that proprietary feature of SQL Server. Either TOP or the IDENTITY property. Now we try to make this portable and use ANSI SQL.
SELECT e1.hire_dateFROM employee AS e1INNER JOIN employee AS e2ON e1.hire_date <= e2.hire_dateGROUP BY e1.hire_dateHAVING COUNT(DISTINCT e2.hire_date) = 3hire_date ------------------------------------------------------ 1994-01-19 00:00:00.000(1 row(s) affected)
If you are interested in how this statement works, we suggest you have a look at the books by Joe Celko.
So, we now have four different methods to get the same result. Which should we choose? Well, the classical answer here: It depends! If your goal is to make your SQL as portable as possible, you will surely choose the ANSI SQL method. If you, however, do not bother about portability, you still have three different methods to choose from. Let's now have a look at the output of SET STATISTICS IO ON. The results below correspond to the four methods described above.
Table 'employee'. Scan count 4, logical reads 8, physical reads 0, read-ahead reads 0.
Table 'employee'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.
Table 'employee'. Scan count 44, logical reads 88, physical reads 0, read-ahead reads 0.
As you can see, one method clearly differs from all the others. This is the ANSI SQL method. Portability has its price. The first method was the TOP method. It creates 4 times the IO of the other 2 methods. Though it is logical IO, it still is IO. So, the choice now is between the temp table approach and the variable assignment approach. A choice here might be dependent on how busy your whole system is. The use of temp tables might cause issues in tempdb. So, for such simple questions, using the variable assignment method seems to be a fairly reasonable choice. Running Profiler to measure the duration here, is not very meaningful, since the employee table all in all has just 43 rows. So every method is executed very fast. On larger table it is a good practice to build up a test scenario to see how the different methods perform in your specific environment.