How do you determine the Nth row in a SQL Server database?

Consider the Pubs sample database. Our task is to determine the third, but last date when one employee joined the company. Several approaches are possible here. Let’s first have a look at the different methods available to us, before getting into a basic performance analysis.

 

1) Using TOP. This is probably the most intuitive one.

SELECT TOP 1
hire_date
FROM
employee
WHERE
hire_date
NOT IN(
SELECT TOP 2
hire_date
FROM
employee
ORDER BY
hire_date DESC)
ORDER BY
hire_date DESC hire_date
——————————————————
1994-01-19 00:00:00.000 (1 row(s) affected)

Not much explanation needed here. The NOT IN rules out the TOP 2 hire dates. And from the remaining row we take the TOP 1 hire date. This is a straightforward approach.

2) Here we us the SQL Server feature to assign the value of a variable to the last row processed.

DECLARE @dt DATETIME
SELECT TOP 3
@dt = hire_date
FROM
employee
ORDER BY
hire_date DESC
SELECT @dt hire_date hire_date
——————————————————
1994-01-19 00:00:00.000 (1 row(s) affected)

The variable @dt is assigned to every row in the resultset. But since we force an ORDER BY, the last row processed contains the date we are interested in and @dt is assigned this value. We now only need to SELECT the variable to get the desired result.

3) Use a temporary table. Below, we show a generic approach that uses a stored procedure that accepts the desired row as an input parameter and returns the corresponding hire date: USE PUBS
GO
CREATE PROC dbo.GetNthLatestEntry (@NthLatest INT)
AS
SET NOCOUNT ON
BEGIN
CREATE TABLE #Entry
(
ID INT PRIMARY KEY NOT NULL IDENTITY(1,1)
, Entry DATETIME NOT NULL
)
INSERT INTO #Entry (Entry) SELECT hire_date FROM employee ORDER BY hire_date DESC
SELECT
Entry hire_date
FROM
#Entry
WHERE
ID = @NthLatest
DROP TABLE #Entry
END
SET NOCOUNT OFF
GO
EXEC dbo.GetNthLatestEntry 3
DROP PROCEDURE dbo.GetNthLatestEntry hire_date
——————————————————
1994-01-19 00:00:00.000

 

4) Until now we always used either this or that proprietary feature of SQL Server. Either TOP or the IDENTITY property. Now we try to make this portable and use ANSI SQL.

SELECT
e1.hire_date
FROM
employee AS e1
INNER JOIN
employee AS e2
ON
e1.hire_date <= e2.hire_date
GROUP BY
e1.hire_date
HAVING COUNT(DISTINCT e2.hire_date) = 3 hire_date
——————————————————
1994-01-19 00:00:00.000 (1 row(s) affected)

If you are interested in how this statement works, we suggest you have a look at the books by Joe Celko.

So, we now have four different methods to get the same result. Which should we choose? Well, the classical answer here: It depends! If your goal is to make your SQL as portable as possible, you will surely choose the ANSI SQL method. If you, however, do not bother about portability, you still have three different methods to choose from. Let’s now have a look at the output of SET STATISTICS IO ON. The results below correspond to the four methods described above.

  1. Table ’employee’. Scan count 4, logical reads 8, physical reads 0, read-ahead reads 0.

  2. Table ’employee’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.

  3. Table ’employee’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0.

  4. Table ’employee’. Scan count 44, logical reads 88, physical reads 0, read-ahead reads 0.

As you can see, one method clearly differs from all the others. This is the ANSI SQL method. Portability has its price. The first method was the TOP method. It creates 4 times the IO of the other 2 methods. Though it is logical IO, it still is IO. So, the choice now is between the temp table approach and the variable assignment approach. A choice here might be dependent on how busy your whole system is. The use of temp tables might cause issues in tempdb. So, for such simple questions, using the variable assignment method seems to be a fairly reasonable choice. Running Profiler to measure the duration here, is not very meaningful, since the employee table all in all has just 43 rows. So every method is executed very fast. On larger table it is a good practice to build up a test scenario to see how the different methods perform in your specific environment.

]]>

Leave a comment

Your email address will not be published.