Ranking Functions and Performance in SQL Server 2005

Then I tested the INSERT and DELETE commands, using ranking functions with and without sorting (Listing 7 and Listing 8).

Listing 7. Performance tests 1 (Inserts, using SELECT …INTO).
— Query 1: Using ORDER BY orderID.
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].RankingFunctionsInserts’) AND type in (N’U’))
DROP TABLE RankingFunctionsInserts;
GO
SELECT ROW_NUMBER () OVER (ORDER BY OrderID) AS rowNum, OrderID
     INTO RankingFunctionsInserts
     FROM RankingFunctions;

— Drop table RankingFunctionsInserts and run Query 2.
— Query 2: Without sorting.
SELECT ROW_NUMBER () OVER (ORDER BY (SELECT 1)) AS rowNum, OrderID
     INTO RankingFunctionsInserts
     FROM RankingFunctions;

— Drop table RankingFunctionsInserts and run Query 3.
— Query 3: Using a pre-2005 solution.
SELECT IDENTITY(int,1,1) AS rowNum, orderID
     INTO RankingFunctionsInserts
     FROM RankingFunctions;

Each of the three queries in Listing 7 inserts the generated row number and orderID into the RankingFunctionsInserts table, using the SELECT…INTO statement. (This technique is very helpful when you trying to create pseudo-arrays in SQL.)

For the sake of curiosity, I tested a solution with an IDENTITY column (Query 3). That solution is very common in pre-2005 versions of SQL Server.

Listing 8. Performance tests 2 (Delete every fifth row in the RankingFunctions table).
— Query 1: Without sorting.
— Run the script from Listing 6 to insert 2,621,440 rows into RankingFunctions.
WITH originalOrder AS
(SELECT ROW_NUMBER ( ) OVER (ORDER BY (SELECT 1)) AS rowNum, OrderID
     FROM RankingFunctions)
DELETE originalOrder WHERE rowNum%5 = 0;

— Query 2: With ORDER BY OrderID.
— Run the script from Listing 6 to insert 2,621,440 rows into RankingFunctions.
WITH originalOrder AS
(SELECT ROW_NUMBER ( ) OVER (ORDER BY OrderID) AS rowNum, OrderID
     FROM RankingFunctions)
DELETE originalOrder WHERE rowNum%5 = 0;

Deleting every Nth row or duplicates in the table are common tasks for a DBA or database programmer. In Listing 8, I used CTE to delete every fifth row in the RankingFunctions table.

Test Results

Here are the results that I got on a regular Pentium 4 desktop computer with 512 MB RAM running Windows 2000 Server and Microsoft SQL Server 2005 Developer Edition:

As you can see, the ROW_NUMBER() function works much faster without sorting. It also performs better than the IDENTITY solution, which is unsorted as well.

The RANK() and DENSE_RANK() functions, as we found earlier, don’t work properly without sorting. NTILE() shows a very small improvement, about 10 percent. This is can be explained.

As I mentioned earlier, the optimizer is using Nested Loops to implement the NTILE() function. For large data sets, without the indexes (as in our case), Nested Loops can be very inefficient. However, you will find that they are inexpensive in the execution plan (see Figure 6), because sorting helps to make Nested Loops lighter.

When sorting is missing (see Figure 7), the Nested Loops become much heavier and almost “eat” the performance gains that you achieve by avoiding sorting.

How Indexes Can Help

As you know, all the pages of non-clustered indexes, and the intermediate-level pages of clustered indexes, are linked together and sorted in key sequence order. The leaf-level of a clustered index consists of data pages that are physically sorted in the same key sequence order as the clustered index key. All that means is that you already store some part(s) of your table’s data in a particular order. If your query can use that sorted data — and this is what happens when you have a covering index — you will increase the performance of your query dramatically.

Take any table with many columns and rows (or create and populate one using the technique from Listing 6). Then create different indexes and test the ranking functions. You will find that for covered queries the optimizer won’t use a Sort operator. This is what makes the ranking function as fast as, or even faster than, the functions with an expression in an “order by” clause.

Conclusion

This article explains ranking functions and helps you understand how they work. The techniques shown here, in some situations, can increase the performance of ranking functions 3-5 times. In addition, this article discusses some common trends in the behavior of an ORDER BY clause with expressions.

]]>

Leave a comment

Your email address will not be published.

ROW_NUMBER()

RANK()

DENSE_RANK()

NTILE(3)

INSERT 2,621,440 rows

without sorting

5 sec.

N/A

N/A

35 sec.

with sorting

14 sec.

14 sec.

14 sec.

40 sec.

with IDENTITY

8 sec.

N/A

N/A

N/A

DELETE each 5th row

without sorting

5 sec.

with sorting

24 sec.