Ranking Functions and Performance in SQL Server 2005

Ranking functions, introduced in SQL Server 2005, are a great enhancement to Transact-SQL. Many tasks, like creating arrays, generating sequential numbers, finding ranks, and so on, which in pre-2005 versions requires many lines of code, now can be implemented much easier and faster.

Let’s look at the syntax of ranking functions:

ROW_NUMBER () OVER ([<partition_by_clause>] <order_by_clause>)
RANK () OVER ([<partition_by_clause>] <order_by_clause>)
DENSE_RANK () OVER ([<partition_by_clause>] <order_by_clause>)
NTILE (integer_expression) OVER ([<partition_by_clause>] <order_by_clause>)

All four functions have “partition by” and “order by” clauses and that makes these functions very flexible and useful. However, there is one nuance in syntax that deserves your attention: the “order by” clause is not an option.

Why should you worry about the “order by” clause?

Well, as a DBA or database programmer you know that sorting is a fairly expensive operation in terms of time and resources. And if you were forced to use it always, even in a situation where you didn’t need it, you could expect degradation of performance, especially in large databases.

Is it possible to avoid sorting in ranking functions? If possible, how would it improve performance?

Let’s try to answer these questions.

How to Avoid Sorting in Ranking Functions

Create a sample table (Listing 1):

— Listing 1. Create a sample table.
CREATE TABLE RankingFunctions(orderID int NOT NULL);
INSERT INTO RankingFunctions VALUES(7);
INSERT INTO RankingFunctions VALUES(11);
INSERT INTO RankingFunctions VALUES(4);
INSERT INTO RankingFunctions VALUES(21);
INSERT INTO RankingFunctions VALUES(15);

Run the next query with the ROW_NUMBER() function:

SELECT ROW_NUMBER () OVER (ORDER BY orderID) AS rowNum, orderID
FROM RankingFunctions;

If you check the execution plan for that query (see Figure 1), you will find that the Sort operator is very expensive and costs 78 percent.

Run the same query, leaving the OVER() clause blank:

SELECT ROW_NUMBER () OVER () AS rowNum, orderID
FROM RankingFunctions;

You will get an error:

Msg 4112, Level 15, State 1, Line 1
The ranking function “row_number” must have an ORDER BY clause.

Since the parser doesn’t allow you to avoid the “order by” clause, maybe you can force the query optimizer to stop using the Sort operator. For example, you could create a computed column that consists of a simple integer, 1, and then use that virtual column in the “order by” clause (Listing 2):

— Listing 2. ORDER BY computed column.
— Query 1: Using derived table.
SELECT ROW_NUMBER () OVER (ORDER BY const) AS rowNum, orderID
     FROM (SELECT orderID, 1 as const
          FROM RankingFunctions) t1
GO
— Query 2: Using common table expression (CTE).
WITH OriginalOrder AS
(SELECT orderID, 1 as const
     FROM RankingFunctions)
SELECT ROW_NUMBER () OVER (ORDER BY const) AS rowNum, orderID
     FROM OriginalOrder;

If you check the execution plans now (see Figure 2), you will find that query optimizer doesn’t use the Sort operator anymore. Both queries will generate the row numbers and return the orderID values in the original order.

RowNum	orderID
1	7
2	11
3	4
4	21
5	15

There is a small problem with the queries in Listing 2 — they need time (resources) to create and populate the virtual column. As a result, the performance gains that you achieve by avoiding the sort operation may disappear when you populate the computed column. Is there any other way to skip the sort operation?

Continues…

How to Avoid Sorting in Ranking Functions

Leave a comment Cancel reply