However, when you run the queries, the second result will be wrong:
Query 1 retrieves the correct result:
RankNum |
denseRankNum |
orderID |
1 |
1 |
4 |
1 |
1 |
4 |
1 |
1 |
4 |
4 |
2 |
7 |
5 |
3 |
11 |
5 |
3 |
11 |
7 |
4 |
15 |
8 |
5 |
21 |
Query 2 retrieves the wrong result:
rankNum |
denseRankNum |
orderID |
1 |
1 |
7 |
1 |
1 |
11 |
1 |
1 |
4 |
1 |
1 |
21 |
1 |
1 |
15 |
1 |
1 |
11 |
1 |
1 |
4 |
1 |
1 |
4 |
Even though the expressions in the “order by” clause help to skip sorting, they can’t be applied to the RANK() and DENSE_RANK() functions. Apparently, these ranking functions must have a sorted input to produce the correct result.
Now let’s look at the NTILE() function:
— Listing 5. NTILE() function with expressions in an ORDER BY clause.
— Query 1: ORDER BY orderID.
SELECT NTILE(3) OVER (ORDER BY orderID) AS NTileNum, orderID
FROM RankingFunctions;
GO
— Query 2: ORDER BY expression.
SELECT NTILE(3) OVER (ORDER BY (SELECT 1)) AS NTileNum, orderID
FROM RankingFunctions;
Analyzing the execution plans for both queries (see Figure 5), you will find that:
- The second query skips sorting, meaning the solution is working.
- The results of both queries are correct.
- The optimizer is using Nested Loops, which in some situations can be heavy.
Performance of Ranking Functions
Now, when you know how to avoid sorting in ranking functions you can test their performance.
Let’s insert more rows into the RankingFunctions table (Listing 6):
— Listing 6. Insert more rows into the RankingFunctions table.
IF EXISTS (SELECT * FROM sys.objects
WHERE object_id = OBJECT_ID(N'[dbo].[RankingFunctions]’) AND type in (N’U’))
DROP TABLE RankingFunctions
SET NOCOUNT ON
CREATE TABLE RankingFunctions(orderID int NOT NULL);
INSERT INTO RankingFunctions VALUES(7);
INSERT INTO RankingFunctions VALUES(11);
INSERT INTO RankingFunctions VALUES(4);
INSERT INTO RankingFunctions VALUES(21);
INSERT INTO RankingFunctions VALUES(15);
DECLARE @i as int, @LoopMax int, @orderIDMax int;
SELECT @i = 1, @LoopMax = 19;
WHILE (@i <= @LoopMax)
BEGIN
SELECT @orderIDMax = MAX(orderID) FROM RankingFunctions;
INSERT INTO RankingFunctions(OrderID)
SELECT OrderID + @orderIDMax FROM RankingFunctions;
SELECT @i = @i + 1;
END
SELECT COUNT(*) FROM RankingFunctions;
— 2,621,440.
UPDATE RankingFunctions
SET orderID = orderID/5
WHERE orderID%5 = 0;
The INSERT and SELECT parts of the INSERT…SELECT statement are using the same RankingFunctions table.
The number of generated rows can be calculated as:
generated rows number = initial rows number * power(2, number of loop iterations)
Since RankingFunctions initially has 5 rows and @LoopMax = 19, the number of generated rows will be:
5 * POWER(2,19) = 2,621,440
To increase the entropy in the row order, I changed (updated) the orderID values in the rows where orderID can be divided by 5 without the remainder.