Training Videos
Write for Us
As always, there is more than one way to get a correct result. But which way gives you the best performance? Let's first build a simplified test scenario to illustrate what we're after.
CREATE TABLE #t (ArticleGroup CHAR(2), Price DECIMAL(8,2)) INSERT INTO #t VALUES('SP', 1.2) INSERT INTO #t VALUES('SP', 1.2) INSERT INTO #t VALUES('SP', 2.1) INSERT INTO #t VALUES('GR', 2.2) INSERT INTO #t VALUES('GR', 2.3) INSERT INTO #t VALUES('GR', 2.3) INSERT INTO #t VALUES('GR', 2.3)
According to our task the resultset should look like this:
ArticleGroup Price Number ------------ ---------- ----------- SP 1.20 2 GR 2.30 3 (2 row(s) affected)
Now how to achieve that result?
1. Approach
SELECT ArticleGroup, Price, COUNT(*) AS Number FROM #t a GROUP BY ArticleGroup, Price HAVING COUNT(*) >= (SELECT MAX(Number) FROM (SELECT ArticleGroup, Price, COUNT(*) AS Number FROM #t b WHERE b.ArticleGroup = a.ArticleGroup GROUP BY ArticleGroup, Price) c)
2. Approach
SELECT ArticleGroup,Price, COUNT(*) Number FROM #t a GROUP BY ArticleGroup,Price HAVING NOT EXISTS (SELECT * FROM #t b WHERE b.ArticleGroup = a.ArticleGroup GROUP BY b.Price HAVING COUNT(b.ArticleGroup) > COUNT(a.ArticleGroup))
When you compare the execution plans, you'll find a "Lazy Spool" in the second approach. Usually this is a bad sign and you should try to restate the query to work around that logical operator. The way to do this is by restating the first approach:
3. Approach
SELECT ArticleGroup, Price, COUNT(*) AS Number FROM #t t GROUP BY ArticleGroup, Price HAVING COUNT(*) = (SELECT MAX(Number) FROM (SELECT ArticleGroup, Price, COUNT(*) AS Number FROM #t GROUP BY ArticleGroup, Price) x WHERE ArticleGroup = t.ArticleGroup)
So, now that we have our contestants, let's see how they perform on a representative table. The table contains 250,000 rows, has 8 ArticleGroups with 358 different prices. Without any indexing on the table, you get the following runtimes:
1. Approach: 5.390 Milliseconds 2. Approach: 19.033 Milliseconds 3. Approach: 436 Milliseconds
If you now place an index on (ArticleGroup, Price) we can observe the following runtimes:
1. Approach: 153 Milliseconds 2. Approach: 500 Milliseconds 3. Approach: 110 Milliseconds
You'll see that the third approach is in both cases the fastest one. But what is even more impressive is how much the "non-optimal" first and second approaches can gain from the index.
Just for completeness: you can replace MAX() with various TOP methods:
SELECT ArticleGroup, Price, COUNT(*) AS Number FROM #t a GROUP BY ArticleGroup, Price HAVING COUNT(*) = (SELECT TOP 1 COUNT(*) FROM #t WHERE a.ArticleGroup = ArticleGroup GROUP BY ArticleGroup, Price ORDER BY COUNT(*) DESC)
Or:
SELECT ArticleGroup, Price, COUNT(*) AS Number FROM #t t GROUP BY ArticleGroup, Price HAVING COUNT(*) = (SELECT TOP 1 Number FROM (SELECT TOP 100 PERCENT ArticleGroup, Price, COUNT(*) AS Number FROM #t GROUP BY ArticleGroup, Price ORDER BY Number DESC) x WHERE ArticleGroup = t.ArticleGroup)
These will also yield the correct results but they don't give any significant performance benefit.
When you're on SQL Server 2005, you can take advantage of its new ranking functions:
SELECT ArticleGroup, Price, Cnt FROM (SELECT ArticleGroup, Price, COUNT(*) as Cnt, RANK() OVER (PARTITION BY ArticleGroup ORDER BY COUNT(*) DESC) AS Rnk FROM #t GROUP BY ArticleGroup, Price ) x WHERE Rnk = 1