SQL Server Performance

  • Home
  • Articles
  • Forums
  • Tips
  • Training
  • FAQ's
  • Blogs
  • Software
  • Books
  • About Us
RSS Feeds
Sign in | Join


FAQ Topics

All FAQ's
General DBA
General Developer
DBA Performance Tuning
Developer Performance Tuning
Clustering
Error Messages

Training Videos

Check out our new SQL Server Training Videos section More...

Write for Us

Share your SQL Server knowledge with others and raise your profile in the community More...
Latest Articles

A High Level Comparison Between Oracle and SQL Server - Part ...
A High Level Comparison Between Oracle and SQL Server
SQL Server 2008 R2 Multi-server Administration - A First Look ...
An overview of Master Data Services - MDS in SQL Server ...

More     
 
Latest FAQ's

Add Node to A SQL Server failover Cluster failed with invalid ...
SQL Server Destination remote server error
Setting Up Data And Log Files For SQL Server
Will Check Constraints Improve Database Performance?

More     
   
Latest Software Reviews

dbForge Review
Spotlight on ApexSQL Diff - Server-based database comparison tool ...
Spotlight on ApexSQL Data Diff - Server-based database comparison tool ...
Spotlight on ApexSQL Doc 2008

More     

How do I find the most frequently occurring value for all data in a table? How do I find that value per group?



As always, there is more than one way to get a correct result. But which way gives you the best performance? Let's first build a simplified test scenario to illustrate what we're after.

CREATE TABLE #t (ArticleGroup CHAR(2), Price DECIMAL(8,2))
INSERT INTO #t VALUES('SP', 1.2)
INSERT INTO #t VALUES('SP', 1.2)
INSERT INTO #t VALUES('SP', 2.1)
INSERT INTO #t VALUES('GR', 2.2)
INSERT INTO #t VALUES('GR', 2.3)
INSERT INTO #t VALUES('GR', 2.3)
INSERT INTO #t VALUES('GR', 2.3)

According to our task the resultset should look like this:

ArticleGroup Price      Number
------------ ---------- -----------
SP           1.20       2
GR           2.30       3

(2 row(s) affected)

Now how to achieve that result?

1. Approach

SELECT ArticleGroup, Price, COUNT(*) AS Number
  FROM #t a
 GROUP BY ArticleGroup, Price
  HAVING COUNT(*) >=
   (SELECT MAX(Number)
      FROM
      (SELECT ArticleGroup, Price, COUNT(*) AS Number
         FROM  #t b
        WHERE b.ArticleGroup = a.ArticleGroup
         GROUP BY ArticleGroup, Price) c)

ArticleGroup Price      Number
------------ ---------- -----------
SP           1.20       2
GR           2.30       3

(2 row(s) affected)

2. Approach

SELECT ArticleGroup,Price, COUNT(*) Number
  FROM #t a
 GROUP BY ArticleGroup,Price
  HAVING NOT EXISTS
   (SELECT *
      FROM #t b
     WHERE b.ArticleGroup = a.ArticleGroup
     GROUP BY b.Price
     HAVING COUNT(b.ArticleGroup) > COUNT(a.ArticleGroup))

ArticleGroup Price      Number
------------ ---------- -----------
SP           1.20       2
GR           2.30       3

(2 row(s) affected)

When you compare the execution plans, you'll find a "Lazy Spool" in the second approach. Usually this is a bad sign and you should try to restate the query to work around that logical operator. The way to do this is by restating the first approach:

3. Approach

SELECT ArticleGroup, Price, COUNT(*) AS Number
  FROM #t t
 GROUP BY ArticleGroup, Price
  HAVING COUNT(*) =
  (SELECT MAX(Number)
     FROM
      (SELECT ArticleGroup, Price, COUNT(*) AS Number
         FROM #t
        GROUP BY ArticleGroup, Price) x
         WHERE ArticleGroup = t.ArticleGroup)

ArticleGroup Price      Number
------------ ---------- -----------
SP           1.20       2
GR           2.30       3

(2 row(s) affected)

So, now that we have our contestants, let's see how they perform on a representative table. The table contains 250,000 rows, has 8 ArticleGroups with 358 different prices. Without any indexing on the table, you get the following runtimes:

1. Approach: 5.390 Milliseconds
2. Approach: 19.033 Milliseconds
3. Approach: 436 Milliseconds

If you now place an index on (ArticleGroup, Price) we can observe the following runtimes:

1. Approach: 153 Milliseconds
2. Approach: 500 Milliseconds
3. Approach: 110 Milliseconds

You'll see that the third approach is in both cases the fastest one. But what is even more impressive is how much the "non-optimal" first and second approaches can gain from the index.

Just for completeness: you can replace MAX() with various TOP methods:

SELECT ArticleGroup, Price, COUNT(*) AS Number
  FROM #t a
 GROUP BY ArticleGroup, Price
  HAVING COUNT(*) =
   (SELECT TOP 1 COUNT(*)
      FROM #t
     WHERE a.ArticleGroup = ArticleGroup
     GROUP BY ArticleGroup, Price
     ORDER BY COUNT(*) DESC)

Or:

SELECT ArticleGroup, Price, COUNT(*) AS Number
  FROM #t t
 GROUP BY ArticleGroup, Price
  HAVING COUNT(*) =
  (SELECT TOP 1 Number
     FROM
      (SELECT TOP 100 PERCENT ArticleGroup, Price, COUNT(*) AS Number
         FROM #t
        GROUP BY ArticleGroup, Price
         ORDER BY Number DESC) x
        WHERE ArticleGroup = t.ArticleGroup)

These will also yield the correct results but they don't give any significant performance benefit.

When you're on SQL Server 2005, you can take advantage of its new ranking functions:

SELECT ArticleGroup, Price, Cnt
  FROM
  (SELECT ArticleGroup, Price, COUNT(*) as Cnt,
   RANK() OVER (PARTITION BY ArticleGroup
   ORDER BY COUNT(*) DESC) AS Rnk
  FROM #t
  GROUP BY ArticleGroup, Price
  ) x
 WHERE Rnk = 1








C# Help and Tutorials | PHP MySQL Tutorial | Sharepoint Tutorial | Azure Tutorial | Cloud Hosting Magazine | ASP.NET Tutorials | Windows Server Help | Windows Phone Pro | Silverlight Ace | Visual Studio Tutorials | Home | Peformance Articles | Audit Articles | Business Intelligence Articles | Clustering Articles | Developer Articles | Reporting Services Articles | DBA Articles | ASP.NET / ADO.NET Articles | SQL Server Training Videos | DBA FAQ's | Developer Peformance FAQ's | DBA Peformance FAQ's | Developer FAQ's | Clustering FAQ's | Error Messages | Audit Tool Reviews | Backup Tool Reviews | Coding Tool Reviews | Compare Tool Reviews | Documentation Tool Reviews | Design Tool Reviews | Monitoring Tool Reviews | Log Tool Reviews | Reporting Tool Reviews | Clustering Tool Reviews | Security Tool Reviews | Change Management Tool Reviews | Remote Access Tool Reviews | Book Reviews | Security Tool Reviews | ADO.NET / ASP.NET | Administration | Analysis/OLAP Services | Application Development | Configuration | Components | ETL | Hardware | High Availability | Hints | Index | Misc | Operating Systems | Performance Tuning | Replication | T-SQL | Views


              © 2010 Jude O'Kelly. All rights reserved