Performance Impact of Using SELECT COUNT(*) In Queries

I am not sure how this statement came into common usage, however, it seems to be regularly used by developers. Count(*) will use a table scan or Clustered index scan and so will eventually impair performance.

In the case of a large table, a table scan or clustered index scan will impair your query performance, but let’s examine whether count(*) will always scan through the entire table.

Take for example the Sales.Customer table in the AdventureWorks2012 database.

It has four indexes :

sp_helpindex '[Sales].[Customer]' will give you the index properties of the table.

From running, DBCC IND, the following are the page counts for each index.

DBCC IND ('Adventureworks2012', '[Sales].[Customer]', 1);

Index Id

Index Name

Column

Data Type

Page Count

1

PK_Customer_CustomerID

CustomerID

Int

123

2

AK_Customer_rowguid

rowguid

uniqueidentifier

66

3

AK_Customer_AccountNumber

AccountNumber

Varchar(10)

61

5

IX_Customer_TerritoryID

TerritoryID

int

37

Since PK_Customer_CustomerID is the clustered index, it contains the all the pages for the table which has the highest count. Of the other three indexes, IX_Customer_TerritoryID is the smallest index in terms of the number of pages.

Therefore, it should be much faster to use the IX_Customer_TerritoryID index.  Let us see the query plan for COUNT(*) .

So as expected, Count(*) is using least costly index which is the IX_Customer_TerritoryID not the Clustered Index Scan.

There is another important issue to bear in mind. Let us say, by the time you query COUNT(*), all the pages of the table are loaded into the cache. So rather than using the non-clustered index, it may be better to use the Clustered Index since they are already in the cache. However, this is not the case for SQL Server. No matter what is in the cache,  SQL Server will still evaluate using the least costly mechanism for COUNT(*).

DBCC DROPCLEANBUFFERS -- Clean the Cache
SET STATISTICS IO ON -- to display query stats
SELECT * FROM [Sales].[Customer]
SELECT Count(*) FROM [Sales].[Customer]

Let us verify the query plan.

This has not changed despite the all pages are in the cache.Note the query stats :

You can see , it has used physical reads which is the non-clustered index. Thus, when using COUNT(*) verify whether correct indexes are being used and do not believe that COUNT(*) will use Clustered index.




Related Articles :

7 Responses to “Performance Impact of Using SELECT COUNT(*) In Queries”

  1. Okay, good info. So what’s the solution? Would COUNT(Customer_ID) be better?

  2. Sebastiaan Halsema Reply January 23, 2013 at 7:29 pm

    Whenever I need to know the number of records in a table or query I always use

    Select Count(1) From …

    Not sure though if it really makes any difference in the used plan, never tested it either. But you won’t have to know the name of any column that way and the result will be the same of course (each row will be counted)

    • I use COUNT(1) as a holdover from my Oracle days, where it made a significant difference. I just tried SELECT(1), SELECT(*), and SELECT() on a table with 1.3M rows, and the explain plan showed 33% for each, so it may not make a difference to SQL Server.

  3. Ed, I think that if you’re just counting all the rows in a table, you could just look it up at sys.sysindexes. Join it with sys.tables. Take a look here: http://geekswithblogs.net/TakeNote/archive/2007/09/22/115537.aspx

    But in real world, i usually want to count customers by territory, or some other group. I don’t know of an alternative to that. Or if I’m actually interested in the total row count of a table, I can suffer the one-off impact of an index scan (as it would take more time to write a better query).

  4. You can use sp_spaceused to get a row count, too. It can report incorrect values for rows and space used, but a pretty lightweight way to find a row count.

    USE AdventureWorks2008R2;
    GO
    EXEC sp_spaceused N’Purchasing.Vendor’;
    GO

    More info: http://msdn.microsoft.com/en-us/library/ms188776(SQL.105).aspx

  5. This is the most efficient way that I have found, although it is obviously a little more time consuming to type. :-)

    SELECT i.rows
    FROM sys.tables t
    INNER JOIN sys.sysindexes i ON t.object_id = i.id AND i.indid < 2
    WHERE t.schema_id = SCHEMA_ID('’) AND t.name = ”;

    If you compare it to any flavor of SELECT COUNT() FROM ., it us the clear winner regardless of what is in the cache or not.

    • Hi Mark,
      Is the database statistics frequently updated (indexes are rebuiltO on your server so that the information returned in sys.sysindexes is reliable? I applied your suggestion in a system storing multiple terabytes and it’s working great.

      Thanks,
      Grant

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |