SQL Server Performance

Count grouped rows

Discussion in 'General Developer Questions' started by jilia, Mar 16, 2007.

  1. jilia New Member

    Hello All,
    I have a question:

    Lets say you have a table with 10.000 rows. You can count them with :
    select count(*) from TableName.

    OK..

    Lets say you group these rows by a column (lets say column_name), and the grouped rows are 200.

    Is there a way to get this number with a query?

    Of course,
    select count(column_name) from TableName group by column_name
    returns 200 rows, each rows containing the number or same column_name...

    Is there a way to somehow return the number 200 ???

    Thanx a lot.
  2. ranjitjain New Member

    are you after @@rowcount?
  3. jilia New Member

    No, can you explain me?
  4. FrankKalis Moderator

    Ahem, you are aware that you can easily check this out in the SQL Server Books Online?

    --
    Frank Kalis
    Moderator
    Microsoft SQL Server MVP
    Webmaster:http://www.insidesql.de
  5. Adriaan New Member

    Easier than that: COUNT(DISTINCT &lt;column&gt<img src='/community/emoticons/emotion-5.gif' alt=';)' /><br /><br />select count(<b>distinct</b> column_name) from TableName group by column_name<br /><br />... at least if you are grouping by a single column. Also note that Null does not get counted if you use the DISTINCT keyword - in that case, @@ROWCOUNT is a better option.<br /><br />Another option:<br /><br />SELECT COUNT(*)<br />FROM (SELECT column_name FROM TableName GROUP BY column_name) X
  6. jilia New Member

    Thank you very much Adriaan! <br />I will use that 'Another option' style... <img src='/community/emoticons/emotion-1.gif' alt=':)' />
  7. FrankKalis Moderator

    May I say, that this "Another option" requires a second trip to the data, while @@ROWCOUNT is readily available?

    --
    Frank Kalis
    Moderator
    Microsoft SQL Server MVP
    Webmaster:http://www.insidesql.de
  8. Adriaan New Member

    Frank,

    Made a script of the two versions of the query together, and asked for the execution plan.

    The difference in relative query cost is just .03% higher for the SELECT COUNT(*) version, compared to @@ROWCOUNT.

    The execution plan for both versions starts with a Table scan and a Hash Match/Aggregate. The COUNT(*) version does add a Stream Aggregate and a Compute Scalar, but the cost for these two is estimated at 0%.

    So I'm not sure it requires a second trip. And I assume there is always a Stream Aggregate being calculated, where else would the @@ROWCOUNT value come from?
  9. FrankKalis Moderator

    Have you compared something like:


    SELECT COUNT(OrderID) FROM Northwind..Orders GROUP BY CustomerID
    SELECT @@ROWCOUNT

    and this


    SELECT COUNT(OrderID) FROM Northwind..Orders GROUP BY CustomerID
    SELECT COUNT(*) FROM (SELECT COUNT(OrderID) FROM Northwind..Orders GROUP BY CustomerID) AS x(a)

    The second batch definitely consists of two queries and requires 2x executing the SELECT COUNT(OrderID) FROM Northwind..Orders GROUP BY CustomerID part. As the data is already loaded into cache from the 1st execution it should be pretty fast, but nonetheless it needs to be executed, while the 1st batch automatically changes the value of the @@ROWCOUNT variable upon finishing.

    --
    Frank Kalis
    Moderator
    Microsoft SQL Server MVP
    Webmaster:http://www.insidesql.de
  10. Adriaan New Member

    No, that is not the syntax I was discussing.<br /><br />I mentioned COUNT(DISTINCT &lt;column_name&gt<img src='/community/emoticons/emotion-5.gif' alt=';)' />, and said that unlike COUNT(*) this did not include null, so the rowcount can be 1 less than what you get with a GROUP BY query.<br /><br />Check the results for the ShippedDate column, which contains nulls:<br /><br />SELECT COUNT(*) FROM Northwind..Orders GROUP BY ShippedDate<br />SELECT @@ROWCOUNT<br /><font color="blue">388</font id="blue"><br /><br />SELECT COUNT(DISTINCT ShippedDate) FROM Northwind..Orders<br /><font color="red">387</font id="red"><br /><br />SELECT COUNT(*) FROM<br />(SELECT ShippedDate FROM Northwind..Orders GROUP BY ShippedDate) AS x<br /><font color="blue">388</font id="blue"><br /><br /><br />
  11. FrankKalis Moderator

    Aah, now I see. I was just referring to your "Another option" remark. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br />--<br />Frank Kalis<br />Moderator<br />Microsoft SQL Server MVP<br />Webmaster:<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a>
  12. Adriaan New Member

    Yes, it is the same syntax as the third query in my previous post, and reads from the table just once - caching or no caching, same difference!
  13. jilia New Member

    Great help guys! This small discussion about execution times...<br />Thanx a lot again <img src='/community/emoticons/emotion-1.gif' alt=':)' />
  14. Adriaan New Member

    Thanks also to Frank for showing something I didn't know - you can add the column aliases for a derived table outside of the derived table itself:<br /><br />FROM (&lt;derived_table&gt<img src='/community/emoticons/emotion-5.gif' alt=';)' /> AS &lt;table_alias&gt; (&lt;column_aliases&gt<img src='/community/emoticons/emotion-5.gif' alt=';)' /><br /><br />As I am something of a maniac when it comes to derived tables, this will come in handy from time to time.
  15. FrankKalis Moderator

    Not bad for a Friday. Time to go home now. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br />--<br />Frank Kalis<br />Moderator<br />Microsoft SQL Server MVP<br />Webmaster:<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a>
  16. Adriaan New Member

    Already?

Share This Page