select count (distinct email) from table | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

select count (distinct email) from table

I have a table with 20 million rows and this table grows by abt 2 million records/month. Periodically, I’m asked to count the number of distinct email addresses we have in this big table. How can I do it so that I don’t bring down the server? there’s a non-clustered index on the email field. when I do, select count(distinct email) from table CPU shoots up to 100%.
Try to see execution plan, may be there is a loss statistics
Luis Martin
Moderator
SQL-Server-Performance.com All postings are provided “AS IS” with no warranties for accuracy.
Also, if there number of distinct emails is significantly less than the number of records, you may want to consider normalizing the data and putting the distinct emails in another table with a link from the big table.
Yo what’s up tusanto! When running this query how are the disk reads? How fast are the processor(s)? Yes, normilizing you data would reduce overhead.
Also WITH (NOLOCK) should help a bit and I don’t think it would change much your results.
I think the issue is lot of data and I/O. So I would second what Cheris has suggested. When inserting or updating the main table, check from the distict table if this is a new email id and if it is, store it in the distinct table. Preprocessing Vs Postprocessing – COST REMAINS THE SAME. SOMEBODY HAS TO PAY IT. Gaurav
Moderator
Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard

The views expressed here are those of the author and no one else. There are no warranties as to the reliability or accuracy of anything presented here.
yo yo yo..lazy dBA..it’s friday..!
OK..thanks everyone for your suggestions… yes, I used with (nolock). I’ll think about storing the distinct emails in a separate table. They are always asking me for count of this and that grouped by this and that. I have a feeling all these requests coming from upper management suggests it’s time for us to get a real data warehouse.

<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by tsusanto</i><br /><br />yo yo yo..lazy dBA..it’s friday..!<br /><hr height="1" noshade id="quote"></blockquote id="quote"></font id="quote"><br /><br />Yes, it’s party night here with the [<img src=’/community/emoticons/emotion-2.gif’ alt=’:D‘ />]MAMACITAS!!!!!![<img src=’/community/emoticons/emotion-2.gif’ alt=’:D‘ />] Have a great weekend everyone!!!
How about archiving of rows based on a period?
This way you can avoid running resourceful queries against live database.
quote:Originally posted by tsusanto I have a table with 20 million rows and this table grows by abt 2 million records/month.
when I do, select count(distinct email) from table CPU shoots up to 100%.

Satya SKJ
Moderator
http://www.SQL-Server-Performance.Com/forum
This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
]]>