Site sponsored by: Idera Try Idera’s new SQL admin toolset
SQL Server Performance

  • Home
  • Articles
  • Forums
  • Tips
  • Quiz
  • FAQ's
  • Blogs
  • Software
  • Books
  • About Us
RSS Feeds
Sign in | Join


Article Topics

All Articles
Performance Tuning
Audit
Business Intelligence
Clustering
Reporting Services
Developer
General DBA
ASP.NET / ADO.NET

Write for Us

Share you SQL Server knowledge with others and raise your profile in the community More...
Latest Articles

Policy Based Management in SQL Server 2008
Inside SQL Server Cluster Setup and Troubleshooting Techniques - Part I ...
Configure and Manage Policy Based Management in SQL Server 2008 ...
Using Column Sets with Sparse Columns

More     
 
Latest FAQ's

Cannot Start SQL Server Service
Users are able to connect to report manager but not able ...
Errors when SQL Server Snapshot Replication is Running
How to Display Server Name or IP Address in a Reporting ...

More     
   
Latest Software Reviews

Spotlight on ApexSQL Doc 2008
ApexSQL Enforce
Embarcadero Change Manager
SQL Server DBA Dashboard

More     

articles >> performance tuning >> Techniques for Indexing Low-Selectivity Columns in SQL ...

Techniques for Indexing Low-Selectivity Columns in SQL Server

By : Merrill Aldrich
Jun 09, 2006

Page 2 / 4

This method will probably produce some duplicates, but in the context of what we are testing, it makes no difference. (The technique for random selection is derived from one posted on Pete Freitag's Web site and elsewhere.)

We should end up with a distribution where about 25% of the rows in the table have both "flags" set to 1. It's possible to verify that with a query like this:

Select

     ( select count(*) from basetable ) as Total,

     ( select ( select count(*) from baseTable where validAddr = 1 )* 100.0

      / ( select count(*) from basetable ) ) as PercentValidAddr,

     ( select ( select count(*) from baseTable where getsMail = 1 )* 100.0

      / ( select count(*) from basetable ) ) as PercentGetsMail,

     ( select ( select count(*) from baseTable where getsMail = 1 AND validAddr = 1 )* 100.0

      / ( select count(*) from basetable ) ) as PercentBoth

We have enough rows, at 5 million, to hit the 50% / 50% / 25% almost exactly, just through probability.



Measure Baseline Query Performance

The first step in optimizing this is to get a baseline for the query. For this, I use the Display Execution Plan option in Management Studio and

set statistics io on

GO


select fname, lname from baseTable where validAddr = 1 and getsMail = 1

As expected, this results in a Clustered Index Scan, execution plan cost = 25.4, reads = 21561. The database is examining the entire table to fish out the 25% that match our criteria. It seems like there is some extra work going on here that ought not to be required, but it's tricky to avoid that extra work.



Indexing

So what about a conventional index? We suspect it will not help, because it would not be selective enough, but it's important to really test that assumption:

create nonclustered index IX_Flags on baseTable( validAddr, getsMail )

GO


select fname, lname from baseTable where validAddr = 1 and getsMail = 1

There's no advantage here. SQL Server still scans the table, at the same cost; as we suspected, the index is not selective enough to be of any use, and we are asking for too many rows for it to be practical to look them up using an index.

drop index baseTable.IX_Flags

Next, it may be possible to create a covering index instead, ordered such that all the data for the records we are interested in is located together in one range. There is a challenge that makes this a less attractive option: the index will be essentially as large as the table itself, so we've doubled, or nearly doubled, our storage requirement. Ordered correctly, such an index would group the records we care about together, but we would be storing all of them when we are actually interested in only 25%. If we use a covering index, ideally we'd want to index only a subset of the rows rather than all of them. But that's not possible with a conventional index.



<< Prev Page     Next Page>>    








Home | Peformance Articles | Audit Articles | Business Intelligence Articles | Clustering Articles | Developer Articles | Reporting Services Articles | DBA Articles | ASP.NET / ADO.NET Articles | DBA FAQ's | Developer Peformance FAQ's | DBA Peformance FAQ's | Developer FAQ's | Clustering FAQ's | Error Messages | Audit Tool Reviews | Backup Tool Reviews | Coding Tool Reviews | Compare Tool Reviews | Documentation Tool Reviews | Design Tool Reviews | Monitoring Tool Reviews | Log Tool Reviews | Reporting Tool Reviews | Clustering Tool Reviews | Security Tool Reviews | Change Management Tool Reviews | Remote Access Tool Reviews | Book Reviews | Security Tool Reviews | QDPMA Performance Tuning | ADO.NET / ASP.NET | Administration | Analysis/OLAP Services | Application Development | Configuration | Components | ETL | Hardware | High Availability | Hints | Index | Misc | Operating Systems | Performance Tuning | Replication | T-SQL | Views


              © 1999-2008 by T10 Media. All rights reserved