Site sponsored by: Idera The gold standard of SQL Server performance monitoring & diagnostics.
SQL Server Performance

  • Home
  • Articles
  • Forums
  • Tips
  • Quiz
  • FAQ's
  • Blogs
  • Software
  • Books
  • About Us
RSS Feeds
Sign in | Join


Article Topics

All Articles
Performance Tuning
Audit
Business Intelligence
Clustering
Reporting Services
Developer
General DBA
ASP.NET / ADO.NET

Write for Us

Share your SQL Server knowledge with others and raise your profile in the community More...
Latest Articles

System Data Collection Reports
Recover Data Using Database Snapshots
Analyze and Fix Index Fragmentation in SQL Server 2008
Powerful Geographical Visualisations made easy with SQL 2008 Spatial (Part 2) ...

More     
 
Latest FAQ's

How to alter a User Defined Data Type?
How to unzip a File in SSIS?
How to view previous query plans?
ALTER TABLE SWITCH statement failed because the object '%.*ls' is not ...

More     
   
Latest Software Reviews

Spotlight on ApexSQL Doc 2008
ApexSQL Enforce
Embarcadero Change Manager
SQL Server DBA Dashboard

More     

articles >> performance tuning >> Techniques for Indexing Low-Selectivity Columns in SQL ...

Techniques for Indexing Low-Selectivity Columns in SQL Server

By : Merrill Aldrich
Jun 09, 2006

Page 4 / 4



Impact on Updates and Inserts

This technique will have some cost in terms of changes to the base table, so it's important to estimate how much, and whether that impact will be detrimental. The sample data I created happened to include about 150,000 rows having state = "MT", so to at least get some idea of the update performance hit, I performed some updates on those rows and compared the execution plans. This is not a comprehensive test of inserts and updates, but just a brief check.

First, with the indexed view, a change to the base table has to also be affected in the view's data. For a statement like this one:

update baseTable set getsMail = 0 where state = 'MT'

An update has to be performed against both the base table and the view content, and each will show up in the execution plan as a node labeled as a "clustered index update." It might seem odd, but that is what is happening: the view data is persisted, and has its own clustered index, so that clustered index has to be updated along with the table itself. The cost in this case is 28.79.

Next, I'll remove the view, and perform a similar update:

drop view completeSubset

GO


update baseTable set getsMail = 1 where state = 'MT'

The cost to perform this update without the indexed view is 24.70, which is slightly less than the cost when the view is present, as one would expect. But this difference does not seem like a significant impact compared to performance gain we achieved on select queries. If the workload on this hypothetical system favors selects over changes — and many systems do by even a factor of ten — then the benefits are probably worth it. Let me roughly generalize the costs above into a guesstimate, understanding that this is an oversimplification: Suppose that the select statements run five times as often as updates on our system. Our test provided a four-fold improvement in selection, multiplied by five is a 20x weighted gain in selection performance, and a relatively small 0.86x loss in update performance.



Conclusion

Conventional indexes are not much help when the data they index is not selective. However, if you have adequate storage space and data cache available, and the hit in insert/update performance is not an overriding concern, then it is possible to get significant gains in select performance on this type of data by denormalizing into separate indexed views. This type of denormalization is relatively safe, because — unlike denormalized data in tables — the database engine itself maintains the consistency of the data. So there is little or no worry that the data in the indexed views will get out of sync with the base tables, and there is no additional effort or coding required to keep the view data up to date.

Good candidates for this technique include tables with large numbers of rows, where it's common to select from those rows based on columns that have low selectivity, but where the result sets are expected to be smaller than the base table itself. For example, if the selected set comprises 80% of the base table, then the benefit of this technique is not likely to be great, but if the selected set is only 25% of the table, then it will be significant. There is a trade-off in terms of the storage space required, but the space requirement is mitigated by storing only the matching rows from the base table, rather than all of them, as with a covering index. There is also some price in insert/update performance, but on many systems, the demands made by select queries far outweigh those made by inserts and updates.



Join a discussion about this article in our forum.


<< Prev Page         








Home | Peformance Articles | Audit Articles | Business Intelligence Articles | Clustering Articles | Developer Articles | Reporting Services Articles | DBA Articles | ASP.NET / ADO.NET Articles | DBA FAQ's | Developer Peformance FAQ's | DBA Peformance FAQ's | Developer FAQ's | Clustering FAQ's | Error Messages | Audit Tool Reviews | Backup Tool Reviews | Coding Tool Reviews | Compare Tool Reviews | Documentation Tool Reviews | Design Tool Reviews | Monitoring Tool Reviews | Log Tool Reviews | Reporting Tool Reviews | Clustering Tool Reviews | Security Tool Reviews | Change Management Tool Reviews | Remote Access Tool Reviews | Book Reviews | Security Tool Reviews | QDPMA Performance Tuning | ADO.NET / ASP.NET | Administration | Analysis/OLAP Services | Application Development | Configuration | Components | ETL | Hardware | High Availability | Hints | Index | Misc | Operating Systems | Performance Tuning | Replication | T-SQL | Views


              © 1999-2008 by T10 Media. All rights reserved