SQL Server Performance

Index Selectivity

Discussion in 'Performance Tuning for DBAs' started by HarryArchibald, Jan 3, 2003.

  1. HarryArchibald New Member

    Here is a good one for a Friday evening.

    I began analysing a database, not designed by myself, for superflouous indexes. As I understood it, indexes with a selectivity of less than 85-95% would not be used.

    The first table I looked at, confirmed this when an index on a column with only 317 distinct rows out of 6616 rows was ignored using a select on that column.

    Further investigation showed that this did not apply to other tables.

    I then began using dbcc show_statistics to examine the density of the index. I assumed an inverse relation between selectivty and density i.e. the higher a denisty, the lower the selectivity.

    I found a correlation between density and whether the index was used or not, until the densest index, which I expected to ignore the index, used the index.

    Has anyone else investigated selectivity and/ or density as a measure of the worth of an index.

    Or have I failed to understand something?

    Cheers,
    Harry
  2. bradmcgehee New Member

    This is a area that I have not studied in-depth, and have always assumed what I have been told about selectivity is true. On occasion, I have received letters telling me that this is not always necessarily true, as you have indicated.

    I wish I had the time to test this, as it is an important issue with little good information available. Hopefully others will have some good info to offer.


    ------------------
    Brad M. McGehee
    Webmaster
    SQL-Server-Performance.Com
  3. Chappy New Member

    i couldnt really reproduce this with the data profile youd given. Perhaps not surprising, im sure it takes a lot of factors into account. I tried with a simple one col (int) table, and then with a two integer column table (one being identity primkey).

    when you say it ignores the index, is this in favour of an (in your opinion) less suitable one, or does it simply do a table scan?

    Are your statistics fully up to date ?
    and was the select you were doing simply a "select * from table" ?

  4. bradmcgehee New Member

    From all the literature I have read, if an index is not selective enough, it is ignored and a scan is used instead, as supposedly is is suppposed to be less work. If I have time this week, I'll try to conduct a few experiments myself.


    ------------------
    Brad M. McGehee
    Webmaster
    SQL-Server-Performance.Com
  5. HarryArchibald New Member

    Thanks for the replies.
    It seems that this is an area of interest.

    In reply to Chappy, I was reviewing the indexes to see whether they are required or not. I have been running a query using a SARG that contains the column(s) of the index. I then study the query plan, using showplan. In some cases, the index is not used and a clustered index is used. The statistics are updated daily.

    If the index is always ignored, then there is only a performance loss in having the index, and it could be dropped.
  6. alzdba Member

    Also keep in mind indexes may support FK thus DRI. Don't just drop those indexes, unless you can prove they hurt performance.
    I recall someone telling me sqlserver does not use density and cardinality straight forward, but keeps statistics of the index itselve, breaking it up into a number of parts that contain equal amounts of index-entries, so even when you have an almost unique value, that 's located in such a part that contains also a value that covers > 15% (and therefore would concider the ix useless), because of the +15%-value, the index is not used.
  7. bradmcgehee New Member

    I have been doing some research on this topic of selectivity, and here are some more comments.

    Selectivity within indexes is mostly an issue with non-clustered, non-covering indexes, not clustered or covered nonclustered indexes.

    As you probably know, when data is retrieved using a nonclustered index, non-covering index, B-tree entries need to be read in order to locate the databases that is pointed to, and then the data page has to be retreived, then the B-tree has to be read again, and the data page, and so on and so on, until all the index and data pages are read.

    As you might image, if the indexed column is not very selective, it is less disk I/O intensive to perform a simple table scan than having to read from both the B-tree entries over and over again.

    According to Kalen Delaney in "Inside Microsoft SQL Server 2000," a nonclustered index is most useful when it is at least 95 percent selective. Any less than this, the Query Optimizer may not use it. Again, this is in reference to nonclustered, non-covered indexes, where both the index pages and the data pages are both accessed over and over.

    To prove this (unscientifically), I did some testing with a table that had 30,425 rows. First, I created a clustered index on a particular field that was very unselective (it only had a choice of 4 values). Then I did a SELECT * on this table using the column that had the clustered index on it. This returned 25,930 rows using 686 logical I/O's and performed a clustered index seek (in other words, the clustered index was used to lookup the values).

    Then I took off the clustered index and added a nonclustered index on the same column, and then ran the same exact query. This time, it took 803 logical reads to return the 25,930 rows and a table scan was performed. The nonclustered index was not used.

    Next, I used the same nonclustered index, but with the query, I added a hint to force it to use the index. This time, it took 25,986 (that's not a typo) logical disk I/Os when the index was forced to be used. The execution plan showed that a Bookmark Lookup was used when the index was forced.

    ------------------
    Brad M. McGehee
    Webmaster
    SQL-Server-Performance.Com
  8. HarryArchibald New Member

    Thanks for that Brad. I had not considered the difference between clustered and non-clustered indexes.
    This seems to emphasise the importance of having a clustered index on all tables, because if a query does a full table scan, it will be quicker when there is a clustered index.

    I've noticed the poor performance before when using a hint to force the index, but never twigged why.
  9. bradmcgehee New Member

    Yes, I agree, there should be a clustered index in every table, for the reason you mention, plus many more I have listed in my clustered index tips webpage.


    ------------------
    Brad M. McGehee
    Webmaster
    SQL-Server-Performance.Com

Share This Page