SQL Server Performance

INDEX SELECTIVITY

Discussion in 'Performance Tuning for DBAs' started by sqlnewdba, Sep 6, 2006.

  1. sqlnewdba New Member

    I am confused on "Index Selectivity". Two scenarios are below. Being new to SQL Server, I am trying to gain additional knowledge on "Index Selectivity". Please help.

    1)dbcc show_statistics (table name, index name)


    Updated Rows Rows Sampled Steps

    Density Average row length
    Sep 5 2006 10:19PM 94755577106451 200 6.4302716E-5 36.0
    36.0

    What is meant by - 6.4302716e-5? Not a Math major.
    What does it compute to?

    Using "dbcc show-statistics", is the index a highly
    selected index?

    2) Copied and pasted script from article on your web site "Tips on
    Optimizing SQL Server Non-Clustered Indexes" that uses the Northwind
    database. and substituted for my use with same table and index above..

    total unique rows - 6209


    total rows - 9.47556e+006


    selectivity ratio - 0

    Confused. The column is not 95% unique. Since it is not, the Query
    Optimizer might not even use it. However, it will return less than 5% of
    the rows. Reading the article "Not All SQL Server Indexes Are Created
    Equal", talks about the 5% rule. If it is less than 5%, the Query Optimizer
    will use the index. Please advise at your convenience.

    Most grateful,

    sqlnewdba
  2. Adriaan New Member

    E is short for 'times ten to the power of', so 1E-6 is 0.000001 and 1E6 is 1,000,000.
  3. sqlnewdba New Member

    Thank you Adriaan.

    Do you know if either scenario qualifies the index as a "highly selective index"?

    Does anyone know? Appreciate reply.

    Thanx
  4. satya Moderator

    The most effective indexes are the indexes with a small percentage of duplicated values. An index with a high percentage of unique values is a selective index. Obviously, a unique index is highly selective since there are no duplicate entries. The database uses these statistics when generating an execution plan for a query.
    quote:

    The selectivity ratio (Sr) of a potential key value in a table is a good guideline to its usefulness as an index. If the number of rows, which are uniquely identified by the key, is Rk, and the total number of rows on the table is Rt, then

    Sr = 100*(Rk/Rt)

    The lower the selectivity ratio, the more useful the key would be as an index to the table. SQL Server stores this ratio for each index. Note that if the selectivity ratio of a key is more than 15%, the optimizer will ignore the index and chose a full table scan.

    Satya SKJ
    Microsoft SQL Server MVP
    Contributing Editor & Forums Moderator
    http://www.SQL-Server-Performance.Com
    This posting is provided AS IS with no rights for the sake of knowledge sharing.
  5. sqlnewdba New Member

    Thank you satya for responding. I did find the formula while "googling" the day before and substituted the values. Since (100*(6209/9475557) = 0, (unless my math is wrong),the index would be used because of the low,(extreme low), selectivity ratio. I gather this formula is a "good rule of thumb".

    Could you please shed some light on "density" in the "dbcc show_statistics" with my results posted earlier? Still searching ("googling") for some light (knowledge) on this area.

    Thank you
  6. gurucb New Member

    density = 1/selectivity
  7. satya Moderator

    Density measures how selective an index is. The more selective an index is, the more useful it is, and because higher selectivity means that a query can eliminate more rows from consideration.
    quote:The statblob (in sysindexes table) field contains two types of density information: density values and all_density values. The optimizer uses the density and all_density values to determine whether it is better to conduct a table scan or to use an index to access the data.

    Satya SKJ
    Microsoft SQL Server MVP
    Contributing Editor & Forums Moderator
    http://www.SQL-Server-Performance.Com
    This posting is provided AS IS with no rights for the sake of knowledge sharing.
  8. Mark Brady New Member

    Satya,

    I'm not sure where this quote comes from,

    The term "uniquely" seems to be problematic. The sentence is just as true and more accurate without that word there. The inclusion of the word unique then causes people to believe this.

    Clearly the notion that 95% of the values need to be unique is false. It's also clear that this misconception is created by using "unique" in your definitions.
  9. FrankKalis Moderator

    Welcome to the forums!
    Thanks for your contribution, but I guess you have noticed that this thread is 6 years old?
  10. Mark Brady New Member

    If you're saying the information is stale, then deleting this would be useful as it appears to be fueling contemporary misconceptions...
  11. FrankKalis Moderator

    No, I guess that is not what I'm saying. Usually there is no point in replying to such an old thread, because neither the questioner nor the answerer might be still monitoring it, in fact, they might not even still participate in the community. So, there might be no chance to get into discussion with them about their statements.

    In the case of misleading or plain wrong advise given, it might be beneficial to comment to such an old thread in order to keep others from blindly adopting an advise given in an online community, but I think one should question any advise given by someone in a community anyway, regardless of the name of the person giving that advise. Some names of community members though imply greater knowledge than others obviously, but still that shouldn't keep one from using common sense when applying an advise to ones own IT environment. So, is the advise given here wrong or misleading? I can't tell, since to me it is a 6 years-old thread and SQL Server engine components like the optimizer have come a long way since then and I don't bother reading through such an old one anyway.
  12. admin Administrator

    Just to weigh in on the issue of old threads being left live on the site. Old threads are typically left on forums and Q&A sites as it is nearly impossible to decide if a question is obsolete or not - for example, SQL Server 2000 is still used in production so an archive of questions on issues specific to it is still relevant. Even in the main site, in 2011 we published an article specifically on SQL Server 2000 (http://www.sql-server-performance.com/2011/configuring-email-sql-server-2000/) since it was an issue that was still being encountered.

Share This Page