I am confused on "Index Selectivity". Two scenarios are below. Being new to SQL Server, I am trying to gain additional knowledge on "Index Selectivity". Please help. 1)dbcc show_statistics (table name, index name) Updated Rows Rows Sampled Steps Density Average row length Sep 5 2006 10:19PM 94755577106451 200 6.4302716E-5 36.0 36.0 What is meant by - 6.4302716e-5? Not a Math major. What does it compute to? Using "dbcc show-statistics", is the index a highly selected index? 2) Copied and pasted script from article on your web site "Tips on Optimizing SQL Server Non-Clustered Indexes" that uses the Northwind database. and substituted for my use with same table and index above.. total unique rows - 6209 total rows - 9.47556e+006 selectivity ratio - 0 Confused. The column is not 95% unique. Since it is not, the Query Optimizer might not even use it. However, it will return less than 5% of the rows. Reading the article "Not All SQL Server Indexes Are Created Equal", talks about the 5% rule. If it is less than 5%, the Query Optimizer will use the index. Please advise at your convenience. Most grateful, sqlnewdba
Thank you Adriaan. Do you know if either scenario qualifies the index as a "highly selective index"? Does anyone know? Appreciate reply. Thanx
The most effective indexes are the indexes with a small percentage of duplicated values. An index with a high percentage of unique values is a selective index. Obviously, a unique index is highly selective since there are no duplicate entries. The database uses these statistics when generating an execution plan for a query. quote: The selectivity ratio (Sr) of a potential key value in a table is a good guideline to its usefulness as an index. If the number of rows, which are uniquely identified by the key, is Rk, and the total number of rows on the table is Rt, then Sr = 100*(Rk/Rt) The lower the selectivity ratio, the more useful the key would be as an index to the table. SQL Server stores this ratio for each index. Note that if the selectivity ratio of a key is more than 15%, the optimizer will ignore the index and chose a full table scan. Satya SKJ Microsoft SQL Server MVP Contributing Editor & Forums Moderator http://www.SQL-Server-Performance.Com This posting is provided AS IS with no rights for the sake of knowledge sharing.
Thank you satya for responding. I did find the formula while "googling" the day before and substituted the values. Since (100*(6209/9475557) = 0, (unless my math is wrong),the index would be used because of the low,(extreme low), selectivity ratio. I gather this formula is a "good rule of thumb". Could you please shed some light on "density" in the "dbcc show_statistics" with my results posted earlier? Still searching ("googling") for some light (knowledge) on this area. Thank you
Density measures how selective an index is. The more selective an index is, the more useful it is, and because higher selectivity means that a query can eliminate more rows from consideration. quote:The statblob (in sysindexes table) field contains two types of density information: density values and all_density values. The optimizer uses the density and all_density values to determine whether it is better to conduct a table scan or to use an index to access the data. Satya SKJ Microsoft SQL Server MVP Contributing Editor & Forums Moderator http://www.SQL-Server-Performance.Com This posting is provided AS IS with no rights for the sake of knowledge sharing.
Satya, I'm not sure where this quote comes from, The term "uniquely" seems to be problematic. The sentence is just as true and more accurate without that word there. The inclusion of the word unique then causes people to believe this. Clearly the notion that 95% of the values need to be unique is false. It's also clear that this misconception is created by using "unique" in your definitions.
Welcome to the forums! Thanks for your contribution, but I guess you have noticed that this thread is 6 years old?
If you're saying the information is stale, then deleting this would be useful as it appears to be fueling contemporary misconceptions...
No, I guess that is not what I'm saying. Usually there is no point in replying to such an old thread, because neither the questioner nor the answerer might be still monitoring it, in fact, they might not even still participate in the community. So, there might be no chance to get into discussion with them about their statements. In the case of misleading or plain wrong advise given, it might be beneficial to comment to such an old thread in order to keep others from blindly adopting an advise given in an online community, but I think one should question any advise given by someone in a community anyway, regardless of the name of the person giving that advise. Some names of community members though imply greater knowledge than others obviously, but still that shouldn't keep one from using common sense when applying an advise to ones own IT environment. So, is the advise given here wrong or misleading? I can't tell, since to me it is a 6 years-old thread and SQL Server engine components like the optimizer have come a long way since then and I don't bother reading through such an old one anyway.
Just to weigh in on the issue of old threads being left live on the site. Old threads are typically left on forums and Q&A sites as it is nearly impossible to decide if a question is obsolete or not - for example, SQL Server 2000 is still used in production so an archive of questions on issues specific to it is still relevant. Even in the main site, in 2011 we published an article specifically on SQL Server 2000 (http://www.sql-server-performance.com/2011/configuring-email-sql-server-2000/) since it was an issue that was still being encountered.