I have a table with no primary key and a non-clustered composite index of two columns. When I look at the number of distinct values (selectivity ratio), the table has a ratio of 52% unique values. I thought an index should be at least 90% unique in order for it to be used. Yet when I run the query with the index, it only takes 2 seconds to run. When I drop the index, it takes 1.25 minutes to run. Why is it using the index when the index is only 52% unique? One of the examples always given is to never have an index on a column that is a flag because it only has 50% distinct values (YES, NO). So, what am I misunderstanding here. Thank you!
Hi, It also depends on the number of rows you have, kind of query you are executing and data types on which predicate in query is based. Probably you would see the theory apply when you have huge table. Cheers Satish
The table has 232,441 rows. The number of distinct values for the index is 123,476. It is SQL Server 2005.
"No primary key" - rule #1, always add a PK to any table. Add an identity column if there is no apparent unique key, make that the PK with a clustered index.
Yes, I agree. I just started working here on Friday as a consultant and asked the very same question. The reason explained to me was the table is a many to many resolution table / cross reference table. The integrity is kept through the foreign keys and, therefore, they did not find it necessary to have a primary key. It would slow down updates and inserts. But since they are thinking of adding a non-clustered index which will slow down inserts and updates, why not create a primary key. I will suggest that. I still am not clear why it is using the index when it has 52% selectivity ratio though. It is SQL Server 2005 with a 232,441 rows and 123,476 unique values. And I am not sure why I don't see a difference in the execution plan. The table that has the new index is being called within a function that is being called within a stored procedure. Do you think this has anything to with this? Thanks!
"The table that has the new index is being called within a function that is being called within a stored procedure. Do you think this has anything to with this?" Is the UDF being called from a query statement (SELECT, UPDATE or DELETE) inside that stored procedure? If so, is it in the WHERE clause of that query? This is the most common cause for indexes getting ignored: incorrect application of a UDF.
Do you have any hints such as index hints to use within that code? Or are you forcing any plan guides to use?
If it's a covering index -- that is, if all columns in the query result can come from the index -- then SQL will use the index because it's less I/O to read the index than the full table. Rather than add an arbitrary PK, which will slow down your lookup query, you could consider changing the current index to a clustered one. This helps by allowing the table to be reorganized (a table w/o a clustered index cannot be reorganized / compacted). But don't rely too heavily on a slogan like "Always have a PK"; that is too broad to apply to every situation.