How SQL Server Determines an Execution Plan Using Available Indexes and Statistics
In the case of last names, there is a very large difference in the number of occurrences per distinct value. In our database, we host approximately 18 million records with about 500.000 different last names. This gives an average of 36 records per name. In fact, half of these names only occur once. Another 100.000 names occur less than 10 times. The frequently used names occur up to 95,000 times. The variance is extremely high. In cases of such extremely high variance, the use of statistics in this manner is not very accurate.
If a second indexed search parameter is provided, SQL Server might decide not to use this index, but to use a filter to extract the second parameter. When in practice the actual row count is much higher, this plan is less ideal and might lead to extreme query times.
The entry for Smit is found directly in the statistics. Therefore the estimated row count equals the actual row count of 48,760. The same applies to the street name ‘Kerkstraat’ with a rowcount of 42,722. If a search is made for last name ‘Smit’ and street name ‘Kerkstraat’ the following plan is used:
Figure 4: Execution Plan – Search for Smit and Kerkstraat
As can be seen, optimal use is made of the two available indexes. If the same query is made for ‘Smits’, where the estimated row count is 144 (as calculated above) and an actual row count of 25,718 (a given) and ‘Kerkstraat’, the following plan is used:
Figure 5: Execution Plan – Search for Smits and Kerkstraat