SQL Server Performance Forum – Threads Archive

Optimizer not using our indexes

Description of problem. We have in the Database table ConsumerListing with a number of searchable columns. We can not predict the combination of search conditions. Because of that we chose to create a number of one-column non-clustered indexes. Two types of searches are being performed on the table. Type 1: we need to access all records which satisfy certain conditions. In our example people who Dogs and interested in Gardening in the household. Type 2 we need to access records which satisfy certain conditions and have certain phone numbers. To achieve that we decided in addition to one-column non-clustered indexes to create indexes with keys AreaCode, Phone and add all the other search conditions on the leaf level of non clustered index (included columns new for SQL Server 2005). We decided to test query select count(*) from ConsumerListing where Dogs=’Y’ AND Gardner=’Y' It took long. Query optimizer issued following execution plan: StmtText ———————————————————————— —- ———————————————————————— —- ———————————————– |–Compute Scalar(DEFINE<img src=’/community/emoticons/emotion-6.gif’ alt=’

‘ />[Expr1004]=CONVERT_IMPLICIT(int,[globalagg1008],0))) |–Stream Aggregate(DEFINE<img src=’/community/emoticons/emotion-6.gif’ alt=’

‘ />[globalagg1008]=SUM([partialagg1007]))) |–Parallelism(Gather Streams) |–Stream Aggregate(DEFINE<img src=’/community/emoticons/emotion-6.gif’ alt=’

‘ />[partialagg1007]=Count(*))) |–Index Scan(OBJECT<img src=’/community/emoticons/emotion-6.gif’ alt=’

‘ />[DD_Data].[dbo].[ConsumerListing].[Ind_Phone]), WHERE<img src=’/community/emoticons/emotion-6.gif’ alt=’

‘ />[DD_Data].[dbo].[ConsumerListing].[Dogs]=’Y’ AND [DD_Data].[dbo].[ConsumerListing].[Gardner]=’Y’)) Meaning instead of seeking indexes on Dogs and Cats and merging results (Behavior we expected) optimaizer choose to scan Ind_Phone. We decided to overwrite this with index hints select count(*) from ConsumerListing with (index (IND_Dogs, Ind_Gardner)) where Dogs=’Y’ AND Gardner=’Y' We get expected Behavior StmtText ———————————————————————— —- ———————————————————————— —- ———————————————————————— —- ———————————————————————— —- ———————————————————————— —- ———————————————————————— —- ———————————————– |–Compute Scalar(DEFINE<img src=’/community/emoticons/emotion-6.gif’ alt=’

‘ />[Expr1004]=CONVERT_IMPLICIT(int,[Expr1007],0))) |–Stream Aggregate(DEFINE<img src=’/community/emoticons/emotion-6.gif’ alt=’

‘ />[Expr1007]=Count(*))) |–Merge Join(Inner Join, MERGE<img src=’/community/emoticons/emotion-6.gif’ alt=’

‘ />[DD_Data].[dbo].[ConsumerListing].[AddressID], [DD_Data].[dbo].[ConsumerListing].[IndividualId], [Uniq1002])=([DD_Data].[dbo].[ConsumerListing].[AddressID], [DD_Data].[dbo].[ConsumerListing].[IndividualId], [Uniq1002]), RESIDUAL<img src=’/community/emoticons/emotion-6.gif’ alt=’

‘ />[DD_Data].[dbo].[ConsumerListing].[AddressID] = [DD_Data].[dbo].[ConsumerListing].[AddressID] AND [DD_Data].[dbo].[ConsumerListing].[IndividualId] = [DD_Data].[dbo].[ConsumerListing].[IndividualId] AND [Uniq1002] = [Uniq1002])) |–Index Seek(OBJECT<img src=’/community/emoticons/emotion-6.gif’ alt=’

‘ />[DD_Data].[dbo].[ConsumerListing].[Ind_Dogs]), SEEK<img src=’/community/emoticons/emotion-6.gif’ alt=’

‘ />[DD_Data].[dbo].[ConsumerListing].[Dogs]=’Y’) ORDERED FORWARD) |–Index Seek(OBJECT<img src=’/community/emoticons/emotion-6.gif’ alt=’

‘ />[DD_Data].[dbo].[ConsumerListing].[Ind_Gardner]), SEEK<img src=’/community/emoticons/emotion-6.gif’ alt=’

‘ />[DD_Data].[dbo].[ConsumerListing].[Gardner]=’Y’) ORDERED FORWARD) We decided to test both situations with DBCC DROPCLEANBUFFERS DBCC FREEPROCCACHE SET STATISTICS IO ON SET STATISTICS TIME ON Resulsts for SET STATISTICS TIME ON SQL Server Execution Times: CPU time = 36266 ms, elapsed time = 396475 ms.–default CPU time = 19032 ms, elapsed time = 19245 ms.–with index hints In my oppinion wiht index hints times match better Results from STATISTICS TIME ON Table ‘ConsumerListing’. Scan count 5, logical reads 3948831, physical reads 7, read-ahead reads 3939681, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.–default Table ‘ConsumerListing’. Scan count 2, logical reads 88589, physical reads 6, read-ahead reads 88581, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.—-with index hints Again all results are better when we’re using two one column indexes. All our one column indexes has low selectivity Partial formatted results from DBCC SHOW_STATISTICS bellow: Dogs RANGE_HI_KEYRANGE_ROWSEQ_ROWSDISTINCT_RANGE_ROWSAVG_RANGE_ROWS U0169,693,20001 Y034,562,52001 Gardener RANGE_HI_KEYRANGE_ROWSEQ_ROWSDISTINCT_RANGE_ROWSAVG_RANGE_ROWS U0159,358,90001 Y044,896,82001 Table has 204,255,720 Please note that regarding of low selectivity if you will run select count(*) from ConsumerListing where Dogs=’Y' (Or anything with only one condition) optimizer will use ind_dogs (seek)(or any other appropriate index. plan? What can we do to avoid index hints( will be difficult to program)? Help will be greatly appreciated.

How many possible values can you have in the "Dogs" column (or any such column)> Two, I guess, since you say they have low selectivity? At first glance, I would imagine that is the problem. The optimizer is not likely to choose an index over a scan when the index is not selective.

I know most likely it is the reason, but
1.Why it is using index in case of search of just one condition. Index has low selectivity as well.
2.Besides index hints for each query is it any global setting we can use to push server to desired behavior
Thank you.

– just a little remark regarding your tests :
Like you stated (correctly) you performed :
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
SET STATISTICS IO ON
SET STATISTICS TIME ONResulsts for SET STATISTICS TIME ON SQL Server Execution Times:
CPU time = 36266 ms, elapsed time = 396475 ms.–default
Did you also perform
the DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE
before you executed your test with the hints ?
resulting to CPU time = 19032 ms, elapsed time = 19245 ms.–with index hints
[?] – dogs=’y’ … meaning you have a Y-N index ? cardinality plays a huge roll with indexes. You can force it using a hint, but there may be better ways to implement it not using just a two-valued index …. thinking out of the box.

Did you also perform
the DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE
before you executed your test with the hints ?
Yes I run DBCC DROPCLEANBUFFERS and DBCC FREEPROCCACHE before test with hints.
dogs=’y’ … meaning you have a Y-N index ? cardinality plays a huge roll with indexes. You can force it using a hint, but there may be better ways to implement it not using just a two-valued index …. thinking out of the box.
Y and U(unknown) to be exact.
Of course two column index will be more efficient. The challenge is that we have around 170 search conditions and combinations are not predictable. That#%92s why we choose to use bunch of one-valued indexes.
Thank you.

"select count(*)" probably is not a good indicator for this reason: Count(*) simply has to count qualifying rows, not return any data; as such, the optimizer is likely to choose to scan the smallest index available that can produce the "where" clause results (even when the selectivity is so low), which is indeed your one-column index. In usage where data needs to be returned from any other column, or where any join is needed, it probably won’t behave that way. I know of no way around the cardinality problem except to break your data out in some other way than binary columns, or perhaps partitioning the table.

… around 170 search conditions …
Should they all be dynamic ?
At first … maybe yes
Maybe you may want to examine this predicate-load and then determine there wich are the most used and maybe even come to a conclusion that there a pattern use-case so you can develop a couple of scenarios with tuned fixed combinations …

well whatever was performance, the drop clean buffers did not take effect before the second query because you can clearly see that both indexes were in memory the net is the default plan was 36 cpu-sec, compared with 19 for the hinted plan,
just to be fair, run the default with OPTION (MAXDOP 1) to get a non-parallel plan comparison what you are seeing is a discrepancy between the optimizer cost formulas and the true cost in any case, the hinted plan is not that efficient either at 19 cpu-sec,
is a throughput of 3 searches/min per proc acceptable, with no other activity i think you need consider some mandatory search conditions, ie, no single bit field searches
and build compound indexes to match requirements

]]>