SQL Server Performance

Optimizer not using index consistently

Discussion in 'SQL Server 2005 T-SQL Performance Tuning' started by gah, Jul 16, 2009.

  1. gah New Member

    Hi,
    I have a table, call it T, with primary key:
    id int, date_ smalldatetime, f1 decimal(5,2), f2 smallint
    T is clustered on: date_, id, f1, f2
    T has a non-clustered index on: id, date_, f1, f2
    The following query (in a stored procedure) pulls a time series of data. It inconsistently uses the non-clustered index. Sometimes it performs a clustered index seek (incorrect), but if I update statistics on T, SOMETIMES I can get it to do an Index Seek on the non-clustered index (correct).
    CREATE PROCEDURE [dbo].[getValueFromId]
    @id int,
    @f1 decimal(5,2),
    @f2 smallint,
    @dtStart smalldatetime = NULL,
    @dtEnd smalldatetime = NULL

    AS
    BEGIN
    if @dtStart is null set @dtStart = '19000101'
    if @dtEnd is null set @dtEnd = '20781231'

    select date_, value
    from T
    --with (INDEX(IX_nonclustered))
    where
    id = @id and
    date_ between @dtStart and @dtEnd and
    f1 = @f1 and
    f2 = @f2 and
    order by date_ asc
    END
    If I include the index hint, it (correctly) uses the non-clustered index.
    (Replacing "between" with >= and <= does not fix the problem.)
    This appears to be a bug in the optimizer. It's a simple query, no joins, perfectly SARGable, using all fields in the non-clustered index key.
    Aside from using an index hint, does anyone have any suggestions or experience with this kind of problem?
  2. moh_hassan20 New Member

    Welcome to the Forum
    [quote user="gah"]but if I update statistics on T, SOMETIMES I can get it to do an Index Seek on the non-clustered index (correct).[/quote]
    First
    it's not a bug , but by design.
    index selection is statistics oriented, and the optimizer build his decision in the statistics
    second
    you change the passed parameters within your stored procedure , and that is called sniffing , which should be avoided (because it lead to mis decission for the optimizer)
    if @dtStart is null set @dtStart = '19000101'
    if @dtEnd is null set @dtEnd = '20781231'
    third
    your indexes:
    T is clustered on: date_, id, f1, f2
    T has a non-clustered index on: id, date_, f1, f2
    they are redundant index (only the order of fields is different)
    it is enough to use the clustered index and delete the non clustered one.
    and let the leading columns of the index the more selective
    (i don't see date_ in the table structure, and if it's null don't let it a leading column)
    what is the size of that table?

  3. gah New Member

    Thanks. A few comments.
    First: it's inconsistent. If I execute "update statistics", SOMETIMES it uses the correct index; sometimes it does not. To me, that's a bug.
    Second: Why would changing the parameters of the stored proc within the stored proc matter? After all, they are variables -- they are meant to vary!
    Third: No, the indices are NOT redundant! The clustered index orders first on date_, then on id. This is intentional, since new data is added to the table each day across many ids. If I cluster on <id, date_>, then each day's new data causes excessive fragmentation of the clustered index, and is very slow.
    Using only the clustered index to run the query takes 20 seconds. Using the non-clustered index is nearly instantaneous.
    date_ is a smalldatetime, non null.
    These are BIG tables: 300 million rows at least.
  4. ndinakar Member

    if you have 2 indexes with same columns, SQL Server tries to choose the clustered index because its less expensive... the data is directly on the index page(s). Because you have both a CI and a NCI its throwing off SQL Server each time you run the query.. as data gets modified, the statistics change and the query plan is deemed out of date due to older statistics. So at that point of time, depending on the data, the optimizer again creates a plan and it could either use CI or NCI depending on the stats available.
    you have already found the workaround - using the index hint. Thats the best alternative.
    I understand both indexes have different key columns. the index is chosen based on the statistics and not based on the order of columns in WHERE clause...as far as SQL server is concerned, you have 3 columns in your WHERE clause and it found 2 indexes it can use - and depending on the uniqueness of the data in those columns it will either choose a CI or NCI.
  5. gah New Member

    "if you have 2 indexes with same columns, SQL Server tries to choose the clustered index because its less expensive..."
    Thanks! Well, that looks like a bug to me. This is a very simple query, and the indexing scheme has a very good reason to exist as it is.
    SQL Server documentation specifically states that clustered indices and non-clustered indices can be used for the very purpose described in my original post. I'd hope the optimizer looks at more than just column names, and I think it does: otherwise, why would (sometimes) the optimizer choose the non-clustered index (correct behavior), and other times choose the clustered index scan (20x more expensive)?
    I'm not convinced -- still think it's a bug in SQL Server. Any Microsoft experts out there who care to chime in?

  6. ndinakar Member

    The optimizer looks at column names and the indexes that can be used and how updated the statistics for those indexes are.
    What are the total number of records with uniqie "Date_" field and "id" field?
    Select * from (Select [date_] from T group by date_) a
    Select * from (Select Id from T group by [Id]) a
  7. gah New Member

    There are about 300 million rows with unique keys.
    There are about 750,000 with unique <date_, id> fields.
    The optimizer definitely looks at more than column names; it also takes ordering of the fields into account.

Share This Page