SQL Server Performance

need query plan explination

Discussion in 'SQL Server 2005 T-SQL Performance Tuning' started by rklimes, Dec 21, 2007.

  1. rklimes New Member

    I have a fairly basic query and am confused as to why an index is not being used. The query does an index scan the primary key (clustered) index but I expect it to do an index seek on non-clustered index of criteria in where clause. Can someone please clarify why I am not see behavior I expect. I have provided query and execution plan. Thanks.
    select vendorID, count(leadid) as total, sum(accepted) as accept
    from logReports WITH (nolock)
    where received between '2007-12-12' and '2007-12-21'
    group by vendorID
    order by vendorID
    |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[globalagg1006],0), [Expr1004]=CASE WHEN [globalagg1008]=(0) THEN NULL ELSE [globalagg1010] END))
    |--Stream Aggregate(GROUP BY:([whitebox].[dbo].[logReports].[vendorID]) DEFINE:([globalagg1006]=SUM([partialagg1005]), [globalagg1008]=SUM([partialagg1007]), [globalagg1010]=SUM([partialagg1009])))
    |--Sort(ORDER BY:([whitebox].[dbo].[logReports].[vendorID] ASC))
    |--Parallelism(Gather Streams)
    |--Hash Match(Partial Aggregate, HASH:([whitebox].[dbo].[logReports].[vendorID]), RESIDUAL:([whitebox].[dbo].[logReports].[vendorID] = [whitebox].[dbo].[logReports].[vendorID]) DEFINE:([partialagg1005]=COUNT(*), [partialagg1007]=COUNT_BIG([whitebox].[dbo].[logReports].[accepted]), [partialagg1009]=SUM([whitebox].[dbo].[logReports].[accepted])))
    |--Clustered Index Scan(OBJECT:([whitebox].[dbo].[logReports].[PK_logReports]), WHERE:([whitebox].[dbo].[logReports].[received]>='2007-12-12 00:00:00.000' AND [whitebox].[dbo].[logReports].[received]<='2007-12-21 00:00:00.000'))



  2. ndinakar Member

    Is there any index on Received column in the table?
  3. rklimes New Member

    yes there is an index on received, there is also index on vendorID and the primary key is leadid.
  4. martins New Member

    Hi,
    I have seen 2 things that can cause this:
    1. Your statistics on this table is out of date. Update statistics on this table and see if it makes a difference to your execution plan.
    2. Fragmentation. Your index on the "received" field might be very fragmented, and therefore forces the primary index scan. Rebuild the index and see if it makes a difference. Also, if you find that you will use recent dates in most of your queries then it might also be a good idea to sort it desc.
    Hope this helps.
  5. satya Moderator

  6. rklimes New Member

    The statistics and indexes are up to date ( I had updated both before posting). I have tested this a bit more myself and have found by changing a few things with the query i can get it to use the right index although it changes the results.
    In both examples , I get an execution plan that I would expect. Does the fact that I am using the sum and count function change the execution plan?
    1) if I take out 'between' and replace it with ' = ' it will do index seek on the 'received' index.

    select vendorID, count(leadid) as total, sum(accepted) as accept
    from logReports WITH (nolock)
    where received = '2007-12-12'
    group by vendorID
    order by vendorID
    |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[Expr1012],0), [Expr1004]=CASE WHEN [Expr1013]=(0) THEN NULL ELSE [Expr1014] END))
    |--Stream Aggregate(GROUP BY:([whitebox].[dbo].[logReports].[vendorID]) DEFINE:([Expr1012]=Count(*), [Expr1013]=COUNT_BIG([whitebox].[dbo].[logReports].[accepted]), [Expr1014]=SUM([whitebox].[dbo].[logReports].[accepted])))
    |--Sort(ORDER BY:([whitebox].[dbo].[logReports].[vendorID] ASC))
    |--Nested Loops(Inner Join, OUTER REFERENCES:([whitebox].[dbo].[logReports].[leadID]) OPTIMIZED)
    |--Index Seek(OBJECT:([whitebox].[dbo].[logReports].[logReports_A]), SEEK:([whitebox].[dbo].[logReports].[received]='2007-12-12 00:00:00.000') ORDERED FORWARD)
    |--Clustered Index Seek(OBJECT:([whitebox].[dbo].[logReports].[PK_logReports]), SEEK:([whitebox].[dbo].[logReports].[leadID]=[whitebox].[dbo].[logReports].[leadID]) LOOKUP ORDERED FORWARD)
    2) when I take out the leadid and accepted columns from the results it will again do index seek on received column

    select vendorid from logreports with (nolock)
    where received between '2007-12-12' and '2007-12-17'
    group by vendorID
    order by vendorID

    |--Sort(DISTINCT ORDER BY:([whitebox].[dbo].[logReports].[vendorID] ASC))
    |--Parallelism(Gather Streams)
    |--Hash Match(Partial Aggregate, HASH:([whitebox].[dbo].[logReports].[vendorID]), RESIDUAL:([whitebox].[dbo].[logReports].[vendorID] = [whitebox].[dbo].[logReports].[vendorID]))
    |--Hash Match(Inner Join, HASH:([whitebox].[dbo].[logReports].[leadID])=([whitebox].[dbo].[logReports].[leadID]))
    |--Bitmap(HASH:([whitebox].[dbo].[logReports].[leadID]), DEFINE:([Bitmap1003]))
    | |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([whitebox].[dbo].[logReports].[leadID]))
    | |--Index Seek(OBJECT:([whitebox].[dbo].[logReports].[logReports_A]), SEEK:([whitebox].[dbo].[logReports].[received] >= '2007-12-12 00:00:00.000' AND [whitebox].[dbo].[logReports].[received] <= '2007-12-17 00:00:00.000') ORDERED FORWARD)
    |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([whitebox].[dbo].[logReports].[leadID]), WHERE:(PROBE([Bitmap1003])=TRUE) [IN ROW])
    |--Index Scan(OBJECT:([whitebox].[dbo].[logReports].[logReports_B]))

    note: PK_logreports is index on leadid, logreports_A is index on received, and logreports_B is index on vendorID
  7. martins New Member

    Ok, you have done a good job so far in testing different scenarios. From the above, it is clear that when there is a count and/or a sum combined with the between filter, it is less costly for the query engine to do the clustered index scan. All of this is decided by the optimizer and purely so because it will be less costly.
    In my opinion, you can do one of two things here:
    1. Force the query to use your index by using an index hint. This is not advisable.
    2. Change the non-clustered index on "vendorID" to include "leadid" and "accepted" as well. I think this will be sufficient and the optimizer should then use that index.
    Counts and sums are usually very costly, and in the event (as in your first example) where you wanted to retrieve those two columns and aggregate them as well the optimizer chose the best way to do that.
  8. satya Moderator

  9. rklimes New Member

    Thanks for your replies. I have changed the VendorID index to include both leadID and accepted which has changed the execution plan closer to what I was expecting. It now does an Index seek on the "recieved" index and also an index scan on the newly changed "leadid, vendorid, accepted" index. By doing this it has reduced the execution time of the query by about 2/3 (20 sec). Thanks again as this has given me a little more insight as to how optimizer works, which was what I was looking for in the first place.
    have included new execution plan
    |--Compute Scalar(DEFINE:([Expr1003]=CONVERT_IMPLICIT(int,[globalagg1006],0), [Expr1004]=CASE WHEN [globalagg1008]=(0) THEN NULL ELSE [globalagg1010] END))
    |--Stream Aggregate(GROUP BY:([whitebox].[dbo].[logReports].[vendorID]) DEFINE:([globalagg1006]=SUM([partialagg1005]), [globalagg1008]=SUM([partialagg1007]), [globalagg1010]=SUM([partialagg1009])))
    |--Sort(ORDER BY:([whitebox].[dbo].[logReports].[vendorID] ASC))
    |--Parallelism(Gather Streams)
    |--Hash Match(Partial Aggregate, HASH:([whitebox].[dbo].[logReports].[vendorID]), RESIDUAL:([whitebox].[dbo].[logReports].[vendorID] = [whitebox].[dbo].[logReports].[vendorID]) DEFINE:([partialagg1005]=COUNT(*), [partialagg1007]=COUNT_BIG([whitebox].[dbo].[logReports].[accepted]), [partialagg1009]=SUM([whitebox].[dbo].[logReports].[accepted])))
    |--Hash Match(Inner Join, HASH:([whitebox].[dbo].[logReports].[leadID])=([whitebox].[dbo].[logReports].[leadID]))
    |--Bitmap(HASH:([whitebox].[dbo].[logReports].[leadID]), DEFINE:([Bitmap1012]))
    | |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([whitebox].[dbo].[logReports].[leadID]))
    | |--Index Seek(OBJECT:([whitebox].[dbo].[logReports].[logReports_A]), SEEK:([whitebox].[dbo].[logReports].[received] >= '2007-12-12 00:00:00.000' AND [whitebox].[dbo].[logReports].[received] <= '2007-12-21 00:00:00.000') ORDERED FORWARD)
    |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([whitebox].[dbo].[logReports].[leadID]), WHERE:(PROBE([Bitmap1012])=TRUE) [IN ROW])
    |--Index Scan(OBJECT:([whitebox].[dbo].[logReports].[logReports_F]))

Share This Page