SQL Server Performance

Newbie,clustered index scan vs index seek

Discussion in 'Performance Tuning for DBAs' started by renwickyu, May 21, 2005.

  1. renwickyu New Member

    Hi everyone, I'm a newbie in SQL server and have something to ask your opinion.
    I have a table consist of 600.000 - 800.000 data, put an index on a column (say: CloseDate) and have this query

    select * from Table1 where CloseDate = 'somedate'

    and from execution plan I could see that query optimizer didn't use the index, it uses only clustered index scan. so I try to change it to

    select * from Table1 with (index(ix_table1_closedate)) where CloseDate = 'somedate'

    and it turns out that the query optimizer uses index seek + bookmark lookup.

    so can anybody suggest which one is better, I'm quite hintless which one to pick, or is it almost the same since the data is small? and anyway the table is a monthly data table, so it's quite small and the row is quite fix in number (around 600.000 - 800.000). If in this case Index seek won't do any better significantly, then should I avoid it? since using index might slow down insert/update/delete operation.

    thx a lot guys, sorry for my english.
  2. mmarovic Active Member

    I believe second plan produces faster execution when you have at least a few days data entered. Why don't you run both queries after cleaning cash several times and compare execution times?
  3. renwickyu New Member

    Ups, forget to mention that I did try to run both queries several times and found out that two of them didn't show a significant difference. Each query took around 3-4 seconds. so the difference is below 1 second. I do have all 30 days in related month entered into this table, so you think I should use second procedure? thx
  4. FrankKalis Moderator

    Generally Seeks are preferable to Scans.
    However, as you say this table stores monthly data, the selectivity of the index on dates is likely to be very low. I believe this to be the reason, why the optimiser opts for a Scan in your case and ignores the index at all. Furthermore I wouldn't force SQL Server here to use the index with the index hint. Read BOL for the architecture of indexes. If you force the use of a nonclustered index, SQL Server must make every time an additional step (the bookmark lookup) to find the data, while the clustered index scan is already at the data. What is the output when you run both queries with SET STATISTICS IO ON?

    --
    Frank Kalis
    SQL Server MVP
    http://www.insidesql.de
  5. mmarovic Active Member

    quote:However, as you say this table stores monthly data, the selectivity of the index on dates is likely to be very low.
    Selectivity number of days stored varies from 1 to 31, so if you have data for a few days using non-clustered index is effective enough to be used. Algorithms used by query optimizer tend to understimate effectivness of non-clustered index so forcing non-clustered index is optimization I use the most.


    quote:Ups, forget to mention that I did try to run both queries several times and found out that two of them didn't show a significant difference. Each query took around 3-4 seconds. so the difference is below 1 second. I do have all 30 days in related month entered into this table, so you think I should use second procedure? thx
    If you want to force exec plan then use second proc, otherwise don't force anything and allow query optimizer to choose the plan.
  6. renwickyu New Member

    hello guys, I've done what you ask, this is the result<br />before each query I run<br /><br /><br />DBCC DROPCLEANBUFFERS<br />DBCC FREEPROCCACHE<br /><br />to clear the cache, and then set <br /><br />SET STATISTICS IO ON<br />SET STATISTICS TIME ON<br /><br /><br />and here is for the ordinary query (clustered index scan) result:<br /><br />-----------------------------------<br /><br /><br />select * from CIFAcc200411WIthDate where OpenDateDay=15<br /><br />SQL Server parse and compile time: <br /> CPU time = 0 ms, elapsed time = 0 ms.<br /><br />SQL Server Execution Times:<br /> CPU time = 0 ms, elapsed time = 0 ms.<br />SQL Server parse and compile time: <br /> CPU time = 0 ms, elapsed time = 236 ms.<br />SQL Server parse and compile time: <br /> CPU time = 0 ms, elapsed time = 0 ms.<br /><br />(22030 row(s) affected)<br /><br />Table 'CIFAcc200411WithDate'. Scan count 1, logical reads 17071, physical reads 0, read-ahead reads 15406.<br /><br />SQL Server Execution Times:<br /> CPU time = 872 ms, elapsed time = 13241 ms.<br /><br />SQL Server Execution Times:<br /> CPU time = 872 ms, elapsed time = 13247 ms.<br /><br />SQL Server Execution Times:<br /> CPU time = 872 ms, elapsed time = 13263 ms.<br />SQL Server parse and compile time: <br /> CPU time = 0 ms, elapsed time = 0 ms.<br /><br />SQL Server Execution Times:<br /> CPU time = 0 ms, elapsed time = 0 ms.<br /><br /><br /><br />and here is for the force with index query (2nd plan) result:<br /><br />-----------------------------------<br />select * from CIFAcc200411WIthDate with (index (ix_cifacc200411Withdate))where OpenDateDay=15<br /><br />SQL Server parse and compile time: <br /> CPU time = 0 ms, elapsed time = 0 ms.<br /><br />SQL Server Execution Times:<br /> CPU time = 0 ms, elapsed time = 0 ms.<br />SQL Server parse and compile time: <br /> CPU time = 20 ms, elapsed time = 188 ms.<br /><br />(22030 row(s) affected)<br /><br />Table 'CIFAcc200411WithDate'. Scan count 1, logical reads 113476, physical reads 0, read-ahead reads 34.<br /><br />SQL Server Execution Times:<br /> CPU time = 6469 ms, elapsed time = 11597 ms.<br /><br />SQL Server Execution Times:<br /> CPU time = 6469 ms, elapsed time = 11613 ms.<br /><br />SQL Server Execution Times:<br /> CPU time = 6469 ms, elapsed time = 11638 ms.<br />SQL Server parse and compile time: <br /> CPU time = 0 ms, elapsed time = 0 ms.<br /><br />SQL Server Execution Times:<br /> CPU time = 0 ms, elapsed time = 0 ms.<br /><br /><br />so it said that the 2nd procedure need more logical reads (113476 to 17071), is it true that the 1st query is better? according to article "Use SET STATISTICS IO and SET STATISTICS TIME to Help Tune Your SQL Server Queries" in sql-server-performance.com<br />thx a bunch <img src='/community/emoticons/emotion-1.gif' alt=':)' />
  7. FrankKalis Moderator

    I would say yes. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br />--<br />Frank Kalis<br />SQL Server MVP<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />
  8. mmarovic Active Member

    Second query reads more and uses cpu more, but is faster based on results you posted.

    I prefer the second one because of concurrency issues. Clustered index scan will read all rows in the table and updates may need to wait for sp to release shared locks.

    What I do like about the first one is that you leave decision to query optimizer. So you can do the same and go with second one if you see that increased locking slows down updates.
  9. mmarovic Active Member

    You can also use noLock hint if it is acceptable.
  10. renwickyu New Member

    I see, and since mmarovic mention about locking, there is another thing about this table. In my case, this table will only acts as "cache" table for my application. so after it has been processed (daily - after office) the next morning it won't be processed anymore (update/delete/insert) and it'll only be queried (select) by my application, so concurrency won't be an issue. <br /><br />so the conclusion is 1st plan: less disk read but slower, 2nd plan: more disk read but faster, any hint which element to sacrifice? <img src='/community/emoticons/emotion-1.gif' alt=':)' /> Is "elapsed time" a reliable (and absolute) unit to be used as a query measurement?<br /><br />And one more thing, is using noLock-hint faster than using no noLock-hint at all? thx <img src='/community/emoticons/emotion-1.gif' alt=':)' />
  11. tjjfv New Member

    quote:Originally posted by renwickyu

    I see, and since mmarovic mention about locking, there is another thing about this table. In my case, this table will only acts as "cache" table for my application. so after it has been processed (daily - after office) the next morning it won't be processed anymore (update/delete/insert) and it'll only be queried (select) by my application, so concurrency won't be an issue.

    Is this the primary query related to this table? If so, have you considered modifying the clustered index or creating an indexed view?


    quote:
    so the conclusion is 1st plan: less disk read but slower, 2nd plan: more disk read but faster, any hint which element to sacrifice?

    One factor to consider is that time related to CPU will not decrease due to the availability of pages in the cache, while time related to physical disk reads will.


    quote:
    Is "elapsed time" a reliable (and absolute) unit to be used as a query measurement?

    Ultimately, measurement should be in the context of testing actual workload scenarios. However significant optimization can be based off of measurement of a query's isolated performance and consideration of a query's overall influence upon other potenially concurrent queries in a workload through consideration of the utilization of shared resources such as CPU, disk, and locks.


    quote:And one more thing, is using noLock-hint faster than using no noLock-hint at all?

    Using NOLOCK will result in a minor benefit in decreased CPU from allowing the execution of the query to not involve the checking/setting/releasing of locks and possibly a major benefit in decreased latency by allowing greater concurrency though the trade off is no transaction isolation. However, if you do not expect any modifications when you are reading then you would not see the major benefit as no locking delay would be expected. Thus, since there would not necessarily be a significant benefit to using NOLOCK, it may be better to simply not use the NOLOCK hint.
  12. tjjfv New Member

    quote:I prefer the second one because of concurrency issues. Clustered index scan will read all rows in the table and updates may need to wait for sp to release shared locks.
    Even in the index seek, the default isolation level requires serializability (no phantom reads) and repeatable reads and thus should hold the same locks on the table necessary to avoid these.
  13. mmarovic Active Member

    Do you want to say that default isolation level is repeatable reads?
  14. tjjfv New Member

    quote:
    and here is for the ordinary query (clustered index scan) result:

    select * from CIFAcc200411WIthDate where OpenDateDay=15

    Table 'CIFAcc200411WithDate'. Scan count 1, logical reads 17071, physical reads 0, read-ahead reads 15406.

    ...

    and here is for the force with index query (2nd plan) result:

    select * from CIFAcc200411WIthDate with (index (ix_cifacc200411Withdate))where OpenDateDay=15

    Table 'CIFAcc200411WithDate'. Scan count 1, logical reads 113476, physical reads 0, read-ahead reads 34.

    I do not understand why the statistics for the execution of the second query indicate so few physical/read-ahead reads. This suggests that the logical reads referenced pages that were in the cache, which would decrease the overall execution time. Is there another explanation for this, or was the testing in error?
  15. mmarovic Active Member

    To clarify locking impact: Clusterd index scan will read all rows in the table, so it would put shared lock on all of them. Index seek will read just rows for the specified date so just rows for that date will be accessed and locked. It means if there is update at the same time it will have to wait for shared locks set by clustered index scan to be released. In case of index seek update will have to wait only if it updates rows for the date queried.
  16. tjjfv New Member

    quote:Originally posted by tjjfv_uphs
    ... the default isolation level requires serializability (no phantom reads) and repeatable reads ...
    This portion of my statement was completely incorrect, as the default isolation level is 'read committed' (I've been thinking about transaction/concurrency theory too much, and forgot about the practical considerations that justify a less isolated default level). Additionally, the overall statement was not as relevant as a consequence.
  17. mmarovic Active Member

    You were wrong about consequence as well as I explained in the previous post. And yes, noLock hint will have significant impact only if there are many updates at the same time as selects mentioned (which is the situation i refered to).
  18. tjjfv New Member

    quote:Originally posted by mmarovic

    You were wrong about consequence as well as I explained in the previous post.
    The first part is true, the second is not. An explanation of why I was wrong would have stated that repeatable read requires only that locks be held until the end of the transaction, thus there are no additional table/index locks being held, and serializable requires only key-range locks on the relevant indices, which would thus only impact updates related to those key-ranges.

    Furthermore, as the default isolation level of read committed does not require read locks to be held until the end of the transaction, the shared locks on additional records/pages accessed should be released immediately after being read, so it is not likely that any updates would be significantly delayed by the query in question. I think it would be more likely that the query in question would be delayed by an update holding an exclusive write lock until the transaction ends (committed/aborted).



    quote:Originally posted by mmarovic

    And yes, noLock hint will have significant impact only if there are many updates at the same time as selects mentioned (which is the situation i refered to).(which is the situation i refered to).
    I realized that is what you were referring to; I was just trying to clarify that further for the individual who started the thread who had asked further questions regarding the details of NOLOCK.


    Any thoughts on the lack of physical/read-ahead reads in the execution of the query using the index seek?
  19. mmarovic Active Member

    That is pretty good explanation. I have no idea about physical reads. I didn't want to comment that. It looks strange and I can't test it myself and even I don't have time right now, but I would like to hear explanation or at least theory <img src='/community/emoticons/emotion-1.gif' alt=':)' />
  20. mmarovic Active Member

    Is it possible that read-ahead reads (in case of clustered index scan) are not included in the statistic?
  21. tjjfv New Member

    quote:Originally posted by mmarovic

    Is it possible that read-ahead reads (in case of clustered index scan) are not included in the statistic?
    The statistics posed show read-ahed reads for the first query which was the one using the cluster index scan, but not for the second which was using the index seek. I think the original poster needs to confirm the second query statistics are accurate to the performance without making use of the data cache; running it immediately after a server restart would help ensure there are no other influencing factors.
  22. mmarovic Active Member

    I believe it is enough to execute checkpoint then to clear both data and proc buffers.
  23. tjjfv New Member

    quote:Originally posted by mmarovic

    I believe it is enough to execute checkpoint then to clear both data and proc buffers.
    It should be, but that is what the poster stated that they did, and the statistic do not seem to make sense. I would be interested to know what the results are after a restart, as well as after forcing the clear of the data buffer.
  24. gurucb New Member

    SET STATISTICS IO ON
    SET STATISTICS TIME ON


    and here is for the ordinary query (clustered index scan) result:

    -----------------------------------


    select * from CIFAcc200411WIthDate where OpenDateDay=15

    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.
    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 236 ms.
    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 0 ms.

    (22030 row(s) affected)

    Table 'CIFAcc200411WithDate'. Scan count 1, logical reads 17071, physical reads 0, read-ahead reads 15406.

    SQL Server Execution Times:
    CPU time = 872 ms, elapsed time = 13241 ms.

    SQL Server Execution Times:
    CPU time = 872 ms, elapsed time = 13247 ms.

    SQL Server Execution Times:
    CPU time = 872 ms, elapsed time = 13263 ms.
    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.



    and here is for the force with index query (2nd plan) result:

    -----------------------------------
    select * from CIFAcc200411WIthDate with (index (ix_cifacc200411Withdate))where OpenDateDay=15

    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.
    SQL Server parse and compile time:
    CPU time = 20 ms, elapsed time = 188 ms.

    (22030 row(s) affected)

    Table 'CIFAcc200411WithDate'. Scan count 1, logical reads 113476, physical reads 0, read-ahead reads 34.

    SQL Server Execution Times:
    CPU time = 6469 ms, elapsed time = 11597 ms.

    SQL Server Execution Times:
    CPU time = 6469 ms, elapsed time = 11613 ms.

    SQL Server Execution Times:
    CPU time = 6469 ms, elapsed time = 11638 ms.
    SQL Server parse and compile time:
    CPU time = 0 ms, elapsed time = 0 ms.

    SQL Server Execution Times:
    CPU time = 0 ms, elapsed time = 0 ms.


    The second plan when using non clustered index is using more logical reads as it has book mark and then it reads from Clustered index key.

    i think if the number of rows returned by the query, then book mark lookup has to increase and second query will be as good as the first one.

    What about covered index if it is possible?


  25. mmarovic Active Member

    quote:i think if the number of rows returned by the query, then book mark lookup has to increase and second query will be as good as the first one.
    I think when number of rows in the table increases clustered index scan will produce more and more logical reads while non-clustered index range scan will have the same number of logical reads as now.
    quote:What about covered index if it is possible?
    It is possible but not recommended as far as you use select *.

Share This Page