adding TOP 500 to query makes it slower? | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

adding TOP 500 to query makes it slower?

Hey folks, I have a query that is behaving oddly. I’m hoping someone here can explain to me why?
The query is: Select top 500 *
from table
where state = ‘wa’ And (ID like ‘a%e%joanne%’) — Runs 22 sec. Returns 96 rows
— Execution plan:
— Total cost: .0334
— Execution plan step 1: Select
— Execution plan step 2: Top (cost: .004, 13%)
— Execution plan step 3: Clustered Index Seek (Cost: .029, 87%)
— Execution plan shows it using CL_ID_City_State (clustered) index.
The weird part is, if I remove the "top 500" and rerun:
— runs 2 sec. Returns 96 rows
— Execution plan:
— Total subtree cost: 18.7
— Execution plan step 1: Select
— Execution plan step 2: Paralellism/Gather Streams (cost: 7.5/41%)
— Execution plan step 3: Clustered Index Seek (cost: 11/59%)
I ran this query multiple times to make sure that caching is not effecting the performance. The one with the "TOP" runs 22 seconds consistently. The one without the TOP runs 2 seconds consistently. The table is 70M rows. The server is a 4 way, 900MHz PIII-Xeon with 8GB of ram. The table has these indexes: CL_ID_City_State (Clustered, on primary filegroup)
NC_ID (NonClustered, Secondary filegroup)
NC_City_State_ID (NonClustered, Secondary filegroup)
NC_State_ID(NonClustered, secondary filegroup) Primary and Secondary filegroups are on the same file system
(Disclaimer: I did not design this database or the indexes) For troubleshooting, I tried both queries again forcing the system to use the NC_State_ID index. The results are:
select top 500 * from table with (INDEX(NC_State_ID)) where state = ‘wa’ And (ID like ‘a%e%joanne%’)
— Runs 3 sec. Returns 96 rows.
— Execution plan: Total subtree cost: 1.56
— Select – top (cost: .0001, 0%) – Bookmark Lookup (1.55, 100%) – Insex Seek (.00572, 0%)
— returns 96 rows in 3 sec
select * from table with (INDEX(NC_State_ID)) where state = ‘wa’ And (ID like ‘a%e%joanne%’)
— Runs 3 sec. Returns 96 rows.
— Execution plan: Total subtree cost: 1.56
— Select – Bookmark Lookup (2651, 100%) – Insex Seek (4.17, 0%)
— returns 96 rows in 3 sec
Anyhow, can anybody shed light on why it runs slower with the "top 500" limiter? It seems to me that should only make it faster. Thanks, Steve

Sorry, I should add that the execution plan for the first query (with or without the top 500) says it uses the clustered index. Thanks! Steve
May be, just may be, if you run update statistics on this table time change.
Luis Martin …Thus mathematics may be defined as the subject in which we never know what we are talking about, nor whether what we are saying is true.
Bertrand Russell
Luis, Thanks for the suggestion. I ran update statistics against that table last night then reran the query. Same results (22 sec runtime when cached. If I remove the top, 2 second runtime).
I understand why the query runs slow – The where clause limits the index’s effectiveness to narrowing it down to a% so it must do a table scan on the rest. But what I dont get is why the query is slower when I try to limit how many rows it returns. Any thoughts would be appreciated. Steve Sadler

How about DBREINDEX? _________
Satya SKJ
Moderator
SQL-Server-Performance.Com

Have you tried using SET ROWCOUNT instead of a TOP clause? I would be interested to see what happens then.
Hey folks, This is a read only database. The data was generated and then idnexes were created, and from then on only selects have been issued against it. I suspect that this means I didnt need to reindex or do update statistics. Is this correct?
BTW, good idea about the set rowcount. I gave that a try with various rowcounts. The first four dupliate entries were to ensure caching would not skew my results. Note: There are only 96 rows that match the where clause.
Rowcount 0: 9 sec runtime (first run – data not cached)
Rowcount 0: 2 sec runtime (2nd run – data cached)
Rowcount 0: 2 sec runtime (3rd run – seems consistent)
Rowcount 0: 2 sec runtime (4th run – seems consistent)
Rowcount 500: 23 sec runtime
Rowcount 1: 1 sec runtime
Rowcount 50: 3 sec runtime
Rowcount 100: 27 sec runtime
Rowcount 200: 31 sec runtime
Rowcount 300: 25 sec runtime
Rowcount 400: 27 sec runtime
Rowcount 500: 25 sec runtime
Rowcount 500000: 2 sec runtime I’m starting to conclude that sql is optimizing the query, and for a rowcount/top of 1-1000 it does something different, but by the time you hit 500000 it decides that it’s not worthwhile and goes back to the non rowcount model. Next, I re-ran to get execution plans for rowcount=200 and rowcount=500000. The results were: set rowcount 200
select * from combined state = ‘wa’ And (ID like ‘a%e%joanne%’) — Ran 31 sec
— Execution plan step 1: Select. Step cost: .0017 (11%)
— Execution plan step 2: Clustered Index Seek. Step cost: .0136 (89%)
set rowcount 0
set rowcount 500000
select * from combined state = ‘wa’ And (ID like ‘a%e%joanne%’) — Ran 2 sec
— Execution plan steps 1: Parallism/Gather Streams. Step cost: 4.46/41%
— Execution plan steps 2: Clustered Index Seek. Step cost: 6.43/59%
set rowcount 0
As always, any thoughts would be appreciated.
Have you tried the top 500 with the query hint MAXDOP 1? I’m guessing that sql is assuming that you expect a large returned record set (since you specified top 500 records) and its running parallel threads in hopes of making it more efficient. Without the top 500, it may chose a different optimzation path. Chris
]]>