SQL Server Performance

Very Slow Query - Very Strange

Discussion in 'Performance Tuning for DBAs' started by gmichetti, Nov 24, 2003.

  1. gmichetti New Member

    There is a query performance problem some developers came to me with.


    The query is for 4 different products, lets call them books. (they are not all books however.)


    Here is the query:


    THE QUERY:

    SELECT isnull(b.ean, N.ean) as EAN, isnull(n.onHandQty,0) as onHand,
    isnull( n.onOrderQty, 0) as OnOrder,
    isnull(P.ListPrice, b.Price) as ListPrice, P.SalePrice,
    b.Price as Price
    FROM books b (nolock)
    left join inventory N (nolock) on b.ean=n.ean and n.StoreNo= convert(int,substring(@@servername, 4,4))
    left join ( plist P (nolock) join pdiscounts M (NOLOCK)
    on ( p.Market=M.market and convert(char(11), P.StartDate, 102) <= convert(char(11), getdate(), 102)
    and convert(char(11), P.EndDate, 102) >= convert( char(11), getdate(), 102 ) )
    ) on N.Ean=P.ean
    WHERE b.ean in ( '9780375826689','9780385757959','9780385758093','9780375926686' )
    order by SalePrice, ListPrice

    This query takes almost 1 minute and a half on our production servers. Of course it runs faster the second time the query runs but that's not my point.
    If I choose any and I mean "ANY" other bookid, the results come back in seconds, less the 3 or four seconds. There are approx. 4 millions rows in the books table.

    The indexes have been rebuilt, there is no data corruption. The query works perfectly on any other bookid. What could the problem be? I deleted and reinserted the records and rebuilt the indexes. Same performance: BAD.




    Here's are some details. I'll provide more if anyone needs it.

    WINDOWS NT 4
    SQL SERVER 6.5 sp5a
    2 Pentium 2 proccesors
    512 MB of memory
    RAID 5



    Here's the showplan and stats I/O output for the FULL query:


    THE QUERY:



    showplan and stats I/O output

    STEP 1
    The type of query is INSERT
    The update mode is direct
    Worktable created for ORDER BY
    FROM TABLE
    books b
    Nested iteration
    Index : pk_books_ean
    LEFT OUTER JOIN : nested iteration
    FROM TABLE
    inventory N
    Nested iteration
    Using Clustered Index
    LEFT OUTER JOIN : nested iteration
    FROM TABLE
    plist P
    Nested iteration
    Using Clustered Index
    FROM TABLE
    pdiscounts M
    Nested iteration
    Table Scan
    TO TABLE
    Worktable 1
    STEP 2
    The type of query is SELECT
    This step involves sorting
    FROM TABLE
    Worktable 1
    Using GETSORTED Table Scan






    Table: books scan count 1, logical reads: 16170, physical reads: 9079, read ahead reads: 293
    Table: Inventory scan count 4, logical reads: 16, physical reads: 2, read ahead reads: 0
    Table: pList scan count 1, logical reads: 3, physical reads: 0, read ahead reads: 0
    Table: pDiscounts scan count 0, logical reads: 0, physical reads: 0, read ahead reads: 0
    Table: Worktable scan count 0, logical reads: 0, physical reads: 0, read ahead reads: 0


    The bookid column is the primary key but is a non-clustered index. The devleopers wanted it that way to use the clutstered index as a search string. (not my idea.)

    The question is really this: Everything being equal, why does this query run perfectly for any other bookid, except for these 4?



    gmichetti








  2. Luis Martin Moderator

    Try to run Index Tuning using SQL Analyzer and this Query to find out what are the recommendations.


    Luis Martin

    ...Thus mathematics may be defined as the subject in which we never know what we are talking about, nor whether what we are saying is true.
    Bertrand Russell
  3. bradmcgehee New Member

    Is the execution plan the same for the bad "book" as it is for the good "books"?

    -----------------------------
    Brad M. McGehee, MVP
    Webmaster
    SQL-Server-Performance.Com
  4. gmichetti New Member

    The execution plan is the same for each query.

    The index tuning is not a part of SQL SERVER 6.5. Or have I missed something?

    I did run the query through a couple of trace flags recommended in the book "Inside Sql Server 6.5" on page 728. Here are the trace flags used: 3604, 302, 310. There is a lot of output here and most of it fairly cryptic. I am looking it over now to see if there is anything useful.





    gmichetti
  5. satya Moderator

    True, Index tuning wizard is introduced from ver 7 onwards.
    For that trace information on 3604n no action is necessary unless you want to insert that row into the table. If so, you can drop and re-create the index without the unique option, or you can change the data causing the uniqueness violation.

    How often you reindex the index on this table(s)?

    _________
    Satya SKJ
    Moderator
    SQL-Server-Performance.Com
  6. ChrisFretwell New Member

    Very strange indeed. If you keep these 4 and add another one, is the query plan still the same? Is it still the same if you use only 1 of these ones and 3 other book ids? Of is it only when its these three ids?

    Also, likely not related to your concern, why do you convert your dates to character before comparing them in your join clause? Not that its a huge deal, but if your table fields are date formats, there is no reason to convert before you compare, and converting to compare can have a negative performance hit in some cases.

    Chris
  7. bradmcgehee New Member

    If the execution plans are the same, then the results, in regards to performance should be the same. Since they are not, this would imply to me some sort of bug or data corruption somewhere. Where, I don't know.

    I know you won't like this recommendation, but have you considered upgrading to SQL Server 2000? If money has been a problem, maybe this is a good reason to convince your boss to upgrade now.

    -----------------------------
    Brad M. McGehee, MVP
    Webmaster
    SQL-Server-Performance.Com
  8. gmichetti New Member

    satya,

    I don't think I follow your point. I have dropped and recreated the index. No difference. The example from the book is a select query.

    My point here , is one that I will need to reiterate again and again, is why does the one query perform slowly, while the other one does not? The execution plan is the same for both queries. However, the physical and read ahead reads seem to be execessive, even for the first time the query is run.


    Chris,
    Haven't tried that exactly, but I did break up the query with every combination of the 4. Like I said two of these book_ids are not books. Two of these book_ids are floor displays. They are not part of inventory, nor do they ever get reorders. It is bad data, not corrupt data. The best performance is when I only uses the two book_ids that are actually books. Usually comes back in 3 to 5 seconds. The problem we were having is the client app was timing out since the query was taking almost 90 seconds the very first time it was executed. And I should mention that this query is in a store procedured that executes based on user input. Essentialy, return all book_ids where the tile is 'LIKE' 'sometitle%'. And again, we don't have this type of problem with 'ANY' other query we know of.


    Brad,
    I thought of that. And we did check the data on our lab servers. Checked out ok. Now this is happening on ALL our production servers. And we have a lot of them. It could be that the data is actually corrupt, but we didn't find any problems on our lab servers. That's not to say that there isn't a problem on the production servers, but we have not checked yet. My gut tells me there is not any 'corrupt data'. It is in fact, bad data, we just should not have these two recrods in the system at all.



    We have upgraded to SQL SERVER 2000 on most of our production servers last year. But we still have quite a few SQL SEVER 6.5 servers out there. (You don't want to know how many!) I am hoping they will go away in the next year or so, but I have serious doubts about it. And I know there is going to be almost no support from MS on this. (Told everyone that last year.) I can forsee us still having 6.5 in production into 2005 and maybe beyound. It's a sad state of affairs, but that's what I have to deal with. Tried this query on the SQL Server 2000 servers and there is no problem whatsoever.

    There are many theories that have come up. The setup for the data devices is 14 devices over two disks with the log on another drive. It's all raid 5 so.... We don't hae the a mirrored set for the log devices. Is the data spread across two devices on different drives? Don't think so, and that realy shouldn't produce this type of really bad performance. There are four records here, and we are traversing on average over 9000 data pages and a few hundred read aheads? That does not make sense and if that is the way it is supposed to work, then it is a very, very bad design.


    A quick fix(suggested by me) is to delete those two (useless)records from the prodcution servers to improve the performance at once, and then deal with what the problem is. We have already called in Microsoft. This is one of those incidents where the quick fix will be implemented and I am afraid we may never find out why it happened. However, I am determinted to find out why, which is the reason I am posting here. So if I have missed something, or someone can point me in the right direction please let me know and I will post my findings and the answer to this here. In the end, it will help all of us if we all keep pounding away at these odd little problems that come up.


    PS

    Chris I agree with you about the comparision of the date fields. Again, I am just the DBA is who isn't always shown the code before it is in production. I have to figure out why there is a problem only after the fact. I try to keep track of the development servers and see what new code or objects they are coming up with. But there are more developers then there are DBAs in my group and more production servers than you can shake a stick at. You simply end up learning how to thread the needle on the fly while you're forever sliding down a vortex of spindle.


    gmichetti





  9. ChrisFretwell New Member

    gmichetti, try changing the date compares to not convert and see what happens. It cant hurt.

    I had a similar issue in a 6.5 installation and cant remember if it was ever resolved (we upgraded the server). This one was involved a date field in the query. It was a vender supplied application and we couldnt change that part at all. But it was a similar thing, if we put in a date value earlier than jan 1, 1998 (dont ask me why that date) the query would time out. Any other date value and it was fine.
  10. Twan New Member

    SQL 6.5 has a kown problem with trying to query with where clauses that have data off the end of an index. SQL will actually resort to a table scan

    Cheers
    Twan
  11. richmondata New Member

    First, stats i/o seems to show an index SCAN of the books table pk_books_ean (at best). The tipoff is the scan count = 1. If the optimizer was using its OR STRATEGY to process the IN-list (which you would expect), you should see 4 scans (one each per key value). When I worked with 6.5, I would try to turn my in-lists into temp tables and then join from the temp table instead of having the WHERE/IN clause. Alternately, you could convert the in-list to an explicit set of OR clauses (thereby attempting to lure the OR STRATEGY out of hiding). Another suggested remedy "back in the day" was to turn the in-list into n-queries UNION'd (or UNION ALL'd) together.

    Speculation as to why this is happening for only these values: it's probably due to some estimates based on the statistics for the index. All four key values share their first 5 high-order digits with the 6th digits being either '7' or '8'. I also remember that the SQL Server optimizer for 6.5 got strange (i usually say 'lobotomized' with one of the last service packs, with something tantamount to an admission of that in the 6.5 KB once upon a time).
  12. satya Moderator

    But SP5a is the last and latest on SQL 6.5 and instead you may need to follow the tips listed above on this website.

    _________
    Satya SKJ
    Moderator
    SQL-Server-Performance.Com
  13. gmichetti New Member

    Thanks again to everyone for your replies.

    Ok so here's what I did.

    Since I suspected a problem of some kind with the indexes, I thought I would rebuild them.

    Now what we normally do is use DBCC REINDEX to rebuild them, but I have seen at times that it doesn't really help. Not sure why. Even on SQL 2000. It might be that if there is some problem with the data the DBCC exits gracefully but I'm not sure if that is true. Anyone know the difference if any, between using DBCC REINDEX as oppposed to dropping and then recreating the indexes?

    Now we did try to simply drop and recreate the indexes but this did not help either. SO...

    I took a drastic measure, I bulk copied all of the data out. Over 4 million rows. dropped all of the indexes and truncated the table. I then loaded the data back and put on the indexes. This took approx. 2 1/2 hours.

    I then ran the same query and this is this is the result:

    Table: title scan count 4, logical reads: 25, physical reads: 0, read ahead reads: 0
    Table: Worktable scan count 4, logical reads: 24, physical reads: 4, read ahead reads:0

    So now that this seems to fix this problem I have to run this on all of our servers. There are a lot of them!!! I'm sure that doing the 'refresh' of the data and reallocating the data and index pages would probably help, but I didn't want to be forced to do that.

    Now I don't have a very big maintence window during the overnight processing and doing what I did here can be problematic. What do you guys think? And if you could also comment on the pros and cons of dbccc dbreindex and simply dropping and recreating them.



    thanks
    guy






  14. richmondata New Member

    The new scan count of 4 shows that after the reorg, the optimizer did use the OR STRATEGY (essentially breaking an in-list into n separate queries, then unioning the results). This is also demonstrated by the 4 worktables. Why the optimizer only did this after the reorg and not before is puzzling. I think the strategy of loading the in-list values into a temp table, then joining from the temp table should prompt the optimizer to do the right thing. If this approach helps, then you may be able to live without the full-blown reorg.

    Another approach to the reorg (if needed) would be to create another table with the same definition as the original table (minus the constraints, triggers, and indexes at first). You may want to put this table on a different device/segment to be sure you're putting it into non-fragmented disk space. Load the new table using an insert/select from the old table with an order by on the values from the clustered index. After that, create the clustered index on the new table using the WITH SORTED_DATA (or SORTED_DATA_REORG) option. Next, add the other indexes, triggers, and constraints (using WITH NOCHECK for any FK's). Some care may need to be taken when naming constraints on the new table. When you're done, switch table names using sp_rename. This should be faster than the bcp option, but I'm not sure by how much.

Share This Page