SQL Server Performance

a better loop

Discussion in 'T-SQL Performance Tuning for Developers' started by bolo, Feb 14, 2006.

  1. bolo New Member

    Hi there,

    Here is my table:

    ----------------------------------------------------
    CREATE TABLE [dbo].[MyTable] (
    [ColumnA] [int] NOT NULL ,
    [ColumnB] [datetime] NOT NULL ,
    [ColumnC] [decimal](5, 4) NOT NULL
    ) ON [PRIMARY]
    GO

    ALTER TABLE [dbo].[MyTable] WITH NOCHECK ADD
    CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED
    (
    [ColumnA]
    ) ON [PRIMARY]
    GO

    ----------------------------------------------------

    this table has about 20 mil records and
    there is a loop required for this table.
    Here is my method:
    ...code from inside a stored procedure
    ----------------------------------------------------
    DECLARE @VarB datetime
    DECLARE @VarC decimal(5,4)

    DECLARE MyCursor CURSOR FAST_FORWARD FOR
    SELECT ColumnB, ColumnC FROM MyTable

    OPEN MyCursor
    FETCH NEXT FROM MyCursor INTO @VarB, @VarC
    WHILE @@FETCH_STATUS = 0
    BEGIN
    --some processing
    --it has to read row by row and do something

    FETCH NEXT FROM MyCursor INTO @VarB, @VarC
    END
    CLOSE MyCursor
    DEALLOCATE MyCursor
    ----------------------------------------------------

    for a simple processing block it takes about 18 min to run on my machine
    (Athlon XP 2200, 760MB RAM and 35 GB free space). My computer is very slow
    after executing this loop.

    Do you see any better solution to loop through this records?
    I made MyTable smaller (about 1 mil) and run time was 1 min and 17 sec
    which is greater than 1/20 of 18 min. Does it really help to split my table?

    Thank you,

    Bolo
  2. Madhivanan Moderator

    What type of process are you doing using Cursors? Give more details

    Madhivanan

    Failing to plan is Planning to fail
  3. mmarovic Active Member

    Here is the alternative:
    <some processing>
    from myTable
  4. FrankKalis Moderator

  5. ranjitjain New Member

  6. mmarovic Active Member

    I've seen the method proposed in many articles. In almost each one there was the same claim that this method significantly improves performance over cursor solution. However, my test always showed comparable performance. Sometimes cursors were even slightly faster. The slowness of the cursor comes mostly from slowness of row by row processing, so replacing it with another row-by-row processing method doesn't help performance much if at all.
  7. Madhivanan Moderator

    Replacing Cursors with While loop is only alternative to Cursors. But it doesnt improve performance much. Only set based approach will play a major role [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br />Madhivanan<br /><br />Failing to plan is Planning to fail
  8. Adriaan New Member

    Which takes us back to the original question ...<br /><br />Bolo,<br /><br />You have to give more details about the proecessing that you need to do. 9 times out of 10, processing can be done in a single UPDATE or INSERT statement with no looping through cursors.<br /><br />People who first start programming in T-SQL usually think in step-by-step processing like in procedural languages, but in most cases you can issue a single T-SQL statement to do the whole thing in a single step.<br /><br />You may well have that 1 in 10 example where you can't avoid a cursor, but without the details we can't tell.[<img src='/community/emoticons/emotion-1.gif' alt=':)' />]
  9. mmarovic Active Member

    <blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by Madhivanan</i><br /><br />Replacing Cursors with While loop is only alternative to Cursors. But it doesnt improve performance much. Only set based approach will play a major role [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br />Madhivanan<br /><br />Failing to plan is Planning to fail<br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote">I would express what you had in mind differently:<br /><br />Besides data set approach, while loop is the only alternative to Cursors. However, while data set approach is much more efficient, row-by-row loops without cursor are equaly inefficient, sometimes marginaly faster, sometimes marginaly slower.
  10. Madhivanan Moderator

    >>sometimes marginaly faster, sometimes marginaly slower

    Do you mean that it depends on the number of rows in the table?

    Madhivanan

    Failing to plan is Planning to fail
  11. mmarovic Active Member

    I don't know. Since there was not significant performance difference I didn't investigate the reason. Even for some cursor friendly solutions I've found dataset method that performs better. However, the best solution is to resolve such problems (where one column value depends on values from previous rows) on the client. Sql code should return just enough information, client (middle tier or presentation layer) should produce all derived information based on raw data.
  12. mmarovic Active Member

    By "sometimes faster, sometimes slower" I mean once I tested cusor was marginaly faster almost all the time, next time (could be slightly different loop solution or test table structure) it was marginaly slower.
  13. Adriaan New Member

    Let's not drown out Bolo on his first posting ...

    ... but anyway ...

    If you use cursors where you don't need them, in my experience you are also likely to use lookup queries to retrieve data related to the values you've fetched, etc. etc. Might not be too bad for a handful of rows being processed - then it is quite possibly performing better than set-based - but most definitely not if you need to process bigger numbers of rows.

    Then again this might be one of those processes that can only be done in a cursor.
  14. FrankKalis Moderator

    <blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><br />Let's not drown out Bolo on his first posting ...<br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br />How nice! [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br />But indeed we should let Bolo explain what his<br /><blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><br />--some processing<br />--it has to read row by row and do something<br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br />is all about. Anything else is like a shot in the dark. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br />--<br />Frank Kalis<br />Microsoft SQL Server MVP<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />Heute schon gebloggt?<a target="_blank" href=http://www.insidesql.de/blogs>http://www.insidesql.de/blogs</a><br />
  15. bolo New Member

    First of all I want to thank you for your feedback.

    “some processing” is something that has to be developed and will change from time to time. Here is a small example:

    MyTable is a historical table ColumnB is a datetime and ColumnC is decimal. Let#%92s say ColumnC is outside temperature for a certain date/time recorded in ColumnB. Temperature was measured every minute and that#%92s why MyTable ended up having 20 mil records. ColumnA is an EntryID primary key and was incremented by 1 as new observations were added to MyTable. ColumnB is sorted ASC as data was added so there is no need to sort on ColumnB.

    As MyCursor runs and @RecordDate and @Temperature get values there should be a process to write @RecordDate and @Temperature to a second table if @Temperature is greater than the average of previous readings for example. If @RecordDate is a Sunday then the average should look at previous Sundays only and so on.
    This is what I call “some processing”. Requirements for “some processing” my change from time to time and the above example is just a small scenario.

    I found cursors interesting and more like a VB approach (see Adriaan#%92s remark).
    As I said my computer is slow after 20 mil loops even if statements like this
    CLOSE MyCursor
    DEALLOCATE MyCursor
    are used.
    Any feedback on this?

    And also what about having many historical tables instead of one? Like dividing MyTable into 20 MySmallTable having 1 mil records each?

    Thanks again and have a great day,

    Bolo
  16. Adriaan New Member

    So you want to find all rows where the recorded temperature on ColumnC is higher than the average temperature for all previous recorded temperatures for the same weekday?

    Agreed: this is a complex query. But it can be done in a single statement, using a so called correlated subquery. Whether it performs better or worse than doing it one at a time in a cursor is rather hard to predict, though I'm pretty sure it will do better.

    Perhaps this is what you're looking for ...

    SELECT T1.* FROM MyTable T1
    WHERE T1.ColumnC >
    (SELECT AVG(T2.ColumnC) FROM MyTable T2
    WHERE DATEPART(dw, T2.ColumnB) = DATEPART(dw, T1.ColumnB)
    AND T2.ColumnA < T1.ColumnA)

    ... or to be really tidy:
    AND T2.ColumnB < T1.ColumnB)

    ... as you should never assume that the values on the identity column are 100% dependable. Not unless you use a temp table with its own identity column and you're inserting the data ordered by the datetime.

    The first row for each weekday will not be included in the resultset, as the subquery should return NULL for them.

    As time goes by, the average will level out somewhat, which has an effect on how far out a record must be to top the average, so I'm not sure if the simple average is really what your looking for.
  17. Twan New Member

    Hi ya,

    to get any performance out of the query, you'd possibly need to create:
    - an indexed view with the average temperatures for each weekday (since that would be 7 records...)
    - appropriate indexes, possibly on computed columns

    BUT what to do will depend on exactly what you are trying to achieve, so difficult to give exact answers and possible approaches. In general as others have said it is best to stay away from cursors if possible

    Cheers
    Twan

  18. mmarovic Active Member

    The problem is similar to the running totals problem I wrote article about. It should be published soon. I'll post the link here when it happens.

    Basically, there is a data set solution for "running average problem" using "ordered update". First step is to insert data into another table, you can use temporary table, with the same structure except for additional running average column. Next step is update statement with order of updates forced and local variables keeping current aggregate value.

    This solution is faster then cursor method, but 20 million rows may still be too much. Probably it is good idea to do average calculations in chunks, but it doesn't mean you have to split the table. You can make 10000 calculations in one batch and repeat it untill complete work is done.

    If datetime uniquely identify rows in that historical table there is no need to have identity column as primary key, you can use datetime column instead. I assume that table is not referenced by other tables in the database.
  19. FrankKalis Moderator

    I'm looking forward for your article, Mirko! [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br />Being in the financial business, I've tried to implement some technical stock analyis in SQL Server. One thing was also the running average. I only had some 50 different stocks and a history way back 3 years I think. Here's a bit a code I used:<br /><pre id="code"><font face="courier" size="2" id="code"><br />select<br /> TradeDate, TradePrice,<br /> (select avg(TradePrice*1.0)<br /> from Stocks t2<br /> where t2.TradeDate between dateadd(dd, -10, t1.TradeDate) and t1.TradeDate<br /> ) as moving_average<br />from Stocks t1<br /></font id="code"></pre id="code"><br />It was not really overwhelming performance when wanting to compute the 38 day running average and the 250 days running average. Even with a covering index. I ended up adding an indexed computed column, like Twan mentioned. So, breaking the theoretical rules and keep redundant data. Another idea I never explored any further was to have another column with a UDF as DEFAULT. And in the UDF calculate the average at INSERT time. <br /><br />Just my $0.02 cents anyway. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br /><br />--<br />Frank Kalis<br />Microsoft SQL Server MVP<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />Heute schon gebloggt?<a target="_blank" href=http://www.insidesql.de/blogs>http://www.insidesql.de/blogs</a><br />
  20. mmarovic Active Member

    Frank, your solution is quadratic, number of rows accessed is N * (N+1)/2 where N is number of rows returned. Row by row solution is better since it has to access each row once. It means even cursor is better. Comlete discussion and faster data set solution is in the article. The main piece of code is:<br /><pre id="code"><font face="courier" size="2" id="code"><br />update t<br /> set @total = t.total = @total + t.amount<br /> from table t (index = &lt;index that forces proper order&gt<img src='/community/emoticons/emotion-5.gif' alt=';)' /></font id="code"></pre id="code"><br />or:<br /><pre id="code"><font face="courier" size="2" id="code">update t<br /> set @total = t.total = @total + t.amount<br /> from (select top 100 percent a.id<br /> from table a<br /> order by &lt;whatever order is needed&gt<img src='/community/emoticons/emotion-5.gif' alt=';)' /> as o <br /> inner loop join table t on o.id = t.id<br /> option (force order)</font id="code"></pre id="code">
  21. FrankKalis Moderator

    Hey, I've never mentioned that I think my code is optimal. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br />Probably a case for Analysis Services, but I'm not into that anyway. I finally ended up downloading the raw data into Excel and do the stuff there. A whole lot easier, especially when doing comparisons between several stocks, draw nicely coloured graphs for management and stuff like that. Running averages is just basic technical analysis. There are many very sophisticated methods which aren't simply made for implementing into a database. <br /><br /><br />--<br />Frank Kalis<br />Microsoft SQL Server MVP<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />Heute schon gebloggt?<a target="_blank" href=http://www.insidesql.de/blogs>http://www.insidesql.de/blogs</a><br />
  22. mmarovic Active Member

    I agree, the client-side processing of such calculaton is the best solution for sure.
  23. mmarovic Active Member

    For the problems in hand where running average is not presentational issue, but further processing is going to be done based on historical data, I would add real columns with calculated aggregations. In that case it doesn't matter how long it takes to calculate aggregations initially, later processing based on them will be fast and not resource intensive.
  24. jamoldover New Member

    Here's a possible alternative that I use for some similar processing involving transaction activity at about 15K businesses/day - instead of building the average/max/min/count/etc. completely from scratch each time, I have a separate summary table which has the previous day's summary totals in it. I run a summary view on the newest data only, and then compare/add it to the previous summary values. This gives me a constant running summary which I can then use in other processes. FWIW, there are about 7 million records in the live data table; updating the summary table with data for 1-week, 1-month, 6-month, 1-year, and lifetime numbers takes about 30-40 seconds total. No cursors involved at all.

  25. Informative New Member

    Perhaps just as valuable as knowing how to plow through 21k records which report the same temperature for six hours, might be to consider that as with Zip and other breakthrough technologies, we sometimes may wish to analyze our use of database space.

    Simply changing the database to only record and insert a new record every time the temperature 'changes' would effectively be tracking all of the same information just as completely as blindly throwing down a new record every second, minute or whatever.

    Full historical data could be reconstructed or otherwise extrapolated by just recording the changes in temperature when they occured.

    Just my 2 cents from the less of a trees more of a forest guy perspective.
  26. Adriaan New Member

    Good point, Informative.

    In this case, such "change only" records should also record the datetime of the previous change, otherwise you'll have to do some heavy processing to determine the number of intervals, which you need to establish the average - if I'm not mistaken.
  27. mmarovic Active Member

    That solution may introduce its own comlexity, depending on statistics required and exact table design. It may be feasable, but it is hard to judge without more details about the requirements and the solution.

    Anyway, the problem fits better analytical processing area, maybe it is not bad idea to start discussion there. I'm just reading Kimball's "The datawarehousing toolkit" and one of his first recommendation is: Do not populate the fact table with zero measure values meaning "nothing happened".
  28. mmarovic Active Member

    quote:The problem is similar to the running totals problem I wrote article about. It should be published soon. I'll post the link here when it happens.
    http://www.sql-server-performance.com/mm_cursor_friendly_problem.asp

Share This Page