SQL Server Performance

Handling Cursor-Friendly Problems in T-SQL

Discussion in 'Performance-Related Article Discussions' started by shanetasker, Mar 24, 2006.

  1. shanetasker New Member

  2. Madhivanan Moderator

  3. mmarovic Active Member

    Yes, the scope is the same, but solutions are different. Although this article is published a month after one you mentioned it was submitted in January. Btw, there were a few articles about the running totals problem published in the past, some of them favoring O(N*N) solutions over the cursor which is completely wrong.
  4. FrankKalis Moderator

    Nice one! It would be really nice, if there was a link for downloading the whole script to test it out on one's own machine. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br />However, one thing worth mentioning is, that the order of execution in an UPDATE statement is not guaranteed. It definitely isn't enough that there is a clustered index in place. Unless we are not able to tie an ORDER BY to the UPDATE you never know for certain that the rows are updated in the order you expect. I can't prove that with an example right now, however this is the quintessence from several answers from members of the SQL dev team made in the private MVP newsgroup.<br /><br />By the way, you haven't mentioned that this might very well be a case for a front-end report generation. <br /><br />...last, but not least, wish MS had implemented this OLAP SUM() OVER... PARTITION stuff.<br />--<br />Frank Kalis<br />Microsoft SQL Server MVP<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />Heute schon gebloggt?<a target="_blank" href=http://www.insidesql.de/blogs>http://www.insidesql.de/blogs</a><br />Ich unterstuetze PASS Deutschland e.V. <a target="_blank" href=http://www.sqlpass.de>http://www.sqlpass.de</a>)
  5. mmarovic Active Member

    <blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"> Nice one!<hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote">Thanks. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"> It would be really nice, if there was a link for downloading the whole script to test it out on one's own machine. <hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote">You are right, I will do that next time.<br /><br /><blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote">However, one thing worth mentioning is, that the order of execution in an UPDATE statement is not guaranteed. It definitely isn't enough that there is a clustered index in place. Unless we are not able to tie an ORDER BY to the UPDATE you never know for certain that the rows are updated in the order you expect. I can't prove that with an example right now, however this is the quintessence from several answers from members of the SQL dev team made in the private MVP newsgroup.<hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br /><br />Well, I used a hack that is really not in line with how the sql language is supposed to be used. My recommendation stays that for presentational issues client application should handle such functionality. For one-time scripts populating new aggegate columns that haven't existed in the table initially, techniques from the article can be used. The example is recent thread about weather condition history where running average column had to be populated.<br /><br /><blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote">...last, but not least, wish MS had implemented this OLAP SUM() OVER... PARTITION stuff.<hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote">Yes, I learned it just after I made last article correction and approved it to be published. [<img src='/community/emoticons/emotion-6.gif' alt=':(' />] It was really worth mentioning.<br />
  6. FrankKalis Moderator

    quote:
    More about update order: Force join order with loop join makes possible just one execution plan that doesn't involve parallelism. The order of rows selected is forced. I can't prove either that order will be used for update, but there is no sense to select all rows to be updated in specific order and then re-arrange them. Besides, I really have one application where I used technigue mentioned and I always had correct results. I believe it is similar to gravity law, you can't prove it (besides being internally consistent with other physical laws) but there is a good chance that it is correct
    Yes, but what now seems to work, might break in the future for a variety of reasons.
    I believe you're right in that the execution plan generated plays a very important role here. Execution plans containing operations like spools are viewed "highly" risky, while others might be viewed "less" risky. But, as I mentioned, the dev team recommends to stay away from that and those are the guys who should know.






    --
    Frank Kalis
    Microsoft SQL Server MVP
    http://www.insidesql.de
    Heute schon gebloggt?http://www.insidesql.de/blogs
    Ich unterstuetze PASS Deutschland e.V. http://www.sqlpass.de)
  7. FrankKalis Moderator

  8. mmarovic Active Member

    I didn't pay attanetion, so I got it later that you were talking about solution with clustered index only, so I deleted that part from my answer. However, doubts you have I had too. I came up to conclusion that unless query optimizer is completely reworked, forcing row order and data order, plus loop join, should be enough to guarantee results. <br /><br />I would really like to attract attention from microsoft people even if they prove I am wrong. I prefere knowing the truth (if there is) over being right. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]
  9. mmarovic Active Member

    I have found this link:http://blogs.msdn.com/queryoptteam/archive/2006/03/24/560396.aspx
    that invalidates one of "safe" solutions from the article. However the solution with forcing index works, because forcing index forces order and loop join algorithm makes sure that order is kept in resulting data set. I'll contact Brad to make appropriate changes in the article.
  10. FrankKalis Moderator

  11. ja928 New Member

    Hi gang, great article! I am trying to apply the technique described here, but I want to update TWO columns. I want to do a beginning and ending balance, but both fields are updating to the same value.

    UPDATE tmp SET EndBal = @RunBal, @RunBal = tmp.BegBal = CASE
    WHEN @AccountIsDebit = 1 THEN @RunBal - DrAmount + CrAmount
    WHEN @AccountIsDebit = 0 THEN @RunBal + DrAmount - CrAmount END
    FROM (SELECT TOP 100 PERCENT pkAutoId FROM @tbl ORDER BY pkAutoId DESC) as o
    INNER LOOP JOIN @tbl tmp ON o.pkAutoId = tmp.pkAutoId
    OPTION (FORCE ORDER) --Keep tables in order to force the update to run in order.


    I start with a known value for the Ending balance and I want to read backward to derive the beginning balance and thus the ending balance for a previoius transaction. I get the beginning balance, but my EndBal column is the same as my BegBal.

    Do I need to do this in two passes or am I missing the syntax for the UPDATE. I didn't think it would work, but I tried moving the EndBal = @RunBal to the end of the update, but I saw no difference. I would really appreciate a clarification or any advice.

    Jason Akin
  12. mmarovic Active Member

    Hi Jason.

    Before looking further into your problem, I must say that I'll try to change article soon, to emphasize a few points more.

    First point I want to make is that the optimal solution is to use sql to return raw data and use client for running totals calculation. That way you put the least pressure on db server, recieve all information you need the most quickly and then use the client code to do the simple (simple for the client programming language, not sql) calculatation and display data.

    SQL solutions I mentioned because I wanted to emphasize that cursor frienly problems should better be solved on the client because of the reasoning mentioned above and because of the drowback of each sql solution mentioned in the article.

    To recapitulate:

    1. cursor solution: slow and cpu hungry
    2. corralated query: even slower, number of reads grows exponentially with number of rows.
    3. First solution using ordered update: the fastest sql solution but still less efficient then approach mentioned. On top of that it uses behaviour that is not guaranteed.
    4. Solution you used with ""select top order by" uses assumptation that data will be updated in order specified by order by clause. While it is true for sql server 2000 it is not guaranted on 2005 (see the link I already posted). The solution still works on 2005 but it is not more or less reliable then first one. It will work on 2005 for the exaclty same reason (primary clustered index on table variable) but it will be slower.
    5. Solution using index on source table is still reliable. It may become unreliable when microsoft decides query optimizer is close to perfect so there is no need for query hints any more and decides to ignore them but still allow the syntax to protect breaking the code from previous version. I don't know if it will ever happen, but this is certenly possibility.

    So before looking further at you query, let me know if you considered points I made and decided that for some reason you still have to apply pure sql solution for the running total problem.

    Cheers,
    Mirko.
  13. ja928 New Member

    Wow, thank you for the quick response! I have thought about the points you raised. The SQL 2005 compatibility will not be an issue. I am considering the article's techniques as an intermediate step to my long-term solution.

    We are completing a conversion from another data processor. I have an existing procedure in place that uses a cursor to derive the Beginning and Ending balances. This has an acceptable response time currently, but I am about to bring a large amount of history online. Suddenly, my original solution falls flat.

    Regarding the suggestion to use the client for processing, the procedure is called in several places (some WinForms, some webforms and many Crystal Reports). It involves a bit more business logic than shown in the example. I was hoping to maintain this logic in one place (the stored procedure) and deliver the data to clients already finished.

    My ideal scene is to denormalize the data and write a function like this to update it once and serve it up without any need for computation. I had not seen the:
    SET @variable = column = expression
    syntax before and was really intrigued! It will take a bit of time to update and verify the data for this scheme and I hope to get some short-term performance gains in the meantime.

    I appreciate your help and advice.

    Jason Akin
  14. ranjitjain New Member

  15. Madhivanan Moderator

    quote:Originally posted by ranjitjain


    quote:Originally posted by Madhivanan

    Recently I read this which has somewhat the same scope
    http://sqlservercentral.com/cs/blogs/amachanic/archive/2006/02/28/508.aspx

    Madhivanan

    Failing to plan is Planning to fail

    Hi Madhivanan,
    The link is not working for me[xx(], is it the right one?
    Some Server Error page is coming when you click on that.
    Yes there seems problem. Try tomorrow

    Madhivanan

    Failing to plan is Planning to fail
  16. mmarovic Active Member

    Here is the "one-update" solution that works on the server I used for testing:


    declare @tbl table(
    pkAutoID int identity(1,1) primary key clustered,
    drAmount money not null,
    crAmount money not null,
    begBalmoney null,
    endBalmoney null
    )
    declare @runBal money
    declare @accountIsDebit bit
    set @accountIsDebit = 1
    set @runBal = 0

    insert into @tbl(drAmount, crAmount, begBal, endBal) values(1, 2, 0, 0)
    insert into @tbl(drAmount, crAmount, begBal, endBal) values(2, 4, 0, 0)
    insert into @tbl(drAmount, crAmount, begBal, endBal) values(4, 8, 0, 0)

    UPDATE tmp
    SET
    EndBal = @runBal + case
    WHEN @AccountIsDebit = 1 THEN DrAmount - CrAmount
    WHEN @AccountIsDebit = 0 THEN - DrAmount + CrAmount
    END,
    @RunBal = tmp.BegBal = CASE
    WHEN @AccountIsDebit = 1 THEN @RunBal - DrAmount + CrAmount
    WHEN @AccountIsDebit = 0 THEN @RunBal + DrAmount - CrAmount
    END
    FROM (SELECT TOP 100 PERCENT pkAutoId
    FROM @tbl
    ORDER BY pkAutoId DESC) as o
    INNER LOOP JOIN @tbl tmp ON o.pkAutoId = tmp.pkAutoId
    OPTION (FORCE ORDER)

    select *
    from @tbl

    However, I recommend using two updates instead. The reason is that dependency between two column updates are not documented anywhere (AFAIK) so it may be just coincidence that the solution works on my server.

    quote:
    Regarding the suggestion to use the client for processing, the procedure is called in several places (some WinForms, some webforms and many Crystal Reports). It involves a bit more business logic than shown in the example. I was hoping to maintain this logic in one place (the stored procedure) and deliver the data to clients already finished.

    My ideal scene is to denormalize the data and write a function like this to update it once and serve it up without any need for computation. I had not seen the:
    SET @variable = column = expression
    syntax before and was really intrigued! It will take a bit of time to update and verify the data for this scheme and I hope to get some short-term performance gains in the meantime.
    In that case I would rather create dll and include a method that executes procedure with raw data, then calculates running totals and returns recordset (or whatever structure you prefere) to all clients.
  17. ja928 New Member

    Thanks for the reply. I think we are proving the statements about the ORDER BY being unreliable. I made a more simplified example to get rid of the account type and the drAmount field. What I'm trying to achieve is to start with the ENDING Balance and read backward to derive the balances in reverse order. Notice that even though the ORDER BY is set to DESC, it updates the recordset in Ascending order!

    I added a commented Select to test how the joined table appears when ordered.

    declare @tbl table(
    pkAutoID int identity(1,1) primary key clustered,
    crAmount money not null,
    begBalmoney null,
    endBalmoney null
    )
    declare @runBal money
    set @runBal = 9

    insert into @tbl(crAmount, begBal, endBal) values(1, 0, 0)
    insert into @tbl(crAmount, begBal, endBal) values(3, 0, 0)
    insert into @tbl(crAmount, begBal, endBal) values(5, 0, 0)
    insert into @tbl(crAmount, begBal, endBal) values(1.5, 0, 0)

    UPDATE tmp SET EndBal = @runBal, @RunBal = tmp.BegBal = @RunBal - CrAmount
    --SELECT * , @RunBal
    FROM (SELECT TOP 100 PERCENT pkAutoId
    FROM @tbl
    ORDER BY pkAutoId DESC) as o
    INNER LOOP JOIN @tbl tmp ON o.pkAutoId = tmp.pkAutoId
    OPTION (FORCE ORDER)

    select *
    from @tbl


    Here is my result set:
    11.00008.00008.0000
    23.00005.00005.0000
    35.0000.0000.0000
    41.5000-1.5000-1.5000


    For this and other reasons, I will go for a denormalized approach to gather this data.

    Jason Akin
    CUInterface, LLC
  18. mmarovic Active Member

    Thanks for the good example, I may use it when I update the article.

    Btw, your problem can be solved by using temp table and appropriate index, but as I said before it is really better to solve it outside db. Storing the aggregated value in the table is also good solution when possible.
  19. mmarovic Active Member

    The article is updated now. Jason, thanks again for reminding me to do it sooner and providing a good example.
  20. mmarovic Active Member

    Today I found an article about the same topic that covers the problem in much more details. Also, the discussion contains some more solution and insight why "quirky update" is not quite reliable.In my opinion, the only advantage of my article is that my conclusion is still correct: This problem is better to solve on the client side.Here is the link: [link]http://www.sqlservercentral.com/articles/T-SQL/68467/[/link]
  21. Madhivanan Moderator

    [quote user="mmarovic"]Today I found an article about the same topic that covers the problem in much more details. Also, the discussion contains some more solution and insight why "quirky update" is not quite reliable. In my opinion, the only advantage of my article is that my conclusion is still correct: This problem is better to solve on the client side. Here is the link: http://www.sqlservercentral.com/articles/T-SQL/68467/ [/quote]
    Welcome back mmarovic [:)]

    In the article it was also mentioned how Quiry update will give correct solution in specific cases
  22. mmarovic Active Member

    Hi Madhi.I am glad I found you are still active here. :)About correct solution in specific cases: Please read the discussion about that article, especially Hugo's posts. The post he mentioned what happens when data grow and date interval of week or month suddenly become tiny compared to entire history stored in the table made me embarrassed that I hand't thought about that scenario myself. Also, from the time I actually used that method (it was a couple of years before I wrote the article), query optimizer improved a lot and started using many more techniques besides nested loop joins and clustered/non clustered index scans/seeks.So again, calculate running totals on the client whenever possible. Even without reliability issue, that approach will always be faster, more scalable and less resource intensive.
  23. Madhivanan Moderator

    As you know, "calculate running total on client" is my first answer whenever this type of question is asked [:)]
  24. mmarovic Active Member

    Yes, I remember :)I just wanted to be as explicit as possible, because I still occasionally receive questions about how to solve specific running totals problem in sql.

Share This Page