Handling Cursor-Friendly Problems in T-SQL: Running Totals Example

Table Variable Solution

The next solution is the most efficient pure Transact-SQL one I have figured out so far. To reach that solution let’s look at Books Online (BOL), update statement topic, local variables in set clause:

@variable

Is a declared variable that is set to the value returned by expression.

SET @variable = column = expression sets the variable to the same value as the column. This differs from SET @variable = column, column = expression, which sets the variable to the pre-update value of the column.

That means we can use a stored procedure local variable to save the cumulative total and update the total column of the table variable we created to return the result set. Here is the code:

create procedure dbo.Sales_sel_by_StoreID_ProductID
               @StoreID int,
               @ProductID int
as begin

     set noCount on
     declare @report table(
               transactionID int primary key clustered,
               transactionTime dateTime not null,
               amount money not null,
               total money not null
          )

     declare @total money

     set @total = 0

     insert into @report(transactionID, transactionTime, amount, total)
     select a.transactionID,
          a.transactionTime,
          a.amount,
          0.0 as total
     from
          dbo.Sales a
     where
          a.StoreID = @StoreID and
          a.ProductID = @ProductID

     update r
          set @total = r.total = @total + amount
     from @report r

     select r.transactionID, r.transactionTime, r.amount, r.total
          from @report r
          order by r.transactionID

     set noCount off

end

This is the most efficient data set based solution! However, there is a small problem with this approach. How do we know that the update will be done exactly in the order of increasing transactionID? Is it enough to have a clustered primary key on that column to be safe? I think so, but I can’t prove it. A future service pack could change the way the execution plan for this procedure is built. Even though I don’t expect the way critical update executes to change any time soon, to be absolutely safe we have to force the update order more explicitly. One way to do that would be to use a derived table with an order by clause. An order by clause is not allowed in derived tables unless the “top” option is used too. So one possible “safe” solution is below:

create procedure dbo.Sales_sel_by_StoreID_ProductID
               @StoreID int,
               @ProductID int
as begin

     set noCount on
     declare @report table(
               transactionID int primary key clustered,
               amount money not null,
               total money not null
          )

     declare @total money

     set @total = 0

     insert into @report(transactionID, amount, total)
     select a.transactionID,
          a.amount,
          0.0 s total
     from
          dbo.Sales a (index=ix_sales)
     where
          a.StoreID = @StoreID and
          a.ProductID = @ProductID

     update r
          set @total = r.total = @total + amount
     from (select top 100 percent
               transactionID
           from @report
           order by transactionID) as s
     inner loop join @report r on r.transactionID = s.transactionID
     option (force order)

     select s.transactionID, s.transactionTime, s.amount, r.total
          from @report r
          join dbo.sales s on s.transactionID = r.transactionID
          order by r.transactionID

     set noCount off

end

However, if you try to change the order to descending you may be surprised by the result. The problem is that “select top N … order by” guarantees that the first N rows based on order will be returned, but the order they are returned is not guaranteed.

The only reliable way to force the update order is to force an index on the sales table:

create procedure dbo.Sales_sel_by_StoreID_ProductID
               @StoreID int,
               @ProductID int

as begin

     set noCount on
     declare @report table(
               transactionID int primary key clustered,
               total money not null
          )

     declare @total money

     set @total = 0

     insert into @report(transactionID, total)
     select a.transactionID,
          0.0 as total
     from
          dbo.Sales a (index=ix_sales)
     where
          a.StoreID = @StoreID and
          a.ProductID = @ProductID

     update r
          set @total = r.total = @total + amount
     from
          dbo.Sales s (index=ix_sales)
     join @report r on r.transactionID = s.transactionID
     where
          s.StoreID = @StoreID and
          s.ProductID = @ProductID
     option (force order)

     select s.transactionID, s.transactionTime, s.amount, r.total
          from @report r
          join dbo.sales s on s.transactionID = r.transactionID
          order by r.transactionID

     set noCount off

end

It was not enough to have the sales table rows ordered using the index and joined with the table variable. We had to make sure that the inner table used is a derived one. For that reason, we added option (force order). The force order hint doesn’t force the data order; instead, it actually forces the join order.

Quote from BOL:

FORCE ORDER

Specifies that the join order indicated by the query syntax is preserved during query optimization.

On top of that, we didn’t want any fancy join algorithm to change the order in which rows are updated, so we used a loop join hint.

This solution is slower than the first table variable based solution but it is still more efficient than a solution using a cursor. The solution will work as long as Microsoft supports the query hints used.

Conclusion

None of the solutions mentioned is the one I prefer. The solution I recommend is to return a “raw” data set to the client and let the client loop through rows and calculate the running total. The query returning “raw” data is the fastest and the least resource intensive. Implementation of running total calculation is fast and straight forward for programming languages used on the client side.

]]>

Leave a comment

Your email address will not be published.