Select Based on List of Random Values — Dynamic SQL or CSV Parsing UDF?

Dynamic SQL Solution

The simplest, and at the same time almost the most efficient, solution involves the use of dynamic SQL:

create procedure orders_sel_by_userID_orderStatusList
     @userId     int,
     @orderStatusList     varchar(100)
as begin
     set noCount on
     declare @query     nvarchar(1000)
     declare @parameters     nvarchar(100)
     set @query = N’Select count(*) from orders where userID = @userID and OrderStatusID in (‘ + @orderStatusList + ‘)’
     set @parameters = N’@userID int’
     exec sp_executeSQL @query, @parameters, @userId = @userID
     set noCount off
end

Many Microsoft SQL Server specialists discourage use of dynamic SQL because of SQL injection attack possibilities. However, in this case, the list of values would not be typed in a text box. The list would rather be generated after some kind of control, providing multiple choices to use.

The advantage of this solution is that each query contains concrete values so Query Optimizer has maximum information for the cost calculation.

One drawback of the dynamic SQL solution is that for each distinct list, a separate execution plan will be compiled. That may cause high CPU usage due to high stored procedure cache misses and inserts if the query pattern involves frequent executions with different lists of statuses.

Another problem is that read rights to tables involved are necessary for successfully running such a stored procedure.

The optimal execution plan should use an index seek for each userID and OrderStatusID value combination. That way we would have the least number of reads. For example, if “1,4,7” is passed as the @orderStatusList value, the next three index seeks would be used:

  1. index seek on userID = @userID and orderStatusID = 1
  2. index seek on userID = @userID and orderStatusID = 4
  3. index seek on userID = @userID and orderStatusID = 7

This means that only index entries that satisfy the condition would be accessed.

When testing dynamic SQL implementation on different servers, I haven’t had consistent execution plans. In most cases index scans had been used. Comparing the number of reads of other solutions I came to the conclusion that most if not all index rows for a specific userID value had been read and then orderStatusIDs with values outside of the list had been filtered out. This means that all index entries with userID = @userID is read and then orderStatusID values are compared with values from the list. This way, entries with orderStatusID values 2, 3, 5 and 6 are accessed too.

In order to avoid both a caching issue and range scan based on userID value only, we need to use a table with values from the list and then join it with the main (orders) table. The next two solutions use that approach.

Solution Based on Table Function

The most popular solution for this kind of problem is joining the orders table with the table function that parses a list of values. The code for the procedure is straightforward:

create procedure orders_sel_by_userID_orderStatusList
     @userId     int,
     @orderStatusList     varchar(100)
as begin
     Select count(*)
     from orders o
     join dbo.listOfValues(@orderStatusList) l on o.orderStatusID = l.value
     where o.userID = @userID
end

One possible implementation of the listOfValues table function is below:

CREATE FUNCTION ListOfValues (@List varchar(100))
RETURNS @ListOfValues TABLE (value int)
AS
BEGIN
     DECLARE @off int
      DECLARE @substr varchar(4000)
     DECLARE @value varchar(4000)
     DECLARE @prevOff int

     SELECT @prevOff = 1
     SELECT @off = CHARINDEX(‘,’, @List, 0)

     WHILE (@off > 0)
     BEGIN
          SELECT @value = RTRIM(SUBSTRING(@List, @prevOff, @off-@prevOff))
          SELECT @prevOff = @off+1
          insert into @ListOfValues (value) Values (@value)
          SELECT @off = CHARINDEX(‘,’, @List, @prevoff)
     END

     if (LEN(@List)-@prevOff+1 > 0)
     BEGIN
          set @value = RTRIM(SUBSTRING(@List, @prevOff, LEN(@List)-@prevOff+1))
          insert into @ListOfValues (value) Values (@value)
     END

     return
END

This solution had the worst performance of all the solutions I tested. It was also the most CPU intensive and the slowest solution. It looks like the overhead of the list of value parsing overweighs the gain of a single execution plan in the cache. There was no significant performance improvement with a few other implementations of that function.

Continues…

Leave a comment

Your email address will not be published.