Dynamic SQL Solution
The simplest, and at the same time almost the most efficient, solution involves the use of dynamic SQL:
create procedure orders_sel_by_userID_orderStatusList
@userId int,
@orderStatusList varchar(100)
as begin
set noCount on
declare @query nvarchar(1000)
declare @parameters nvarchar(100)
set @query = N’Select count(*) from orders where userID = @userID and OrderStatusID in (‘ + @orderStatusList + ‘)’
set @parameters = N’@userID int’
exec sp_executeSQL @query, @parameters, @userId = @userID
set noCount off
end
Many Microsoft SQL Server specialists discourage use of dynamic SQL because of SQL injection attack possibilities. However, in this case, the list of values would not be typed in a text box. The list would rather be generated after some kind of control, providing multiple choices to use.
The advantage of this solution is that each query contains concrete values so Query Optimizer has maximum information for the cost calculation.
One drawback of the dynamic SQL solution is that for each distinct list, a separate execution plan will be compiled. That may cause high CPU usage due to high stored procedure cache misses and inserts if the query pattern involves frequent executions with different lists of statuses.
Another problem is that read rights to tables involved are necessary for successfully running such a stored procedure.
The optimal execution plan should use an index seek for each userID and OrderStatusID value combination. That way we would have the least number of reads. For example, if “1,4,7” is passed as the @orderStatusList value, the next three index seeks would be used:
- index seek on userID = @userID and orderStatusID = 1
- index seek on userID = @userID and orderStatusID = 4
- index seek on userID = @userID and orderStatusID = 7
This means that only index entries that satisfy the condition would be accessed.
When testing dynamic SQL implementation on different servers, I haven’t had consistent execution plans. In most cases index scans had been used. Comparing the number of reads of other solutions I came to the conclusion that most if not all index rows for a specific userID value had been read and then orderStatusIDs with values outside of the list had been filtered out. This means that all index entries with userID = @userID is read and then orderStatusID values are compared with values from the list. This way, entries with orderStatusID values 2, 3, 5 and 6 are accessed too.
In order to avoid both a caching issue and range scan based on userID value only, we need to use a table with values from the list and then join it with the main (orders) table. The next two solutions use that approach.
Solution Based on Table Function
The most popular solution for this kind of problem is joining the orders table with the table function that parses a list of values. The code for the procedure is straightforward:
create procedure orders_sel_by_userID_orderStatusList
@userId int,
@orderStatusList varchar(100)
as begin
Select count(*)
from orders o
join dbo.listOfValues(@orderStatusList) l on o.orderStatusID = l.value
where o.userID = @userID
end
One possible implementation of the listOfValues table function is below:
CREATE FUNCTION ListOfValues (@List varchar(100))
RETURNS @ListOfValues TABLE (value int)
AS
BEGIN
DECLARE @off int
DECLARE @substr varchar(4000)
DECLARE @value varchar(4000)
DECLARE @prevOff int
SELECT @prevOff = 1
SELECT @off = CHARINDEX(‘,’, @List, 0)
WHILE (@off > 0)
BEGIN
SELECT @value = RTRIM(SUBSTRING(@List, @prevOff, @off-@prevOff))
SELECT @prevOff = @off+1
insert into @ListOfValues (value) Values (@value)
SELECT @off = CHARINDEX(‘,’, @List, @prevoff)
END
if (LEN(@List)-@prevOff+1 > 0)
BEGIN
set @value = RTRIM(SUBSTRING(@List, @prevOff, LEN(@List)-@prevOff+1))
insert into @ListOfValues (value) Values (@value)
END
return
END
This solution had the worst performance of all the solutions I tested. It was also the most CPU intensive and the slowest solution. It looks like the overhead of the list of value parsing overweighs the gain of a single execution plan in the cache. There was no significant performance improvement with a few other implementations of that function.