Cursor tuning, parsing comma-separated list | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Cursor tuning, parsing comma-separated list

I have a table with three varchar fields containing comma-separated lists, and an integer key. Each of the lists contains 1 to about 10 values; in a single row, each list contains the same number of items; and the lists are ordered such that the 1st item of the 1st list is intended to be paired with the 1st item of the second list, and so on. The table contains about a 300,000 rows.<br /><br />A sample row would look something like:<br /><br />key list1 list2 list3<br />1 a,b,c 1,2,3 x,y,z<br /><br />The goal is to "unpack" the table into a new table so that we get something like the following:<br /><br />key value1 value2 value2<br />1 a 1 x<br />1 b 2 y<br />1 c 3 z<br /><br />and so on.<br /><br />In Access, I wrote a VB function to accomplish this using ADO Recordsets in basically a two-level nested loop structure. The first level looped through the rows of the original table; the second parsed the lists within the row, appending one row to a new table for each set of items parsed.<br /><br />In SQL Server, I attempted to write a stored procedure using the same logic. In the sp, I loop through the rows via a cursor, then use a 2nd While loop to parse the row, inserting a new row for each set of items parsed. Much to my suprise, the Access version works over ten times faster, finishing within about 8 minutes, writing an average of 161,613 rows per minute to the new table, whereas the SQL Server version writes an average about 13,551 rows per minute to the destination table! [xx(]<br /><br />Since both the number of items within the lists as well as the number of characters in each item are variable, I can’t think of a way to do this without a cursor.<br /><br />Is the performance difference I’m seeing typical of SQL Server cursors vs looping through ADO recordsets in Access VB? If not, would anyone have a thought as to what I might be doing wrong?<br /><br />Any advice would be greatly appreciated. <br /><br />Thank you! [<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />]<br /><br />Jay<br /><br /><br /><br /><br />
SQL Server cursors, if possible, should generally be avoided, as they are never good performers. If you have looked at the tips I have on cursors, you might find some ideas on how to improve the performance of your cursor. Ideally, you should avoid using a cursor, and instead use a derived query (there is an article on them on this website, or if that won’t work, try using temp tables. I don’t have enough information to determine which method is best for you, but I am confident that you can do what you want to do without using a cursor, using one of the other methods instead.
——————
Brad M. McGehee
Webmaster
SQL-Server-Performance.Com
Without giving this too much thought, you could use a query similar to the one below to split your comma-seperated fields. The performance of this is not too bad, and is much better than using a cursor or a while loop to split a string. This code will only split the first list (list1) – you could insert the results of this into a temp table, then repeat for the other lists and join the three temp tables… Shouldn’t be too hard to figure out.
— This piece of nastiness splits a comma seperated string and returns a table of values
SELECT ID,
NULLIF(SUBSTRING(‘,’ + list1 + ‘,’ , COMMA.Pos , CHARINDEX(‘,’ , ‘,’ + list1 + ‘,’ , COMMA.Pos) – COMMA.Pos) , ”) AS AppNum
FROM (
SELECT (A0.ID + A1.ID + A2.ID) AS Pos
FROM
(SELECT 0 ID UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION
SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) A0,
(SELECT 0 ID UNION SELECT 10 UNION SELECT 20 UNION SELECT 30 UNION SELECT 40 UNION
SELECT 50 UNION SELECT 60 UNION SELECT 70 UNION SELECT 80 UNION SELECT 90) A1,
(SELECT 0 ID UNION SELECT 100 UNION SELECT 200 UNION SELECT 300 UNION SELECT 400 UNION
SELECT 500 UNION SELECT 600 UNION SELECT 700 UNION SELECT 800 UNION SELECT 900) A2
) AS COMMA, tblLists
WHERE COMMA.Pos <= LEN(‘,’ + list1 + ‘,’)
AND SUBSTRING(‘,’ + list1 + ‘,’ , COMMA.Pos – 1, 1) = ‘,’
AND CHARINDEX(‘,’ , ‘,’ + list1 + ‘,’ , COMMA.Pos) – COMMA.Pos > 0
ORDER BY ID, COMMA.Pos

quote:Originally posted by bradmcgehee SQL Server cursors, if possible, should generally be avoided, as they are never good performers. If you have looked at the tips I have on cursors, you might find some ideas on how to improve the performance of your cursor. Ideally, you should avoid using a cursor, and instead use a derived query (there is an article on them on this website, or if that won’t work, try using temp tables. I don’t have enough information to determine which method is best for you, but I am confident that you can do what you want to do without using a cursor, using one of the other methods instead.

I have tried most of the cursor tuning recommendations posted on this site; but since the consensus seems to be that there is a way to do this without cursors, I’ll focus my attention there for now. Thank you very much for your advice. Jay

quote:Originally posted by arb Without giving this too much thought, you could use a query similar to the one below to split your comma-seperated fields. The performance of this is not too bad, and is much better than using a cursor or a while loop to split a string. This code will only split the first list (list1) – you could insert the results of this into a temp table, then repeat for the other lists and join the three temp tables… Shouldn’t be too hard to figure out.
— This piece of nastiness splits a comma seperated string and returns a table of values
SELECT ID,
NULLIF(SUBSTRING(‘,’ + list1 + ‘,’ , COMMA.Pos , CHARINDEX(‘,’ , ‘,’ + list1 + ‘,’ , COMMA.Pos) – COMMA.Pos) , ”) AS AppNum
FROM (
SELECT (A0.ID + A1.ID + A2.ID) AS Pos
FROM
(SELECT 0 ID UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 4 UNION
SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9) A0,
(SELECT 0 ID UNION SELECT 10 UNION SELECT 20 UNION SELECT 30 UNION SELECT 40 UNION
SELECT 50 UNION SELECT 60 UNION SELECT 70 UNION SELECT 80 UNION SELECT 90) A1,
(SELECT 0 ID UNION SELECT 100 UNION SELECT 200 UNION SELECT 300 UNION SELECT 400 UNION
SELECT 500 UNION SELECT 600 UNION SELECT 700 UNION SELECT 800 UNION SELECT 900) A2
) AS COMMA, tblLists
WHERE COMMA.Pos <= LEN(‘,’ + list1 + ‘,’)
AND SUBSTRING(‘,’ + list1 + ‘,’ , COMMA.Pos – 1, 1) = ‘,’
AND CHARINDEX(‘,’ , ‘,’ + list1 + ‘,’ , COMMA.Pos) – COMMA.Pos > 0
ORDER BY ID, COMMA.Pos

That is one dense piece of SQL! I will definitely give it a try. Thanks very much! Jay
]]>