SQL Server Performance Forum – Threads Archive

String parsing

Hi, Can someone plz tell me what is a fast way to do string parsing in T-SQL? Heres my problem: One of the tables I have contains a rule_id column which looks like: attribute1_val:attribute2_val:attribute3_val:… The total number of inter attributes is always the same. There can also be intra values like: attribute1_val,attribute1_val11,attribute1_val12:attribute2_val:attribute3_val:… These attributes are essentially user attributes and another table has the attributes for a user. So, in the rules table, I can have a column value like: 2<img src=’/community/emoticons/emotion-2.gif’ alt=’

‘ />A:1,2:#: .. etc which means that show this content to user whose attribute_1 is 2 attribute_2 is DA attribute_3 is in (1,2,3) and so on.. attribute_4 is # (which essentially menas ignore this attribute since this) Can someone help me with a parsing query to match this up? I was also thinking of splitting the rules table into another table which has all these attributes in column (and which can be indexed). Is that a better option than running the parsing script? Pls let me know if I have not been able to explain this fully since this is a little bit tricky. Thanx Nitin

You would definitely need to explain a bit more, but from what I understand now, your table violates First Normal Form. A column should only contain just one scalar value, not an array for values. T-SQL isn’t the language of choice for string parsing. You should really consider having another table for that data. —
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Ich unterstÃ¼tze PASS Deutschland e.V. http://www.sqlpass.de)

Hi Frank, Thanks for the response. Heres the thing. The base structure of the table I cannot change because of a third party’s limitations which is holding these values in a colon separated manner. Currently we are going through a lot of looping etc so that at run time, the user’s sttributes are matched against these parsed values for different attributes. I was thinking of not doing this at run time and creating a master table where I have the rule_id and user_id in one table and then at run time I join the rule_id in master table with the rule_id in original table and get the results faster. Your thoughts, Nitin

I once built a system that had a fixed-length alphanumeric code with some business rules built-in – don’t ask why, that’s what the client wanted. For basic ‘data integrity checking’ I had an auxiliary table that listed each valid entry for each position. The checking was done by reconstructing the original code like this: SET @Code = ‘ABC123’
SET @Check = ” SELECT @Check = @Check + aux.Entry
FROM AuxiliaryTable AS aux
WHERE SUBSTRING(@Code, aux.Position, 1) = aux.Entry
ORDER BY aux.Position IF @Code <> @Check
BEGIN
— check failed
END Another table listed valid combinations of characters, in strings that had the wildcard format exactly like you would use in a LIKE clause (use ! for solving the NOT issue – and lots of square brackets) – plus the starting point and number of positions of the matching string in the code. We verified that the code entered was matching all the wildcard strings – I seem to remember lots of double-negative syntax in EXISTS clauses because there would be a wildcard string for a set of possible values at position X and then one for another set of possible values at position X, and so on and so on. It took a lot of to-and-fro with the client to get the business rules right – and it turned out their system needed a bit of cleaning-up, but we got there in the end.

nattynatty, your vendor has a very unfortunate design there. The data from that column should be split across one column per value. If it were me, I guess I would look at using the CHARINDEX() function to locate the delimiters, then SUBSTRING() to pull the value from between them.

I have created a split function: CREATE FUNCTION pfn_split ( @list nvarchar(2000), @spliton nvarchar(5), @whichsplit int ) returns nvarchar(1000) as begin declare @mylist nvarchar(1000) declare @value nvarchar(1000) declare @num int declare @intra_spliton nvarchar(5) set @mylist = ” set @num=0 set @intra_spliton=’:' set @mylist = @list while (charindex(@spliton, @list)>0) begin set @value = ltrim(rtrim(substring(@list,1,charindex(@spliton,@list)-1))) set @mylist = @value set @num = @num + 1 if @num = @whichsplit break set @list = substring(@list,charindex(@spliton,@list)+len(@spliton),len(@list)) end if @num = @whichsplit begin set @mylist = @mylist end else begin set @mylist = substring(@list,charindex(@spliton, @list)+len(@spliton),len(@list)) end return @mylist end The return string is like this: select dbo.pfn_split(‘100;4;SE;#;#;#;#;#;#;#’,’;’,2) => 4 select dbo.pfn_split(‘100;4:5;SE;#;#;#;#;#;#;#’,’;’,2) => 4:5 Now the way I am calling this in my T-sql code: ———————————————— declare @rules_string varchar(1000), @rules_ex_string varchar(1000), @role_string varchar(1000), @cluster_string varchar(1000), @division_string varchar(1000), @region_string varchar(1000), @district_string varchar(1000), @dce_string varchar(1000), @masters_string varchar(1000), @state_string varchar(1000) begin set @rules_string=’100;3;SE;#;#;#;#;#;#;#' set @role_string=dbo.pfn_split(@rules_string,’;’,1) set @cluster_string=dbo.pfn_split(@rules_string,’;’,2) set @division_string=dbo.pfn_split(@rules_string,’;’,3) set @region_string=dbo.pfn_split(@rules_string,’;’,4) set @district_string=dbo.pfn_split(@rules_string,’;’,5) set @dce_string=dbo.pfn_split(@rules_string,’;’,7) set @masters_string=dbo.pfn_split(@rules_string,’;’,<img src=’/community/emoticons/emotion-11.gif’ alt=’8)’ /> set @state_string=dbo.pfn_split(@rules_string,’;’,9) select * from pfn_app_prd.VW_PFN_FIELD_REP a where a.role_cd in (replace(@role_string,’#’,a.role_cd)) and a.cluster_cd in (replace(@cluster_string,’#’,a.cluster_cd)) and a.division_cd in (replace(@division_string,’#’,a.division_cd)) and a.region_cd in (replace(@region_string,’#’,a.region_cd)) and a.district_cd in (replace(@district_string,’#’,a.district_cd)) and a.is_dce in (replace(@dce_string,’#’,a.is_dce)) and a.masters_fg in (replace(@masters_string,’#’,a.masters_fg)) and a.mailing_state in (replace(@state_string,’#’,a.mailing_state)) This works fine when there are no intra value separators but when there are intra value separators then matching against any of the user attributes does not return proper results since the IN condition is looking for single quote separated values for varchar. In other words if the rule_id is: ‘100;4:5;SE;#;#;#;#;#;#;#' then the cluster string will be : 4:5 which will always fail since the equate condition ina.cluster_cd should be something like (‘4′,’5’). Now I can try using the pfn_split function again but for that I need to loop and I dont really like that idea. Suggestions/Inputs.. Thanx Nitin

Hey, you didn’t tell in your first post that you are stuck with the design. [<img src=’/community/emoticons/emotion-5.gif’ alt=’

‘ />] Probably this one:<a target="_blank" href=http://www.sommarskog.se/arrays-in-sql.html>http://www.sommarskog.se/arrays-in-sql.html</a> will help you determining the "best" approach to split a string. – Frank Kalis Microsoft SQL Server MVP http://www.insidesql.de</a> Ich unterstÃ¼tze PASS Deutschland e.V. <a target="_blank" href=http://www.sqlpass.de>http://www.sqlpass.de</a>)

]]>