SQL Server Performance

Count number of times substring occurs in a column

Discussion in 'Contribute Your Performance and Clustering Tips' started by bradmcgehee, Feb 25, 2004.

  1. bradmcgehee New Member

    Here is a tip to show how to count number of times a substring occurs in a field with simple Transact SQL.

    This functionality is pretty common in languages like VB, but in SQL there#%92s no obvious solution. CharIndex will show that a substring exists, but not how many times.

    This functionality can be very valuable when trying to deal with de-normalized data, such as when you have comma-delimited values inside a field.

    Here#%92s what I came up with: use the replace function to return a string that has an extra space in it for every occurrence of the substring, then compare the length of the new string to the original. It should be one character longer for each occurrence.

    In the example below I include a count of the substring (a delimiting comma):, as well as the values. Your query might be cleaner if nulls aren#%92t allowed.

    CREATE TABLE #StringTest (ID int, StringField varchar(50))

    INSERT INTO #StringTest VALUES(1, null)
    INSERT INTO #StringTest VALUES(2, '')
    INSERT INTO #StringTest VALUES(3, '1')
    INSERT INTO #StringTest VALUES(4, '1,2')
    INSERT INTO #StringTest VALUES(5, '1, 2, 3')
    INSERT INTO #StringTest VALUES(6, '1, 2, 3, 4')

    @SearchString varchar(50)

    SET @SearchString = ','

    len(replace(isnull(StringField, ''), @SearchString, @SearchString + ' ')) - len(isnull(StringField, '')) as DelimeterCount,
    len(replace(isnull(StringField, ''), @SearchString, @SearchString + ' ')) - len(isnull(StringField, '')) + case when ltrim(isnull(StringField, '')) <> '' then 1 else 0 end as ValueCount
    FROM #StringTest

    DROP TABLE #StringTest

    With Dynamic SQL, this could easily be changed into a stored procedure to deal with any table, any fieldname.

    Tip provided by Scott Thompson, who is certified in SQL Server and Delphi, and has worked as a software developer and DBA in the insurance, financial and market research industries.
  2. Madhivanan Moderator

    Here is other method

    Create PROCEDURE FindChar
    (@String nvarchar(200),
    @Sear nvarchar(200))

    Declare @C numeric
    Declare @j int
    Declare @i numeric
    Declare @S nvarchar(100)
    Declare @Sql nvarchar(100)
    Declare @Search nvarchar(100)
    Declare @SearchCount int
    set @Search=@Sear
    set @S=@String
    set @i=len(@S)
    set @SearchCount=len(@Search)
    set @j=1
    set @c=0
    set @sql=substring(@s,@j,@SearchCount)
    if @Sql=@Search
    set @c=@c+1
    set @j=@j+1
    if @j<=@i goto Con

    print 'The number of occurrences of "' + @Search + '" in "' +@S + '" is ' + ltrim(str(@c))

  3. snewton New Member

    Here is a function you can use like the CharIndex() function but it instead it counts the occurances of the first string in the second one.

    -- This function will looks for the number of occurances of @Find in @SourceText
    -- If either parameter is null or blank, returns 0

    CREATE FUNCTION CharCount(@Find varchar(8000) , @SourceText varchar(8000) )
    RETURNS int AS

    if (@Find is null) or (@Find = '') or (@SourceText is null) or (@SourceText = '')
    return 0

    declare @len int
    declare @pos int
    declare @cnt int

    --Break the string into gramSize chunks
    set @len = len(@SourceText)
    set @pos = 1
    set @cnt = 0

    --Add all the grams for this record
    while charindex(@Find, @SourceText, @pos) > 0
    set @cnt = @cnt + 1
    set @pos = charindex(@Find, @SourceText, @pos) + len(@Find)
    return @cnt

  4. FrankKalis Moderator

    While this LEN() method will usually work, be careful when you have such stupid data like

    INSERT INTO #StringTest VALUES(6, '1, 2, 3, 4,')

    Note, the additional comma at the end.
    The original

    len(replace(isnull(StringField, ''), @SearchString, @SearchString + ' ')) - len(isnull(StringField, '')) as DelimeterCount,

    won't catch this and therefore returns.

    ID StringField DelimeterCount ValueCount
    ----------- -------------------------------------------------- -------------- -----------
    6 1, 2, 3, 4, 3 4

    which is wrong. The only workaround I've found so far (though I didn't try very hard to find) is to add a dummy character at the end like this

    len(replace(isnull(StringField, ''), @SearchString, @SearchString + ' ')+'X') - len(isnull(StringField, '')+'X') as DelimeterCount,

    The additional 'X' has the effect, that the query now correctly reports

    ID StringField DelimeterCount ValueCount
    ----------- -------------------------------------------------- -------------- -----------
    6 1, 2, 3, 4, 4 4

    Frank Kalis
    SQL Server MVP

Share This Page