SQL Server Performance

Wrong Spelling

Discussion in 'General Developer Questions' started by dineshasanka, Apr 26, 2005.

  1. dineshasanka Moderator

    Hi,
    I would like to give an option to users to return nessery records even though they have submit wrong parameters

    For Example,
    My database it is store as 'Dinesh' even if Users submit 'Diensh' I need to give the above record.
    is there any similairity mathcing function in SQL Server.
    I couldn't find it
    Thankx for the Help
  2. satya Moderator

  3. dineshasanka Moderator

    Satya,
    Thank you for th epormpt reply.
    But I have done a mistake here
    Actually it not the speeling that I need. Because I need to search the name column
    Therefore, I need to find out the some pattern matching way


    quote:Originally posted by satya

    http://www.microsoft.com/india/msdn/articles/116.aspx
    http://www.wwwcoder.com/main/parentid/207/site/2148/68/default.aspx
    .. relevant articles.

    Satya SKJ
    Moderator
    http://www.SQL-Server-Performance.Com/forum
    This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
  4. Madhivanan Moderator

    Will soundex function do this?


    Madhivanan

    Failing to plan is Planning to fail
  5. dineshasanka Moderator

    Yes it will,
    That is the function that I was looking for Tahnkx Madhivan
  6. FrankKalis Moderator

    I haven't dealt with SOUNDEX(), but I remember from reading Ken Henderson's "Guru Guide to T-SQL" that he mentioned the algorithm used is not bug-free and efficient. You might want to search the web for a better one.

    --
    Frank Kalis
    SQL Server MVP
    http://www.insidesql.de
  7. benwilson New Member

    I have tried using SOUNDEX and DIFFERENCE and found they can give some very misleading results- eg 'Dl' and 'paoeul' return a difference of 3 (4 being the best match) while obviously they are very difference. Soundex ignores all vowels (unless they are the 1st letter i think) and all double letters, so 'dl' and 'paoeulllllll' also returns 3. We have found that we need to accept a DIFFERENCE() of 4 only to return reasonably similar results.

    Double Metaphone sounds like a better way of allowing for spelling mistakes http://www.codeproject.com/database/dmetaphone4.asp but i was unable to get it to work (probably just me doing something stupid).

    I also wrote a function to compare 2 strings, and the number of each letter in them that returns how many characters are different regardless of their order (so 'Ben' and 'Bne' would return 0:


    CREATE FUNCTION [dbo].[CompareStrings] (@FirstString nvarchar(4000), @SecondString nvarchar(4000))
    RETURNS INT
    AS

    BEGIN

    DECLARE @Difference AS INT,
    @MaxLength AS INT,
    @Position AS INT

    --Create Temporary table to hold the count of letters in words
    DECLARE @LettersInString TABLE
    (
    Letter CHAR(1) NOT NULL,
    FirstStringCount INT NOT NULL DEFAULT 0,
    SecondStringCount INT NOT NULL DEFAULT 0
    )

    --Difference is the counter of characters that are different in the 2 strings
    SET @Difference = 0
    --Position is the position of the character in the string that is being compared
    SET @Position = 1

    --determine the length of the longest string
    IF len(@FirstString) < len(@SecondString)
    SET @MaxLength = len(@SecondString)
    ELSE
    SET @MaxLength = len(@FirstString)

    --loop through each string until the maximum length is reached
    WHILE @Position <= @MaxLength
    BEGIN
    --check to see if the character in the first string is in the temp table. If not add
    IF NOT EXISTS (SELECT Letter FROM @LettersInString WHERE Letter = SUBSTRING(@FirstString, @Position, 1))
    INSERT INTO @LettersInString SELECT SUBSTRING(@FirstString, @Position, 1), 0, 0

    --add 1 to the count of the character in the temp table
    UPDATE @LettersInString
    SET FirstStringCount = FirstStringCount + 1
    WHERE Letter = SUBSTRING(@FirstString, @Position, 1)

    --check to see if the character in the second string is in the temp table. If not add
    IF NOT EXISTS (SELECT Letter FROM @LettersInString WHERE Letter = SUBSTRING(@SecondString, @Position, 1))
    INSERT INTO @LettersInString SELECT SUBSTRING(@SecondString, @Position, 1), 0, 0

    --add 1 to the count of the character in the temp table
    UPDATE @LettersInString
    SET SecondStringCount = SecondStringCount + 1
    WHERE Letter = SUBSTRING(@SecondString, @Position, 1)

    --check to see if the same position in each string holds the same character
    IF SUBSTRING(@FirstString, @Position, 1) <> SUBSTRING(@SecondString, @Position, 1)
    SET @Difference = @Difference + 1

    --move to the next position
    SET @Position = @Position + 1
    END

    --If each string contains the same number of each letters and there are 2 differences, this will be counted as 1 difference as it is likely a typo: EG 'String' and 'Strnig'
    IF (SELECT COUNT(Letter) FROM @LettersInString WHERE FirstStringCount <> SecondStringCount) = 0 AND @Difference = 2
    SET @Difference = 1

    RETURN @Difference
    END

    Its probably not perfect, but we have found it quite useful in combination with SOUNDEX/DIFFERENCE

    Hope this helps!
  8. dineshasanka Moderator

    Thankx ben,
    Yes I also found that Soundex is not the perfect funtion I need. AS you said It can mislead. I will try your fucntion.
    Thankx Again

    quote:Originally posted by benwilson

    I have tried using SOUNDEX and DIFFERENCE and found they can give some very misleading results- eg 'Dl' and 'paoeul' return a difference of 3 (4 being the best match) while obviously they are very difference. Soundex ignores all vowels (unless they are the 1st letter i think) and all double letters, so 'dl' and 'paoeulllllll' also returns 3. We have found that we need to accept a DIFFERENCE() of 4 only to return reasonably similar results.

    Double Metaphone sounds like a better way of allowing for spelling mistakes http://www.codeproject.com/database/dmetaphone4.asp but i was unable to get it to work (probably just me doing something stupid).

    I also wrote a function to compare 2 strings, and the number of each letter in them that returns how many characters are different regardless of their order (so 'Ben' and 'Bne' would return 0:


    CREATE FUNCTION [dbo].[CompareStrings] (@FirstString nvarchar(4000), @SecondString nvarchar(4000))
    RETURNS INT
    AS

    BEGIN

    DECLARE @Difference AS INT,
    @MaxLength AS INT,
    @Position AS INT

    --Create Temporary table to hold the count of letters in words
    DECLARE @LettersInString TABLE
    (
    Letter CHAR(1) NOT NULL,
    FirstStringCount INT NOT NULL DEFAULT 0,
    SecondStringCount INT NOT NULL DEFAULT 0
    )

    --Difference is the counter of characters that are different in the 2 strings
    SET @Difference = 0
    --Position is the position of the character in the string that is being compared
    SET @Position = 1

    --determine the length of the longest string
    IF len(@FirstString) < len(@SecondString)
    SET @MaxLength = len(@SecondString)
    ELSE
    SET @MaxLength = len(@FirstString)

    --loop through each string until the maximum length is reached
    WHILE @Position <= @MaxLength
    BEGIN
    --check to see if the character in the first string is in the temp table. If not add
    IF NOT EXISTS (SELECT Letter FROM @LettersInString WHERE Letter = SUBSTRING(@FirstString, @Position, 1))
    INSERT INTO @LettersInString SELECT SUBSTRING(@FirstString, @Position, 1), 0, 0

    --add 1 to the count of the character in the temp table
    UPDATE @LettersInString
    SET FirstStringCount = FirstStringCount + 1
    WHERE Letter = SUBSTRING(@FirstString, @Position, 1)

    --check to see if the character in the second string is in the temp table. If not add
    IF NOT EXISTS (SELECT Letter FROM @LettersInString WHERE Letter = SUBSTRING(@SecondString, @Position, 1))
    INSERT INTO @LettersInString SELECT SUBSTRING(@SecondString, @Position, 1), 0, 0

    --add 1 to the count of the character in the temp table
    UPDATE @LettersInString
    SET SecondStringCount = SecondStringCount + 1
    WHERE Letter = SUBSTRING(@SecondString, @Position, 1)

    --check to see if the same position in each string holds the same character
    IF SUBSTRING(@FirstString, @Position, 1) <> SUBSTRING(@SecondString, @Position, 1)
    SET @Difference = @Difference + 1

    --move to the next position
    SET @Position = @Position + 1
    END

    --If each string contains the same number of each letters and there are 2 differences, this will be counted as 1 difference as it is likely a typo: EG 'String' and 'Strnig'
    IF (SELECT COUNT(Letter) FROM @LettersInString WHERE FirstStringCount <> SecondStringCount) = 0 AND @Difference = 2
    SET @Difference = 1

    RETURN @Difference
    END

    Its probably not perfect, but we have found it quite useful in combination with SOUNDEX/DIFFERENCE

    Hope this helps!

Share This Page