SQL Server Performance

Remove ASCII Char's

Discussion in 'General DBA Questions' started by surendrakalekar, Sep 26, 2005.

  1. surendrakalekar New Member

    The below function is used to remove the unwanted ASCII characters from the given string.
    This function works very slow because it checks character by character. Is there any alternative to do this.

    Create FUNCTION dbo.Remove_ASCII_Char (@CharData varchar(8000))
    RETURNS varchar(8000)
    AS
    BEGIN
    DECLARE @Index INT,
    @strASCII varchar(8000),
    @strASCIIChar INT
    -- Initialization code
    SET @Index = 1
    SET @strASCII = ''

    WHILE @Index < LEN(@CharData)+1
    BEGIN
    SET @strASCIIChar = ASCII(SUBSTRING(@CharData, @Index, 1))
    IF @strASCIIChar in (9,10,13) or @strASCIIChar BETWEEN 32 and 126
    BEGIN
    -- Construct the output characters
    SET @strASCII = @strASCII + CHAR(@strASCIIChar)
    END
    SET @Index = @Index + 1
    END
    RETURN @strASCII
    END



    Surendra Kalekar

  2. Madhivanan Moderator

    Cant you handle this in your Presentation Layer?

    Madhivanan

    Failing to plan is Planning to fail
  3. surendrakalekar New Member

    Our development team tried that but in C# webservice it is not possible.


    quote:Originally posted by Madhivanan

    Cant you handle this in your Presentation Layer?

    Madhivanan

    Failing to plan is Planning to fail


    Surendra Kalekar

  4. surendrakalekar New Member

    SELECT DISTINCT b.rfpid, b.locstate, b.soltype, dbo.ASCIICharOnly(b.title)
    FROM rfp b with (NOLOCK) WHERE .............

    This query fetches around 15000 rows in 25 seconds
    Without using the function dbo.ASCIICharOnly it shows result in less than 1 seconds.


    Surendra Kalekar

  5. Adriaan New Member

    If the number of characters in the given string is significantly larger than the number of different characters that you're replacing, then simply do a REPLACE for each of those characters. You should probably precede each REPLACE call with a PATINDEX call to improve response time, as the REPLACE function can be time consuming.

    But if the given string is relatively short, your current code will probably perform better than the series of REPLACE operations.
  6. Adriaan New Member

    ... and then I notice that the number of different characters to be removed is probably bigger than the number of different characters to remain. You know the ASCII number ranges to be removed, so you do a loop through those values and do a REPLACE on them ...

    The rule about the relative length still applies, but anyway here's the version for long strings:

    Create FUNCTION dbo.Remove_ASCII_Char (@CharData varchar(8000))
    RETURNS varchar(8000)
    AS
    BEGIN
    DECLARE @Index INT,
    @strASCII varchar(8000),
    @strASCIIChar INT
    -- Initialization code
    SET @strASCII = ''

    SET @strASCIIChar = 1
    WHILE @strASCIIChar < 9
    BEGIN
    IF CHARINDEX(CHAR(@strASCIIChar), @CharData) > 0 SET @strASCIIChar = REPLACE(@CharData, CHAR(@strASCIIChar), '')
    SET @strASCIIChar = @strASCIIChar + 1
    END

    IF CHARINDEX(CHAR(@11), @CharData) > 0 SET @strASCIIChar = REPLACE(@CharData, CHAR(11), '')
    IF CHARINDEX(CHAR(@12), @CharData) > 0 SET @strASCIIChar = REPLACE(@CharData, CHAR(12), '')

    SET @strASCIIChar = 14
    WHILE @strASCIIChar < 32
    BEGIN
    IF CHARINDEX(CHAR(@strASCIIChar), @CharData) > 0 SET @strASCIIChar = REPLACE(@CharData, CHAR(@strASCIIChar), '')
    SET @strASCIIChar = @strASCIIChar + 1
    END

    SET @strASCIIChar = 127
    WHILE @strASCIIChar < 255
    BEGIN
    IF CHARINDEX(CHAR(@strASCIIChar), @CharData) > 0 SET @strASCIIChar = REPLACE(@CharData, CHAR(@strASCIIChar), '')
    SET @strASCIIChar = @strASCIIChar + 1
    END

    RETURN @strASCII

    END
  7. surendrakalekar New Member

    Average length of the string is around 300 characters and max it will be 8000 character long.
    Can you please let me know how I can use REPLACE with PATINDEX fucntion in this case.


    quote:Originally posted by Adriaan

    If the number of characters in the given string is significantly larger than the number of different characters that you're replacing, then simply do a REPLACE for each of those characters. You should probably precede each REPLACE call with a PATINDEX call to improve response time, as the REPLACE function can be time consuming.

    But if the given string is relatively short, your current code will probably perform better than the series of REPLACE operations.


    Surendra Kalekar

  8. Adriaan New Member

    You'll notice that I've posted a script using the CHARINDEX function - which in this case is a bit easier to use than PATINDEX.

    Your average of 300 characters probably means that most of the time the string will be shorter. Create my UDF under a different name, then run one query that calls your UDF for all records, and one query that calls my UDF for all records, and compare the execution times.
  9. RGKN New Member

    Adriaan,

    Out of interest why can't you just use REPLACE? Why do you need to use CHARINDEX at all?

    Surely you could just do the following? As all he wants to do is remove the characters. Or does CHARINDEX work much quicker than REPLACE?

    Create FUNCTION dbo.Remove_ASCII_Char (@CharData varchar(8000))
    RETURNS varchar(8000)
    AS
    BEGIN
    DECLARE @Index INT,
    @strASCII varchar(8000),
    @strASCIIChar INT
    -- Initialization code
    SET @strASCII = ''

    SET @strASCIIChar = 1
    WHILE @strASCIIChar < 9
    BEGIN
    SET @CharData = REPLACE(@CharData, CHAR(@strASCIIChar), '')
    SET @strASCIIChar = @strASCIIChar + 1
    END

    SET @CharData = REPLACE(@CharData, CHAR(11), '')
    SET @CharData = REPLACE(@CharData, CHAR(12), '')

    SET @strASCIIChar = 14
    WHILE @strASCIIChar < 32
    BEGIN
    SET @CharData = REPLACE(@CharData, CHAR(@strASCIIChar), '')
    SET @strASCIIChar = @strASCIIChar + 1
    END

    SET @strASCIIChar = 127
    WHILE @strASCIIChar < 255
    BEGIN
    SET @CharData = REPLACE(@CharData, CHAR(@strASCIIChar), '')
    SET @strASCIIChar = @strASCIIChar + 1
    END

    RETURN @CharData

    END

    Regards,

    Robert.
  10. surendrakalekar New Member

    1. select top 10000 n.rfpid, dbo.ASCIICharOnly(n.title), dbo.ASCIICharOnly(n.titlepublic) from rfp n (Retrieves 10000 rows in 9 seconds)
    2. select top 10000 n.rfpid, dbo.Remove_ASCII_Char(n.title), dbo.Remove_ASCII_Char(n.titlepublic) from rfp n (Retrieves 10000 rows in 14 seconds)

    Adriaan, Second query is your solution. It is working but, taking 5 seconds more than the existing one.


    quote:Originally posted by Adriaan

    ... and then I notice that the number of different characters to be removed is probably bigger than the number of different characters to remain. You know the ASCII number ranges to be removed, so you do a loop through those values and do a REPLACE on them ...

    The rule about the relative length still applies, but anyway here's the version for long strings:

    Create FUNCTION dbo.Remove_ASCII_Char (@CharData varchar(8000))
    RETURNS varchar(8000)
    AS
    ---
    ---
    END


    Surendra Kalekar

  11. surendrakalekar New Member

    select top 10000 n.rfpid, dbo.Remove_ASCII_Char(n.title), dbo.Remove_ASCII_Char(n.titlepublic) from rfp n
    Hi Robert,
    I checked without charindex also but it's taking around 28 seconds to display 10000 rows.


    Adriaan,

    Out of interest why can't you just use REPLACE? Why do you need to use CHARINDEX at all?

    Surely you could just do the following? As all he wants to do is remove the characters. Or does CHARINDEX work much quicker than REPLACE?

    Regards,

    Robert.




    Surendra Kalekar

  12. Adriaan New Member

    As expected! My function does about 200 iterations for each string, regardless of the actual string length. Your function does as many iterations as the number of characters in the string, which is probably lower.

    Only when your average string length grows significantly over 300 characters, then you might see better performance - and you have to consider that the REPLACE function is just not very quick in and by itself.

    The result of this query ...
    SELECT SUM(LEN(n.title)) FROM (SELECT TOP 1000 title FROM rfp) AS n
    ... is probably lower than 200,000.

    By the way, if you want to test with TOP 1000, for reliability you should add an ORDER BY clause, otherwise there is no guarantee that the same rows are being used in both queries.
  13. Adriaan New Member

    Surendra & Robert,

    My function was more a "proof of concept" than an optimized piece of code. The original question was whether there were other ways of doing this, and my code was just to show one method - and it looks as if it's a very poor performer!

    The CHARINDEX function only functions as a check whether the character exists in the string, but you can of course get the response time without the test.

    You can also test if IF @CharData LIKE '%' + ASCII(@strASCIICHar) + '%' instead of the CHARINDEX call performs any better.
  14. surendrakalekar New Member

    Performance of IF @Chardata like ... is very slow. Even compare to CHARINDEX.
    Result of SELECT SUM(LEN(n.title)) FROM (SELECT TOP 1000 title FROM rfp) AS n is less than 100,000 characters.


    quote:Originally posted by Adriaan

    Surendra & Robert,

    My function was more a "proof of concept" than an optimized piece of code. The original question was whether there were other ways of doing this, and my code was just to show one method - and it looks as if it's a very poor performer!

    The CHARINDEX function only functions as a check whether the character exists in the string, but you can of course get the response time without the test.

    You can also test if IF @CharData LIKE '%' + ASCII(@strASCIICHar) + '%' instead of the CHARINDEX call performs any better.



    Surendra Kalekar

  15. Adriaan New Member

    <blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by surendrakalekar</i><br />Result of SELECT SUM(LEN(n.title)) FROM (SELECT TOP 1000 title FROM rfp) AS n is less than 100,000 characters.<hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote">Again, this was expected.<br /><br />Now compare the results for all rows where LEN(title) &gt; 1000 ... that might give you a different result - but then again it might not.<br /><br />I never said my function would perform well.[<img src='/community/emoticons/emotion-2.gif' alt=':D' />]

Share This Page