Remove ASCII Char's | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Remove ASCII Char’s

The below function is used to remove the unwanted ASCII characters from the given string.
This function works very slow because it checks character by character. Is there any alternative to do this. Create FUNCTION dbo.Remove_ASCII_Char (@CharData varchar(8000))
RETURNS varchar(8000)
AS
BEGIN
DECLARE @Index INT,
@strASCII varchar(8000),
@strASCIIChar INT
— Initialization code
SET @Index = 1
SET @strASCII = ” WHILE @Index < LEN(@CharData)+1
BEGIN
SET @strASCIIChar = ASCII(SUBSTRING(@CharData, @Index, 1))
IF @strASCIIChar in (9,10,13) or @strASCIIChar BETWEEN 32 and 126
BEGIN
— Construct the output characters
SET @strASCII = @strASCII + CHAR(@strASCIIChar)
END
SET @Index = @Index + 1
END
RETURN @strASCII
END Surendra Kalekar
Cant you handle this in your Presentation Layer? Madhivanan Failing to plan is Planning to fail
Our development team tried that but in C# webservice it is not possible.
quote:Originally posted by Madhivanan Cant you handle this in your Presentation Layer? Madhivanan Failing to plan is Planning to fail
Surendra Kalekar
SELECT DISTINCT b.rfpid, b.locstate, b.soltype, dbo.ASCIICharOnly(b.title)
FROM rfp b with (NOLOCK) WHERE …………. This query fetches around 15000 rows in 25 seconds
Without using the function dbo.ASCIICharOnly it shows result in less than 1 seconds.
Surendra Kalekar
If the number of characters in the given string is significantly larger than the number of different characters that you’re replacing, then simply do a REPLACE for each of those characters. You should probably precede each REPLACE call with a PATINDEX call to improve response time, as the REPLACE function can be time consuming. But if the given string is relatively short, your current code will probably perform better than the series of REPLACE operations.
… and then I notice that the number of different characters to be removed is probably bigger than the number of different characters to remain. You know the ASCII number ranges to be removed, so you do a loop through those values and do a REPLACE on them … The rule about the relative length still applies, but anyway here’s the version for long strings: Create FUNCTION dbo.Remove_ASCII_Char (@CharData varchar(8000))
RETURNS varchar(8000)
AS
BEGIN
DECLARE @Index INT,
@strASCII varchar(8000),
@strASCIIChar INT
— Initialization code
SET @strASCII = ” SET @strASCIIChar = 1
WHILE @strASCIIChar < 9
BEGIN
IF CHARINDEX(CHAR(@strASCIIChar), @CharData) > 0 SET @strASCIIChar = REPLACE(@CharData, CHAR(@strASCIIChar), ”)
SET @strASCIIChar = @strASCIIChar + 1
END IF CHARINDEX(CHAR(@11), @CharData) > 0 SET @strASCIIChar = REPLACE(@CharData, CHAR(11), ”)
IF CHARINDEX(CHAR(@12), @CharData) > 0 SET @strASCIIChar = REPLACE(@CharData, CHAR(12), ”) SET @strASCIIChar = 14
WHILE @strASCIIChar < 32
BEGIN
IF CHARINDEX(CHAR(@strASCIIChar), @CharData) > 0 SET @strASCIIChar = REPLACE(@CharData, CHAR(@strASCIIChar), ”)
SET @strASCIIChar = @strASCIIChar + 1
END SET @strASCIIChar = 127
WHILE @strASCIIChar < 255
BEGIN
IF CHARINDEX(CHAR(@strASCIIChar), @CharData) > 0 SET @strASCIIChar = REPLACE(@CharData, CHAR(@strASCIIChar), ”)
SET @strASCIIChar = @strASCIIChar + 1
END RETURN @strASCII END

Average length of the string is around 300 characters and max it will be 8000 character long.
Can you please let me know how I can use REPLACE with PATINDEX fucntion in this case.
quote:Originally posted by Adriaan If the number of characters in the given string is significantly larger than the number of different characters that you’re replacing, then simply do a REPLACE for each of those characters. You should probably precede each REPLACE call with a PATINDEX call to improve response time, as the REPLACE function can be time consuming. But if the given string is relatively short, your current code will probably perform better than the series of REPLACE operations.
Surendra Kalekar
You’ll notice that I’ve posted a script using the CHARINDEX function – which in this case is a bit easier to use than PATINDEX. Your average of 300 characters probably means that most of the time the string will be shorter. Create my UDF under a different name, then run one query that calls your UDF for all records, and one query that calls my UDF for all records, and compare the execution times.
Adriaan, Out of interest why can’t you just use REPLACE? Why do you need to use CHARINDEX at all? Surely you could just do the following? As all he wants to do is remove the characters. Or does CHARINDEX work much quicker than REPLACE? Create FUNCTION dbo.Remove_ASCII_Char (@CharData varchar(8000))
RETURNS varchar(8000)
AS
BEGIN
DECLARE @Index INT,
@strASCII varchar(8000),
@strASCIIChar INT
— Initialization code
SET @strASCII = ” SET @strASCIIChar = 1
WHILE @strASCIIChar < 9
BEGIN
SET @CharData = REPLACE(@CharData, CHAR(@strASCIIChar), ”)
SET @strASCIIChar = @strASCIIChar + 1
END SET @CharData = REPLACE(@CharData, CHAR(11), ”)
SET @CharData = REPLACE(@CharData, CHAR(12), ”) SET @strASCIIChar = 14
WHILE @strASCIIChar < 32
BEGIN
SET @CharData = REPLACE(@CharData, CHAR(@strASCIIChar), ”)
SET @strASCIIChar = @strASCIIChar + 1
END SET @strASCIIChar = 127
WHILE @strASCIIChar < 255
BEGIN
SET @CharData = REPLACE(@CharData, CHAR(@strASCIIChar), ”)
SET @strASCIIChar = @strASCIIChar + 1
END RETURN @CharData END Regards, Robert.

1. select top 10000 n.rfpid, dbo.ASCIICharOnly(n.title), dbo.ASCIICharOnly(n.titlepublic) from rfp n (Retrieves 10000 rows in 9 seconds)
2. select top 10000 n.rfpid, dbo.Remove_ASCII_Char(n.title), dbo.Remove_ASCII_Char(n.titlepublic) from rfp n (Retrieves 10000 rows in 14 seconds) Adriaan, Second query is your solution. It is working but, taking 5 seconds more than the existing one.
quote:Originally posted by Adriaan … and then I notice that the number of different characters to be removed is probably bigger than the number of different characters to remain. You know the ASCII number ranges to be removed, so you do a loop through those values and do a REPLACE on them … The rule about the relative length still applies, but anyway here’s the version for long strings: Create FUNCTION dbo.Remove_ASCII_Char (@CharData varchar(8000))
RETURNS varchar(8000)
AS


END
Surendra Kalekar
select top 10000 n.rfpid, dbo.Remove_ASCII_Char(n.title), dbo.Remove_ASCII_Char(n.titlepublic) from rfp n
Hi Robert,
I checked without charindex also but it’s taking around 28 seconds to display 10000 rows.
Adriaan, Out of interest why can’t you just use REPLACE? Why do you need to use CHARINDEX at all? Surely you could just do the following? As all he wants to do is remove the characters. Or does CHARINDEX work much quicker than REPLACE? Regards, Robert.
Surendra Kalekar
As expected! My function does about 200 iterations for each string, regardless of the actual string length. Your function does as many iterations as the number of characters in the string, which is probably lower. Only when your average string length grows significantly over 300 characters, then you might see better performance – and you have to consider that the REPLACE function is just not very quick in and by itself. The result of this query …
SELECT SUM(LEN(n.title)) FROM (SELECT TOP 1000 title FROM rfp) AS n
… is probably lower than 200,000. By the way, if you want to test with TOP 1000, for reliability you should add an ORDER BY clause, otherwise there is no guarantee that the same rows are being used in both queries.

Surendra & Robert, My function was more a "proof of concept" than an optimized piece of code. The original question was whether there were other ways of doing this, and my code was just to show one method – and it looks as if it’s a very poor performer! The CHARINDEX function only functions as a check whether the character exists in the string, but you can of course get the response time without the test. You can also test if IF @CharData LIKE ‘%’ + ASCII(@strASCIICHar) + ‘%’ instead of the CHARINDEX call performs any better.

Performance of IF @Chardata like … is very slow. Even compare to CHARINDEX.
Result of SELECT SUM(LEN(n.title)) FROM (SELECT TOP 1000 title FROM rfp) AS n is less than 100,000 characters.
quote:Originally posted by Adriaan Surendra & Robert, My function was more a "proof of concept" than an optimized piece of code. The original question was whether there were other ways of doing this, and my code was just to show one method – and it looks as if it’s a very poor performer! The CHARINDEX function only functions as a check whether the character exists in the string, but you can of course get the response time without the test. You can also test if IF @CharData LIKE ‘%’ + ASCII(@strASCIICHar) + ‘%’ instead of the CHARINDEX call performs any better.
Surendra Kalekar
<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by surendrakalekar</i><br />Result of SELECT SUM(LEN(n.title)) FROM (SELECT TOP 1000 title FROM rfp) AS n is less than 100,000 characters.<hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote">Again, this was expected.<br /><br />Now compare the results for all rows where LEN(title) &gt; 1000 … that might give you a different result – but then again it might not.<br /><br />I never said my function would perform well.[<img src=’/community/emoticons/emotion-2.gif’ alt=’:D‘ />]
]]>