Data Cleansing | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Data Cleansing

We are in a process of cleansing task of one critical business table which consists of millions of records.This table has 30 columns.Recently we found that when we export data from this table to a text file,some rows get break into 2 lines. Doing a through observation I found that some of the columns have data with ‘carriage return’ i.e char(13) data. I am wondering is there is a way in which I can identify all such carriage returns and replace them (update them) with a single ‘space’
I neva worked with this function before,I tried writing some adhoc scripts but they seems to not work in my scenerio. I will appreciate any kind of solution to get rid of this issue.
You can take help of BULK INSERT or BCP in this case by defining the FORMATFILE option where you can specify the columns if they are not changed. Otherwise you have to import the rows to temptable and find for blank column rows and update the previous record column completely, not a good one to deal. Satya SKJ
Microsoft SQL Server MVP
Writer, Contributing Editor & Moderator
http://www.SQL-Server-Performance.Com
This posting is provided AS IS with no rights for the sake of knowledge sharing. The greatest discovery of my generation is that a human being can alter his life by altering his attitudes of mind.
Thanks Satya for your response….but I am looking for some different solution.I am presenting sample script so that you can get my situation. I got a technique to display all those records which have ‘carriage return’ in their column Select * from table
where CharIndex(Char(13), col2) > 0 And then updating those columns with a space update table
set col2 = replace(col2,char(13),’ ‘)
where CharIndex(Char(13), col2) > 0
go Now it is updating the column with ‘space’ but still the ‘Alt+Enter’ i.e new line is still there . Folks can some one suggest me any kinda of function which can bring data of this column in a single line
Test for CHAR(13) + CHAR(10). The end of a line in text is usually two characters, CRLF (Carriage Return + Line Feed). Yes, CHAR(13) comes before CHAR(10) – that’s just the way it is.
Hi Folks, Thanks for all your help.Got stucked at a point When I execute
Select * from business
where CharIndex(Char(13)+ Char(10),’col3′) > 0 –for individual columns say col3 ,I get desired output But when I pass the column name as a variable i am getting ‘0’ rows effected
I am wondering what I am doing wrong.Here is the script I hv written CREATE PROCEDURE usp_rows_with_CarriageReturn
As
declare @column_name varchar(200) declare column_read_cursor cursor for
select name from sys.syscolumns where id=’1445580188′–Table id open column_read_cursor
fetch next from column_read_cursor into @column_name while @@fetch_status=0
begin
print @column_name
Select * from business
where CharIndex(Char(13)+ Char(10),’@column_name’) > 0 fetch next from column_read_cursor into @column_name
end
close column_read_cursor
deallocate column_read_cursor
The second parameter for the CHARINDEX can be a string, but you can’t use a string to represent a column name. What you’re doing here is checking whether this string of characters – which just happens to be the name of a column – contains a CRLF. Instead of the string, just put in the column name without the string delimiters: SELECT *
FROM business B
WHERE CharIndex(Char(13)+ Char(10), B.col3) > 0 If you want to clean up a number of columns, I would do it one column at a time, and perhaps the easiest syntax would be this: UPDATE B
SET col3 = REPLACE(col3, CHAR(13) + CHAR(10), ‘ ‘)
FROM business B
WHERE CharIndex(Char(13)+ Char(10), B.col3) > 0 Not entirely sure whether this would work for TEXT columns too.
Yes Adriaan..If I execute the select script one by one for each column ,its working fine but when ,i pass the column name as a variable ..its creating problem.
My table has 30 + columns so its a time consuming task to run this for every column.n the biggest problem is that ,this table is subject to frequent change in column name and no of columns this is why ,my manager wants to have this in a script.
Use dynamic SQL … DECLARE @TblName SYSNAME, @ColName SYSNAME, @SQL VARCHAR(500) …….set your @TblName and @ColName variables……. SET @SQL = ‘UPDATE T SET [‘ + @ColName + ‘] = REPLACE([‘ + @ColName + ‘], CHAR(13) + CHAR(10), ”’ ”’) FROM [‘
+ @TblName + ‘] T WHERE CHARINDEX(CHAR(13) + CHAR(10), T.[‘ + @ColName + ‘]) > 0’ EXEC (@SQL)

]]>