SQL Server Performance Forum – Threads Archive

Randomising an Entire Table

Does anyone have any scripts for randomising data within a table? I have a table of client names and I want to randomly swap them about to prevent data protection issues. I could do this easily by opening a cursor of the whole table and then for each row in the cursor randomly updating it to another name from the table by using the NEWID() function to ensure randomness. However this would be slow and ideally I would like to do it with one SQL statement. I tried this one but obviously it will update all rows in the table to the same value. update test
set name = (select top 1 name from test order by newid()) anybody have anything else?

create table #firstname
(fname varchar(30)) create table #lastname
(lname varchar(30)) insert into #firstname values (‘John’)
insert into #firstname values (‘Tom’)
… etc insert into #lastname values (‘Smith’)
insert into #lastname values (‘Doe’)
… etc select identity (int, 1, 1) as ident, fname + ‘ ‘ + lname fullname
into #allnames
from #firstname cross join #lastname select * from #allnames update test
set [name] = fullname
from #allnames
where test.<id column> = #allnames.ident

How big is the table? how often will you need to "randomize" it? Do you need them to be different every time you read the data? or is it a one time thing you need to do.
If it’s a one time thing that will remain fixed, you could do the following SELECT *
INTO new_table
FROM old_table
ORDER BY NEWID() Drop old_table and rename new_table to old_table. Otherwise a simple select with ORDER BY NEWID() should do. Bambola.

alternatively, although not speedy, you can join the table to itself twice, like this, and use the names that way (this is the select, you can figure out the update) select t3.firstname, t2.lastname
from nametable t1
join nametable t2 on t1.tableid = t2.tablid – 1
join nametable t3 on t2.tableid = t3.tablid + 3 or some other +/- combination of ids. this works best if you dont have gaps in your ids, and you will have at least one record not updated (t1.tableid = 1), but it would work. We have quest central for databases, and they have a tool that does exactly what you want to randomize fields (you can chose the fields). Havent tried it yet, but it looks like it will save us some time.
chris

Thanks for the feedback. The solutions mentioned rely on a consistent and incremental primary key value which I do not have. The table will have been built over time and had periods of migration work on it so the keys will be very scattered. Therefore using any king of identity value to join to it or -1/+3 type approach would not work. The second idea could possibly work. What I would have to do is create a temporary table that is randomised on the surname field. Then I would have to step through the entire original table and the temp table and swap surnames for every row. This is because randomising would not change the primary key, only the order of the rows. I’d need to be able to break the link with the primary key in some way.

]]>