SQL Server Performance Forum – Threads Archive

INSERT INTO to create add. rows-Perf. question

I started a new job where they have stored procedures that use an INSERT INTO … SELECT with the same table in both parts (sort of a recursive insert). The SELECT part changes some of the data, so there aren’t actually duplicate rows. It seems that this is a performance hit, and it might be better to create a temp table to use in the SELECT part. It seems a NOLOCK hint wouldn’t be appropriate. I’ve researched this issue, but can’t find anything. Has anyone seen anything like this? The SQL is something like this: INSERT INTO table a(
…
)
SELECT
… <changes values of some fields here>
FROM table a
WHERE …

Not really a recursive insert: the data that is being inserted will not be re-inserted – that would cause an infinite loop. To avoid that, the SELECT part can only consider rows that exist in the table before the INSERT action. So not to worry! The performance hit might be to do with the database growth settings, with the fill factor of your indexes, with the data on the column(s) selected for your clustered index, with the autoupdate setting for statistics, etc. etc. See if you can limit the insert to perhaps 1000 rows, and loop until all rows have been done – add a NOT EXISTS subquery to your WHERE statement to avoid creating duplicates.

I think no need to write NOT EXISTS to WHERE clause to avoid duplicates. Simply write â€œSELECT DISTINCTâ€ Instead of SELECT. And I couldn#%92t understand why you want to do this in loop by limiting the insert to 1000 rows. Ashish Johri

You need NOT EXISTS if you do the insert in batches, otherwise you will create duplicates. You could also split the batches by ging for an range of identity values – whatever. The NOT EXISTS option means you don’t have to program the loop with variables and stuff – you just need a loop: WHILE EXISTS (SELECT … WHERE NOT EXISTS …)
BEGIN
INSERT INTO … (…) SELECT … WHERE NOT EXISTS …
END This will exit when the last rows have been inserted. The reason to do it in batches of 1000 is because of the (presumed) file growth and index update issues.

Fine Adriaan, Instead of using Whie Exists logic recommended by you, can we use the code as below: SELECT DISTINCT
COL1
,COL2….
INSERT INTO TABLE1
COL1 = some logic
,COL2 = some logic….
FROM <SOME JOIN CONDITION>
WHERE <SOME LOGIC> I feel we can use DISTINCT keyword to avoid complexity in the code. Also Adriaan, I feel when I go as per your logic, I repeat the WHERE clause. If there are many joining conditions then this is the extra effort we are asking the SQL Server to do for us. Regards,
Ashish Ashish Johri

The extra effort for the insertion, provided you have covering indexes for the NOT EXISTS to be resolved quickly, is negligeable.

Thanks everyone — BUT I’m not concerned with how to do the insert (that I know) BUT!! if it’s a good idea, or if there’s a performance hit, when doing an INSERT INTO a table while doing a SELECT … FROM the same table. It sounds like it’s not necesarily something to avoid.

Depends if you have a lot of other inserts/updates on the same table from different users/processes.

…and if it is an ongoing process, that must be concerned. Satya SKJ
Microsoft SQL Server MVP
Writer, Contributing Editor & Moderator
http://www.SQL-Server-Performance.Com
This posting is provided AS IS with no rights for the sake of knowledge sharing. The greatest discovery of my generation is that a human being can alter his life by altering his attitudes of mind.

]]>