Updating Million Records Efficiently | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Updating Million Records Efficiently

I have two tables in MS SQL SERVER, as following 1. Table1 with 1000 Records with three fields a, b and c 2. Table2 with 13+ Million Records with three fields x, a, b, c, d, e and f Now i want to update the fields Table2.b and Table2.c with Table1.b and Table1.c respectively by field "a" of Table1 and Table2. Currently if i am using a join condition in update statement as given below:
"update Table2 set b = Table1.b, c = Table1.c from Table1, Table2
where Table1.a = Table2.a" This update is taking many hours to do update. Can u please suggest if i have any other options to update the table much faster than this? Thanks in Advance.
Ashish.

Could you provide what indexs do you have in both tables?
Luis Martin
Moderator
SQL-Server-Performance.com All postings are provided “AS IS” with no warranties for accuracy.
Also, are you doing this on a regular basis or is it a one-time action? ———————–
–Frank
http://www.insidesql.de
———————–

I have indexes on some other field in table2 lets say x field of table2 and the index is non-clustered. And yes, this is being done on regular basis, once in a month. And Sorry for posting it in different forums too, as i didnt knew the perfect place. Would take care next time. -Ashish
Sorry forgot to say that there is no index on the table1. -Ashish
I think you need some index in table2.
See execution plan to find out query cost.
Luis Martin
Moderator
SQL-Server-Performance.com All postings are provided “AS IS” with no warranties for accuracy.
Mr. Luis, I am very new to SQL Server, and i dont know its sql basics, as i come from Oracle, but can u please say me what would u suggest if u have to update this kind of table structure for the quick update in table2. I mean, what indexes would u keep and on what tables, and what would be the best updating procedure or process or statement u would create. This is the second time in my life i am comming to MS SQL Server. So, please if u can suggest me the best option which u would choose to update such kind of table. Thanks
Ashish.
No problem. Open SQL Analyzer (SQL tools) copy your query into SQL Analyzer. Be shure to choose correct database (default is master).
In Query menu you will find Index Tuning Wizard. Run it and see what indexs suggest to add.
Also you can run execution Plan to see query cost in grafic way. May be there is a table scan on large table with high cost.
Luis Martin
Moderator
SQL-Server-Performance.com All postings are provided “AS IS” with no warranties for accuracy.
From what I can see, just for this operation, your optimal indexes would be: Unique Clustered index on table 1 – a,b,c in that order.
Unique Clustered index on table 2 – a only – do not index column b and c as you are updating table 2 columns b and c so you could cause the data to shift around. This should give you your fastest possible update. You are joining on column a, so having that at the beginning of your clustered index is important. a,b,c on table 1 also helps as it then becomes a covering index for the update meaning that all the data is presented to the update in the exact order it required it. It wouldn’t actually matter if it was a,c,b instead though really. Let me know how you get on. If you can’t make the clustered indexes unique, then non unique clustered index is your second best choice. Dave Hilditch.
Thanks Mr. Dave Hilditch for ur reply. I am able to make unique cluster indexes on table 1. Let me try doing this, and i would definately get back to u. Thanks a lot once again.
-Ashish
Thanks Mr. Luis, as ur reply also made a very important contribution to increase my knowledge in SQL Server, and hope combining ur’s and Mr. Daves answer would give me good results. -Ashish
I agree with Dave, but I also suggest to follow my post. The idea is to get experience in SQL tools.
Luis Martin
Moderator
SQL-Server-Performance.com All postings are provided “AS IS” with no warranties for accuracy.
Definately Sir, I have started using the SQL index tuning and it also recomended me the same approach which Mr. Dave has derived above. I appriciate ur help and its been great experience using this site and its forum. This is the first time i am comming across to such site and i am very excited comming up here again. -Ashish

Ashish, My suggestion is not to create any index on table 1. This table is very small and probably SQL Server will scan it first and search for the corresponding record in table 2. So even if you create an index on it it will not be used.
As for table 2, because you update this table, it is better not to create an index that includes the updated column, that is b and c, because this index will be updated as well – which processing time. Create a clustered/non-clustered index only on the join column a and thus the search will be very quick. Because only about 1000 records are to be updated (unless you have duplicates in table 2) this might go very quick and you probably don’t need a clustered index on this column. The decision wether to create clustered or non-clustered on column a in table 2 depends on the other queries on that tables. But non-clustered is the minimum you need for this query to run efficiently. Also change you query for the ANSI-92 syntax:
update Table2
set b = Table1.b,
c = Table1.c
from Table1 join Table2 on
Table1.a = Table2.a
In addition, check the plan in Query analyzer to see if the indexes you created are used and what type of join is being executed.
I would like to know how it goes.
Aviel Iluz
Database Administrator
Pacific Micromarketing
Melbourne Australia
www.pacmicro.com.au

SELECT t2.x,t2.a,t1.b,t1.c,t2.d,t2.e,t2.f
into t3
FROM table2 t2
LEFT JOIN table1 t1 on t2.a=t1.a This worked very excellently for me for the above question. Thanks,
Ashish

Also, in my experience with updating large tables on a regular basis, you would want to add a WHERE condition so that you are not updating rows that are already up-to-date, such as the following: update Table2
set b = Table1.b,
c = Table1.c
from Table1 join Table2
on Table1.a = Table2.a
where Table2.b <> Table1.b
or Table2.c <> Table1.c
You would need to add additional conditions if nulls are allowed. Any performance increase would depend on the number of records that would be ruled out by this check.
-Hartmut5
Your main problem is that you are updating too many rows at a time. SQL Server creates an implicit transaction for each separate query (if transaction is not already started). Therefore, all changes made by the query must be persisted in the transaction log which makes your log HUGE and takes A LOT OF TIME to complete.
IMHO splitting the update to, say 1000 rows at a time, will boost the performance tremendously. You can use a condition on the Table1.a column (e.g. Table1.a between Value1 and Value2) to split the query and execute several hundred/thousand queries to update the whole table.
update Table2 set b = Table1.b, c = Table1.c from Table1, Table2
where Table1.a = Table2.a and Table1.a between @Value1 and @Value2
You must have an index on Table1.a and Table2.a to achieve better results.
Regards,
Muncho

]]>