SQL Server Performance

updating a table with huge data source

Discussion in 'General Developer Questions' started by v1rtu0s1ty, Jun 10, 2004.

  1. v1rtu0s1ty New Member

    Hey guys,<br /><br />I'm 3 months old now with doing some select, insert, update adhoc queries. I am still learning but a little confident now with some queries. Every end of the month, business sends me a .csv file that I import from Enterprise Manager. Average Total rows is 110,000 rows. I update our target table using inner join. It updates about 45,000 rows in 1 minute. Some are not updated because of the criteria we used and it is fine.<br /><br />Btw, I always use Query Analyzer to do my update in production.<br /><br />This is the syntax of what I use:<br /><br />begin transaction<br />update targettable<br />set targettable.timeconfirmcolumn = sourcetable.timeconfirmcolumn<br />from sourcetable inner join targettable<br />on sourcetable.key=targettable.key<br />where targettable.site='los angeles' and targettable.timeconfirmcolumn is NULL<br />commit transaction<br /><br />I have no problem with this query and works great. However, this month, they gave me 900,000 rows. I am worried that if I modify targettable using Query Analyzer, my connection will timeout. I have never done anything this huge. What can you recommend me? And also, my query above is working fine but is it the proper or correct way?<br /><br />Any recommendation will be appreciated.<br /><br />Thanks. <img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><br />V1rt.<br /><br />
  2. derrickleggett New Member

    --There's no need for a transaction because it's only one statement.
    --You should put this in a stored procedure since it's the same thing all the time. It will run better.
    --You shouldn't use key or id as the primary key of a column. It should actually id the table (targettable_id or something).

    CREATE PROCEDURE targettable_mthly_update

    @site VARCHAR(55)

    AS

    UPDATE tt
    SET
    tt.timeconfirmcolumn = st.timeconfirmcolumn
    FROM
    sourcetable st
    INNER JOIN targettable tt ON st.key=tt.key
    WHERE
    tt.site=@site
    and tt.timeconfirmcolumn is NULL
    GO



    MeanOldDBA
    derrickleggett@hotmail.com

    When life gives you a lemon, fire the DBA.
  3. v1rtu0s1ty New Member

    I got confused with

    tt.site=@site

    in the where clause

    What does it mean? How do I used the sp now?

    And what about my 900,000 rows? Will Mssql continue to update to the very end without timing out?

    Thanks.


  4. v1rtu0s1ty New Member

    1.) You know what, I just noticed the variable. I thought that the variable there was used in the<br />"on st.key=@site". <img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><br />Now I remember, it's going to be:<br /><br />exec targettable_mthly_update 'los angeles'<br /><br />2.) Regarding the commit transaction. I was taught that if we don't execute commit transaction, data will not be written to the disk. Any comments here?<br /><br />3.) Another question on sp. Can we specify table name as the parameter to a stored procedure?<br /><br />Thanks again.
  5. derrickleggett New Member

    1. You don't have to worry about the query timing out in Query Analyzer. It doesn't timeout.

    2. The commit transaction is true. However, SQL Server automatically begins and commits any transactions that don't have a BEGIN TRANSACTION and COMMIT TRANSACTION in them. It commits one statement at a time until it reaches the end of the batch.

    3. No, unless you use dynamic SQL, which you should avoid if at all possible.

    MeanOldDBA
    derrickleggett@hotmail.com

    When life gives you a lemon, fire the DBA.
  6. v1rtu0s1ty New Member

    and also, I see some sp executed this way:

    exec sp_test N'something', N'anotherthing'

    what is N there?

    Thanks.

    V1rt
  7. satya Moderator

    When you search for a Unicode string in a WHERE clause it places the N character before the search string.

    Satya SKJ
    Moderator
    http://www.SQL-Server-Performance.Com/forum
    This posting is provided “AS IS” with no rights for the sake of knowledge sharing.
  8. x002548 New Member

    quote:Originally posted by v1rtu0s1ty

    Hey guys,

    I'm 3 months old now with doing some select, insert, update adhoc queries.


    What does your mommy feed you?

    That's amazing!

    Do your update in batch ranges, say like in 10 updates....





    Brett

    :cool:
  9. v1rtu0s1ty New Member

    if that is your recommendation, how can I break down my 900000 rows into 10 tables?
  10. Raulie New Member

    If you are going to update that many rows you should take precautious measures. <br /><br />First of all how many records does the actual target table contain?<br /><br />Perform this update when there is minimal to zero user activity if possible?<br /><br />If you must perform the update during user activity create a stored procedure and issue COMMITS on every n rows (Not 10) see if you can use the source timeconfirmcolumn to brake the batch down.<br /><br />If you haven't done so already really consider placing tempDb and log on different disks.<br /><br />Temparily drop excessive indexes then recreate after update, only keep indexes that will benifit from update (Like in WHERE clause).<br /><br />Good luck [<img src='/community/emoticons/emotion-2.gif' alt=':D' />]<br /> <br />
  11. v1rtu0s1ty New Member

    Hi Lazy DBA,

    Thanks for the response. I really appreciate.

    I'm fairly new with this but definetly would like to take your route. I am confused though.
    So how would I do it?

    Target table is in millions but doing a criteria, I only need to compare it against 1 million. I did find out however that there are only 300,000 records that match my source table and target table.

    What I am planning to do is to put those matched records into a temporary table. So now, i'm down from 900,000 to 300,000. I would like to create a stored procedure and your recommendation about commits. Actually, an example with commits will really help me and will give me an idea on how I should write mine.

    Thanks again.

    V1rt
  12. Raulie New Member

    I use the code in the sp when ever I want to perform large scale updates. Here is a script where you can practice this. Basically it will create a test table and populate it with 9000000 rows or what ever number you want. It will place a clustered index on the nums column. Test it out and run a Profiler trace on it using SP<img src='/community/emoticons/emotion-7.gif' alt=':s' />tmtcompleted to view the action. Modify and apply the sp to your likeing. Let me know it you need more help. <br /><br />USE TESTDB<br /><br />CREATE TABLE NUM<br />(<br />NUMS INT NOT NULL PRIMARY KEY,<br />DESCR VARCHAR(20) DEFAULT 'HELLO'<br />)<br /><br />SET NOCOUNT ON<br /><br />DECLARE<br />@RC AS INT,<br />@MAX AS INT<br /><br />SET @RC = 1<br />SET @MAX =900000<br /><br />INSERT INTO NUM (NUMS) VALUES(1)<br /><br />WHILE @RC &lt;@MAX<br />BEGIN INSERT INTO NUM (NUMS)<br /><br />SELECT NUMS + @RC FROM NUM<br /><br />WHERE NUMS + @RC &lt;=@MAX<br /><br />SET @RC = @RC +@@ROWCOUNT<br />END<br /><br />GO<br /><br />CREATE PROCEDURE usp_update_everynrows<br />AS<br /><br />DECLARE @MAXID INT<br /><br />SELECT @MAXID = Max(NUMS) FROM NUM<br /><br />DECLARE <br />@RANGOMIN INT, <br />@RANGOMAX INT<br /><br />DECLARE @SEGMENTO INT<br /><br />SET @SEGMENTO = @MAXID / 100<br /><br />SET @RANGOMIN = 1<br />SET @RANGOMAX = @SEGMENTO<br /><br />WHILE @RANGOMIN &lt; @MAXID<br />BEGIN<br /> UPDATE NUM<br /> SET DESCR='BYE'<br /> <br /> WHERE NUMS BETWEEN @RANGOMIN AND @RANGOMAX<br /><br /> SET @RANGOMIN = @RANGOMAX + 1<br /> SET @RANGOMAX = @RANGOMAX + @SEGMENTO<br />END<br /><br />--sp_recomplie usp_update_everynrows<br />--exec usp_update_everynrows
  13. v1rtu0s1ty New Member

    This is great. But where is the 'commit' command?

    Thanks.
  14. Raulie New Member

    <blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by v1rtu0s1ty</i><br /><br />This is great. But where is the 'commit' command?<br /><br />Thanks.<br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br /><br />Ah, my fellow SQL newbie. SQL Server automatically commits unless you code a BEGIN TRANSACTION which leaves the operation open until you run a ROLLBACK TRANSACTION or a COMMIT (Don't try this at home). [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br /><br />Lazy_DBA
  15. v1rtu0s1ty New Member

    Ok. That means, if I do execute an UPDATE or INSERT statement without begin transaction, data gets commited to disk at once. But will have no room for ROLLBACK.<br /><br />But if I do start my query with begin transaction, then I did and INSERT or UPDATES, then I had some failures, I can issue ROLLBACK TRANSACTION.<br /><br />Am I correct?<br /><br />Thanks again. <img src='/community/emoticons/emotion-1.gif' alt=':)' /><br />
  16. v1rtu0s1ty New Member

    Hey, I tried it and this is what I got: <img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><br />Server: Msg 3903, Level 16, State 1, Line 1<br />The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION.<br /><br />I can say that it is better to get used to writing queries with BEGIN TRANSACTION. This way, if I get errors, I can rollback.<br /><br />Thanks guys.
  17. Raulie New Member

    Correct something like this<br /><br />CREATE PROCEDURE usp_update_studentgrade<br />AS<br />SET NOCOUNT ON<br />DECLARE @ErrorSave int<br /><br />BEGIN TRANSACTION<br /><br />UPDATE statment....<br /><br />SELECT @ErrorSave = @@ERROR<br /><br />IF @ErrorSave &lt;&gt; 0 --if problem occurs rollback<br />ROLLBACK TRANSACTION <br />ELSE<br />COMMIT TRANSACTION <br />RETURN @ErrorSave<br /><br />It all depends though rolling back a few hundred thousound records would be a pain.]<br />I highly recommed you do some reading on transaction and that sort of stuff (we all must do it). [<img src='/community/emoticons/emotion-1.gif' alt=':)' />] good luck
  18. v1rtu0s1ty New Member

    That's a very nice example. <img src='/community/emoticons/emotion-1.gif' alt=':)' /><br /><br />What I can do now is, if let say I got 2,000,000 rows to update, I will write a WHILE loop then at every 500,000, I will track the @ErrorSave variable, then either execute ROLLBACK if there was a failure or commit if there were no failures.<br /><br />What do you think about that?<br /><br />Thanks.
  19. Raulie New Member

    I recommend you brake the while loop down this will reduce chances of an error occurring.

    I would test my stategy on smaller datasets with and with out indexes and index on the where clause will gain performnce boost. Monitor how much time the different approaches take. Remember to recompile the stored procedure to gain accurate results, once you have a good plan then run on production system. Remember your goal is to have the best performance while holding locks to a minimum.
  20. v1rtu0s1ty New Member

    quote:Originally posted by Lazy_DBA

    I recommend you brake the while loop down this will reduce chances of an error occurring.

    Hi Lazy_DBA,

    What do you mean exactly by the word "brake"? Sorry. Did you mean that I should stop writing loops?
  21. Raulie New Member

    I meant commit your updates on smaller datasets not 500000 I know the use of the word "brake" when refering to a while loop through you off. [<img src='/community/emoticons/emotion-2.gif' alt=':D' />]<br /><br />
  22. gaurav_bindlish New Member

    Doing a transaction log backup between batches will also help as it willr educe the log space used on the disk.

    Gaurav
    Moderator
    Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard

    The views expressed here are those of the author and no one else. There are no warranties as to the reliability or accuracy of anything presented here.
  23. v1rtu0s1ty New Member

    Gotcha! Assuming we are doing an update, one of the record we were trying to update between row 100 and 150 failed say (field was only accepting 10 characters and I was inserting 20 characters), what should I do with that error? Should I put that row in a temp table then continue with the loop? Or will you just rollback instead? What's the best practice?

    Thanks again.
  24. gaurav_bindlish New Member

    Depends on the needed consistency of data. If it is a financial transaction I would rollback the whole thing.<br /><br />Hey BTW if it is a banking system, let me know, I'll open an account there. Hope somebody accidently credits 1,000,000 dollars in the same and while rolling back, error happnes on my record.... [<img src='/community/emoticons/emotion-5.gif' alt=';)' />] See what I am saying?<br /><br />Gaurav<br /><i>Moderator<br /><font size="1">Man thrives, oddly enough, only in the presence of a challenging environment- L. Ron Hubbard</i></font id="size1"><br /><font size="1">The views expressed here are those of the author and no one else. There are no warranties as to the reliability or accuracy of anything presented here.</font id="size1">

Share This Page