SQL Server Performance

INSERT INTO vs SELECT ... INTO

Discussion in 'T-SQL Performance Tuning for Developers' started by arb, Jun 4, 2003.

  1. arb New Member

    While doing some performance testing with a stored procedure I am working on, I decided to try using a temp table in an effort to speed up part of the stored proc.

    My first step was to write a query that extracted the data I wanted to put in the temp table, and ran it a few times to make sure a) it got the right data and b) it was pretty fast. FWIW, the timing code I used is:



    declare @_starttime datetime, @_sectionstart datetime, @_sectionend datetime
    set @_starttime = getdate()

    set @_sectionstart = getdate()

    -- Code to time goes here...

    set @_sectionend = getdate()
    print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
    print 'Section: ' + convert(varchar, datediff(ms, @_sectionstart, @_sectionend))
    print 'Elapsed: ' + convert(varchar, datediff(ms, @_starttime, @_sectionend))
    print '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~'
    set @_sectionstart = getdate()

    This showed the query as taking about 30 ms to execute. Now I thought it would be a simple matter of sticking an INSERT INTO #tmp in front of the query and everything would be fine...

    Naturally I created my temp table first, and ran the code with the INSERT INTO in place. I ran the code a couple of times to allow for caching/compiling/other system performance/etc. I expected the INSERT to take a bit longer than the SELECT, but was amazed to see it now took 1500ms to execute this block! I duplicated the block of code, so now I had the straight SELECT and the INSERT INTO - the SELECT took ~30ms and the INSERT took ~1500ms.

    As a test, I tried removing the INSERT INTO and CREATE TABLE and turning the query into a SELECT ... INTO query. This is where things started to get strange - SELECT ... INTO only took 160ms?!

    Now I was confused, so I tried a few more tests, this time creating the temp table, doing an INSERT INTO with a field list (INSERT INTO #tmp (field1, field2...) ), truncating the table and doing a second INSERT INTO without a field list and finally doing the SELECT ... INTO (into a different table of course).

    The first INSERT INTO again took around 1500ms, while the second INSERT INTO took only 60ms. The SELECT ... INTO was consistently 160ms. I then tried swapping the order of the INSERT INTOs so the one without the field list was first, yet I got the same result - the first time an INSERT INTO was executed, it took about 1500ms.

    Trying to get clever, I tried doing a SELECT ... INTO #tmp, truncating the table then an INSERT INTO #tmp - yet the INSRT INTO took about 1500ms again. There goes the theory I was working on that the huge time difference was caused by the table being created only on the first access.

    Time and again I tried different combinations with the same result - the first INSERT INTO always took around 1500ms, while the second (and subsequent) INSERT INTO took only 60ms. About the only reason I can come up with is that INSERT INTO causes the query processor to worry about datatypes, whereas SELECT ... INTO doesn't care - the datatypes will "fall out" of the underlying SELECT query. After the first INSERT INTO, the datatypes have been determined and maybe cached which would explain the dramatic speed up.

    Can anyone help me understand what is going on here? I try to avoid temp tables like the plague, but sometimes they are necessary and can be very useful. Where I do use temp tables, I invariably opt for INSERT INTO to avoid potential locking problems with tempdb, yet if the performance difference is so substantial, sometimes SELECT ... INTO might be more appropriate.

    Why the big difference between INSERT INTO and SELECT ... INTO?

    As it turned out, I revisited the original query and was able to use derived tables to avoid the temp tables altogether.

    NOTE: Execution plans and IO statistics were identical for each query. These results were all obtained by running the queries in Query Analyser, not as a stored procedure.
  2. kate_for_u New Member

    Hi

    the select into statement
    is actually a mix of 3 stmts
    create , select and into
    Hi hope this answers ur question
  3. kate_for_u New Member

    quote:Originally posted by kate_for_u

    Hi

    the select into statement
    is actually a mix of 3 stmts
    create , select and INSERT
    Hi hope this answers ur question
  4. bambola New Member

    The difference is in logging. As far as I know SELECT INTO is less logged then INSERT INTO and therefore quicker.

    Bambola.
  5. satya Moderator

  6. arb New Member

    quote:Originally posted by satya

    Refer to this link http://www.sql-server-performance.com/nb_avoid_bottlenecks.asp] also.

    This article (and other information similar to this) is why I was trying to avoid using SELECT ... INTO, although in this case the server is running 7.0, so the tempdb locking should not be a problem - it is just a habit I got into a while back to avoid SELECT ... INTO.

    BTW, I tried doing the SELECT ... INTO ... WHERE 1=0 to create the table, then INSERT INTO to populate it, yet the results are still the same - the first time I try an INSERT INTO, it takes a LOT longer.

    As far as logging goes - I am fairly certain we have the SELECT INTO/BULK COPY flag is cleared, so the SELECT INTO AND INSERT INTO statements should both be logged the same, but I will double check that no-one has changed this option on our servers. Even if SELECT INTO/BULK COPY is set, that wouldn't explain why subsequent INSERT INTO statements (even inserting different data) are so much (ie, orders of magnitude) faster.
  7. rushmada New Member

    This may be because Select into is a minimal logged operation, where as
    Insert into is a fully logged in the transaction log.

    Thanks

    Rushendra
  8. bambola New Member

    Try this on you DEV server.

    Restrict log growth on a test database to 1MB.

    Run an INSERT INTO with a table big enought to generate an error ("The log file for database '....' is full")

    Now do the SELECT INTO on the same table and it will run with no problems.
    both in full and in bulk recovery.

    So it is a logging issue.

    Bambola.

Share This Page