SQL Server Performance

INSERT (Col & Sort Order)

Discussion in 'General Developer Questions' started by amitm79, Aug 23, 2004.

  1. amitm79 New Member

    Hi All,
    I have a doubt regarding inserting records in a table. I have read this in many OLAP softwares (Essbase is one) wherein the order of the columns and the there respective sort orders does make a difference while loading into a table. Does that hold for OLTP/SQL Server also? I mean if i have a table with a primary key on ID COlumn, will it make any difference if i sort my records (to be inserted) in ascending order?

    Regards,
    amitm79
  2. FrankKalis Moderator

    First of all, ordering in a relational database is somewhat meaningless, unless you're about to retrieve the data with an ORDER BY. Next, you must make a split between SQL Server's relational engine and its storage engine. For the storage engine it does make sense to have a certain kind of ordering within the columns of a table, however, for the relational engine its almost all the same.
    If the value of your ID column is of any meaning in for app, you'll have to insert the rows sorted in the way you need or, might even be better, insert with IDENTITY_INSERT ON.

    Hope I understand you right?!?


    --Frank
    http://www.insidesql.de
  3. amitm79 New Member

    Hi Frank,
    I couldn't understand the point you want to say about Relational engine and storage engine. I am not getting the role of IDENTITY_INSERT in this context. Let me clarify the example more. My ID is a unique number which may not be a running number (as in the case of IDENTITY). If i have index on my ID column, newer rows may not necessarily be added at the end, it can be in the middle also. That is where i was thinking that sorting can help, i believe if i send the rows sorted on ID it should be fast.

    Regards,
    amitm79
  4. FrankKalis Moderator

    If you have an index on your ID column and insert new rows, it depends on what kind of index this is.

    Case A Clustered Index:
    When you create a clustered index, the physical ordering matches the clustered index keys. New data is always appended at the end of the file. No matter if the key values might be somewhere in between. To ensure consistency, SQL Server maintains the page chain, a doubly linked list.
    In an ideal world, SQL Server can read the data in the order of the index just by scanning of certain range of the disk. No unnecessary roundtrips are done.

    Case B Nonclustered Index:
    That's what you have described above. Data is inserted where there is enough space. Not necessarily at the end. However, SQL Server does also keep track in form of a doubly linked list. The point is that, consider this an extreme example: One record in at the beginning of the file, the second at the end, the rest somewhere in between. Now, to retrieve the rows in the order of the index, the disk heads must rotate from beginning to end and back and need to do maybe unnecessary roundtrips.


    So, even if you insert in a sorted manner, you can't be sure that it's physically also sorted in that way. The only point in time, I know of, when the physical storage matches the logical is when you (re)create an clustered index.

    --Frank
    http://www.insidesql.de
  5. amitm79 New Member

    Hi Frank,
    Are we trying to say that the column/sort order won't make any difference in SQL Server?

    Regards,
    amitm79
  6. FrankKalis Moderator

    The column actually makes a difference. Not much and mostly not noticed, but it does. The storage engine rearranges the column if necessary you specify by your CREATE TABLE state,emt. First comes the fixed length, then variable lengths NOT NULL, last variable lengths NULLable. Unless you haven't defined it that way in your CREATE TABLE statement, SQL Server's storage engine need to keep track of its rearrangement. IIRC, that's a cost of some bits or bytes per row. So, not much at all, but still there. But that's the storage engine. The relational engine is something completely different, and the one you should be concerned about.

    As for sorting:
    Any basic book on relational databases will tell you that such systems work with set of data at a time. And that within these sets there is no meaningful sorting unless you retrieve data with an ORDER BY clause, which is the only reliable way to produce the resultset in the way you expect it.

    But I suspect we're talking about two different things, right?


    --Frank
    http://www.insidesql.de
  7. amitm79 New Member

    Hi Frank,
    Speaking frankly i am not getting your point. May be i am not upto that level that i can understand it. Still your replies have made me think about the internals, i will surely try to read about them. Can i put my question in another way "How to ensure best insert performance?"

    Regards,
    amitm79
  8. FrankKalis Moderator

    Okay, then I missed your point somewhat completely.
    Speaking for myself, I use bcp to insert large amounts of data and build the indexes thereafter. You might also want to have a look at DTS to do this, but i can't help you with DTS, since I don't use it.

    Is this getting more in your direction?


    --Frank
    http://www.insidesql.de
  9. derrickleggett New Member

    Hi All,<br />I have a doubt regarding inserting records in a table. I have read this in many OLAP softwares (Essbase is one) wherein the order of the columns and the there respective sort orders does make a difference while loading into a table. Does that hold for OLTP/SQL Server also? I mean if i have a table with a primary key on ID COlumn, will it make any difference if i sort my records (to be inserted) in ascending order?<br /><br />Regards,<br />amitm79<br /><br /><br />There are some instances where it can make a difference. The one you pointed out is a good case. The ID column index will try to keep the numbers in order inside the index itself. If your existing table has rows 1-5000 minus 200-250 and you insert 200-250, you will have a fragmented index when the operation is completed. If you insert rows 5001-6000 though in order, there is no fragmentation at all. You've just extended the existing index. <br /><br />If you really want optimal performance though, you do what Frank said. You drop the indexes and bcp in the data, then reapply the indexes. If you're inserting millions of rows, this is faster because you're not writing the data 2 or more times for each index. Make sense?<br /><br />Feel free to correct me on this everyone. The index fragmentation/page storage and building stuff is always a little foggy to me. I normally look it up each time. <img src='/community/emoticons/emotion-1.gif' alt=':)' /> I just don't have time right now.<br /><br />MeanOldDBA<br />derrickleggett@hotmail.com<br /><br />When life gives you a lemon, fire the DBA.
  10. FrankKalis Moderator

    Since I found this article<a target="_blank" href=http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ss2kidbp.mspx>http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ss2kidbp.mspx</a> I have read it many times, as I also can't remember this fragmentations thingy. Anyway, one doesn't need to know everything, you just need to know where to look at when you need to.<br /><br />[<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br /><br />--Frank<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />
  11. amitm79 New Member

    Hi Derrick & Frank,
    The information provided by you has been quite useful. I agree that bcp followed by recreation of indexes gives best performance. I will try to put it whereever i can. Thanks to you guys for your cooperation.

    Regards,
    amitm79

Share This Page