SQL Server Performance

INT Vs NUMERIC(9, 0) Vs BIGINT

Discussion in 'General Developer Questions' started by amitm79, Feb 24, 2005.

  1. amitm79 New Member

    Hi,
    I just thought of comparing the statistics of one of my tables if i change the primary key column from INT to NUMERIC or BIGINT. I am getting best response time with BIGINT followed by NUMERIC(9, 0) AND INT. My perception was that INT will be the fastest. Can anybody explain the reason behind this performance improvement? The percentile based on INT as 100 is below
    INT NUMERIC BIGINT
    100 60 47

    I see the change as tremendous and its important for me to find why so?

    Regards,
    amitm79
  2. mmarovic Active Member

    What exactly have you tested? And how? Have you cleaned the cache before each test started?
  3. FrankKalis Moderator

    Hm, considering the raw size of each datatype the result isn't suprising at all.
    Why are you comparing this anyway? The only reason for a change like yours I can imagine is when you run out of the scope on an INT value or when you know in advance that you need to store more than 2 billion rows in that table. Since the table is already existing, the latter doesn't apply here.

    ???

    --
    Frank
    http://www.insidesql.de
  4. mmarovic Active Member

    Frank, int was reported to be the slowest one, that is surprising.
  5. amitm79 New Member

    Hi All,
    Please find a detailed reply for all your questions and the justification of why the results were like that.

    mmarovic
    i did DBCC FREEPROCCACHE & DBCC DROPCLEANBUFFERS before each query. I even executed a CHECKPOINT. Lastly, i restarted the server to get confident about the numbers.

    Frank
    I am doing this because i want to keep IP address as a number. The logic behind it is
    Max IP Possible: 255.255.255.255
    i.e. i can use four TINYINT's to store each octet. Better off each byte can be one of the four bytes of an INT. The bottleneck occurred when i couldn't store unsigned int values in database INT type. one workaround was to subtract the offset(2^31) so that unsigned values can be stored as signed. I thought let me see if NUMERIC(9, 0) or BIGINT can help me and what is the overhead in using them. NUMERIC doesn't support bitwise operations so is ruled out. BIGINT has about 10-20% overhead in query performance leaving aside the storage hit. I have around 800GB of data so for me saving every byte is crucial.

    Solution
    When i looked at details from sysindexes for the three tables. I found the number of pages for INT almost double then other two. When i defraged the table and ran my queries again, following was the order in terma of performance.
    INT->BIGINT->NUMERIC(9, 0)

    I hope the discussion was helpfult to all of us.

    Regards,
    Amit
  6. mmarovic Active Member

    Whatever solution you choose make sure you have 80% or lower fill factor for index on ip address.
  7. amitm79 New Member

    Hi mmarovic,
    Could you please elaborate on why you want to keep fill factor of 80% or less? My table will receive data inserts every 15 minutes for 24 hours and then it will be read only for 30 days.

    Regards,
    amitm79
  8. FrankKalis Moderator

    quote:Originally posted by mmarovic

    Frank, int was reported to be the slowest one, that is surprising.
    Oops, guess I misread this question [:I]

    --
    Frank
    http://www.insidesql.de
  9. FrankKalis Moderator

    Amit, <br />didn't we have this issue before? [<img src='/community/emoticons/emotion-5.gif' alt=';)' />]<br /><br /><br />--<br />Frank<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />
  10. mmarovic Active Member

    quote:Hi mmarovic,
    Could you please elaborate on why you want to keep fill factor of 80% or less? My table will receive data inserts every 15 minutes for 24 hours and then it will be read only for 30 days.
    Hi amitm79,
    I assumed that your inserts are not done in IP number order. If so you may expect a lot of page splits if you don't have enough free space in index on ip address column. I would rebuild index before inserts to be 80% or lower (down to 50%) and after they are done you can rebuild it again without fill factor (you can set it even to 100 with pad index).
  11. amitm79 New Member

    Hi mmarovic,
    You are right. I faced this exact situation wherein because of IP's, many page splits are happening. This can be eliminated by sorting the data file before bcping into the database. The other option is the one you suggested above that i reindex after data insert.

    Frank,

    We had it before between int and bigint but didn't reach much conclusions Moreover this time it was not a quetion but an observed behaviour. If you read the whole thread then we have reached a definitive decision/conclusion. I apologize if you think its a repeat.

    Regards,
    amitm79
  12. FrankKalis Moderator

    No no, that's okay with me. I actually thought you've solved your problem by now. <br />And still I think using a VARBINARY(16) or even a VARCHAR(16) is a good choice. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br /><br />--<br />Frank<br /<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a><br />
  13. mmarovic Active Member

    Do you populate your table from the scratch truncated it before inserts start or you just add new ip addresses?
  14. amitm79 New Member

    Hi mmarovic,
    i add new addresses every 15 minutes. The process continues for one day. After one day the table is read only for an entire month.

    Regards,
    amitm79
  15. mmarovic Active Member

    I understand that you load all data during one single day in a month. What I don't know is:
    Do you truncate previous months data and start inserting in empty table or
    you just insert additional addresses on top of what you have already had there?
  16. amitm79 New Member

    Hi mmarovic,
    i do inserts every 15 minutes for one day. New table is created for each day of month and is dropped and recreated if i reach the same day after one month. I am horizontally partitioning my data.

    Regards,
    amitm79
  17. mmarovic Active Member

    Ok, then setting fillFactor before inserts is irrelevant. In case nobody reads data the day you load them I wouldn't create any index before load is complete. After that I would create all indexes with fill factor 100 and pad index.
  18. amitm79 New Member

    Hi mmarovic,
    I can't disable indexes as it might result into invalid entries getting into the table. One solution of it can be having a staging area for purifying the data but its not possible in my implementation.

    Only thing i wish to do is reindex after each day with a fillfactor of 100. Yeah adding pad index on top of it makes sense to me. I am not getting maintainence window for even that task. Moreover even if i get a maintainence window, i have about 50 such daily tables to be reindexed.

    As far as i have noticed, time taken to reindex is directly proportional to fragmentation. It will be good if i can forecast the size of the maintainence window but i am not sure how to calculate the time required.

    Regards,
    Amit

Share This Page