SQL Server Performance

Convert INT to BIGINT on a 900 million row table

Discussion in 'General DBA Questions' started by ndinakar, Apr 18, 2006.

  1. ndinakar Member

    We have a table A which has an Id column of type INT. There are currently about 900 million rows in the table and growing at 100,000,000 rows a month. We will hit the limits of INT pretty soon at this rate. So I was asked to study the different options to convert the INT to BIGINT for the table. Conditions: (1) The downtime should be the least (2) the method should be 100% reliable and no unknown factors going into the rollout. I did my research and came up with a couple of methods:

    (1) ALTER table ALTER COLUMN
    Notes: this has been tried and we had a huge server outage. it was awful)
    (2) create a new column in the same table. repopulate the column with existing data using a job. modify all the stored procs/business objects to update/insert the new column instead of the old one. Drop the old column. rename the new column back to what it was ( or leave it as is). Recreate the indexes/FK constraints with the new column instead of the old column.
    Notes: Requires lot of Space and consumes time. Also, drop column will take lot o ftime.

    (3) create a new table B with BIGINT datatype. BCP the data from A into B. create all the indexes. drop the old tables. rename the new table to the old one.

    I am suggesting method 3 as it requires least number of operations => less changes (and less places to) of screw up and less complicated.

    If anyone else who has done this before can provide some feedback/do's and dont's etc I would apprecaite it. Please keep the table size in mind.

    our PROD server env:
    Windows 2003 Data Center Edition, 32 64 bit processors, SQL 2000 x64 edition, 64 GB RAM and DB size = 4 TB. The 2 tables in question about couple of gigs each.

    Dinakar Nethi
    Life is short. Enjoy it.
  2. Haywood New Member

    I'd do #3 as well, but split the bcp data into multiple files so you could do a parallel-data load. S'bout my only .02¢
  3. ndinakar Member

    Yes. I was thinking of the same thing. Create multiple bcp files. Was wondering if there are any cons and if there is a better way to do it?

    Dinakar Nethi
    SQL Server MVP
  4. ndinakar Member


    Dinakar Nethi
    SQL Server MVP
  5. joechang New Member

    bcp out to 4-6 files, the bcp or bulk insert in parallel is fastest,
    only concern is bcp/bulk insert errors.
    ie, there some symbol you can safely use as column delimiter (usually |) that is not used in any of the columns

    also, if your columns has the ascii char for new line (
    ), then bcp would have problems
  6. Luis Martin Moderator

    I'm with Joe in this case.

    Luis Martin

    Although nature commences with reason and ends in experience it is necessary for us to do the opposite, that is to commence with experience and from this to proceed to investigate the reason.
    Leonardo Da Vinci

    Nunca esperes el reconocimiento de tus hijos, eso ocurrirá luego de tu muerte

    All postings are provided “AS IS” with no warranties for accuracy.

  7. ndinakar Member

    Yes. I was thinking of the same thing. Write a VB tool to create 5-10 threads and bcp out chunks of data. paralelly another 5-10 threads can read the files and bcp in into the table.

    The issue is the Identity column. The column in question is an Identity column. When we bcp into the new table we should retain the ID's of the old table. So the new table should have Identity Insert OFF initially when we do the transfer. After the transfer, we can turn it on. Would that work?

    Thanks for the suggestions.

    Dinakar Nethi
    SQL Server MVP
  8. Adriaan New Member

    Not sure you have the right syntax in mind ... also not sure how this would work in a bcp command, but in a straight T-SQL script it goes something like this:

    /* This allows you to insert existing values into the identity column of table B.*/

    INSERT INTO B (IdentityColumn, DataColumn1, DataColumn2)
    SELECT IdentityColumn, DataColumn1, DataColumn2
    FROM A

    /* Returns the identity column of table B. */
  9. FrankKalis Moderator

  10. ndinakar Member

    Hi Adriaan

    how can you set the identity on and insert?
    It will not let you specify a value for the identity column right?

    Dinakar Nethi
    SQL Server MVP
  11. Adriaan New Member

    My previous posting has the exact syntax you need to insert existing values into an existing identity column, at least within the regular T-SQL query syntax. (Not sure how it would work in BCP.)

    I used fake names for the columns and tables.
  12. ndinakar Member

    Like Frank mentioned the -E is the key. We can have the identity insert ON on the table and use the -E parameter. It allows to keep the existing identity values.

    I will try a couple of runs and see how it works out. I also have timestamp columns in the table and probably have to use a format file too.

    Dinakar Nethi
    SQL Server MVP

Share This Page