SQL Server Performance

Insert a Billion rows fastest

Discussion in 'SQL Server 2005 T-SQL Performance Tuning' started by jamie.downs, May 10, 2007.

  1. jamie.downs Member

    Hopefully the title should wet your appetite.

    I am interested in methods people might use. We have a multithreaded process which on completion inserts about a billion rows into a table.

    The multithreaded process is run on a grid of 16 machines.

    Our best rate is around 250,000 rows/sec. This is done by passing a blob, extracting the data using a table valued function and inserting into our table.

    We have tried bcp and this is slower.

    The table looks like the following:

    /****** Object: Table [dbo].[PV] Script Date: 05/10/2007 13:19:02 ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[PV](
    [TradeID] [int] NOT NULL,
    [SimulationID] [int] NOT NULL,
    [PV] [float] NOT NULL,
    CONSTRAINT [PK_PV] PRIMARY KEY CLUSTERED
    (
    [SimulationID] ASC,
    [TradeID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    GO
    One bizarre thing is that performance decreases without the PK. I expected the oppostie.

    Any suggestions would be greatly appreciated.

    Thanks
    Jamie.....
  2. FrankKalis Moderator

    Just a quick question, as I am still amazed at the 1,000,000,000 rows load. [<img src='/community/emoticons/emotion-1.gif' alt=':)' />]<br />Have you tried clustering on an IDENTITY column?<br /><br />--<br />Frank Kalis<br />Moderator<br />Microsoft SQL Server MVP<br />Webmaster:<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a>
  3. Adriaan New Member

    HAv eyou checked what happens on a single machine? Note that this is not the Clustering zone of the forums.[<img src='/community/emoticons/emotion-5.gif' alt=';)' />]
  4. jamie.downs Member

    Hi Frank,

    Thanks for your reply. We have not tried clustering on an IDENTITY column.

    To be clear. Do you mean having something like the following?

    /****** Object: Table [dbo].[PV] Script Date: 05/10/2007 13:19:02 ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[PV](
    ID int IDENTITY,
    [TradeID] [int] NOT NULL,
    [SimulationID] [int] NOT NULL,
    [PV] [float] NOT NULL,
    CONSTRAINT [PK_PV] PRIMARY KEY CLUSTERED
    (
    [ID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    GO
  5. jamie.downs Member

    <blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by Adriaan</i><br /><br />HAv eyou checked what happens on a single machine? Note that this is not the Clustering zone of the forums.[<img src='/community/emoticons/emotion-5.gif' alt=';)' />]<br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br /><br />Hi Adriaan,<br /><br />Thanks for your reply.<br />The perf with a single machine is too slow. The db server is a 4-way dual core so when you do it with a single machine only one cpu is active. We really want to beast it and get the data in.
  6. satya Moderator

    http://blogs.msdn.com/sqlperf/archive/2007/02/16/high-volume-update-performance.aspx fyi on the subject.
    At the same time have you checked the contention on TEMPDB & Transaction log of this database?
    http://www.microsoft.com/technet/prodtechnol/sql/2005/tsprfprb.mspx is the one I always refer as a start.

    Satya SKJ
    Microsoft SQL Server MVP
    Writer, Contributing Editor & Moderator
    http://www.SQL-Server-Performance.Com
    This posting is provided AS IS with no rights for the sake of knowledge sharing. Knowledge is of two kinds. We know a subject ourselves or we know where we can find information on it.
  7. Adriaan New Member

    Note that in your last script (SimulationId, TradeId) no longer has a unique constraint.

    The identity column doesn't have to be the Primary Key: you can create a unique clustered index on the identity column, then add a Primary Key on (SimulationId, TradeId).

    If there is already a clustered index on the table, the primary key is nonclustered.
  8. FrankKalis Moderator

    quote:Originally posted by jamie.downs

    Hi Frank,

    Thanks for your reply. We have not tried clustering on an IDENTITY column.

    To be clear. Do you mean having something like the following?

    /****** Object: Table [dbo].[PV] Script Date: 05/10/2007 13:19:02 ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[PV](
    ID int IDENTITY,
    [TradeID] [int] NOT NULL,
    [SimulationID] [int] NOT NULL,
    [PV] [float] NOT NULL,
    CONSTRAINT [PK_PV] PRIMARY KEY CLUSTERED
    (
    [ID] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    GO
    Yes.

    --
    Frank Kalis
    Moderator
    Microsoft SQL Server MVP
    Webmaster:http://www.insidesql.de
  9. jamie.downs Member

    Thankyou guys for all replies. I need some time to give them ago. I'll let you know how I get on.

    Cheers.
  10. MohammedU New Member

    To get the good performance with BCP, destination table should not have the index and SELECT INTO/BULK COPY option should be enabled...

    If remember correctly MS demostrated in SQL PASS of 2004 or 2005 with SSIS package inserted 10 million rows in 4-5 seconds.....


    MohammedU.
    Moderator
    SQL-Server-Performance.com

    All postings are provided “AS IS” with no warranties for accuracy.
  11. FrankKalis Moderator

Share This Page