Insert a Billion rows fastest | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

Insert a Billion rows fastest

Hopefully the title should wet your appetite. I am interested in methods people might use. We have a multithreaded process which on completion inserts about a billion rows into a table. The multithreaded process is run on a grid of 16 machines. Our best rate is around 250,000 rows/sec. This is done by passing a blob, extracting the data using a table valued function and inserting into our table. We have tried bcp and this is slower. The table looks like the following: /****** Object: Table [dbo].[PV] Script Date: 05/10/2007 13:19:02 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[PV](
[TradeID] [int] NOT NULL,
[SimulationID] [int] NOT NULL,
[PV] [float] NOT NULL,
CONSTRAINT [PK_PV] PRIMARY KEY CLUSTERED
(
[SimulationID] ASC,
[TradeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
One bizarre thing is that performance decreases without the PK. I expected the oppostie. Any suggestions would be greatly appreciated. Thanks
Jamie…..
Just a quick question, as I am still amazed at the 1,000,000,000 rows load. [<img src=’/community/emoticons/emotion-1.gif’ alt=’:)‘ />]<br />Have you tried clustering on an IDENTITY column?<br /><br />–<br />Frank Kalis<br />Moderator<br />Microsoft SQL Server MVP<br />Webmaster:<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a>
HAv eyou checked what happens on a single machine? Note that this is not the Clustering zone of the forums.[<img src=’/community/emoticons/emotion-5.gif’ alt=’;)‘ />]
Hi Frank, Thanks for your reply. We have not tried clustering on an IDENTITY column. To be clear. Do you mean having something like the following? /****** Object: Table [dbo].[PV] Script Date: 05/10/2007 13:19:02 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[PV](
ID int IDENTITY,
[TradeID] [int] NOT NULL,
[SimulationID] [int] NOT NULL,
[PV] [float] NOT NULL,
CONSTRAINT [PK_PV] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by Adriaan</i><br /><br />HAv eyou checked what happens on a single machine? Note that this is not the Clustering zone of the forums.[<img src=’/community/emoticons/emotion-5.gif’ alt=’;)‘ />]<br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br /><br />Hi Adriaan,<br /><br />Thanks for your reply.<br />The perf with a single machine is too slow. The db server is a 4-way dual core so when you do it with a single machine only one cpu is active. We really want to beast it and get the data in.
http://blogs.msdn.com/sqlperf/archive/2007/02/16/high-volume-update-performance.aspx fyi on the subject.
At the same time have you checked the contention on TEMPDB & Transaction log of this database?
http://www.microsoft.com/technet/prodtechnol/sql/2005/tsprfprb.mspx is the one I always refer as a start. Satya SKJ
Microsoft SQL Server MVP
Writer, Contributing Editor & Moderator
http://www.SQL-Server-Performance.Com
This posting is provided AS IS with no rights for the sake of knowledge sharing. Knowledge is of two kinds. We know a subject ourselves or we know where we can find information on it.
Note that in your last script (SimulationId, TradeId) no longer has a unique constraint. The identity column doesn’t have to be the Primary Key: you can create a unique clustered index on the identity column, then add a Primary Key on (SimulationId, TradeId). If there is already a clustered index on the table, the primary key is nonclustered.
quote:Originally posted by jamie.downs Hi Frank, Thanks for your reply. We have not tried clustering on an IDENTITY column. To be clear. Do you mean having something like the following? /****** Object: Table [dbo].[PV] Script Date: 05/10/2007 13:19:02 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[PV](
ID int IDENTITY,
[TradeID] [int] NOT NULL,
[SimulationID] [int] NOT NULL,
[PV] [float] NOT NULL,
CONSTRAINT [PK_PV] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Yes. —
Frank Kalis
Moderator
Microsoft SQL Server MVP
Webmaster:http://www.insidesql.de
Thankyou guys for all replies. I need some time to give them ago. I’ll let you know how I get on. Cheers.
To get the good performance with BCP, destination table should not have the index and SELECT INTO/BULK COPY option should be enabled… If remember correctly MS demostrated in SQL PASS of 2004 or 2005 with SSIS package inserted 10 million rows in 4-5 seconds…..
MohammedU.
Moderator
SQL-Server-Performance.com All postings are provided “AS IS” with no warranties for accuracy.

FWIW,http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/incbulkload.mspx
Frank Kalis
Moderator
Microsoft SQL Server MVP
Webmaster:http://www.insidesql.de
]]>