SQL Server Performance Forum – Threads Archive
Insert a Billion rows fastest
Hopefully the title should wet your appetite. I am interested in methods people might use. We have a multithreaded process which on completion inserts about a billion rows into a table. The multithreaded process is run on a grid of 16 machines. Our best rate is around 250,000 rows/sec. This is done by passing a blob, extracting the data using a table valued function and inserting into our table. We have tried bcp and this is slower. The table looks like the following: /****** Object: Table [dbo].[PV] Script Date: 05/10/2007 13:19:02 ******/SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[PV](
[TradeID] [int] NOT NULL,
[SimulationID] [int] NOT NULL,
[PV] [float] NOT NULL,
CONSTRAINT [PK_PV] PRIMARY KEY CLUSTERED
(
[SimulationID] ASC,
[TradeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
One bizarre thing is that performance decreases without the PK. I expected the oppostie. Any suggestions would be greatly appreciated. Thanks
Jamie…..
Just a quick question, as I am still amazed at the 1,000,000,000 rows load. [<img src=’/community/emoticons/emotion-1.gif’ alt=’
![Smile :) :)](styles/default/xenforo/smilies/smile.png)
HAv eyou checked what happens on a single machine? Note that this is not the Clustering zone of the forums.[<img src=’/community/emoticons/emotion-5.gif’ alt=’
![Wink ;) ;)](styles/default/xenforo/smilies/wink.png)
Hi Frank, Thanks for your reply. We have not tried clustering on an IDENTITY column. To be clear. Do you mean having something like the following? /****** Object: Table [dbo].[PV] Script Date: 05/10/2007 13:19:02 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[PV](
ID int IDENTITY,
[TradeID] [int] NOT NULL,
[SimulationID] [int] NOT NULL,
[PV] [float] NOT NULL,
CONSTRAINT [PK_PV] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by Adriaan</i><br /><br />HAv eyou checked what happens on a single machine? Note that this is not the Clustering zone of the forums.[<img src=’/community/emoticons/emotion-5.gif’ alt=’
![Wink ;) ;)](styles/default/xenforo/smilies/wink.png)
http://blogs.msdn.com/sqlperf/archive/2007/02/16/high-volume-update-performance.aspx fyi on the subject.
At the same time have you checked the contention on TEMPDB & Transaction log of this database?
http://www.microsoft.com/technet/prodtechnol/sql/2005/tsprfprb.mspx is the one I always refer as a start. Satya SKJ
Microsoft SQL Server MVP
Writer, Contributing Editor & Moderator
http://www.SQL-Server-Performance.Com
This posting is provided AS IS with no rights for the sake of knowledge sharing. Knowledge is of two kinds. We know a subject ourselves or we know where we can find information on it.
Note that in your last script (SimulationId, TradeId) no longer has a unique constraint. The identity column doesn’t have to be the Primary Key: you can create a unique clustered index on the identity column, then add a Primary Key on (SimulationId, TradeId). If there is already a clustered index on the table, the primary key is nonclustered.
quote:Originally posted by jamie.downs
Hi Frank,
Thanks for your reply. We have not tried clustering on an IDENTITY column.
To be clear. Do you mean having something like the following?
/****** Object: Table [dbo].[PV] Script Date: 05/10/2007 13:19:02 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[PV](
ID int IDENTITY,
[TradeID] [int] NOT NULL,
[SimulationID] [int] NOT NULL,
[PV] [float] NOT NULL,
CONSTRAINT [PK_PV] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Yes.
—SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[PV](
ID int IDENTITY,
[TradeID] [int] NOT NULL,
[SimulationID] [int] NOT NULL,
[PV] [float] NOT NULL,
CONSTRAINT [PK_PV] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
Frank Kalis
Moderator
Microsoft SQL Server MVP
Webmaster:http://www.insidesql.de
Thankyou guys for all replies. I need some time to give them ago. I’ll let you know how I get on. Cheers.
To get the good performance with BCP, destination table should not have the index and SELECT INTO/BULK COPY option should be enabled… If remember correctly MS demostrated in SQL PASS of 2004 or 2005 with SSIS package inserted 10 million rows in 4-5 seconds…..
MohammedU.
Moderator
SQL-Server-Performance.com All postings are provided “AS IS†with no warranties for accuracy.
FWIW,http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/incbulkload.mspx —
Frank Kalis
Moderator
Microsoft SQL Server MVP
Webmaster:http://www.insidesql.de
]]>