Hopefully the title should wet your appetite. I am interested in methods people might use. We have a multithreaded process which on completion inserts about a billion rows into a table. The multithreaded process is run on a grid of 16 machines. Our best rate is around 250,000 rows/sec. This is done by passing a blob, extracting the data using a table valued function and inserting into our table. We have tried bcp and this is slower. The table looks like the following: /****** Object: Table [dbo].[PV] Script Date: 05/10/2007 13:19:02 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[PV]( [TradeID] [int] NOT NULL, [SimulationID] [int] NOT NULL, [PV] [float] NOT NULL, CONSTRAINT [PK_PV] PRIMARY KEY CLUSTERED ( [SimulationID] ASC, [TradeID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO One bizarre thing is that performance decreases without the PK. I expected the oppostie. Any suggestions would be greatly appreciated. Thanks Jamie.....
Just a quick question, as I am still amazed at the 1,000,000,000 rows load. [<img src='/community/emoticons/emotion-1.gif' alt='' />]<br />Have you tried clustering on an IDENTITY column?<br /><br />--<br />Frank Kalis<br />Moderator<br />Microsoft SQL Server MVP<br />Webmaster:<a target="_blank" href=http://www.insidesql.de>http://www.insidesql.de</a>
HAv eyou checked what happens on a single machine? Note that this is not the Clustering zone of the forums.[<img src='/community/emoticons/emotion-5.gif' alt='' />]
Hi Frank, Thanks for your reply. We have not tried clustering on an IDENTITY column. To be clear. Do you mean having something like the following? /****** Object: Table [dbo].[PV] Script Date: 05/10/2007 13:19:02 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[PV]( ID int IDENTITY, [TradeID] [int] NOT NULL, [SimulationID] [int] NOT NULL, [PV] [float] NOT NULL, CONSTRAINT [PK_PV] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO
<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by Adriaan</i><br /><br />HAv eyou checked what happens on a single machine? Note that this is not the Clustering zone of the forums.[<img src='/community/emoticons/emotion-5.gif' alt='' />]<br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br /><br />Hi Adriaan,<br /><br />Thanks for your reply.<br />The perf with a single machine is too slow. The db server is a 4-way dual core so when you do it with a single machine only one cpu is active. We really want to beast it and get the data in.
http://blogs.msdn.com/sqlperf/archive/2007/02/16/high-volume-update-performance.aspx fyi on the subject. At the same time have you checked the contention on TEMPDB & Transaction log of this database? http://www.microsoft.com/technet/prodtechnol/sql/2005/tsprfprb.mspx is the one I always refer as a start. Satya SKJ Microsoft SQL Server MVP Writer, Contributing Editor & Moderator http://www.SQL-Server-Performance.Com This posting is provided AS IS with no rights for the sake of knowledge sharing. Knowledge is of two kinds. We know a subject ourselves or we know where we can find information on it.
Note that in your last script (SimulationId, TradeId) no longer has a unique constraint. The identity column doesn't have to be the Primary Key: you can create a unique clustered index on the identity column, then add a Primary Key on (SimulationId, TradeId). If there is already a clustered index on the table, the primary key is nonclustered.
quote:Originally posted by jamie.downs Hi Frank, Thanks for your reply. We have not tried clustering on an IDENTITY column. To be clear. Do you mean having something like the following? /****** Object: Table [dbo].[PV] Script Date: 05/10/2007 13:19:02 ******/ SET ANSI_NULLS ON GO SET QUOTED_IDENTIFIER ON GO CREATE TABLE [dbo].[PV]( ID int IDENTITY, [TradeID] [int] NOT NULL, [SimulationID] [int] NOT NULL, [PV] [float] NOT NULL, CONSTRAINT [PK_PV] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO Yes. -- Frank Kalis Moderator Microsoft SQL Server MVP Webmaster:http://www.insidesql.de
Thankyou guys for all replies. I need some time to give them ago. I'll let you know how I get on. Cheers.
To get the good performance with BCP, destination table should not have the index and SELECT INTO/BULK COPY option should be enabled... If remember correctly MS demostrated in SQL PASS of 2004 or 2005 with SSIS package inserted 10 million rows in 4-5 seconds..... MohammedU. Moderator SQL-Server-Performance.com All postings are provided “AS IS†with no warranties for accuracy.
FWIW,http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/incbulkload.mspx -- Frank Kalis Moderator Microsoft SQL Server MVP Webmaster:http://www.insidesql.de