RAID & filegroup placement | SQL Server Performance Forums

SQL Server Performance Forum – Threads Archive

RAID & filegroup placement

Hello all, We are currently building a Data warehouse on SQL Server 2005 x64 on Windows Server 2003 (x64). The database is loaded nightly via an import routine from an Oracle Rdb Database on a VMS Server. The VMS server uses RAID 0+1 (striped & mirrored). The structure of both databases is identical. The actual Rdb database contains 180 tables which are split over 250 files (equivalent to filegroups in MS SQL parlance) – we do this to place indexed and non-indexed information on different portions of the disk to optimise spindle movement etc. In terms of the SQL/Windows side of things how do we best configure the disks to ensure optimum read access. We obviously want redundancy but the system needs to be optimized for read access. We have been speaking to various technical people and the general feeling is that we pretty much employ the same config as we have on the VMS server, i.e. RAID0+1 with a hardware RAID controller. What are peoples thoughts on splitting table data and index data into separate filegroups and is this possible or necessary in SQL 2005? For example, a tables called Invoices with two indexes (idx1 & idx2) would have the following physical files. d:datainvoices_dat.mdf
e:datainvoices_idx1.mdf
f:datainvoices_idx2.mdf
g:datainvoices_log.ldf
Please help as we are unsure as to whether the efficiency gains of the file configuration in Rdb will be lost in the Windows environment. Many thanks
Mike

How large do you expect the DB to be? —
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Heute schon gebloggt?http://www.insidesql.de/blogs
The database contains 65,000,000 rows on the VMS server, 180 tables
with 258 indicies. The actual file count I gave in the original post
is incorrect, it’s more like 500. We have "budgeted" for 450gb worth of disk space on the
windows server but expect the database could physically live in
50gb of space. Probably getting slightly ahead of ourselves but we would like to
introduce date range horizontal partitioning on some of the larger
(+5million row) tables. Basically we are looking to do everything
we possibly can to make this db fly. Also worth mentioning is that the 64bit server we have is currently on
rental whilst we evaluate. We expect the server that we buy would have at least 14gb
RAM and four Opteron CPU’s. We would assign three CPU’s to SQL and one to windows. Thanks
Mike
Typically when databases are below 1 TB you can live very well with just one filegroup, but I’m sure your subject will get Joe Chang’s attention and then you’ll get more detailed informations. —
Frank Kalis
Microsoft SQL Server MVP
http://www.insidesql.de
Heute schon gebloggt?http://www.insidesql.de/blogs

We have SQL 2000 databases here at the moment and they all use just one file group.
As you can tell we are not MS SQL experts so we just don’t understand how to
tune for performance if you can’t place data to a particular spindle. Might be we are coming at this in a slightly old fashioned way. Cheers
Mike

Wait for Joe.
Midtime you can read in this forum (hardware) similars problems. Luis Martin
Moderator
SQL-Server-Performance.com All in Love is Fair
Stevie Wonder
All postings are provided “AS IS” with no warranties for accuracy.
i don’t think you should wait for me,
i am currently in Costa Rica with iffy internet access personnally, i see people discussing strategy of multiple filegroups, but not the underlying disk configuration, which i regard as more important i will assume you are on s2k, not s2k5,
the max disk transfer rate you will get on a s2k table scan is 700+MB/sec, much more on s2k5, i have seen as high as 15GB/sec so for s2k, first priority is to get the data portion spread across disks to get 800MB/sec
for this, i suggest 12 36GB 15K SAS drives on 1 PCI-E adapter, or 2 SAS PCI-X adapters
above applies to direct attach storage, if you really want a SAN, figure 10MB/sec per disk or 80 disks if you do not do the above, everything else is a waste of time
if you do, i like the strategy used by HP’s TPCH (a datawarehouse test) benchmark team on the above disk array, create one or more partitions for the main data file group using the first partition on each disk array
then create a second partition for the index filegroups
a third partition for the tempdb data files.
a fourth partiton for a load file group, where the data is first loaded, then create a clustered index to move it to the first file group, which is the fastest portion of the disk logs can reside on 1 pair of disks, but i have considered that for DW, the logs can be spread across part of the data disk as well read the HP Itanium tpch full disclosre for the details
Also, I found this Microsoft paper interesting: http://www.sqlservercentral.com/articles/articlelink.asp?articleid=2470
the general pattern with MS papers is that they reflect an excellent source of information on how SQL & Windows interacts with hardware, considering they source code and that they are reasonably informed on technology, the writers have access to high-end hardware, but they do not live and breath hardware performancce, and cannot provide precise performance information
Many thanks for your help chaps. The first thing to say is that we are running SQL 2005 64 bit. Does this mean that these configuration remain the same ? Also we are using MSDN Enterprise Edition , but are looking at buting Standrad Edition – I am not sure that partitioning is an option for this ? have you any ideas on this ? Many thanks Mike
Partitioning is only available in the Enterprise edition, not standard.
Hi there I have to say that I am unsure what to do now. A database this size reportedly runs ok using one filegroup. I would just like torestate the situation and reiterate the question. note we ARE using SQL 2005 x64. We are currently building a Data warehouse on SQL Server 2005 x64 on Windows Server 2003 (x64). The database is loaded nightly via an import routine from an Oracle Rdb Database on a VMS Server. The VMS server uses RAID 0+1 (striped & mirrored). The structure of both databases is identical. The actual Rdb database contains 180 tables which are split over 500 files (equivalent to filegroups in MS SQL parlance) – we do this to place indexed and non-indexed information on different portions of the disk to optimise spindle movement etc. The database contains 65,000,000 rows In terms of the SQL/Windows side of things how do we best configure the disks to ensure optimum read access ? We obviously want redundancy but the system needs to be optimized for read access. We have been speaking to various technical people and the general feeling is that we pretty much employ the same config as we have on the VMS server, i.e. RAID0+1 with a hardware RAID controller. What are peoples thoughts on splitting table data and index data into separate filegroups and is this possible or necessary in SQL 2005? For example, a tables called Invoices with two indexes (idx1 & idx2) would have the following physical files. d:datainvoices_dat.mdf
e:datainvoices_idx1.mdf
f:datainvoices_idx2.mdf
g:datainvoices_log.ldf i. Can this be done on SQL Server 2005 ie split tables/indexes ?
ii.Will we be using the benefis gained in RDB by putting it into a Windows environment with the one file group.
iii. Are we worrying too much about this in light of what Joes says – altho he assumed we were using 32bit SQL 2000. I hope this makes things clearer. Thanks Mike
one of the reasons to make multiple filegroups is to place the very largest table on its own file group, its indexes on another
possible the same for another very large table
then all smaller tables on its own filegroup all of this is so you do not have to defrag the very big tables, typically 1TB or more another reason is if you had a DW, where you load up 1 month or quarter at a time, then you only need to backup the most recent load, instead of the whole db. the oracle practice probably stemmed from the days before there were hardware raid controllers, and the unix 2gb filesystem limit
any body who still believes this is relevent today probably has not learned anything in the last 10 years
A suggestion on the RAID – rather than RAID 0+1, use RAID 1+0 – it gives better redundancy and recoverability.
]]>