Data Warehouse Partition Strategies

Microsoft put a great deal of effort into SQL Server 2005 and 2008 to ensure that that the platform it is a real Enterprise class product.  I’m not going to write about all the new features in the OLTP Engine, in this article I will focus on Database Partitioning and provide a real world scenario.  This article will cover the following points: I. Benefits and Limitations of Partitioning (HEAPS/CLUSTERED/NONCLUSTERED).
II. Implementation Methodology.
III. Partitioning Management. I. Benefits and Limitations of Partitioning (HEAPS/CLUSTERED/NONCLUSTERED)
Positives 
– Data can be accessed through multiple partitions in parallel, this results in faster DML.
– Different partitions can be managed separately.
– Management of history more efficient (without having to leave the table).

Negatives  
– Limit of 1000 Partitions by Table.
– Inability to use Indexed Views. II. Implementation Methodology
I’m going to focus on the most common scenario.  Typically, in a Data Warehouse environment there is a large Fact Table (a.k.a the Monster). In this article the Fact Table contains all the banking movements of all accounts within the last 4 years in the MAIN database and the remaining years in a HISTORY database.  In this case, because there is a chronology to follow, I will use the extract day date column of the table as the KEY to our Partition Strategy.  In this situation, I recommend the creation of a Partitioned Clustered Index because typically all access to the fact table always has the date in the SARG.   The fact table will be partitioned by month. Technical approach:
Storage Arquitecture – 3 RAID-5 Partition (2TB each), 1 RAID 01 for Transaction Log and 1 more RAID 01 for
Non-Clustered indexes. First off all you will need to develop the Partition Function.  The Partition Function defines the values which the partition scheme will bind with the Filegroups.   Our boundary will be the date, and right range (boundary right aligned): CREATE PARTITION FUNCTION PFFactTableMonthly (SMALLDATETIME)
AS RANGE RIGHT
FOR VALUES
(
  ‘2007-01-01′,’2007-02-01′,’2007-03-01′,’2007-04-01′,’2007-05-01’,
  ‘2007-06-01′,’2007-07-01′,’2007-08-01′,’2007-09-01′,’2007-10-01’,
  ‘2007-11-01′,’2007-12-01′,’2008-01-01′,’2008-02-01′,’2008-03-01’,
  ‘2008-04-01′,’2008-05-01′,’2008-06-01′,’2008-07-01′,’2008-08-01’,
  ‘2008-09-01′,’2008-10-01′,’2008-11-01’, ‘2008-12-01′,’2009-01-01’,
  ‘2009-02-01’
  ) Create the Datafiles and assign each file to a differente Filegroup. In this situation 27 Datafiles/Filegroups will be created.  Now that the Filegroups are created, assign them to a Partition Scheme that acts like a ROUTER, because it will forward the data to a specified path.  The Partition Scheme will use the Partition Function to link the logical layer to the physical layer: CREATE PARTITION SCHEME PSFactTableMonthly
AS PARTITION PFFactTableMonthly
TO (
 FG_FactTable_BASE,FG_FactTable_200701,FG_FactTable_200702,
 FG_FactTable_200703,FG_FactTable_200704,FG_FactTable_200705,
 FG_FactTable_200706,FG_FactTable_200707,FG_FactTable_200708,
 FG_FactTable_200709,FG_FactTable_200710,FG_FactTable_200711,
 FG_FactTable_200712,FG_FactTable_200801,FG_FactTable_200802,
 FG_FactTable_200803,FG_FactTable_200804,FG_FactTable_200805,
 FG_FactTable_200806,FG_FactTable_200807,FG_FactTable_200808,
 FG_FactTable_200809,FG_FactTable_200810,FG_FactTable_200811,
 FG_FactTable_200812,FG_FactTable_200901,FG_FactTable_200902
 ) Now that we have the Partition Scheme, just create the Clustered Index on the Fact table with a FILLFACTOR 90% with PAD_INDEX on. CREATE CLUSTERED INDEX [IDXC_FACTTABLE] ON MIS.FACTTABLE
 (
 datecolumn
 )
 WITH( PAD_INDEX = ON, FILLFACTOR = 90, STATISTICS_NORECOMPUTE = OFF,
 IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
on PSFactTableMonthly (DATECOLUMN)

Continues…

Leave a comment

Your email address will not be published.