SQL Server Performance

choosing the right partition field

Discussion in 'SQL Server 2008 General DBA Questions' started by cbabino71, Apr 5, 2011.

  1. cbabino71 New Member

    I am looking for some suggestions.We have a very large table.. let' call it custorder table .Most of the reporting queries running against it, are for the current year(2011)We want to partition this table against 5 separate physical hard disks using the partition key choosen. One suggestion is to use the 'order date' breaking the data up by calender year and the other is the 'customer id'. The reasoning behind partitioning the table using the 'order date' , so when querying for 2011 data just the latest partition would be searched all other partition would not be (partition elimnation). The other idea is to partition by 'customer id' (keeping in mind as I said before most reporting queries are for the current year) all the separate partition(on separate hard disks) would be invoked to search resulting faster performance. Also which partition key would provide better performance on new record inserts( primary key being customer-id and order-date)I am looking for advice or suggestions.thanks,CB
  2. satya Moderator

    I would recommend ORder Date as it is easy to partition the data, anyhow the Customer-id is primary key & clustered the rows are sorted in that order that can offer better optimization,.
    See what BOL refers:
    Although partitioned indexes can be implemented independently from their base tables, it generally makes sense to design a partitioned table and then create an index on the table. When you do this, SQL Server automatically partitions the index by using the same partition scheme and partitioning column as the table. As a result, the index is partitioned in essentially the same manner as the table. This makes the index aligned with the table.
  3. mmarovic Active Member

    There is no easy answer. Let's consider both scenarios.
    1. Partitioning by order date, putting orders from the same year into the same partition.
    The main advantage is maintenance. You can run maintenance tasks like index defragmentation only on the latest partition. However, you do not take advantage of multiple disks if in most cases just data from the last few months are retrieved.
    2. Partition by customer id.
    It looks like good idea. For one customer data would be selected from single partition, but if there are multiple clients selecting data about different customers at the same time, they might access different partitions. The good thing about selecting data about customer from single partition, using customerId as partition key, is that these data are "physically close", that means they are typically placed in the same segment, so fewer physical reads are necessary to return data. However, it depends on customerId distribution. If customers are "short lived", you still could end up with clients selected data from the same partition in most cases. Also, not all queries are about limited number of customers.
    I suggest the third solution.
    3. Partition by order date, but split data by quarter or even month. That way you still have maintenance advantage. In addition to that, the query for the same customer would often read data from multiple disks at the same time in parallel.
    That was my post 2222 :)

Share This Page