Optimizing Microsoft SQL Server Analysis Services: Optimization Tools: Partition Wizard
Introduction and Scope
We briefly mentioned partitions in the first article of this series, Optimization Tools: The Storage Design Wizard, within which we introduced the Storage Design Wizard. The Storage Design Wizard, as we discovered, enables us to manage aggregations on a partition-by-partition basis when working with a multi-partitioned cube. We noted that, if a cube we are optimizing through the use of the Storage Design Wizard contains multiple partitions, we are forced to select a partition from the outset, as we can only design storage for a single partition at a time.
In this article, we will introduce the MSAS Partition Wizard, whose role is to enable us to create and modify partitions to optimize the query performance of our cubes. We will first discuss partitioning as a concept, within the context of MSAS cubes, and then we will perform a hands-on exercise where we partition a copy of the Budget cube, one of the sample cubes that accompany the installation of MSAS. Our objective in this article is an overview of the wizard itself; later articles will focus on detailed partitioning strategy and techniques.
Within our exploration of the Partition Wizard, we will accomplish the following:
Create a copy of the Budget sample cube for use in our practice exercise.
Prepare the cube further by processing.
Perform a practice exercise, using the Partition Wizard, to create partitions for our practice cube.
Examine the options that are available to us, as we proceed through the guided steps of the Wizard.
Comment upon general optimization concepts as we proceed through our practice example, as a prelude to future articles devoted to specific techniques.
Overview of Partitioning as an Optimization Technique
Partitions allow us to divide a logical cube into separate “storage containers” for our data and its aggregations. All MSAS cubes have at least one partition, which is created along with the cube automatically. Additional partitions are not a requirement, and, indeed, many cubes lead full and rewarding lives with only the partition upon which they were born. But very large cubes can benefit dramatically from a well-conceived partitioning strategy, whose benefits include enhanced query performance, as well as enlarged flexibility in data storage and data source location.
The creation of separate physical partitions mean that we can design cube storage around expected use, particularly expected access frequency, of the data in the partitions we design. An example of a common partitioning strategy that I see on client engagements includes the creation of one partition for each quarter of the current operating year, together with a separate partition for each prior year. Because cube querying, and reporting in general, is typically at its highest frequency in the current year and its constituent quarters, we would likely want to design the “current” partitions with MOLAP storage, along with the aggregations required to deliver rapid query results. Prior years’ partitions would receive perhaps less aggregation, because the further in the past a given year from the current year, the less routine demand we would expect for more than summary information. The need for access to the partitions in general would decrease with age, so the storage design strategy for those years’ data might become more focused on size conservation, and less on query performance.
A cube containing financial information can contain a partition for the data of each past year, and also partitions for each quarter of the current year. At the end of the year, the four quarterly partitions can be merged into a single partition for the year. The “separate container” concept means each partition can have different combinations of options, enabling us to design cube storage strategies appropriate to our unique business requirements. These combinations include:
source data location
aggregation data location
Many variations with regard to partition data sources are possible; we will investigate some of the alternatives in subsequent articles. We can store partition aggregate data on the Analysis server computer where the partition is defined (the default) or on another Analysis server computer, as a remote partition.
Storage mode, which we discussed at length in Optimization Tools: The Storage Design Wizard, determines whether the partition’s aggregate data is stored in a database specified in the partition’s data source (ROLAP), stored completely within a cube on an Analysis server computer (MOLAP) or stored within a custom combination of the two (HOLAP).
Aggregation design, which specifies the number and contents of the aggregations created for the partition, can be different for each partition. As we learned in Optimization Tools: The Storage Design Wizard, we can tailor a partition’s aggregation design, defining the balance between query performance and storage utilization, with the Storage Design Wizard. Moreover, with the Usage-Based Optimization Wizard, we can perform these same actions, as well as optimizing the aggregation design based on queries (which we can select) that the organization’s information consumers have previously executed against the partition’s cube. As we noted in our articles surrounding the Storage Design and Usage-Based Optimization Wizards, the aggregations that we design in either case are actually created as a part of the subsequent cube processing cycle.
Practice Example – Creating a Partition
Let’s get started by creating a clone of a FoodMart sample cube once again, in case we want to keep our originals intact for other uses.
NOTE: If you are performing the steps of our article on a production (or other) cube, instead of the sample cube we will create for the purposes of illustration of the use of the Partition Wizard, the preparation steps in the first couple of sections below can be skipped.