Optimizing Microsoft SQL Server Analysis Services: Optimization Tools: Incremental Update Wizard
The Options for Processing an MSAS Cube
MSAS offers numerous processing options, as we shall see in this section. The best way to synchronize our cubes with the underlying data, and to ensure that they reflect completely the dimensional structure that we have established in Analysis Manager, is to toss the cube entirely, and rebuild from scratch. There are many times when this is not desirable, however, and so other options exist. The time it takes to perform a Full Processing cycle (say, longer than the overnight period that lies between hours that information consumers are expected to need access to the cube we are updating) may be prohibitive, although updates are vital. Moreover, storage space may be a consideration. The accumulation of aggregations in a Full Processing cycle requires the creation of temporary files that, in combination with the independent copies of cube and dimension files that accompany a full build, can become major resource challenges.
One way to manage challenges of this sort within MSAS is to perform “piecemeal processing” of certain components of the MSAS database. Of the four options available for these independent processing approaches, three are mutually exclusive, and include Full Process, Refresh Data, and Incremental Update. The fourth option, Incrementally Update the Dimensions of this Cube, can be performed along with any of the first three options, to incrementally update the cube’s dimensions as part of the cube processing. Our exposure to cube processing within this series has centered, up until now, on the Full Process option. As we have discussed, Full Processing completely rebuilds a cube from scratch, reconstructs it based upon the current MSAS definition, recalculates its data and aggregations, and repopulates the new structure with the data and aggregations.
When we simply need to clear out the data in a cube, then to reload the data with its recalculated aggregations, (as in a case where the underlying source data has changed, but the cube structure has not been altered), we have the Refresh Data option. This would be the case, say, where we want to update a cube for each month’s activity (new data), and to recalculate all aggregations surrounding the data, and where we are certain that no changes have been made in the structure of the cube. This approach is intended for the “steady state” operations of many organizations.
In my opinion, little is gained by choosing Refresh Data over Full Process because both processing options completely rebuild the aggregations tables from scratch. If testing on your local environment shows little difference, you might simply use the Full Process option for similar update needs: at least with Full Process you get an automatic check of the dimension structure to ascertain that no changes have, in reality, been made.
An Incremental Update adds new data and updates aggregations. Structural changes, such as those made to dimensions, measures, and so forth, are not updated with an Incremental Update. Moreover, Incremental Updates do not update changes to the cube’s existing, underlying source data. We will examine the Incremental Update further, at least within the context of walking through the steps of the Incremental Update Wizard, in our next section, but the important characteristic of an Incremental Update to keep in mind is that it merges new data into an existing partition, adding the data to, and updating the aggregations of, the cube for which it is initiated. This management of “what has changed only” in the warehouse / mart, for addition to the cube, has its obvious attractions when the goal is to reduce cube processing time, but only if the way that it works is understood in detail, and if accommodation for the process is made prior to its use. To summarize its operation in simple terms, we can look at the Incremental Update process as being pointed at a different fact table than that in which the “existing” / already processed data is being stored — or, if pointed at the same fact table, pointed only at the “new” subset of the data via a filter we put in place. New files are created by the process – cube files that are identical to those produced in a Full Process build.
The Incremental Update process, in contrast to the Full Process, does not simply swap the new files it creates with the ones that make up the previous cube. Remember, these cube files presumably represent “new” data that is not summarized in the existing cubes. The Incremental Update process creates yet another set of files, composed of a combination of the original cube files and the “new” cube set. Because we have, at least temporarily, three full sets of files, Incremental Updates on larger cubes may not act to relieve the disk space issues that cause problems with a Full Process approach. In addition, other potential dangers can arise within the process by which the Incremental Update creates a temporary partition to accomplish the merge with the existing data, which is housed within its own partition(s). We will touch upon this further at the appropriate point in our practice exercise within the next section.
NOTE: Once the “combination cube” is born, the original and “complementary” (or “delta”) cube files are removed completely from the drive. The combination cube, now alone, is then named to identify it as the cube it replaces.
We will discuss the Incremental Update process in more detail, when it is relevant to our introduction to the Incremental Update Wizard, in the next section. As we have stated, we will address detailed incremental processing strategies and approaches in subsequent articles. Our focus in this article is the use of the Incremental Update Wizard, and, therefore, our efforts rely upon the background assumption of an Incremental Update. Incrementally Update the Dimensions of this Cube is a supplementary action that, as we have stated before, can be performed along with any of the available options to incrementally update the cube’s dimensions. This is done as part of cube processing under the respective option, and is not the type of Incremental Update for which we use the Incremental Update Wizard, but is simply means of adding in new dimension members that have come along. This “add-on” feature within the primary update options exists to enable us to easily handle simple member adds that do not alter the dimension structure enough to drive a forced reprocessing of the cube.
Introducing the Incremental Update Wizard
As we noted in the last section, we use Incremental Updates to append new data to a cube — more precisely to a single partition of a cube — and to update the aggregations involved. Let’s take a look at how we manage the process with an MSAS tool that is provided to manage this operation, the Incremental Update Wizard.
We will use, in our practice example a copy of one of our sample cubes, the Budget Cube. You can skip the section immediately following, where we copy the cube, if you prefer to perform it with the actual sample Budget cube (you can always restore the cube from the .cab archive that comes along with the MSAS installation, or from the original CD itself — see the Books Online for the process), or if you prefer to simply follow along with another cube of your choice (realize that results I picture, etc., will, of course, differ).