Optimizing Microsoft SQL Server Analysis Services: MDX Optimization Techniques: Considering DISTINCT COUNT
In this article, we will introduce the concept of distinct counts, discussing why they are useful (and often required) within the design of any robust analysis effort. Throughout this article, we will describe some of the challenges that are inherent in distinct counts, and then we will undertake practice exercises to illustrate solutions to meet example business requirements. As a part of the practical exercises, built around a hypothetical business need, we’ll provide an approach afforded us by the MSAS user interface, and then we’ll offer an alternative approach we can take using MDX.
We will revisit DISTINCT COUNT in subsequent articles in our series, examining specifics with regard to appropriate use, and details of optimization within the perspective under examination in the article concerned. In this article, we will lay the framework for those specific scenarios, and discuss the basics of DISTINCT COUNT, together with considerations that surround its use.
Managing Distinct Counts
Anyone working within the realm of business intelligence and general analysis realizes, in short order, that we often encounter the need to quantify precisely the members of various sets of data. Those of us who have become familiar with MSAS are aware of its capabilities when it comes to categorizing and aggregating data within the hierarchical contexts of dimensions and levels. We can, for the most part, readily tap these capabilities from the user interface that MSAS provides. Through the exploitation of more advanced approaches, including the use of calculated members / measures and multidimensional expressions (“MDX”), we can extend our analysis even further, and leverage MSAS to reach far more specific objectives.
One of the basic requirements that comes into play, at least in some form, in virtually any analysis scenario, is the need to count the members of a set targeted for analysis. An example might be the need to count the number of products we have shipped from a given warehouse, or group of warehouses, to a given geographical location, or a specific group of stores. This can be accomplished readily enough with the Count() function, as most of us are aware.
Count() does a great job of giving us a total count. Of course, the results we would achieve in using Count() with products, in the scenarios above, would represent total number of products shipped. What we would not get, and what we might find far more useful in some situations, would be a count of the different products that were shipped. Count(), in providing a total number, would also be providing multiple counts of the same products, because products will have been shipped multiple times, in many instances. To reach our objective of counting different products, then, we would need to count each different product shipped, only once. To count them multiple times not only misstates the number of different products, but it also renders averages, and other metrics based upon the count value, meaningless or misleading.
The word “different” here is easily supplanted by “distinct.” And as many of us are aware, the performance of distinct counts has historically presented a challenge in the OLAP world. Let’s discuss an example that illustrates the challenge, and then transform that challenge to an opportunity to meet an illustrative business need, using the distinct count capabilities found within MSAS.
Let’s start with a look at a scenario that illustrates a need for a distinct count, using a hypothetical business need to add practical value. Let’s say that a group of information consumers within the FoodMart organization have approached us with a need that they wish to meet within the Warehouse cube. The consumers want to be able to report upon number of products within various metrics without having to be concerned with an issue they faced with a previous system – a scenario of “double counting” in many inventory reports that concerned product-related transactions between warehouses and stores.
We might initially attempt to meet the needs of the consumers with somewhat advanced MDX, but let’s try to minimize complication, while heading off many of the issues, with a straightforward approach from within the Cube Editor component of the MSAS user interface, Analysis Manager, first. This provides all that we need, in many cases. We will examine an MDX approach in the next section of this article.
Distinct Counts in MSAS
Let’s start Analysis Services and proceed with the following steps:
1. Open Analysis Manager.
2. Expand the Analysis Servers folder by clicking the “+” sign to its immediate left.
Our server(s) appear (my server, MOTHER1, is depicted in some of the illustrations).
3. Expand the desired server.
Our database(s) appear, in much the same manner as shown in Figure 1.
Figure 1: A Sample Set of Databases Displayed within Analysis Manager
4. Expand the FoodMart2000 database.
5. Expand the Cubes folder.
The sample cubes appear, as shown in Figure 2.
Figure 2: The Sample Cubes in the FoodMart2000 Database
NOTE: Your databases / cube tree may differ, depending upon the activities you have performed since the installation of MSAS (and the simultaneous creation of the original set of sample cubes). Should you want or need to restore the cubes to their original state, simply restore the database under consideration. For instructions, see the MSSQL Server 2000 Books Online.