In this article, we will continue the examination of distinct counts we began in our previous article, Considering DISTINCT COUNT. Having discussed why distinct counts are useful, and often required, within the design of robust analysis and reporting applications, we described some of the challenges that are inherent in distinct counts. We then undertook practice exercises to illustrate general solutions to meet example business requirements, providing an approach afforded us by the MSAS user interface, and then an alternative approach we enacted using MDX. Our purpose, as we stated, was to lay the framework for this and subsequent articles, where we will focus upon specific scenarios that occur commonly in the business environment, where optimization of distinct counts can become a very real consideration. In this article, we will examine one approach to the optimization of the use of DISTINCT COUNT within our applications: We will focus upon the optimization of DISTINCT COUNT through the isolation of the DISTINCT COUNT attributes into a separate cube, and show how this represents one of the more efficient approaches to optimizing the related functionality. To accomplish our objectives, we will undertake the following steps in this article:
Set the stage by providing a hypothetical business requirement;
Meet the requirement with an MDX query that contains DISTINCT COUNT;
Comment upon performance of the query in general;
Create a separate cube to house the DISTINCT COUNT attributes of our solution;
Combine the new DISTINCT COUNT cube with the previously existing cube, through the creation of a virtual cube in MSAS;
Create a new query, targeting the virtual cube as its source, to return a dataset identical to that returned by our initial query;
Comment upon performance gains in executing the new query upon the new cube combination.
Managing Distinct Counts
Considerations and Comments
We mentioned in our introductory article, Considering DISTINCT COUNT, that it is common in the business environment to encounter the need to quantify precisely the members of various sets of data. A simple example, and one upon which we will expand in our hypothetical business requirement, involves the number of customers who are purchasing a product, or group of products, sold by an organization. We learned in the previous article that we can exploit settings within MSAS’ Analysis Manager, as well as take more advanced approaches, to extend our analysis even further, and leverage MSAS to reach our specific business objectives.
We discussed why distinct counts differ from simple counts, noting that a distinct count might comprise, as an example, a count of the different products that were purchased, or of the individual customers who purchased our products. To review our discussion, COUNT(), in providing a total number of, say, customers, would also be providing multiple counts of the same customers, because customers will have, in most cases, purchased multiple products, multiple times. To reach our objective of counting different customers, then, we would need to count each different customer, only once. As we noted in our previous session, using COUNT() when DISTINCTCOUNT() is required not only misstates the number of different customers, but it also likely renders averages, and other metrics similarly based upon the count value, misleading or totally useless in our analysis efforts.
Let’s discuss an example that illustrates a solution to meeting an illustrative need, which expands upon the customer example to which we have alluded. The example will also highlight the performance challenges that can arise in simply addressing such requirements in an intuitive manner. We will then take steps to reshape our solution to take advantage of another approach that meets the need, while bettering the performance of the overall solution.
We will begin with a scenario that illustrates a requirement for a distinct count, using a hypothetical business need to add practical value. Let’s say that a group of information consumers within the FoodMart organization have approached us with an information request they wish to meet using the Sales cube. The consumers want to be able to analyze the performance of products, by category, both in terms of dollar sales, and number of different customers contributing to those sales, for the third quarter (Q3) of 1997. In addition, they wish to see an “average sales per (distinct) customer” within the same dataset.
We will initially attempt to meet the needs of the consumers with relatively simple MDX, having introduced both MSAS and MDX approaches in Considering DISTINCT COUNT (see the steps provided there, if you have joined the series with this article, and find the initial query we present to be less than intuitive.)