Optimizing Microsoft SQL Server Analysis Services: MDX Optimization Techniques: Optimizing the CROSSJOIN() Function

Overview

In this article, we will return to our intermittent examination of various MDX optimization techniques which we have intermingled with other MSAS optimization topics within this series. The function upon which we will focus will be CROSSJOIN(), whose mission it is to combine two sets of dimension members into one. We will discuss how CROSSJOIN() works in general, and then discuss how performance can become an issue in medium- to large-sized data sets. We will then undertake a multiple-step practice example to gain an understanding of the issues, before examining approaches to minimizing the performance overhead that we can encounter in using CROSSJOIN() to reach our reporting and analysis needs.

To accomplish this objective, we will undertake the following steps in this article:

  • Create a copy of the Warehouse sample cube for use in our practice exercise;

  • Prepare the cube further by processing;

  • Discuss the operation of the CROSSJOIN() function, and factors that may make its use suboptimal in our queries;

  • Enhance a suboptimal query employing a CROSSJOIN() function by substituting NONEMPTYCROSSJOIN();

  • Provide further enhancement by leveraging the set count parameter in the NONEMPTYCROSSJOIN() function;

  • Explain the results we obtain from the steps we take to accomplish the solution.

Optimizing the CROSSJOIN() Function Considerations and Comments

As many of us know, the CROSSJOIN() function is highly useful anytime we wish to generate a cross-product of members in two different sets. The ability to specify “all possible combinations” is convenient – indeed, the most straightforward way to perform such a combination of two sets. Unfortunately, the indiscriminate use of the CROSSJOIN() function, like many other MDX functions, can slow reporting and analysis dramatically. This is often due to a failure to understand how the function performs set combinations, and how its action can lead to huge results datasets when applied to large cubes.


NOTE: For a discussion of the interaction between the CROSSJOIN() function and the Large Level Threshold , see my article  MDX Optimization Techniques:  Further Control of Processing Location and Expression Arrangement here at Sql-Server-Performance.com


In combining two sets, CROSSJOIN() combines every member of the first set (all from a single dimension) with every member of the second, creating a “Cartesian” effect as a result. Combining two sets, for example, with the following query will illustrate a scenario, on the scale of a decidedly small sample cube (the WAREHOUSE sample that installs with MSAS), where we experience a taste of the consequences when the number of members in set 1, times the number of members in set 2, times the member population in set 3, results in many combinations.
SELECT
{[Measures].[Warehouse Profit]} ON COLUMNS,
{CROSSJOIN([Warehouse].[Warehouse Name].Members,
CROSSJOIN([Store].[Store Name].Members,
[Product].[Product Name].Members))} ON ROWS
FROM
[SSP_OPT_CROSSJOIN]
This query would generate 898,560 combinations (24 individual Warehouses times 24 Stores times 1,560 distinct Products). A similar scenario, with the cube sizes we see these days, could be crippling to performance. This is aggravated by the fact that sparsity is common enough in large cubes, and CROSSJOIN() results on those cubes would likely have a much higher sparsity factor. (The results dataset produced by the above query yields only a tiny fraction of combinations / tuples with non-empty measures.)

The need to generate all possible combinations, empty or not, lies behind the performance drag, and becomes even more pronounced when we go a step further and attempt to perform query operations, as we shall see in the practice example, that must wait for the combinations to be assembled, and then be applied to the resulting dataset. The time consumed in assembling a large number of empty combinations has been wasted when they are “tossed” in a subsequent step in the march toward the ultimate results.

For purposes of our practice procedure, we will assume that we have been asked by management of a hypothetical client to investigate denigration in performance of a query. The query was originally constructed at the request of a group of information consumers in the Corporate Planning department, shortly after the implementation of MSAS at the FoodMart organization. The creator of the query, who initially wrote the MDX in a way that seemed intuitive, intended to optimize it later. Unfortunately, he was laid off in the wake of sudden moves by management to move all developer functions offshore.

Attempts to communicate with the offshore support team were abandoned when it was learned that the building housing the group had been destroyed in a tsunami that had swept the region. (The inability to obtain assistance with this issue paled, we are told, with the loss of documentation, code, and other collateral for a host of enterprise projects in process at the time of the disaster.)

We listen closely to the requirement. We then develop a plan to examine the query under consideration, before offering options for improving its performance. As is typically the case, we decide to work with a copy of the affected cube (in this case, the Warehouse cube) to allow the original to remain isolated.

Hands-On Procedure
Preparation

Create a Clone Cube

Let’s get started by creating a clone of the Warehouse sample cube, which, along wit the FoodMart database that contains it, accompanies an MSAS installation. This will allow us to keep the original sample cube intact for other uses.

1. Open Analysis Manager, beginning at the Start menu. 2. Expand the Analysis Servers folder by clicking the “+” sign to its immediate left. Our server(s) appear. 3. Expand the desired server.

Continues…

Leave a comment

Your email address will not be published.