Analysis Services in High Data Volume Business scenarios
Overview of Parallel Processing in Analysis Services 2005
Analysis Services 2005 embraces the XMLA standards and an enhanced object model which supports parallel processing without use of external utilities. The basic drawback of DSO based objects in AS 2000 rendered it unavailable for parallel processing using Analysis Services GUI. The DSO objects had to be invoked in a defined manner to simulate parallelism. Usage of finer grained control on the processing options provides increased flexibility in AS 2005 to perform processes in parallel using XMLA scripts, GUI based processing or SSIS based processing.
Parallel Processing – In Detail
When an execution request is submitted to AS 2005, an execution plan is generated that provides the blueprint for processing the given task. The execution plan is prepared based on dependencies existing between the objects to be processed.
The following parameters provide opportunities for configuring the parallel capabilities of AS 2005. These are:
- CoordinatorExecutionMode: This is a server-wide configuration value (specified in the msmdsrv.ini file) that specifies the maximum number of concurrent jobs in a single processing request. The server administrator can use this to throttle parallelism. The default is zero, i.e., unlimited parallelism.
- MaxParallel: This is an attribute in the <Parallel> command. It specifies the maximum number of concurrent jobs in the request. It does not override CoordinatorExecutionMode. It can only further throttle the parallelism.
- CPUs: Depending on the number of CPUs, the Analysis server can artificially throttle the parallelism to avoid unnecessary context switching.
- Memory Quotas: Every job requests a memory quota that is the minimum and maximum amount of memory it needs. If the memory management system cannot grant it the minimum requested memory, then it blocks the job until the memory is available.
The first half of the paper discussed the business problem and the implications on the relational database and cube design. The business requirements were explained and the implications of these requirements on the design were also illustrated. The second half of the paper spoke of the problems faced with sequential processing of partitions in the large volume cubes and how the processing times were brought down by 25-28% by the use of Parallel Processing Utility. The final part of the paper spoke of parallel processing in Analysis Services 2005. The expected benefits from PPU in Analysis Services can be matched by parallel processing in Analysis Services 2005 and there is much better control on the process at server side level unlike in PPU with Analysis Services 2000. Some general observations about optimum cube processing performance in large data warehouse/ data mart implementations
- In large data mart/ data warehouse to reduce processing time it is recommend to use a multiprocessor server. Multiprocessor servers enable cubes to be processed in parallel and thereby reduce the time needed to process the cubes.
- Use of Parallel Processing Utility to parallelize processing of cubes, partitions and dimensions as applicable. This will utilize the existing hardware and reduce the processing time as compared to sequential processing.
- It is however important to understand that just increasing the number of processors will not increase the performance linearly. It is important to understand that during parallel processing memory also plays a significant role. For instance if the Process Buffer Size is set to 600 MB and 3 partitions are to be processed in parallel the RAM in the server needs to be 1800 MB (600 * 3).
- It is recommended that large data mart implementation use 64 bit SQL Server. The benefits are manifold – the memory allocation for Analysis Services can be increased beyond 3 GB. In addition with 64 bit SQL Server the benefits are pronounced where large cubes need to be processed ion short time windows. The 64 bit Analysis Services would have more memory and hence while processing partitions the use of temporary files is minimized thereby reducing the processing times. Temporary file usage can be monitored by looking at the performance counters Temp files bytes written/sec and Temp file rows written /sec under the Analysis Services: Proc Aggs object.
- It is recommended that the aggregation levels do not exceed 20% for large data volume implementations. The increase in aggregation percentage implies that the storage needs for the cubes would go up and also the processing time.
- It is recommended that MOLAP storage mode is used for optimum cube processing and query performance. However for large dimension scenarios ROLAP is the more optimal mode of storage.
- Migrating to SQL Server 2005 ensures that parallel processing of cubes is in built in Analysis Services and there is no need to process cubes in parallel using external utilities like the Parallel Processing Utility.