SQL Server 2008 - Worth the Wait
Executive Summary
The realization of the importance of data warehouses coupled with ever decreasing price of hardware and advanced computing options has resulted in datawarehouses of thousands of gigabytes over the last few years. This in turn has paved way for OLAP solutions that store detail and aggregate data for optimally answering complex business queries related to multi-dimensional data. However, with increasing data volumes and complexity of business requirements, OLAP solutions are also challenged to meet the processing and reporting SLA’s defined by business. Parallel Processing of OLAP cubes is a very powerful option for meeting such performance and timeline demands. This white paper is an initiative to understand the benefits Parallel Processing Utility provides in processing large Analysis Services 2000 cubes in high volume business scenarios. The paper also explains how improvements in Analysis Services 2005 ensure that Parallel Processing is inbuilt in the upgraded version and there is no requirement for Parallel Processing Utility in 2005.
Introduction
This white paper explains the benefits derived by use of Parallel Processing utility particularly where large volumes of data need to be processed as part of the overnight batch process. It explains the underlying data processing and storage needs in large businesses across domains with the implications of impact on overnight batch loads. The complex data processing requirements in terms of data volumes and business logic have implications on the OLAP and relational database design and in the use of optimal processing methods whereby over night batch processes can meet the SLA(service level agreement) and also address the business needs for large complex data sets. This white paper addresses how Parallel Processing Utility in Analysis Services plays a stellar role in addressing such requirements.This white paper applies to SQL Server 2000 SP4 and assumes that the reader has a basic understanding of online analytical processing (OLAP) and Analysis Services 2000 in particular. This white paper is divided into 2 parts – The first part talks of the business problem, the data storage needs and batch process requirements, the implications of these requirements on the OLAP design.The second half of the paper deals with the performance problems faced with sequential processing and the performance benefits derived by the use of Parallel Processing Utility(PPU) and the optimal processing of partitions in parallel with a given set of hardware and some recommendations on the use of Parallel Processing Utility.The available hardware environment is also explained to give the readers an understanding of the test environment and the metrics mentioned in the paper are based on the given test environment. The metrics collected are based on a common set of business requirements across domains like Retail and Manufacturing and provide real life scenarios that can be addressed using the given technology suite.The last part of the paper explains how these problems are overcome in Analysis Services 2005 and how parallel processing is in built in the software.
Business Requirements and their Implications on storage and processing
A common business requirement across domains is to store historic information to aid managers to analyze business trends based on historic data. For instance sales volume information over the years was used by managers to understand the seasonal fluctuation of sales and analyze the impact on sales of promotions. Also waste related information is key to understand vendor performance trends over time and in ranking vendors. Similarly raw material spends over time for a manufacturer helps in understanding cost price trends over time and the impact of finished goods costs.
In case of backdated updates the entire partition would need to be processed thereby increasing the batch load time.
The obvious benefits of partitioning being the following
Considering the data volume of 900+ million rows in our case, the transaction related facts need to be partitioned by month. The idea behind monthly partitions being not to have very large number of partitions as well ensure that number of rows in a single partition were not too large.
Cube Design Considerations for handling large data volumes:
Dimension Storage and Processing – The dimension design has a direct impact on processing performance. The design guidelines to be followed are as follows:
Cube Design – Cubes are designed based on shared dimensions and MOLAP storage. Large cubes related to sales data in manufacturing and retail or call information in telecomm industry is usually partitioned by month to ensure optimal batch load processing and query performance.
Some basic design guidelines that have to be followed while designing the cubes in Analysis Services are: