SQL Server Analysis Services ( SSAS ) – Introduction
Microsoft SQL Server 2005 Analysis Services (SSAS) delivers both online analytical processing (OLAP) and data mining functionality for business intelligence applications. Analysis Services supports OLAP by letting the user design, create, and manage multidimensional structures that contain data aggregated from other data sources, such as relational databases. For data mining applications, Analysis Services lets the user design, create, and visualize data mining models, constructed from other data sources by using a variety of industry-standard data mining algorithms.
Online analytical processing (OLAP) allows the user to access aggregated and organized data from business data sources, such as data warehouses, in a multidimensional structure called a cube. Microsoft provides tools and features for OLAP that user can use to design, deploy, and maintain cubes and other supporting objects.
Cubes in a data warehouse are stored in three different modes. A relational storage model is called Relational Online Analytical Processing mode or ROLAP, while a Multidimensional Online Analytical processing mode is called MOLAP. When dimensions are stored in a combination of the two modes then it is known as Hybrid Online Analytical Processing mode or HOLAP.
MOLAP (multidimensional online analytical processing) is online analytical processing (OLAP) that indexes directly into a multidimensional database. In general, an OLAP application treats data multidimensionally; the user is able to view different aspects or facets of data aggregates such as sales by time, geography, and product model. If the data is stored in a relational data base, it can be viewed multidimensionally, but only by successively accessing and processing a table for each dimension or aspect of a data aggregate. MOLAP processes data that is already stored in a multidimensional array in which all possible combinations of data are reflected, each in a cell that can be accessed directly. For this reason, MOLAP is, for most uses, faster and more user-responsive than relational online analytical processing (ROLAP), the main alternative to MOLAP.
Relational online analytical processing (ROLAP) is a form of online analytical processing (OLAP) that performs dynamic multidimensional analysis of data stored in a relational database rather than in a multidimensional database (which is usually considered the OLAP standard).
Data processing may take place within the database system, a mid-tier server, or the client. In a two-tiered architecture, the user submits a Structure Query Language (SQL) query to the database and receives back the requested data. In a three-tiered architecture, the user submits a request for multidimensional analysis and the ROLAP engine converts the request to SQL for submission to the database. Then the operation is performed in reverse: the engine converts the resulting data from SQL to a multidimensional format before it is returned to the client for viewing. Since ROLAP uses a relational database, it requires more processing time and/or disk space to perform some of the tasks that multidimensional databases are designed for. However, ROLAP supports larger user groups and greater amounts of data and is often used when these capacities are crucial, such as in a large and complex department of an enterprise.
Hybrid online analytical processing (HOLAP) is a combination of relational OLAP (ROLAP) and multidimensional OLAP (usually referred to simply as OLAP). HOLAP was developed to combine the greater data capacity of ROLAP with the superior processing capability of OLAP.
HOLAP can use varying combinations of ROLAP and OLAP technology. Typically it stores data in a both a relational database (RDB) and a multidimensional database (MDDB) and uses whichever one is best suited to the type of processing desired. The databases are used to store data in the most functional way. For data-heavy processing, the data is more efficiently stored in a RDB.For speculative processing, the data is more effectively stored in an MDDB.
HOLAP users can choose to store the results of queries to the MDDB to save the effort of looking for the same data over and over which saves time. Although this technique – called “materializing cells” – improves performance, it takes a toll on storage. The user has to strike a balance between performance and storage demand to get the most out of HOLAP. Nevertheless, because it offers the best features of both OLAP and ROLAP, HOLAP is increasingly preferred.
Data mining gives the user access to the information that is needed to make intelligent decisions about difficult business problems. Microsoft provides tools for data mining with which user can identify rules and patterns in the data, so that the user can determine why things happen and predict what will happen in the future. While creating a data mining solution in Analysis Services, user first creates a model that describes the business problem, and then user runs the data through an algorithm that generates a mathematical model of the data.User can then either visually explore the mining model or create prediction queries against it. Analysis Services can use datasets from both relational and OLAP databases, and includes a variety of algorithms that can be used to investigate that data.
Analysis Services Architecture
SSAS use server and client components to supply OLAP and data mining.
The server component of SSAS is the msmdsrv.exe application, which runs as a Windows service. SSAS supports multiple instances on the same computer, with each instance of Analysis Services implemented as a separate instance of the Windows service. The msmdsrv.exe application consists of security components, an XML for Analysis (XMLA) listener component, a query processor component and numerous other internal components that perform the following functions:
- Parsing statements received from clients
- Managing metadata
- Handling transactions
- Processing calculations
- Storing dimension and cell data
- Creating aggregations
- Scheduling queries
- Caching objects
- Managing server resources
The XMLA listener component handles all XMLA communications between Analysis Services and its clients. The Analysis Services Port configuration setting in the msmdsrv.ini file can be used to specify a port on which an Analysis Services instance listens. A value of 0 in this file indicates that Analysis Services listen on the default port. The default instance of Analysis Services listens on TCP/IP port 2383, but named instances of Analysis Services do not use a default port. Each named instance listens either on the port that an Administrator specifies, or on the port that is dynamically assigned at startup. This variability among ports means that clients do not automatically know which port a particular named instance of Analysis Services is using, and therefore do not automatically know where to send their requests. To make it easy for clients to send requests to named instances of Analysis Services, SQL Server has a service called SQL Server Browser. SQL Server Browser keeps track of the ports on which each named instance listens. Client connection requests for a named instance that do not specify a port number are directed to port 2382, the port on which SQL Server Browser listens. SQL Server Browser then redirects the request to the port that the named instance uses.
Client components communicate with Analysis Services using the public standard XML for Analysis (XMLA), a SOAP-based protocol for issuing commands and receiving responses, exposed as a Web service. Client object models are also provided over XMLA, and can be accessed either by using a managed provider, such as ADOMD.NET, or a native OLE DB provider.
Several different providers are provided with Analysis Services to support different programming languages. A provider communicates with an Analysis Services server by sending and receiving XML for Analysis in SOAP packets over TCP/IP or over HTTP through Internet Information Services (IIS). An HTTP connection uses a COM object instantiated by IIS, called a data pump, which acts as a conduit for Analysis Services data.
Win32 client applications can connect to an Analysis Services server using OLE DB for OLAP interfaces or the ActiveX Data Objects (ADO) object model for Component Object Model (COM) automation languages, such as Visual Basic. Applications coded with .NET languages can connect to an Analysis Services server using ADO MD.NET.
Query commands can be issued using the following languages:
2) Multidimensional Expressions (MDX), an industry standard query language for analysis
3) Data Mining Extensions (DMX), an industry standard query language for data mining.
4) Analysis Services Scripting Language (ASSL) can also be used to manage Analysis Services database objects.
Unified Dimension Model
Analysis Services combines the best aspects of traditional OLAP-based analysis and relational-based reporting by enabling developers to define a single data model, called a Unified Dimensional Model (UDM) over one or more physical data sources. All end user queries from OLAP, reporting, and custom BI applications access the data in the underlying data sources through the UDM, which provides a single business view of this relational data. The UDM is a central place that serves as the single version of truth for all reports, spreadsheets, OLAP browsers, KPIs, and analytical applications.
If the user wants to retrieve information directly from a data source, faces a lot of problems. It can be possible that the content of data sources is hard to understand. Information might be distributed across multiple heterogeneous databases .This makes the access to data more difficult because the user has to understand the SQL used. There are queries that support business decision-making. These queries involve increased data volumes. The time that is required to retrieve such summary values for interactive end-user analysis can be prohibitive. Here’s where the role of UDM comes into play.UDM acts as a bridge between the user and data sources. A UDM is constructed over one or more physical data sources. The user issues queries against the UDM using a variety of client tools, such as Microsoft Excel.
UDM offers the following advantages:
- Simpler Data model
- Client is isolated from the heterogeneous backend data sources.
- Provides high performance in case of queries that support interactive analysis, even over large data volumes. This is achieved by caching.
- UDM helps to capture the business rules in the model
The UDM is defined over one or mode data sources. These data sources can be of a variety of types, including:
- Relational databases;
- Web Services;