In my last article we saw the basics of the data warehousing in which we saw looked at how OLAP and OLTP technologies differ in their functionalities, what were the models being used to develop the data warehouse and what were the schemas under consideration for designing the data warehouse.
In this article I will be focusing on implementing those models and principles for designing the data warehouse.
Implementing a Data Warehouse
I am going to use the Kimball approach as it is simple and widely used. Breaking a data warehouse into smaller data marts and implementing them one at a time gives a systematic development approach to implement the data warehouse. Also I am going to implement the star schema for designing the data warehouse as it is the simplest design model available.
Scenario
I have the Adventure works OLTP database which needs to be remodelled in order to create a data warehouse from it. I will use the sales tables to create our data warehouse. The tables under consideration are as follows
- Contact
- Salesperson
- SalesTerritory
- SalesOrderHeader
- SalesOrderDetail
- Product
- ProductCategory
- ProductSubcategory
These tables are the master and transaction tables which can be used as dimensions and facts tables directly for this example but a lot more consideration should be given when building the data warehouse to meet customer requirements.
I have identified the dimensions and have prefixed them with Dim and the Facts are prefixed with Facts as per the standard Microsoft naming convention :
The short description about the above tables is as follows:
DimContact
This table holds all the records for the names of the sales person and other demographic details about the sales persons.
DimSalesTerritory
This table consists of the territory wise sales related targets of the particular region
DimProduct
This is Product master table with listing of all the products.
DimProductCategory
This table consists of all the product category.
DimProductSubcategory
This table consists of all the product Subcategory.
FactSalesPerson
This table consists of the sales person id and sales related individual targets of the sales person.
FactSalesOrderHeader
This table consists of the details of the sales order transactions such as the salesperson id,subtotal,etc.
FactSalesOrderDetails
This table consists of the shipment details with the tracking id and order id along with the per unit price.
As discussed in the previous article we can clearly see the dimension for our data warehouse which can be identified as the entities which are used to analyze the measures. Therefore looking at the above listed tables we can easily understand that to analyze the measures productwise ,territorywise and sales personwise we would need the product, territory and contact tables as dimensions.
The bus matrix can be used to defined the relationship between the dimensions and facts
Dimensions /Facts |
Sub Total |
Tax Amt |
Freight |
Total Due |
Sales YTD |
Sales Last Year |
Unit Price |
Unit Price Discount |
Line Total |
Product |
X |
X |
X |
||||||
Territory |
X |
X |
X |
X |
X |
X |
|||
Sales Person |
X |
X |
X |
X |
X |
X |
As you can see from the above bus matrix, the application of various dimensions across various facts and how they can be analyzed. Although the product dimension is applicable for only last three measures it can still be applicable to the other facts by creating the views and denormalising the data. Hence all the required fields can be brought under one roof .
The following diagram illustrates this :.
Thus my fact table has been created using the query as there are multiple relationships between the tables and the Adventure Works table schema is highly normalized for the faster insert, updates and delete operations.
Therefore I have taken only the product id in my fact table view as there is already an foreign key relationship for the product subcategory defined in the product and Product category relationship defined in the product subcategory. The SSAS engine will take care of this automatically when the hierarchy is created for the Products and all the facts will be calculated accordingly when the slicing and dicing the data.
Now a revised structure of the bus matrix after the creation of the view would be something like this :
Dimensions /Facts |
Sub Total |
Tax Amt |
Freight |
Total Due |
Sales YTD |
Sales Last Year |
Unit Price |
Unit Price Discount |
Line Total |
Product |
X |
X |
X |
X |
X |
X |
X |
X |
X |
Territory |
X |
X |
X |
X |
X |
X |
X |
X |
X |
Sales Person |
X |
X |
X |
X |
X |
X |
X |
X |
X |
In this case the creation of the view sufficed the creation of the fact table but in the reality one would definitely have to consider the cleaning and processing of the data for invalid entries or for the entries which no longer exist in the system and are discontinued.
Creation of Cube
Use the following steps to create the cube out of the basic DW which we just created in the above.
- Open Visual Studio and click on the Analysis
Services Project and provide a name for the same and click OK.
- Connect to the data source by right clicking on
the data source and specify the server name and the catalog name. Click
Test Connection and then click OK
- Add a data source view by clicking all the
relevant dimension tables and the view which we are going to treat as the fact
tables
-
Provide a relevant name for the DSV and click
Finish
- To format the DSV layout Select the Format
menu->Auto Layout->Diagram
- Similar to the views in a SQL Server database we can create the Named queries in the SSAS DSV’s along with the named calculations. You can also set the logical primary keys to a field or give a friendly name to the table in the DSV
- The final formatted outcome of the DSV after
creating the relationship between the tables is as below:
- Right Click on the cube and select new cube.
A wizard opens which guides the developer in selecting the measure group tables
first and then the dimensions. But that’s not all you can always modify the
cube as per your wish anytime
-
The final cube will look as below:
You can do a lot of things with the cube ranging from adding the custom calculations to defining the perspective and actions on the measures and dimensions which will open and URL for additional information pertaining to the data. Also define the KPI’s, partition the cube and define the aggregation for the better performance of the cube.
Note that the action tab and KPI’s can be browsed in the cube browser by the developers but they are also accessible to the users through the third party tools and Excel.
For adding attributes to the dimensions double click on the dimension and either you can drag the attribute to the dimension or you can right and say new attribute which will add new attribute to the dimension.
For creation of the hierarchy you can just drag and drop the attributes
to the hierarchy pane. Be careful while adding the attributes as the order of
the attributes is also important.