A Journey from OLTP to OLAP- Part 2

In my last  article we saw the basics of the data warehousing in which we saw looked at how OLAP and OLTP technologies differ in their functionalities, what were the models being used to develop the data warehouse and what were the schemas under consideration for designing the data warehouse.

In this article I will be focusing  on implementing those models and principles for designing the data warehouse.

Implementing a Data Warehouse

I am going to use the Kimball approach as it is simple and widely used. Breaking a data warehouse into smaller data marts and implementing them one at a time gives a systematic development approach to implement the data warehouse. Also I am going to implement the star schema for  designing the data warehouse as it is the simplest design model available.

Scenario

I have the Adventure works OLTP database  which needs to be remodelled in order to create a data warehouse from it. I will use the sales tables to create our data warehouse. The tables under consideration are as follows

  • Contact
  • Salesperson
  • SalesTerritory
  • SalesOrderHeader
  • SalesOrderDetail
  • Product
  • ProductCategory
  • ProductSubcategory

These tables are the master and transaction tables which can be used as dimensions and facts tables directly for this example but a lot more consideration should be given  when building the data warehouse to meet customer requirements.

I have identified the dimensions and have prefixed them with Dim and the Facts are prefixed with Facts as per the standard Microsoft naming convention :

The short description about the above tables is as follows:

DimContact

This table holds all the records for the names of the sales person and other demographic details about the sales persons.

DimSalesTerritory

This table consists of the territory wise  sales related  targets of the particular region

DimProduct

This is Product master table with listing of all the products.

DimProductCategory

This table consists of all the product category.

DimProductSubcategory

 This table consists of all the product Subcategory.

FactSalesPerson

This table consists of the sales person id and sales related individual targets of the sales person.

FactSalesOrderHeader

This table  consists of the details of the sales order transactions such as the salesperson id,subtotal,etc.

FactSalesOrderDetails

This table consists of the shipment details with the tracking id and order id along with the per unit price.

As discussed in the previous article we can clearly see the dimension for our data warehouse which  can be identified  as the entities which are used to analyze the measures. Therefore looking at the above listed tables we can easily understand that to analyze the measures productwise ,territorywise and sales personwise we would need the product, territory and contact tables as dimensions.

The bus matrix can be used to defined the relationship between the dimensions and facts

Dimensions

/Facts

Sub

Total

Tax

Amt

Freight

Total

Due

Sales

YTD

Sales

Last

Year

Unit

Price

Unit

Price

Discount

Line Total

Product

X

X

X

Territory

X

X

X

X

X

X

Sales Person

X

X

X

X

X

X

As you can see from the above bus matrix, the application of various dimensions across various facts and how they can be analyzed. Although the product dimension is applicable for only last three  measures it can still be applicable to the other  facts by creating the views and denormalising the data. Hence all the required fields can be brought under one roof .

The following  diagram illustrates this :.

Description: s1.jpg

Thus my fact table has been created using the query as there are multiple relationships between the tables and the Adventure Works table schema is highly normalized for the faster insert, updates and delete operations.

Therefore I have taken only the product id in my fact table view as there is already an foreign key relationship for the product subcategory defined in the product and Product category relationship defined in the product subcategory. The SSAS engine will take care of this automatically when the hierarchy is created for the Products and all the facts will be calculated accordingly when the slicing and dicing the data.

Now a revised structure of the bus matrix after the creation of the view would be something like this :

Dimensions

/Facts

Sub

Total

Tax

Amt

Freight

Total

Due

Sales

YTD

Sales

Last

Year

Unit

Price

Unit

Price

Discount

Line Total

Product

X

X

X

X

X

X

X

X

X

Territory

X

X

X

X

X

X

X

X

X

Sales Person

X

X

X

X

X

X

X

X

X

In this case the creation of the view sufficed the creation of the fact table but in the reality one would definitely have to consider  the cleaning and processing of the data for invalid entries or for the entries which no longer exist in the system and are discontinued.

Creation of Cube

Use the following steps to create the  cube out of the basic DW which we just created in the above.

  1. Open Visual Studio and click on the Analysis Services Project and provide a name for the same and click OK.

  2. Connect to the data source by right clicking on the data source and specify the server name and the catalog name. Click Test Connection and then click OK

  3. Add a data source view by clicking all the relevant dimension tables and the view which we are going to treat as the fact tables

  4. Provide a relevant name for the DSV and click Finish

  5. To format the DSV layout Select the Format menu->Auto Layout->Diagram

  6. Similar to the views in a SQL Server database we can create the Named queries in the SSAS DSV’s along with the named calculations. You can also set the logical primary keys to a field or give a friendly name to the table in the DSV
  7. The final formatted outcome of the DSV after creating the relationship between the tables is as below:

     

  8. Right Click on the cube and select new cube. A wizard opens which guides the developer in selecting the measure group tables first and then the dimensions. But that’s not all you can always modify the cube as per your wish anytime

  9. The final cube will look as below:

You can do a lot of things with the cube ranging from adding the custom calculations to defining the perspective and actions on the measures and dimensions which will open and URL for additional information pertaining to the data. Also define the KPI’s, partition the cube and define the aggregation for the better performance of the cube.

Note that the action tab and KPI’s can be browsed in the cube browser by the developers but they are also accessible to the users through the third party tools and Excel.

For adding attributes to the dimensions double click on the dimension and either you can drag the attribute to the dimension or you can right and say new attribute which will add new attribute to the dimension.

For creation of the hierarchy you can just drag and drop the attributes to the hierarchy pane. Be careful while adding the attributes as the order of the attributes is also important.

]]>

Leave a comment

Your email address will not be published.