A Journey from OLTP to OLAP- Part 2

In my last  article
we saw the basics of the data warehousing in which we saw looked at how OLAP and OLTP
technologies differ in their functionalities, what were the models
being used to develop the data warehouse and what were the schemas under
consideration for designing the data warehouse.

In this article
I will be focusing  on implementing those models and principles for designing
the data warehouse.

Implementing a Data
Warehouse

I am going to
use the Kimball approach as it is simple and widely used. Breaking a data
warehouse into smaller data marts and implementing them one at a time gives a
systematic development approach to implement the data warehouse. Also I am
going to implement the star schema for  designing the data warehouse as it is
the simplest design model available.

Scenario

I have the Adventure
works OLTP database  which needs to be remodelled in order to create a data
warehouse from it. I will use the sales tables to create our data warehouse.
The tables under consideration are as follows

  • Contact
  • Salesperson
  • SalesTerritory
  • SalesOrderHeader
  • SalesOrderDetail
  • Product
  • ProductCategory
  • ProductSubcategory

These tables are
the master and transaction tables which can be used as dimensions and facts
tables directly for this example but a lot more consideration should be given
 when building the data warehouse to meet customer requirements.

I have
identified the dimensions and have prefixed them with Dim and the Facts are prefixed
with Facts as per the standard Microsoft naming convention :

The short
description about the above tables is as follows:

DimContact

This table holds all the records for the
names of the sales person and other demographic details about the sales
persons.

DimSalesTerritory

This table consists of the territory wise 
sales related  targets of the particular region

DimProduct

This is Product master table with listing
of all the products.

DimProductCategory

This table consists of all the product
category.

DimProductSubcategory

 This table consists of all the product
Subcategory.

FactSalesPerson

This table consists of the sales person id
and sales related individual targets of the sales person.

FactSalesOrderHeader

This table  consists of the details of the
sales order transactions such as the salesperson id,subtotal,etc.

FactSalesOrderDetails

This table consists of the shipment details
with the tracking id and order id along with the per unit price.

As discussed in the previous article we can
clearly see the dimension for our data warehouse which  can be identified  as
the entities which are used to analyze the measures. Therefore looking at
the above listed tables we can easily understand that to analyze the measures
productwise ,territorywise and sales personwise we would need the product,
territory and contact tables as dimensions.

The bus matrix can be used to defined the
relationship between the dimensions and facts

Dimensions

/Facts

Sub

Total

Tax

Amt

Freight

Total

Due

Sales

YTD

Sales

Last

Year

Unit

Price

Unit

Price

Discount

Line Total

Product

X

X

X

Territory

X

X

X

X

X

X

Sales Person

X

X

X

X

X

X

As you can see from the above bus matrix,
the application of various dimensions across various facts and how they can be
analyzed. Although the product dimension is applicable for only last
three  measures it can still be applicable to the other  facts by creating the
views and denormalising the data. Hence all the required fields can be brought
under one roof .

The following  diagram illustrates this :.

Description: s1.jpg

Thus my fact table has been created using
the query as there are multiple relationships between the tables and the Adventure
Works table schema is highly normalized for the faster
insert, updates and delete operations.

Therefore I have taken only the
product id in my fact table view as there is already an foreign key
relationship for the product subcategory defined in the product and Product
category relationship defined in the product subcategory. The SSAS engine will
take care of this automatically when the hierarchy is created for the Products
and all the facts will be calculated accordingly when the slicing and dicing
the data.

Now a revised structure of the bus matrix
after the creation of the view would be something like this :

Dimensions

/Facts

Sub

Total

Tax

Amt

Freight

Total

Due

Sales

YTD

Sales

Last

Year

Unit

Price

Unit

Price

Discount

Line Total

Product

X

X

X

X

X

X

X

X

X

Territory

X

X

X

X

X

X

X

X

X

Sales Person

X

X

X

X

X

X

X

X

X

In this case the creation of the view
sufficed the creation of the fact table but in the reality one would definitely
have to consider  the cleaning and processing of the data for invalid entries or for the entries which no longer exist in the system and
are discontinued.

Creation of Cube

Use the following steps to create the  cube
out of the basic DW which we just created in the above.

  1. Open Visual Studio and click on the Analysis
    Services Project and provide a name for the same and click OK.

  2. Connect to the data source by right clicking on
    the data source and specify the server name and the catalog name. Click
    Test Connection and then click OK

  3. Add a data source view by clicking all the
    relevant dimension tables and the view which we are going to treat as the fact
    tables

  4. Provide a relevant name for the DSV and click
    Finish

  5. To format the DSV layout Select the Format
    menu->Auto Layout->Diagram

  6. Similar to the views in a SQL Server database we can
    create the Named queries in the SSAS DSV’s along with the named calculations.
    You can also set the logical primary keys to a field or give a friendly name to
    the table in the DSV
  7. The final formatted outcome of the DSV after
    creating the relationship between the tables is as below:

     

  8. Right Click on the cube and select new cube.
    A wizard opens which guides the developer in selecting the measure group tables
    first and then the dimensions. But that’s not all you can always modify the
    cube as per your wish anytime

  9. The final cube will look as below:

You can do a lot of things with the cube
ranging from adding the custom calculations to defining the perspective and
actions on the measures and dimensions which will open and URL for additional
information pertaining to the data. Also define the KPI’s, partition the cube
and define the aggregation for the better performance of the cube.

Note that the
action tab and KPI’s can be browsed in the cube browser by the developers but
they are also accessible to the users through the third party tools and Excel.

For adding attributes to the dimensions double click on the
dimension and either you can drag the attribute to the dimension or you can
right and say new attribute which will add new attribute to the dimension.

For creation of the hierarchy you can just drag and drop the attributes
to the hierarchy pane. Be careful while adding the attributes as the order of
the attributes is also important.




Array

No comments yet... Be the first to leave a reply!

Software Reviews | Book Reviews | FAQs | Tips | Articles | Performance Tuning | Audit | BI | Clustering | Developer | Reporting | DBA | ASP.NET Ado | Views tips | | Developer FAQs | Replication Tips | OS Tips | Misc Tips | Index Tuning Tips | Hints Tips | High Availability Tips | Hardware Tips | ETL Tips | Components Tips | Configuration Tips | App Dev Tips | OLAP Tips | Admin Tips | Software Reviews | Error | Clustering FAQs | Performance Tuning FAQs | DBA FAQs |