Slowly Changing Dimension in SQL Server 2005 – Part 2

Having discussed type1 slowly changing dimensions (SCD) in Part 1 of this article, this article will talk about further on SCDs and how we can implement them in SQL Server 2005. What is Type2 SCD
A Type 2 SCD writes a record with the new attribute information and preserves a record of the old dimensional data. Implementing type2 changes within a data warehouse environment might require significant analysis and development time. These changes accurately partition history across time more efficiently than other types. However, they also add new records to the data warehouse environment thus, significantly adding to the database size. Let’s take a look at an example: Staging

Customer Code

Customer Name

Region

 53724  Melody  Raleigh
 6705  Dan  Lubbock

Dimension

CustomerSK

Customer Code

CustomerName

Region

 1

 53724  Melody  Raleigh
 2  6705  Dan  Luke

CustomerSK

SalesAmount

Date

 1 1500  01/01/2008
 2  2500  01/01/2008

In the above example, the dimension table is created from the staging table. In the dimension table the surrogate key (CustomerSK) is generated. In the fact table, the surrogate key from the customer dimensions are taken. Let us assume that, after the 1st of January 0f 2008, the Region for the customer Dan is changed to Fresno from Lubbock. If you can remember in the Type 1 SCD, the existing record will be updated to Fresno in the dimension table and after that there is no way that you can get the previous information. However, in the business perspective, if the dimension record is updated the previous data will be read as a new region, which may lead to the wrong business or management decision. Let’s see how this problem can be overcome with the use of Type 2 SCDs. Below is example for Type 2 SCDs. Staging

Customer Code

Customer Name

Region

 53724  Melody  Raleigh
 6705  Dan  Lubbock

Dimension

 CustomerSK

Customer Code

CustomerName

Region

 1

 53724  Melody  Raleigh
  2  6705  Dan  Luke
 3  6705  Dan  Fresno

CustomerSK

SalesAmount

Date

 1 1500  01/01/2008
 2  2500  01/01/2008

In the above example, you can see that a new record is inserted for the change whilst the previous record is left unchanged. For the latest fact record a new customer surrogate key will be used. Because of this, the historical data for region will be maintained and new data will be read as new region. In this type of design, you have to include a new column to identify which is the active or current dimension record. Most of the time, you can include a flag saying whether it is the current or active record. However, the most correct way of doing this is, is to include the start date column and end date column. In the case of the current record the end date column will be null. In the case when you need to reload the entire fact data, keeping the date columns will be helpful. If you have a flag without a date field, you will not be able to find out the fact data that belongs to a customer dimension. 
Continues…

Leave a comment

Your email address will not be published.