Slowly Changing Dimensions in SQL Server 2005
The fixed attributes option is to set fail in case errors are detected in a fixed attribute. If you set this option to true, the entire SCD transformation will fail.
The changing attributes option is used in Type 2 SCD’s hence we will discuss about this in next article.
After doing all this and clicking the finish button you will see the final package that looks like:
You can see that two paths are created, one for the insert (New Output path) and the other for the update (Changing Attribute Updates Output path). Apart from these default paths we can use the Unchanged output path if we need to monitor unchanged records.
Not only can you add new paths, you can also change the existing path. For example, in case you need to add new columns, you can introduce Derive Column component.
The problem with the SCD wizard is after configuring everything, if you redo the wizard again, all your modifications will be removed and you will need to add them again.
Let us look at these data flows tasks which are created by the SCD wizard.
First we will look at the Insert Destination control.
If you double click the control you will be taken to the following screen.
You can see that the insert data flow task is a standard OLE DB Destination.
If you double click the OLE DB command data flow task, you will see that it is an OLE DB command data flow task with the following update statement:
UPDATE [datawarehouse].[DimCustomer] SET [CategoryName] = ?,[MaterialStatus] = ?,[RegionName] = ? WHERE [CustomerID] = ?
In the case of a large dimension, it will take more resources for SCD’s. Hence you need to ensure that a SCD is the correct solution for your problem. I have seen many places where SCD’s have been used but there are only inserts performed. In this situation, rather than using a SCD, you can use a Lookup data flow task and include the OLE DB Destination at the failure path of the lookup.
Another important improvement you can do is by introducing an Index to the business key. The business key is used to identify whether the record needs to be updated or to be inserted. Also, in the case of an update it uses the WHERE clause to update the table.
I have come across three issues with the SCD data flow task in SSIS.
The first issue I have already specified. That is when you modify the SCD configuration, after performing the modification you will lose the changes you have done to the output paths of the SCD. However, there is small chance of doing modification to your mappings. You can do this by right clicking the SCD data flow task and selecting the Show Advanced Editor option. However, this change will not reflect in the insert and update path. Hence you have to do those changes manually.
Another requirement is for missing data. For example, if a customer is deleted from the staging (Meaning, we have a customer record in the data warehouse area but not in staging area), there is no way of tracking it from the SCD. Theoretically, we are not deleting data warehouse data, but at least we may need this information to update a flag saying that specific customer is deleted.
Another issue is the fixed attribute change type. If you select the option failed the transformation if changes are detected in a fixed attributes, the entire package will fail. Instead there should be a way to get those changes so that they can be logged to different table for further analysing and not affecting the other records.
We have discussed Type1 SCD in this article and hopefully this has given you an introduction to the SCD. If you need the sample package and the sample table schema and data queries, you can email me at email@example.com.
In the next article I will introduce Type 2 SCD’s and inferred data updates.