SQL Server Performance

  • Home
  • Articles
  • Forums
  • Tips
  • Training
  • FAQ's
  • Blogs
  • Software
  • Books
  • About Us
RSS Feeds
Sign in | Join


Article Topics

All Articles
Performance Tuning
Audit
Business Intelligence
Clustering
Reporting Services
SQL Azure
Developer
General DBA
ASP.NET / ADO.NET
SQL Azure

USEFUL SITES :

ASP.NET Tutorials
Windows and SQL Azure Tutorials
Cloud Hosting Magazine
SharePoint Tutorials
Windows Server Help

Write for Us

Share your SQL Server knowledge with others and raise your profile in the community More...
Latest Articles

A High Level Comparison Between Oracle and SQL Server - Part ...
A High Level Comparison Between Oracle and SQL Server - Part ...
A High Level Comparison Between Oracle and SQL Server - Part ...
A High Level Comparison Between Oracle and SQL Server

More     
 
Latest FAQ's

Add Node to A SQL Server failover Cluster failed with invalid ...
SQL Server Destination remote server error
Setting Up Data And Log Files For SQL Server
Will Check Constraints Improve Database Performance?

More     
   
Latest Software Reviews

dbForge Review
Spotlight on ApexSQL Diff - Server-based database comparison tool ...
Spotlight on ApexSQL Data Diff - Server-based database comparison tool ...
Spotlight on ApexSQL Doc 2008

More     

articles >> business intelligence >> CDC and Data Warehouse

CDC and Data Warehouse

By : Dinesh Asanka
Oct 27, 2008

Introduction
In a data warehouse, the most challenging thus interesting part is the ETL (Extraction, Transform and Loading) process. The challenge comes as you have to work on different databases which were not designed by you.
Most of the time, you need to update your OLAP system from the data changes in the OLTP environment. You cannot truncate the OLAP table, because truncating will results two issues.

 

  1. Truncate & reloading will consume lot of time if it is a huge table.
  2. Most OLAP designs have surrogate keys which have identity columns. Truncate them and reloading will change the surrogate keys. If surrogate keys got changed, you need to change all the relevant fact table SKs. This will be tedious task.

Need for CDC
Change Data Capture aka CDC is a set of software design patterns to enable a user to track the data that has changed in a database so that actions can be taken using the changed data. In previous versions of SQL Server there was no straight forward way to capture the CDC.

To solve this developers had to adopt triggers to capture these changes. Triggers can add some overhead to the database system.

You can use third party tools to read the SQL Server log to capture data. SQL Server 2008 has a new feature called Change Data Capture which can be used easily to capture incremental data changes.  As you are aware, CDC will collect all the data changes in a table. Also, there is a feature called net changes in CDC which is tailor made for data warehousing implementation. We will talk about net changes feature in short while.

Net Change Feature
Let me explain this by an example for one record:

The above table shows how record ID =1 has changed over time. Rec # 1 shows the insert value while Rec # 2 shows an update on the Date of Join field. Similarly, there is another update at rec # for location. If you consider, all three operation, the net change is following record:



When it comes to type 1 Slowly changing dimensions (SCD), you need a net change record which is in the second table. If you don’t have this record you need to derive it from three records which are shown in the first table. However, you need to execute them in exact order which they occurred.

With SSIS, this is bit difficult. Whenever you introduce the splitting control for insert, update and delete operations it will run in different threads. This means that update operations will occur before the insert thus the update operation fails or there is no effect, as there are no records at the time of update.

In CDC, there is an option to get the net changes records. However, you have to do this at the time of configuring the CDC. Also, you need to have a primary or a business key (in the world of data warehousing) to enable net changes.


    Next Page>>    








C# Help and Tutorials | PHP MySQL Tutorial | Sharepoint Tutorial | Azure Tutorial | Cloud Hosting Magazine | ASP.NET Tutorials | Windows Server Help | Windows Phone Pro | Silverlight Ace | Visual Studio Tutorials | Home | Peformance Articles | Audit Articles | Business Intelligence Articles | Clustering Articles | Developer Articles | Reporting Services Articles | DBA Articles | ASP.NET / ADO.NET Articles | SQL Server Training Videos | DBA FAQ's | Developer Peformance FAQ's | DBA Peformance FAQ's | Developer FAQ's | Clustering FAQ's | Error Messages | Audit Tool Reviews | Backup Tool Reviews | Coding Tool Reviews | Compare Tool Reviews | Documentation Tool Reviews | Design Tool Reviews | Monitoring Tool Reviews | Log Tool Reviews | Reporting Tool Reviews | Clustering Tool Reviews | Security Tool Reviews | Change Management Tool Reviews | Remote Access Tool Reviews | Book Reviews | Security Tool Reviews | ADO.NET / ASP.NET | Administration | Analysis/OLAP Services | Application Development | Configuration | Components | ETL | Hardware | High Availability | Hints | Index | Misc | Operating Systems | Performance Tuning | Replication | T-SQL | Views


              © 2010 Jude O'Kelly. All rights reserved