Understanding SQL Server Change Data Capture

DBA’s often encounter a requirement to audit DML
activity such as INSERT/UPDATE/DELETE operations executed against a particular
table or a group of tables in a particular database. Change Data Capture (first introduced in SQL Server
2008 R2) can greatly assist this function.

Only members of sysadmin fixed server role can enable the
Change Data Capture feature for the entire database. Before a Change Data
Capture feature is enabled for an individual table, it must be enabled for the
entire database using the sys.sp_cdc_enable_db
system stored procedure.

Before you actually enable a particular database for
CDC, you can examine whether the database is already enabled for CDC by
executing the below T-SQL against the master database of the
instance.

select * from sys.databases

If in the output the value of is_cdc_enables is 1 then the database is enabled for Change Data Capture.

Now I have a database named college and I would like to
enable Change Data Capture feature
execute the below T-SQL against the master database :

use college

exec sys.sp_cdc_enable_db

As soon as CDC is enabled for the database named
College, we can see that a CDC schema is created for the particular
database :

Now let’s enable CDC for the student table which
is present in the college database and belongs to the dbo schema.
This is achieved by executing the below T-SQL against the database named college.

EXEC sys.sp_cdc_enable_table

@source_schema = N'dbo',

@source_name   = N'student',

@role_name     = N'NULL'

GO

As soon as CDC is enabled for a database,
In the system tables of the particular database a new table is created as
shown below:

One important point to note is that
the table named student has just two columns : student_id and student_name
both of which are captured through CDC. This also creates the below jobs on the server:

INSERT OPERATION:

Let’s fire the below INSERT statement against the database
named college.

insert student(student_name)

select 'Satnam'

As soon as the INSERT statement is execxuted, CDC will
automatically capture the Change details and the changed data will be stored in
the cdc.dbo_student_CT table and can be obtained by querying the below table as
follows:

select * from cdc.dbo_student_CT

What is the start_lsn and end_lsn ? Well LSN stands for
Last Sequence Number. Any data that gets stored in the Transactional
Log of the particular database is identified by the LSN Number.  As soon as
CDC was enabled for the table named student in the college database a system
table named cdc.lsn_time_mapping is created in the
database.  Now let us query this using the T-SQL below:

select * from cdc.lsn_time_mapping

If we execute the below queries against the college
database, I get the output below:

select * from cdc.dbo_student_CT

select * from cdc.lsn_time_mapping

Here we can see that the INSERT statement was executed against the database on
25-December-2012 at 0706 hrs.

UPDATE OPERATION

Now let’s execute the below UPDATE Statement against the
student table in the college database.

update student

set student_name='Satnam Singh'

where student_name='Satnam' 

Now let us execute the below 3 T-SQL statements against the
college database.

select * from student

select * from cdc.dbo_student_CT

select * from cdc.lsn_time_mapping

As seen from the above screen capture from the
system table named cdc.lsn_time_mapping we can determine what
exactly the time was when the UPDATE query was executed. Now if you clearly see
the above screen capture, In the lower screen capture which is the
output from the table named cdc.dbo_student_CT there are 2 columns named _$operation and
_$update_mask.

Whenever data is updated the CDC table will hold two
entries – the original value and the updated one as you can see in
the above screen capture. _$operation with a value of 3 indicates the original
value before the update whereas _$operation with a value of 4 refers to the updated
value. Therefore if a particular row is updated, then for the
original record in the CDC table, the value of _$operation will be 3 and for
the updated value the same will be 4. If the new data is inserted into the
table then in the CDC table, the value of _$operation will be 2. Whenever there
is an UPDATE operation against a particular table then the value of _$update_mask
will be 2. If it was an INSERT operation then the value of
_$update_mask will be 3.

DELETE OPERATION:

Finally to perform a DELETE operation :

delete from student  

where student_name='satnam singh'

As you can see above, for a DELETE
operation the value of _$operation will be 1 and _$update_mask will be 3. Thus if the operation is INSERT or DELETE then the corresponding
value in _$update_mask will be 3. If an UPDATE operation is performed
then the corresponding value of _$update_mask for both the original record as well
as the updated record will be 2.

Now let us query the Change Tracker table as follows:

 

select * from cdc.dbo_student_CT

this results in the below output:

To disable the CDC feature for the particular
database, execute the below T-SQL.

use college

exec sys.sp_cdc_disable_db


The result of executing this TSQL is that the Change Tracker
table named cdc.dbo_student_CT , will be removed from the database and the Capture and Delete
jobs will also be removed.

Please note that whenever CDC captures the data, we are storing data on the production server.   Whenever
CDC is enabled for a table, a cleanup job will be
created automatically, in our case it would be cdc.college_cleanup. This job
would be completely responsible for the cleanup activity. This job
internally has a stored procedure executing which is as shown below:

sys.sp_MScdc_cleanup_job

Please let us know in the comments if you have any feedback or suggestions, alternatively you can contact me on singhsatnam84@yahoo.com.




Array

No comments yet... Be the first to leave a reply!