Understanding SQL Server Change Data Capture
DBA’s often encounter a requirement to audit DML
activity such as INSERT/UPDATE/DELETE operations executed against a particular
table or a group of tables in a particular database. Change Data Capture (first introduced in SQL Server
2008 R2) can greatly assist this function.
Only members of sysadmin fixed server role can enable the
Change Data Capture feature for the entire database. Before a Change Data
Capture feature is enabled for an individual table, it must be enabled for the
entire database using the sys.sp_cdc_enable_db
system stored procedure.
Before you actually enable a particular database for
CDC, you can examine whether the database is already enabled for CDC by
executing the below T-SQL against the master database of the
select * from sys.databases
If in the output the value of is_cdc_enables is 1 then the database is enabled for Change Data Capture.
Now I have a database named college and I would like to
enable Change Data Capture feature
execute the below T-SQL against the master database :
use college exec sys.sp_cdc_enable_db
As soon as CDC is enabled for the database named
College, we can see that a CDC schema is created for the particular
Now let’s enable CDC for the student table which
is present in the college database and belongs to the dbo schema.
This is achieved by executing the below T-SQL against the database named college.
EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'student', @role_name = N'NULL' GO
As soon as CDC is enabled for a database,
In the system tables of the particular database a new table is created as
One important point to note is that
the table named student has just two columns : student_id and student_name
both of which are captured through CDC. This also creates the below jobs on the server:
Let’s fire the below INSERT statement against the database
insert student(student_name) select 'Satnam'
As soon as the INSERT statement is execxuted, CDC will
automatically capture the Change details and the changed data will be stored in
the cdc.dbo_student_CT table and can be obtained by querying the below table as
select * from cdc.dbo_student_CT
What is the start_lsn and end_lsn ? Well LSN stands for
Last Sequence Number. Any data that gets stored in the Transactional
Log of the particular database is identified by the LSN Number. As soon as
CDC was enabled for the table named student in the college database a system
table named cdc.lsn_time_mapping is created in the
database. Now let us query this using the T-SQL below:
select * from cdc.lsn_time_mapping
If we execute the below queries against the college
database, I get the output below:
select * from cdc.dbo_student_CT select * from cdc.lsn_time_mapping
Here we can see that the INSERT statement was executed against the database on
25-December-2012 at 0706 hrs.
Now let’s execute the below UPDATE Statement against the
student table in the college database.
update student set student_name='Satnam Singh' where student_name='Satnam'
Now let us execute the below 3 T-SQL statements against the
select * from student select * from cdc.dbo_student_CT select * from cdc.lsn_time_mapping
As seen from the above screen capture from the
system table named cdc.lsn_time_mapping we can determine what
exactly the time was when the UPDATE query was executed. Now if you clearly see
the above screen capture, In the lower screen capture which is the
output from the table named cdc.dbo_student_CT there are 2 columns named _$operation and
Whenever data is updated the CDC table will hold two
entries – the original value and the updated one as you can see in
the above screen capture. _$operation with a value of 3 indicates the original
value before the update whereas _$operation with a value of 4 refers to the updated
value. Therefore if a particular row is updated, then for the
original record in the CDC table, the value of _$operation will be 3 and for
the updated value the same will be 4. If the new data is inserted into the
table then in the CDC table, the value of _$operation will be 2. Whenever there
is an UPDATE operation against a particular table then the value of _$update_mask
will be 2. If it was an INSERT operation then the value of
_$update_mask will be 3.
Finally to perform a DELETE operation :
delete from student
where student_name='satnam singh'
As you can see above, for a DELETE
operation the value of _$operation will be 1 and _$update_mask will be 3. Thus if the operation is INSERT or DELETE then the corresponding
value in _$update_mask will be 3. If an UPDATE operation is performed
then the corresponding value of _$update_mask for both the original record as well
as the updated record will be 2.
Now let us query the Change Tracker table as follows:
select * from cdc.dbo_student_CT
this results in the below output:
To disable the CDC feature for the particular
database, execute the below T-SQL.
use college exec sys.sp_cdc_disable_db
The result of executing this TSQL is that the Change Tracker
table named cdc.dbo_student_CT , will be removed from the database and the Capture and Delete
jobs will also be removed.
Please note that whenever CDC captures the data, we are storing data on the production server. Whenever
CDC is enabled for a table, a cleanup job will be
created automatically, in our case it would be cdc.college_cleanup. This job
would be completely responsible for the cleanup activity. This job
internally has a stored procedure executing which is as shown below:
Please let us know in the comments if you have any feedback or suggestions, alternatively you can contact me on firstname.lastname@example.org.