Understanding SQL Server Change Data Capture
DBA’s often encounter a requirement to audit DML activity such as INSERT/UPDATE/DELETE operations executed against a particular table or a group of tables in a particular database. Change Data Capture (first introduced in SQL Server 2008 R2) can greatly assist this function.
Only members of sysadmin fixed server role can enable the Change Data Capture feature for the entire database. Before a Change Data Capture feature is enabled for an individual table, it must be enabled for the entire database using the sys.sp_cdc_enable_db system stored procedure.
Before you actually enable a particular database for CDC, you can examine whether the database is already enabled for CDC by executing the below T-SQL against the master database of the instance.
select * from sys.databases
If in the output the value of is_cdc_enables is 1 then the database is enabled for Change Data Capture.
Now I have a database named college and I would like to enable Change Data Capture feature execute the below T-SQL against the master database :
use college exec sys.sp_cdc_enable_db
As soon as CDC is enabled for the database named College, we can see that a CDC schema is created for the particular database :
Now let’s enable CDC for the student table which is present in the college database and belongs to the dbo schema. This is achieved by executing the below T-SQL against the database named college.
EXEC sys.sp_cdc_enable_table @source_schema = N'dbo', @source_name = N'student', @role_name = N'NULL' GO
As soon as CDC is enabled for a database, In the system tables of the particular database a new table is created as shown below:
One important point to note is that the table named student has just two columns : student_id and student_name both of which are captured through CDC. This also creates the below jobs on the server:
Let’s fire the below INSERT statement against the database named college.
insert student(student_name) select 'Satnam'
As soon as the INSERT statement is execxuted, CDC will automatically capture the Change details and the changed data will be stored in the cdc.dbo_student_CT table and can be obtained by querying the below table as follows:
select * from cdc.dbo_student_CT
What is the start_lsn and end_lsn ? Well LSN stands for Last Sequence Number. Any data that gets stored in the Transactional Log of the particular database is identified by the LSN Number. As soon as CDC was enabled for the table named student in the college database a system table named cdc.lsn_time_mapping is created in the database. Now let us query this using the T-SQL below:
select * from cdc.lsn_time_mapping
If we execute the below queries against the college database, I get the output below:
select * from cdc.dbo_student_CT select * from cdc.lsn_time_mapping
Here we can see that the INSERT statement was executed against the database on 25-December-2012 at 0706 hrs.
Now let’s execute the below UPDATE Statement against the student table in the college database.
update student set student_name='Satnam Singh' where student_name='Satnam'
Now let us execute the below 3 T-SQL statements against the college database.
select * from student select * from cdc.dbo_student_CT select * from cdc.lsn_time_mapping
As seen from the above screen capture from the system table named cdc.lsn_time_mapping we can determine what exactly the time was when the UPDATE query was executed. Now if you clearly see the above screen capture, In the lower screen capture which is the output from the table named cdc.dbo_student_CT there are 2 columns named _$operation and _$update_mask.
Whenever data is updated the CDC table will hold two entries – the original value and the updated one as you can see in the above screen capture. _$operation with a value of 3 indicates the original value before the update whereas _$operation with a value of 4 refers to the updated value. Therefore if a particular row is updated, then for the original record in the CDC table, the value of _$operation will be 3 and for the updated value the same will be 4. If the new data is inserted into the table then in the CDC table, the value of _$operation will be 2. Whenever there is an UPDATE operation against a particular table then the value of _$update_mask will be 2. If it was an INSERT operation then the value of _$update_mask will be 3.
Finally to perform a DELETE operation :
delete from student
where student_name='satnam singh'
As you can see above, for a DELETE operation the value of _$operation will be 1 and _$update_mask will be 3. Thus if the operation is INSERT or DELETE then the corresponding value in _$update_mask will be 3. If an UPDATE operation is performed then the corresponding value of _$update_mask for both the original record as well as the updated record will be 2.
Now let us query the Change Tracker table as follows:
select * from cdc.dbo_student_CT
this results in the below output:
To disable the CDC feature for the particular database, execute the below T-SQL.
use college exec sys.sp_cdc_disable_db
The result of executing this TSQL is that the Change Tracker table named cdc.dbo_student_CT , will be removed from the database and the Capture and Delete jobs will also be removed.
Please note that whenever CDC captures the data, we are storing data on the production server. Whenever CDC is enabled for a table, a cleanup job will be created automatically, in our case it would be cdc.college_cleanup. This job would be completely responsible for the cleanup activity. This job internally has a stored procedure executing which is as shown below:
Please let us know in the comments if you have any feedback or suggestions, alternatively you can contact me on email@example.com.