SQL Server 2008 CDC function to realize data change capture script

Time:2022-5-2

CDC:Change Data Capture

Copy codeThe code is as follows:
–Step: This article takes gposdb as an example

–Step 1: explicitly enable CDC for the target library:
–Use sys. In the current library sp_ cdc_ enable_ db。 Returns 0 (success) or 1 (failure).
–Note that this feature cannot be enabled for system and distribution databases. And the performer needs to use sysadmin role permission.
–The scope of the stored procedure is the entire target library. It includes metadata, DDL triggers, CDC architecture and CDC users.
–Enable with the following code:
Use gposdb — database to enable CDC
GO
EXECUTE sys.sp_cdc_enable_db;
GO
–At the beginning of direct execution, an error message appears:
–Message 22830, level 16, status 1, process sp_ cdc_ enable_ db_ Internal, line 193
–The metadata cannot be updated to indicate that change data capture is enabled for the database AdventureWorks.
–Failed to execute command ‘setcdcracked (value = 1)’.
–The error returned is 15517: ‘cannot execute as database principal because principal’ dbo ‘
–The principal of this type does not exist, cannot be impersonated, or you do not have the required permissions. ‘. Use this action and error to determine the cause of the failure and resubmit the request.
–Here comes another knowledge point: the error of error number 15517
–This error can occur in many places, such as restoring the database.
–The common point is that some stored procedures use the option with execute as.
–Make it have a certain architecture in the current library, but when it is executed in other places, because there is no such architecture, it will report an error. Solution:
ALTER AUTHORIZATION ON DATABASE::[GPOSDB] TO [sa]

–After checking, the stored procedure uspupdateemployeehireinfo does have: with execute as caller
–The reason for using SA is that even if SA is disabled, SA still exists. So no error will be reported.
–Now re execute:
USE GPOSDB
GO
EXECUTE sys.sp_cdc_enable_db;
GO
–Enable successfully, and then check whether it is successful through the following statement:
SELECT
is_cdc_enabled,
CASE WHEN is_ cdc_ Enabled = 0 then ‘CDC function disabled’ else ‘CDC function enabled’ end description
FROM    sys.databases
WHERE   NAME = ‘GPOSDB’

–After the creation is successful, the CDC user and CDC architecture will be automatically added.
–You can see CDC users and CDC architecture under users and architecture

–The reason for creating these two user schemas is that CDC requires exclusive use of these two schemas, so they need to be created separately.
–If there are CDC users and schemas created by non CDC functions, you need to delete the schema named by the CDC before opening it.

–Step 2: enable CDC for the target table:
–Use DB_ Members of the owner role execute sys sp_ cdc_ enable_ Table creates a capture instance for each table that needs to be tracked.
–Then through sys Is in tables catalog view_ tracked_ by_ CDC column to determine whether the creation is successful.
–By default, all columns of the table are captured. If you only need to capture some columns,
–You can use @ captured_ column_ The list parameter specifies these columns.
–If you want to put the change table into a filegroup, it’s best to create a separate filegroup (at least independent of the source table).

–If you do not want to control the access role, @ role_ Name must be explicitly set to null.
sys.sp_cdc_enable_table
    [ @source_schema = ] ‘source_schema’,
    [ @source_name = ] ‘source_name’ ,
    [ @role_name = ] ‘role_name’
    [,[ @capture_instance = ] ‘capture_instance’ ]
    [,[ @supports_net_changes = ] supports_net_changes ]
    [,[ @index_name = ] ‘index_name’ ]
    [,[ @captured_column_list = ] ‘captured_column_list’ ]
    [,[ @filegroup_name = ] ‘filegroup_name’ ]
  [,[ @partition_switch = ] ‘partition_switch’ ]

–Example:
–Turn the systempara table on change capture.
USE GPOSDB
GO
EXEC sys.sp_cdc_enable_table @source_schema = ‘DBO’,
    @source_name = ‘SYSTEMPARA’,@role_name = NULL

–Then check whether the query is successful:
SELECT  name ,
        is_tracked_by_cdc ,
        CASE WHEN is_ tracked_ by_ CDC = 0 then ‘CDC function disabled’
Else ‘CDC function enabled’
End description
FROM    sys.tables
WHERE   OBJECT_ID = OBJECT_ID(‘dbo.systempara’)
–After opening the table, you can see many more tables at the beginning of the CDC architecture in the following figure:
–Refresh the gposdb database. You can see the following tables under the system table
[cdc].[DBO_SYSTEMPARA_CT]
[cdc].[change_tables]
[cdc].[captured_columns]
[cdc].[ddl_history]
[cdc].[index_columns]
[cdc].[lsn_time_mapping]
[dbo].[systranschemas]
[dbo].[dtproperties]

–After startup, you can see the jobs in SQL server agent, and these two jobs also appear:
[cdc.GPOSDB_capture]
[cdc.GPOSDB_cleanup]

–In programmability – “function -” table valued function “, there are also two more functions
[cdc].[fn_cdc_get_all_changes_DBO_SYSTEMPARA]
[cdc].[fn_cdc_get_net_changes_DBO_SYSTEMPARA]

–The related stored procedures are listed below:
–Sys.sp_cdc_add_job
–Description and examples
–Sys.sp_cdc_generate_wrapper_function
–Description and examples
–Sys.sp_cdc_change_job
–Description and examples
–Sys.sp_cdc_get_captured_columns
–Description and examples
–Sys.sp_cdc_cleanup_change_table
–Description and examples
–Sys.sp_cdc_get_ddl_history
–Description and examples
–Sys.sp_cdc_disable_db
–Description and examples} it is recommended to disable the table first and then the library
–Sys.sp_cdc_help_change_data_capture
–Description and examples
–Sys.sp_cdc_disable_table
–Description and examples
–Sys.sp_cdc_help_jobs
–Description and examples
–Sys.sp_cdc_drop_job
–Description and examples
–Sys.sp_cdc_scan
–Description and examples
–Sys.sp_cdc_enable_db
–Description and examples
–Sys.sp_cdc_start_job
–Description and examples
–Sys.sp_cdc_enable_table
–Description and examples
–Sys.sp_cdc_stop_job
–Description and examples

–Function:
–Cdc.fn_cdc_get_all_changes_<capture_instance>
–Description and examples
–Sys.fn_cdc_has_column_changed
–Description and examples
–Cdc.fn_cdc_get_net_changes_<capture_instance>
–Description and examples
–Sys.fn_cdc_increment_lsn
–Description and examples
–Sys.fn_cdc_decrement_lsn
–Description and examples
–Sys.fn_cdc_is_bit_set
–Description and examples
–Sys.fn_cdc_get_column_ordinal
–Description and examples
–Sys.fn_cdc_map_lsn_to_time
–Description and examples
–Sys.fn_cdc_get_max_lsn
–Description and examples
–Sys.fn_cdc_map_time_to_lsn
–Description and examples
–Sys.fn_cdc_get_min_lsn
–Description and examples

——————–Let’s start with a practical case from beginning to end————————-
–Start with an actual case from beginning to end

–Step 1: explicitly enable CDC for the target library
Use gposdb — database to enable CDC
GO
EXECUTE sys.sp_cdc_enable_db;
GO

–There may be some stored procedures in some databases, including execute as and other statements. An error will be reported at this time:

–Text description:
–Message 22830, level 16, status 1, process sp_ cdc_ enable_ db_ Internal, line 186
–The metadata cannot be updated to indicate that change data capture is enabled for the database AdventureWorks. Failed to execute command ‘setcdcracked (value = 1)’. The error returned was 15517: ‘cannot execute as a database principal because the principal’ dbo ‘does not exist, cannot impersonate this type of principal, or you do not have the required permissions.’. Use this action and error to determine the cause of the failure and resubmit the request.
–Message 266, level 16, status 2, process sp_ cdc_ enable_ db_ Internal, line 0
–The transaction count after execute indicates that the number of begin and commit statements does not match. Previous count = 0, current count = 1.
–Message 266, level 16, status 2, process sp_ cdc_ enable_ DB, line 0
–The transaction count after execute indicates that the number of begin and commit statements does not match. Previous count = 0, current count = 1.
–Message 3998, level 16, status 1, line 1
–Uncommitted transactions were detected at the end of the batch. The transaction will be rolled back.
–If this error occurs, the current solution is to execute the following statement. The reason has been explained at the beginning. Generally, there will be no such problem for libraries that do not use execute as:

ALTER AUTHORIZATION ON DATABASE::[GPOSDB] TO [sa]

–Now re execute:
USE GPOSDB
GO
EXECUTE sys.sp_cdc_enable_db;
GO
–Check for success with the following statement:
SELECT
is_cdc_enabled,
CASE WHEN is_ cdc_ Enabled = 0 then ‘CDC function disabled’ else ‘CDC function enabled’ end description
FROM    sys.databases
WHERE   NAME = ‘GPOSDB’

–Step 2: enable CDC for the table
USE GPOSDB
GO
EXEC sys.sp_cdc_enable_table
@source_schema = ‘DBO’,
@source_name = ‘SYSTEMPARA’,
@role_name = NULL,
@capture_instance=DEFAULT
GO

–Then check whether the query is successful:
SELECT  name ,
        is_tracked_by_cdc ,
        CASE WHEN is_ tracked_ by_ CDC = 0 then ‘CDC function disabled’
Else ‘CDC function enabled’
End description
FROM    sys.tables
WHERE   OBJECT_ID = OBJECT_ID(‘dbo.systempara’)

–You can see that [CDC] is added to the system table in the gposdb database [dbo_systempara_ct] table

 

 

–Step 3: check. Let’s change the data
–Check the dbo first_ SYSTEMPARA_ CT table
SELECT * FROM cdc.[DBO_SYSTEMPARA_CT]

–You can see that there is no record, because it has just been created and no addition, deletion or modification has been made to the original table systempara

–Insert a record into the [systempara] table
INSERT INTO [dbo].[SystemPara]
        ( [ParaValue] ,
          [Name] ,
          [Description]
        )
Values (‘China ‘, — paravalue – varchar (50)
‘China’, — name – varchar (50)
‘China’ — Description – varchar (50)
        )

–Check the dbo_ SYSTEMPARA_ CT table, you can see one more record
SELECT * FROM cdc.[DBO_SYSTEMPARA_CT]

–Update a record in the [systempara] table
UPDATE [dbo]. [systempara] set [paravalue] = ‘Germany’ where [description] = ‘China’

–Check the dbo_ SYSTEMPARA_ CT table, you can see two more records
SELECT * FROM cdc.[DBO_SYSTEMPARA_CT]

–Delete a record in the [systempara] table
DELETE FROM  [dbo]. [systempara] where [description] = ‘China’

–Check the dbo_ SYSTEMPARA_ CT table, you can see one more record
SELECT * FROM cdc.[DBO_SYSTEMPARA_CT]

–Now let’s analyze dbo_ SYSTEMPARA_ CT table
–Available in Books Online:
–cdc.<capture_instance>_CT 
–You can see that the table named like this is used to record the changes made to the source table.
–For the insert / delete operation, there will be a corresponding row of records, while for the update operation, there will be two rows of records.
–For__$ Operation column: 1 = delete, 2 = insert, 3 = update (old value), 4 = update (new value)
–Of UPDATE statement__$ The values of the operation column are 3 and 4, so an update statement corresponds to two records

–For__$ start_ LSN column: since the change is the transaction log of the source and database, the start sequence number (LSN) of the transaction log will be saved here
–Microsoft does not recommend using this type of query table directly
–cdc. fn_ cdc_ get_ all_ changes_< Capture instance >
–cdc.fn_cdc_get_net_changes_<capture_instance>
–To query

———————————————————-

–Let’s get familiar with the use of various functions and stored procedures and try some abnormal operations.

 

–Daily use scenario:
–1. Query the opened capture instances:
–Returns change capture configuration information for all tables
USE [GPOSDB]
GO
EXECUTE sys.sp_cdc_help_change_data_capture;
GO

 

–View which columns of an instance (i.e. table) have been captured and monitored
USE [GPOSDB]
GO
EXEC sys.sp_cdc_get_captured_columns
@capture_instance = ‘systempara’ — sysname

 

–You can also find configuration information below
SELECT * FROM msdb.dbo.cdc_jobs

 

 

–2. View the current configuration using SP_ cdc_ help_ jobs:
–As can be seen from the above, two jobs will be created automatically after enabling CDC. You can use the following statement to view them first:
sp_cdc_help_jobs

 

–For a large OLTP system, there will be a lot of data in the change table due to frequent data changes,
–If it is stored for too long (up to 100 years), it is a great challenge to the database space.
–At this time, you can adjust the CDC in the figure above AdventureWorks_ Retention in cleanup (unit: minutes).

–3. Modify configuration: SP_ cdc_ change_ job
–Display original configuration
EXEC sp_cdc_help_jobs
GO
–Change the data retention time to 100 minutes
EXECUTE sys.sp_cdc_change_job
    @job_type = N’cleanup’,
    @retention=100
GO

–Restart the job for the settings to take effect
–Deactivate job
EXEC sys.sp_cdc_stop_job N’cleanup’
GO
–Enable job
EXEC sys.sp_cdc_start_job N’cleanup’
GO
–View again
EXEC sp_cdc_help_jobs
GO
–You can see that the value of retention (in minutes) has changed to 100

–4. Stop / enable, delete / create jobs

–Deactivate job
EXEC sys.sp_cdc_stop_job N’cleanup’
GO
–Enable job
EXEC sys.sp_cdc_start_job N’cleanup’
GO

–Delete job
EXEC sys.sp_cdc_drop_job @job_type = N’cleanup’ — nvarchar(20)
GO
–View job
EXEC sys.sp_cdc_help_jobs
GO
–You can see that there is only one job left: CDC GPOSDB_ capture

–Create job
EXEC sys.sp_cdc_add_job
    @job_type = N’cleanup’,
    @start_job = 0,
    @retention = 5760

–View job
EXEC sys.sp_cdc_help_jobs
GO

 

–5. DDL change capture:
–In addition to capturing data changes, CDC can also capture changes in DDL operations.
–The premise is to ensure that the SQL server agent is enabled. In fact, the CDC function needs to ensure the normal operation of the SQL agent
–Because all operations are implemented through two jobs in the agent.
–Now let’s modify the systempara table to lengthen the length of paravalue
USE [GPOSDB]
GO
ALTER TABLE  [dbo].[SystemPara] ALTER COLUMN PARAVALUE VARCHAR(120) ;
GO

–Then query the DDL record table
SELECT  * FROM    cdc.ddl_history

–6. Use CDC’s functions to get changes
–A. Use [CDC] [fn_cdc_get_all_changes_DBO_SYSTEMPARA]
–The function report captures all currently available changes to the instance
DECLARE @from_lsn BINARY(10) ,
    @to_lsn BINARY(10)
SET @from_lsn = sys.fn_cdc_get_min_lsn(‘SYSTEMPARA’)
SET @to_lsn = sys.fn_cdc_get_max_lsn()
SELECT  *
FROM    cdc.fn_cdc_get_all_changes_DBO_SYSTEMPARA(@from_lsn, @to_lsn,N’all update old’);
GO

 

–B. Get change information for a time period:
–First, get the tracking change data according to the log sequence number (LSN)
–Sys. fn_ cdc_ map_ time_ to_ LSN obtains the maximum and minimum LSN values within the change range. have access to
Smallest greater than;
smallest greater than orequal;
largest less than;
largest less than or equal;

–For example, query the data inserted in a certain time period
INSERT INTO [dbo].[SystemPara]
        ( [ParaValue] ,
          [Name] ,
          [Description]
        )
Values (‘China ‘, — paravalue – varchar (50)
‘China’, — name – varchar (50)
‘China’ — Description – varchar (50)
        )

GO

–Check data
–1 delete
–2 insert
–3. 4 change
–Records that have been inserted can be queried even if they are deleted
DECLARE @bglsn VARBINARY(10)= sys.fn_cdc_map_time_to_lsn(‘smallest greater than or equal’,
                                                         ‘2013-10-21 12:00:00.997’)
DECLARE @edlsn VARBINARY(10)= sys.fn_cdc_map_time_to_lsn(‘largest less than or equal’,
                                                         GETDATE())
SELECT  *
FROM    [cdc].[DBO_SYSTEMPARA_CT]
WHERE   [__$operation] = 2
        AND [__$start_lsn] BETWEEN @bglsn AND @edlsn

 

–C、sys. fn_ cdc_ map_ lsn_ to_ Time query change time:
SELECT  [__$operation] ,
        CASE [__$operation]
When 1 then ‘Delete’
When 2 then ‘Insert’
When 3 then ‘update (the captured column value is the value before the update operation)
When 4 then ‘update (the captured column value is the value after the update operation)
End [type],
        sys. fn_ cdc_ map_ lsn_ to_ Time ([_ $start_lsn]),
        *
FROM    [cdc].[DBO_SYSTEMPARA_CT]

 

–D. Get LSN boundary
SELECT  sys. fn_ cdc_ get_ max_ LSN [maximum level of LSN],
        sys. fn_ cdc_ get_ min_ LSN (‘cdc. Dbo_systempara_ct ‘) [LSN of capture instance]

These two values can be used in the functions mentioned above for filtering data.

———————————————————-
–1. What is the purpose of CDC?

–CDC is a mechanism that greatly facilitates us to obtain the data update of a table. Through an independent process,
–Asynchronous reading of log files rather than triggers works. And its data will be persisted and saved to a system table.
–2. Is CDC a unique function of SQL Server 2008? Does it work for other databases or earlier versions?

–CDC is a unique feature of SQL Server 2008 and enterprise edition. The development version also has this function, but it is only used for testing.

–3. CDC reads logs. What happens if the logs are truncated?

–If a certain part of the log has not been read by the CDC process, this part will be ignored when truncating the log and cannot be truncated!!

–The capture process is a stand-alone process that starts as the agent service starts. The interval between two scans is 5 minutes.

–4. Will the data in the system table exist permanently– No, it will be kept for three days. There will be a cleaning job, which will be scanned at 2 pm every day.
–Finally, it is added that the CDC function depends on the agent service, because two operations are started through jobs.