A Lightweight and Reliable Method for
Managing Incremental Loads to your Data Warehouse
Available in all versions of SQL Server starting with 2008R2, SQL Server Change Tracking (CT) is an excellent option for tracking changes in your source SQL Server tables. CT is easy to implement, has a low impact on server performance, and is highly reliable.
What is so great about incremental loads? Why not just reload all the rows each time?
Incremental loads to your data warehouse are a necessity for two main reasons:
- If you have huge tables in your SQL Server environment, ETL performance improves by processing only those rows that have changed.
- Allows a level of historical tracking for slowly changing dimensions, etc
Why not use a date column or hashing to identify changes?
Using a modified date is unreliable when either the application or a DBA updates the data without updating the modified date. We’d like to say that would never happen, but we all know it does. Hashing provides a maintenance headache when adding columns from the source table or removing, and we must remember to update the HASH to include/remove those columns.
So how is SQL Server Change Tracking implemented?
You must first use the ALTER DATABASE command to enable CT on the source database:
ALTER DATABASE CT_DEMO SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 2 Days, AUTO_CLEANUP = ON)
Then use ALTER TABLE command to enable CT on each source table:
ALTER TABLE dbo.SourceTable1 ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = OFF)
ALTER TABLE dbo.SourceTable2 ENABLE CHANGE_TRACKING WITH (TRACK_COLUMNS_UPDATED = OFF)
Enabling CT on a table creates another hidden internal table that stores:
- the primary key of each modified row (NOTE: CT only works on tables with a primary key)
- the operation performed (I)nsert, (U)pdate, (D)elete
- the version number for the change
That sounds easy enough, but how do you access the changes from those built-in tables?
Microsoft provides built-in SQL functions CHANGETABLE and CHANGE_TRACKING_CURRENT_VERSION to help us access the internal CT tables and identify the rows that have changed. These functions are used in T-SQL SELECT queries to extract the data from your source tables.
Simple t-SQL MERGE query can be used to load the changes from the landing table into the destination table. MERGE allows you to handle Inserts, Updates, and Deletes against the destination table in whatever manner you require.
You’re convinced but would still like help to implement SQL Server Change Tracking in your data warehouse environment?
We’d be more than happy to set up a meeting or have a conversation to discuss your needs and explain how we might be able to help. Please send an email to firstname.lastname@example.org and we’ll get in touch.