Reconciling Your Differences - Part 1 - Data Reconciliation Framework

The Problem

Some BI teams are looking for methods to schedule data tests, track success rates, and allow flexibility on source data (specifically multiple SQL instances). I’ve used tSQLt with success, but now have come to a scenario where I need to include multiple servers, queries, and results. I also need a central place for testing results and metadata as it moves and changes over time.

Test Driven Development is another area where this type of need comes into play. The test developed during a business analysis phase can be applied to multiple environments throughout the development lifecycle and teams can ensure that code changes do not have unexpected results.

I started this project with the following requirements:

  1. Run one or many queries to produce the same result across different databases and/or servers
  2. Easily add/update tests
  3. Not be limited by the number of data sets. For example, running the query against a DEV, QA, UAT, and PROD environments
  4. Simple validation of “do results match”
  5. Schedule test executions
  6. Provide audit trail of what has happened
  7. Report on results

Data Reconciliation Framework

Based on the requirements above, I created the following Data Reconciliation Framework.  The diagram below shows the overall conceptual model for the framework.

Data Model

  1. Test Definition - Describing the overall goal of the test
  2. Query Definition - The tSQL to be executed during a test
  3. Connection - Connection string for the environments being tested
  4. Mapping - Relationship between the query definition and the connection it should be executed on during a test
  5. Execution Results - Returned values for a specific query and connection
  6. Reconciliation - Aggregate analysis of results to determine test pass / fail.

Execution Engine - Dynamic SSIS Package

The solution will use SSIS to dynamically execute multiple query definitions based on the metadata configuration.  The package will then build out the execution results for further analysis.

Reporting

The solution will use SQL Server Reporting Services (SSRS) to support both on-demand reporting and user subscriptions.  This will allow for standard analysis and the ability to alert users of issues when they are found.

Thoughts & Concerns

  1. Storing and executing queries against environments like this may be concerning.  Yep! It’s something certainly to be addressed, and put necessary controls in place to ensure the queries themselves are safe and don’t cause unnecessary locking or blocking
  2. M:M Mapping table – it’s not friendly to load identity based columns manually.  Stored procedures as setup scripts can help, but that’s as good as it will get.  Let’s face it, I don’t see myself building a GUI for this.  Been a long time since I’ve written any type of application code.

More to Come

Part 2 of this series will do into detail around how the data model is loaded and consumed by the framework.