Carrying Governance Through a Platform Migration (Legacy SSIS and SQL to Snowflake)

Share
Carrying Governance Through a Platform Migration (Legacy SSIS and SQL to Snowflake)

As we transform our data ecosystem from on-premise SSIS and SQL Server stack to scalable cloud services , moving to Snowflake has become a key step in achieving the level of scalability, flexibility, and analytical power we need. This migration is not just about moving data or rewriting pipelines it represents a complete shift in how we manage, process, and govern our data.

Why Governance Matters During Migration

Many migration programs prioritize technical delivery—rebuilding SSIS packages, recreating SQL logic, loading data into Snowflake and validating reports. Governance activities are often deferred until after go-live. Unfortunately, this approach can lead to gaps in lineage, unclear accountability, and a platform that works technically but lacks trust.

Over the years, SSIS packages and SQL stored procedures have accumulated a significant amount of business logic, data quality rules, lineage, and operational workflows. Much of this logic is deeply embedded within legacy ETL processes, making it difficult to maintain, scale, or fully understand. For instance, while the source of budget rates is well understood, supplementary budget rates may originate from a separate SSIS process that loads data into a staging database. This creates a hidden dependency that is not readily visible to developers or support teams and,   in many cases, remains undocumented.

Data Governance Principles and Standards


The migration should be guided by key data governance principles, including data ownership, data quality, metadata management, business glossary standardization, lineage and traceability, security and privacy, regulatory compliance, stewardship, and continuous monitoring. These principles help ensure that data remains trusted, secure, and fit for business use throughout the transformation.
Governance requirements should align with industry-specific regulations and standards. For example, financial services may require compliance with IFRS and GDPR, while retail organizations focus on customer data protection, privacy, product master data, and omnichannel consistency.

A successful migration treats governance as a parallel workstream rather than a post-migration activity. For SSIS to Snowflake transformation, we need a maturity model—a structured framework where each phase is validated before we move forward, with data governance embedded throughout the entire migration lifecycle. Collibra provides the governance backbone that delivers visibility, accountability, and trust at every stage. The steps below outline how data governance can be leveraged across the full SSIS and SQL to Snowflake migration journey.

Step 1: Discovery & Assessment

Discovery means building a complete inventory—not just servers and instances, but every database, schema, stored procedure, function, view, SSIS package, SQL Agent job and dependencies between them. Mapping source to target systems(high level). Check on data /process ownership, historical loads and schedule analysis.  Assess the SSIS and SQL Server environment to identify where sensitive data lives, how access is managed, and which regulatory obligations apply, exposing gaps like hardcoded credentials, excessive access, missing encryption, and weak audit trails.

Stakeholder interviews are a critical part of this process. Data owners, stewards, business users, data engineers, and reporting teams are engaged to understand how data is used, which datasets are considered critical, where manual workarounds exist, and which reports or processes are highly dependent on specific data assets. These discussions often reveal undocumented business rules, hidden dependencies, and data quality issues that are not visible through technical analysis alone.

We use Collibra to establish a comprehensive inventory of our data assets, asset owners, business terms and system relationships, ensuring we have a clear view of what exists before any migration work begins. This level of visibility sets the foundation for an accurate and governed migration.

Step 2:  Lineage and Dependency Analysis

We need to understand how data actually moves. Which downstream system breaks if the table changes, which stored procedure calls a linked server, shared staging tables etc.

We need to check if there are any changes in data granularity occur as part of the migration from SSIS to Snowflake/PySpark.For example, in the legacy SSIS environment, a business attribute may have been calculated for each individual transaction using a For Each processing approach. In the target Snowflake/PySpark environment, the same metric may be calculated through batch processing and stored at a monthly aggregation level. While the business outcome may remain the same, the level of detail available to data consumers will change.In such cases we need to check if existing data quality rules, business glossary definitions, and responsibilities remain valid under the new granularity and update the same.

We use Collibra to capture and visualize end to end lineage across all relevant assets. This delivers key governance outcomes, including complete source to target lineage documentation, dependency mapping, identification of undocumented or hidden processes, and the ability to perform meaningful impact analysis.

Step 3: Business Rule and Data Quality Extraction

Rowlevel transformations, filtering logic, and data quality controls embedded within SSIS packages  need to be identified and documented during this phase. These governance artifacts are then carried forward and implemented in PySpark during the Design and Migration phase, ensuring that even as the technology stack changes, the underlying business rules, lineage, and compliance requirements remain intact, consistent, and fully traceable through Collibra.

Not all SSIS logic will be migrated directly to Snowflake or PySpark. During the business rule assessment phase, we need to evaluate whether existing transformations remain relevant to current business requirements. For example, a currency conversion rule introduced to accommodate a country currency change in 2021 may no longer be required if the target platform only retains two years of historical data , changes in the reclassification of financial asset definitions may require certain legacy rules to be updated, replaced, or retired rather than migrated as-is.

Step 4 : Establish Snowflake Governance Standards

In this phase, we use the findings from the previous stage to define how governance will operate in the target Snowflake environment. We establish the foundational standards and controls that Snowflake must adhere to, including naming conventions, data classifications, ownership and stewardship assignments, security and access patterns, and the lineage requirements that must be captured and maintained. By designing these governance elements upfront, we ensure that Snowflake is built with consistent structure, clear accountability, and traceable data flows from day one.

Step 5 : Design and Migration

Snowflake objects such as databases, schemas, tables, views, and materialized views are created according to the governance standards defined in the previous phase. Business rules, data quality checks, and security policies identified earlier are implemented within these objects and in PySpark pipelines. Ownership and stewardship are assigned to each Snowflake object. Create DQ metrics table for data validation.At the same time, Collibra is used to register each Snowflake object, link it to the corresponding business terms, owners, and stewards, and capture lineage back to the source systems. This ensures that the target platform inherits not just the data, but also the full governance context, including metadata, accountability, and compliance requirements.

Step 5 : Validation and Continuous Governance

We compare Snowflake results to ensure row counts match, business rules are correctly implemented, no data is missing or duplicated .Once migration is complete and Snowflake becomes the system of record, the focus shifts from one‑time validation to continuous governance. The DQ dashboard evolves into an operational scorecard.This phase ensures that we validate the Snowflake environment for accuracy, compliance, and full traceability, while embedding continuous governance into operations so that data quality, lineage, ownership, and security controls remain actively enforced and managed through Collibra.

Conclusion

By embedding governance throughout the migration, we ensure that Snowflake carries forward rationalized business rules, preserved data quality, clear ownership, and fully traceable lineage, with Collibra centralizing metadata, lineage, and accountability across the platform.

Data Governance is not a choice ;its a required discipline for accuracy , compliance and traceability