Published on

Using Views for Backwards Compatible Data Migrations

Authors
  • avatar
    Name
    Danny Mican
    Twitter

Views are an important tool in legacy warehouse environments. Views act as an interface which decouples clients (like tableau, looker, applications, etc) from underlying data sources. Enabling backwards compatible changes is especially important in legacy environments, where the full scope of clients may be unknown. This article explains how views can be used to enable backwards compatible data migrations, which avoid the need to update any client queries.

Problem

Legacy warehouse environments for mid or large size companies may have 10's or even 100's of distinct clients. Clients can be:

  • Direct User SQL Access
  • Programatic - Automated Scripts, applications, or ETL jobs
  • Applications - BI Tools such as Tableau or Looker

These clients are often spread across the organization, with multiple teams each maintaining their own access patterns. This leads to an unknown number of clients. In my current role we don't have an inventory, or even the bandwidth to inventory all the database clients.

table clients

In this environment backwards compatible changes are essential, because table structure changes could break multiple different consumers.

How can changes be made to data pipelines without requiring clients to update their SQL queries?

Solution

Views enable swapping individual tables without updating clients. Views provide an interface over concrete tables. This means that the underlying tables can change but clients only communicate with the view. Since the view mirrors the current table, clients remain unaware of any underlying changes. In this case the view functions as an interface, which decouples the client from the underlying data source.

table clients using view

Approach

The following illustrates the view based migration approach using a fictional "widgets" pipeline. This pipeline includes an ETL job and a widgets table. The goal is to migrate the pipeline to a new ETL mechanism while maintaining backwards compatibility on the table structure.

  1. The legacy pipeline is functioning and materializing data to a table.
single pipeline
  1. The view is created, which mirrors the current tables structure.
single pipeline view
  1. The new Pipeline is setup to write to its own table.
multi pipeline view
  1. The view is updated to point at the new pipeline.
multi pipeline view switch
  1. The legacy pipeline is removed. The switchover is fully complete and the end clients did not need to be updated.
v1 removed

Conclusion

View based migrations are a safe and simple pattern. When views are used to decouple clients and underlying tables clients do not need to be updated when underlying table updates occur. This makes view-based migrations a powerful tool for legacy data environments where an unknown number of clients exist.