- Danny Mican
Views are an important tool in legacy warehouse environments. Views act as an interface which decouples clients (like tableau, looker, applications, etc) from underlying data sources. Enabling backwards compatible changes is especially important in legacy environments, where the full scope of clients may be unknown. This article explains how views can be used to enable backwards compatible data migrations, which avoid the need to update any client queries.
Legacy warehouse environments for mid or large size companies may have 10's or even 100's of distinct clients. Clients can be:
- Direct User SQL Access
- Programatic - Automated Scripts, applications, or ETL jobs
- Applications - BI Tools such as Tableau or Looker
These clients are often spread across the organization, with multiple teams each maintaining their own access patterns. This leads to an unknown number of clients. In my current role we don't have an inventory, or even the bandwidth to inventory all the database clients.
In this environment backwards compatible changes are essential, because table structure changes could break multiple different consumers.
How can changes be made to data pipelines without requiring clients to update their SQL queries?
Views enable swapping individual tables without updating clients. Views provide an interface over concrete tables. This means that the underlying tables can change but clients only communicate with the view. Since the view mirrors the current table, clients remain unaware of any underlying changes. In this case the view functions as an interface, which decouples the client from the underlying data source.
The following illustrates the view based migration approach using a fictional "widgets" pipeline. This pipeline includes an ETL job and a widgets table. The goal is to migrate the pipeline to a new ETL mechanism while maintaining backwards compatibility on the table structure.
- The legacy pipeline is functioning and materializing data to a table.
- The view is created, which mirrors the current tables structure.
- The new Pipeline is setup to write to its own table.
- The view is updated to point at the new pipeline.
- The legacy pipeline is removed. The switchover is fully complete and the end clients did not need to be updated.
View based migrations are a safe and simple pattern. When views are used to decouple clients and underlying tables clients do not need to be updated when underlying table updates occur. This makes view-based migrations a powerful tool for legacy data environments where an unknown number of clients exist.