Published on

Stay on Top of Your ETL Pipelines With Table Freshness Checks

Authors
  • avatar
    Name
    Danny Mican
    Twitter

Freshness checks are an easy way to see if your data is out of date. A freshness check queries a table for the MAX created time, and emits the difference between the current time and the record time. Freshness checks are extremely easy to implement and provide a high confidence metric for alerts. For example, if a table that should be refreshed hourly goes a day without being refreshed that there is a high likely hood of issues in your ETL pipeline. The rest of the post briefly illustrates what freshness checks look like and how they indicate issues.

The following image shows an hourly ETL job. The upper left hand graph shows the freshness probe for the Table populated by the ETL job. The Y axis units are "hours". During normal operations the table averages ~30 minutes out of date. A problem with the ETL job occurs causing the table freshness to grow out of date:

issue_occurs

Once the issue is resolved, and the job resumes successful execution, the freshness of the table returns to the expected ~30 minutes:

issue_resolved

I find it super helpful to have a "catch-all" alert that triggers when any table goes a day or two out of date. I created db-insights to make freshness checks easy. db-insights can perform freshness checks against snowflake and emits datadog metrics and logs.