Most cross-team data fights are caused by an undocumented assumption upstream.

Backend ships a schema change. Analytics dashboard breaks. Marketing’s attribution numbers shift by 12% overnight. Three Slack threads, two postmortems, and one all-hands later, nothing structural changes. The next deploy will do it again.

A data contract is what the next deploy needs. More code review won’t catch it.

A data contract is a yaml file (or json, your choice) sitting in the producing repo. It defines:

  • The schema (columns, types, nullability)
  • Quality expectations (uniqueness, freshness, distribution bounds)
  • The SLA (how often it ships, when it can break)
  • The owner (one person)
  • The consumers (who’ll be affected when it changes)

When backend wants to change the schema, the contract requires a deprecation notice and a migration window. The downstream team gets notified. The breaking change becomes a coordinated rollout instead of a surprise.

Most teams skip contracts because “it’s overhead.” Then they spend three hours every week reverse-engineering what upstream changed.

The math: a contract takes 30 minutes to write and 5 minutes to update. The arguments it prevents cost hours per incident. Break-even is one incident.

Start with your highest-traffic table. Write the contract. Add a CI check that fails the PR if the schema drifts without a version bump. Iterate.

Does your most critical dataset have a written contract, or a tribal one?