Hot, warm, cold. Three tiers, one policy, and suddenly your storage bill makes sense.
Data has a shelf life. Last week’s transactions are referenced daily. Last quarter’s reports get pulled monthly. Last year’s logs sit untouched until an audit. Treating all three the same - same performance tier, same replication, same cost - ignores how data actually gets used.
Hot: accessed daily or weekly. Active dashboards, recent transactions, real-time feeds. This is usually 10-20% of your total data.
Warm: accessed occasionally. Monthly reports, historical comparisons, audit trails. Needs to be reachable, not instant.
Cold: accessed rarely or never. Compliance archives, old logs, historical snapshots. You keep it because you have to.
I’ve changed my thinking on this. I used to treat tiering as a cost exercise. It’s more useful as a lifecycle question: what stage is this data in, and what does that stage actually require? Cost savings follow naturally - 30-50% in most cases - but the bigger win is clarity. You stop maintaining hot-tier performance for data nobody’s touched in six months.
Start with one query: when was each table last accessed?
Does your platform have a data lifecycle policy, or is everything kept forever by default?
