What Is MLOps? Getting ML Models to Production

The Short Version

MLOps (Machine Learning Operations) is the set of practices for deploying and maintaining machine learning models in production reliably.

The problem it solves: Models that work in notebooks don’t automatically work in production.

Data scientists build models. MLOps gets those models running reliably, at scale, with monitoring and maintenance. Without MLOps, you have impressive demos that never deliver business value.

Think of it like DevOps for machine learning - but harder, because ML systems have additional complexity that traditional software doesn’t.

Why MLOps Matters

The Deployment Gap

Most ML projects fail to reach production. Common estimates suggest only 10-20% of models ever get deployed.

The reasons aren’t usually about model quality. They’re about:

Can’t reproduce the training environment
Can’t serve predictions at required latency
No monitoring when model degrades
Can’t update models without breaking things
No governance or approval process

Building a model is maybe 20% of the work. Operating it is the other 80%.

ML Systems Are Different

Traditional software:

Deterministic behavior
Code defines functionality
Testing is relatively straightforward
Debugging follows clear paths

ML systems:

Probabilistic behavior
Data + code + model define functionality
Testing is genuinely hard
Debugging involves data, features, and model interactions

These differences mean DevOps practices aren’t sufficient. You need ML-specific operational practices.

MLOps Lifecycle - Train, Deploy, Monitor, Retrain cycle — The MLOps lifecycle: continuous improvement

Core MLOps Concepts

Model Versioning

Models change over time. You need to track:

Which model version is deployed
What data it was trained on
What parameters were used
What performance it achieved

This enables:

Rollback if new models underperform
Reproduction of results
Audit trails for compliance
Comparison across versions

Feature Engineering

Features are the inputs to models - transformed, aggregated, derived data.

Challenges:

Features developed in notebooks don’t translate to production
Training and serving features can diverge (training-serving skew)
Feature computation is often duplicated across teams
Historical features are hard to reproduce

Feature stores address this by:

Centralizing feature definitions
Ensuring consistency between training and serving
Enabling feature reuse across projects
Maintaining point-in-time correctness

Model Serving

Getting predictions from models. Options include:

Batch inference:

Run model on dataset periodically
Store predictions for lookup
Good for: Recommendations, scoring, reports

Online inference:

Predictions on individual requests in real-time
Low latency requirements
Good for: Search ranking, fraud detection, personalization

Embedded:

Model runs in application code
No separate serving infrastructure
Good for: Edge devices, latency-critical applications

Each pattern has different infrastructure requirements.

Model Monitoring

Models degrade. Monitoring catches problems:

Data drift:

Input data distribution changes from training
Example: User behavior shifts after COVID

Concept drift:

Relationship between inputs and outputs changes
Example: Economic conditions change what predicts loan default

Model performance:

Accuracy, precision, recall over time
Business metrics tied to model predictions

Infrastructure:

Latency, throughput, errors
Resource utilization

Without monitoring, you won’t know your model is failing until business impact becomes obvious.

Model Retraining

Models need updates:

Scheduled: Retrain weekly/monthly regardless
Triggered: Retrain when drift exceeds threshold
Continuous: Ongoing learning from new data

Retraining pipelines need to be:

Automated (not manual notebook runs)
Tested (new model validated before deployment)
Governed (approval before production)
Reversible (rollback if problems occur)

MLOps Maturity Levels

Level 0: Manual

Data scientists develop models in notebooks
Deployment is manual, ad-hoc
No automation, no monitoring
Works for: Exploration, prototyping

Level 1: ML Pipeline Automation

Automated training pipelines
Consistent, reproducible training
Some monitoring
Works for: Stable models with infrequent updates

Level 2: CI/CD for ML

Automated testing of data, models, and code
Continuous training with new data
Automated deployment with validation
Full monitoring and alerting
Works for: Production-critical models at scale

Most organizations are at Level 0 or early Level 1. Level 2 requires significant investment.

Common MLOps Challenges

Training-Serving Skew

Model behaves differently in production than training.

Causes:

Different feature computation code
Different data preprocessing
Missing features in production
Timing differences in feature availability

Solutions:

Feature stores that serve training and production
Shared feature computation code
Monitoring for skew detection

Reproducibility

Can’t recreate the model that’s in production.

Causes:

Notebooks don’t capture environment
Random seeds not fixed
Data changed since training
Dependencies not pinned

Solutions:

Version control for code, data, and models
Containerized training environments
Data versioning or snapshots
Experiment tracking tools (MLflow, Weights & Biases)

Data Dependencies

Data problems break ML systems.

Upstream data changes without notice
Data quality degrades
Data arrives late or not at all
Schema changes break feature computation

MLOps requires tight integration with data architecture and data quality practices.

Organizational Challenges

Data scientists who don’t want to do operations
Engineers who don’t understand ML
Nobody owning end-to-end model lifecycle
Unclear handoffs between teams

Structure matters. Models need owners who care about production performance, not just training accuracy.

MLOps Infrastructure

Essential Components

Experiment tracking: Track parameters, metrics, and artifacts from training runs. Tools: MLflow, Weights & Biases, Neptune

Model registry: Store, version, and manage models. Tools: MLflow, cloud-native registries

Feature store: Centralized feature management. Tools: Feast, Tecton, cloud-native options

Orchestration: Coordinate training and deployment pipelines. Tools: Airflow, Kubeflow Pipelines, Prefect

Serving: Deploy and run models in production. Tools: Seldon, KServe, cloud-native endpoints

Monitoring: Track model and data drift, performance. Tools: Evidently, WhyLabs, custom solutions

Build vs Buy

Most teams shouldn’t build MLOps infrastructure from scratch.

Use managed services when:

Speed matters
Team is small
Use cases are standard
Budget available

Build custom when:

Specific requirements not met by tools
Scale justifies investment
In-house expertise available
Vendor lock-in concerns

The ecosystem is maturing rapidly. What required custom builds two years ago may have managed options now.

Getting Started with MLOps

If You Have No MLOps

Start with experiment tracking (it’s the foundation)
Add model versioning and registry
Automate training pipelines
Add basic monitoring
Expand incrementally

Don’t try to build everything at once.

If You Have Basic MLOps

Identify pain points (what breaks most often?)
Add feature store if feature engineering is a bottleneck
Improve monitoring and alerting
Automate more of the deployment process
Build governance processes

If ML Is Business Critical

Audit your current state against best practices
Invest in reliability and governance
Build organizational capability, not just tools
Plan for scale

AI and Data Architecture

AI Governance

Data Foundations

Building Data Teams - Hiring for ML capabilities
Data Platform Scaling - Infrastructure for ML at scale
What Is Technical Debt? - ML technical debt patterns

Get Help

MLOps sits at the intersection of data engineering, software engineering, and data science. Getting it right requires expertise across all three.

If you’re trying to get ML models to production reliably, or struggling with models that degrade in production, book a call to discuss your challenges.

The Short Version#

Why MLOps Matters#

The Deployment Gap#

ML Systems Are Different#

Core MLOps Concepts#

Model Versioning#

Feature Engineering#

Model Serving#

Model Monitoring#

Model Retraining#

MLOps Maturity Levels#

Level 0: Manual#

Level 1: ML Pipeline Automation#

Level 2: CI/CD for ML#

Common MLOps Challenges#

Training-Serving Skew#

Reproducibility#

Data Dependencies#

Organizational Challenges#

MLOps Infrastructure#

Essential Components#

Build vs Buy#

Getting Started with MLOps#

If You Have No MLOps#

If You Have Basic MLOps#

If ML Is Business Critical#

Related Reading#

AI and Data Architecture#

AI Governance#

Data Foundations#

Related Topics#

Get Help#