Quick Reference · ML Experiment Tracking & Model Lifecycle

MLflow cheat sheet

MLflow organises ML work around four components: Tracking (log params, metrics, artifacts every run), Projects (package code so any machine reproduces it), Models (a framework-agnostic format), and the Model Registry (version, alias, and promote to production). Learn the map once and the API stops being a list to memorise.

tracking projects / CLI models / artifacts registry serving destructive most common

Synthesised from: mlflow.org/docs/latest/ml · grafdavid.com/cheatsheets/mlflow · github.com/aishwaryaprabhat/MLflow

From training code to a served model — the full MLflow pipeline
Your Code mlflow.set_experiment() mlflow.start_run() log_param / log_metric YOUR MACHINE log calls Tracking Server REST API (Python/REST/R/Java) local mlruns/ or remote mlflow.set_tracking_uri() Backend Store params · metrics · tags · file/SQL Artifact Store models · plots · files · S3/GCS/Azure register Model Registry versions + aliases (@champion) stages: None→Staging→Prod mlflow.register_model() SHARED SERVER deploy Serve REST batch REGISTRY STAGE LIFECYCLE None Staging Production Archived @champion alias (recommended over stages) mlflow models serve transition_model_version_stage(name, version, stage="Production") · MLflow 2.9+: aliases (@champion) preferred over stage strings client.set_registered_model_alias("MyModel", "champion", version=3) → load via models:/MyModel@champion
01Install & Startpip + first UI
02Experiments & Runsthe lifecycle
03Log Params, Metrics & Tagswhat gets compared
04Artifacts & Modelsfiles & the model itself
05Autologgingzero-config logging
06Query & Compare Runssearch history
07MLflow Projectsreproducible packaging
08Model Registryversion & promote
09Servinglocal · Docker · cloud
10CLI Quick Commandsno Python needed
11Configurationwhere things live
Model URI Schemesanywhere you load or serve

The four MLflow components

Same project, four jobs. Tracking and Models are used on every run; Projects kick in for reproducibility; the Registry manages production promotion.

① Tracking

Logs every run's params, metrics, tags, and artifact files so experiments stay comparable and reproducible.

Run params metrics artifacts UI + search_runs()

② Projects

An MLproject file + env spec packages your code so anyone — or any machine — can reproduce a run exactly.

MLproject conda.yaml train.py entry_points mlflow run any machine any git URL

③ Models

A standard packaging format: one saved model, loadable either as its native flavor or as a generic pyfunc.

model/ sklearn / pytorch native flavor pyfunc.predict() generic loader

④ Model Registry

Centrally versions a named model and tracks which version carries each alias/stage on the road to production.

v1 None v2 Staging v3 @champion models:/Iris@champion

Two ways to log a run

Manual logging gives full control over every captured value; autologging is one line and handles the common case for most popular frameworks with no extra code.

Manual — full control

import mlflow

mlflow.set_experiment("text-clf")

with mlflow.start_run():
    # 1. Log hyperparameters once
    mlflow.log_params({"lr": 0.01, "epochs": 10})

    # 2. Log metrics per step
    for epoch in range(10):
        loss = train(epoch)
        mlflow.log_metric("loss", loss, step=epoch)

    # 3. Save the trained model
    mlflow.sklearn.log_model(model, "model")

    # 4. Optional: tag for filtering
    mlflow.set_tag("framework", "sklearn")

Autolog — one line

import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier

# Patches sklearn BEFORE training starts
mlflow.sklearn.autolog()

with mlflow.start_run():
    model = RandomForestClassifier(n_estimators=100)
    model.fit(X_train, y_train)
    # ↳ params, metrics, model all captured automatically

# ── Nested runs for sweeps ──────────────────────
with mlflow.start_run(run_name="sweep"):
    for lr in [0.01, 0.001, 0.0001]:
        with mlflow.start_run(nested=True):
            mlflow.log_param("lr", lr)
            # … train child trial …

Common issues & fixes

The handful of errors that account for most "MLflow isn't working" moments.

SymptomFix
No such file or directory: mlrunsRun mlflow ui from the directory containing mlruns/, or set MLFLOW_TRACKING_URI explicitly.
Authentication failures (Azure / Databricks)Re-check tokens, service-principal permissions, and that the tracking URI matches the workspace endpoint exactly.
Logging is very slow in training loopsBatch metrics with log_metrics({...}) and log every N steps rather than every step to reduce API round-trips.
Large files silently fail in log_paramUse log_artifact() for files — log_param is for scalar/string values only.
RestException: Run not foundDouble-check experiment ID and run ID — runs are scoped to one experiment and one tracking server.
Permission errors on mlruns/Check filesystem ownership and permissions on the backend store directory or S3 bucket policy.
autolog captures nothingCall mlflow.autolog() or the flavor-specific version before the training call, not inside the loop.

Worth memorising

param oncelog_param errors if you set the same key twice in one run
step= for curvesomit it and only the latest metric value is plotted
alias > stage@champion alias (v2.9+) is the preferred replacement for Production stage string
runs:/ vs models:/runs:/ ties to one training run; models:/ ties to the registry name
autolog() firstcall before start_run() — it must patch the library before training starts
nested=Truegroup all sweep trials under one parent run for clean UI
mlruns/default local backend + artifact store; delete it to start fresh
search_runs() → DataFramepipe it into pandas to rank/filter runs programmatically