MLflow Cheat Sheet

01Install & Startpip + first UI

pip install mlflow★
Core package — tracking, projects, models, registry, serving.
pip install mlflow[extras]
Adds sklearn, pytorch, tensorflow flavor extras.
mlflow ui★
Launch local web UI at localhost:5000; reads local mlruns/.
mlflow ui --port 5001 --host 0.0.0.0
Custom port or expose UI to LAN.
mlflow doctor
Diagnose a broken install — prints env & version info.

02Experiments & Runsthe lifecycle

mlflow.set_experiment("my-exp")★
Create or select an experiment that groups related runs.
with mlflow.start_run():★
Open a run context manager — auto-finalises on exit.
mlflow.start_run(run_name="exp-1")
Attach a human-readable label to the run.
mlflow.start_run(run_id=id)
Re-open an existing run to append data later.
mlflow.start_run(nested=True)★
Create a child run under the active parent — ideal for sweeps.
mlflow.active_run()
Get the currently open run object (returns None if none).
mlflow.end_run()
Manually close a run opened without with.

03Log Params, Metrics & Tagswhat gets compared

mlflow.log_param("lr", 0.01)★
Log one hyperparameter — immutable once set per run.
mlflow.log_params({"lr": 0.01, "bs": 32})★
Bulk-log a dict of params in a single call.
mlflow.log_metric("acc", 0.95)★
Log a numeric scalar; can be called multiple times.
mlflow.log_metric("loss", v, step=epoch)★
Add step to produce a time-series curve in the UI.
mlflow.log_metrics({...}, step=e)
Bulk-log several metrics at once — fewer API round-trips.
mlflow.set_tag("team", "nlp")
Free-form string metadata; filterable in UI and search_runs.
mlflow.set_tags({...})
Bulk tags in one call.
mlflow.log_input(dataset, context="train")
Record dataset provenance for a run (MLflow Dataset API).

04Artifacts & Modelsfiles & the model itself

mlflow.log_artifact("plot.png")★
Upload any file — plot, CSV, config, checkpoint.
mlflow.log_artifacts("./out/")
Upload a whole directory tree to the artifact store.
mlflow.<flavor>.log_model(model, "model")★
Save model using a flavor — sklearn, pytorch, transformers, tensorflow, xgboost…
mlflow.pyfunc.log_model("model", python_model=…)
Log a custom Python model class implementing predict().
mlflow.pyfunc.load_model(uri)★
Load any flavor back as a generic Python function.
mlflow.artifacts.download_artifacts(uri)
Download artifacts from a run or model to local disk.

05Autologgingzero-config logging

mlflow.<flavor>.autolog()★
Auto-capture params, metrics, model — call before start_run().
mlflow.autolog()★
Global autolog — covers all supported frameworks at once.
mlflow.sklearn.autolog()
Scikit-learn: fits, scores, feature importance, model.
mlflow.pytorch.autolog()
PyTorch / Lightning: loss, optimizer, checkpoints per epoch.
mlflow.tensorflow.autolog()
Keras/TF: all callback metrics, model architecture.
mlflow.autolog(log_models=False)
Skip model logging to cut artifact storage costs.

06Query & Compare Runssearch history

mlflow.get_run(run_id)
Fetch one run object — access .data.params, .data.metrics.
mlflow.search_runs(experiment_ids=["1"])★
Returns a pandas DataFrame; combine with filter / order.
filter_string="metrics.acc > 0.9"★
SQL-like syntax: prefix metrics., params., tags., attributes.
order_by=["metrics.acc DESC"]
Sort results — combine with max_results=1 to find best run.
mlflow.search_logged_models(...)v3
Find model checkpoints across experiments by metric/param (MLflow 3+).
mlflow.search_experiments()
List all experiments on the tracking server.
mlflow.delete_run(run_id)soft
Marks deleted; run is recoverable until mlflow gc runs.

07MLflow Projectsreproducible packaging

MLproject★
YAML entry-point file: name, env spec, params, run command.
conda.yaml / python_env.yaml
Pin the exact environment the project runs in.
mlflow run .★
Execute the project locally from current directory.
mlflow run <git-url>
Run directly from a remote Git repository — full reproducibility.
mlflow run . -P lr=0.01
Pass entry-point parameters from the CLI.
mlflow run . --env-manager=local
Skip env creation; use the active interpreter.
mlflow.run(uri, parameters={...})
Programmatic equivalent of the CLI run command.

08Model Registryversion & promote

mlflow.<flav>.log_model(m, "model", registered_model_name="Iris")★
Log + register in one step. Version auto-increments.
mlflow.register_model("runs:/id/model", "Iris")★
Register a previously-logged model by its runs:/ URI.
client.set_registered_model_alias( "Iris", "champion", version=3)★
Point a named alias at a version — preferred over stages.
client.transition_model_version_stage( name="Iris", version=2, stage="Production")
Stage-based promotion (still works, but aliases are current API).
client.set_model_version_tag("Iris", 2, "status", "ok")
Attach a key-value tag to a specific version.
for m in client.search_registered_models():
Iterate over all registered models and their versions.
client.delete_model_version("Iris", 1)permanent
Permanently removes a version — cannot be undone.

09Servinglocal · Docker · cloud

mlflow models serve -m models:/Iris/Production★
Spin up a local REST endpoint — default port 5000.
mlflow models serve -m runs:/<id>/model -p 1234
Serve directly from a run; custom port.
POST /invocations Body: {"instances": [[5.1, 3.5, 1.4, 0.2]]}★
Score via the REST endpoint after serving.
mlflow models build-docker -m x
Package the model as a self-contained Docker image.
mlflow deployments create -t sagemaker
Deploy to SageMaker (or Azure ML, Kubernetes) via plugin.
mlflow.pyfunc.spark_udf(spark, uri)
Wrap a model as a Spark UDF for scalable batch scoring.

10CLI Quick Commandsno Python needed

mlflow server --backend-store-uri sqlite:///mlflow.db★
Full tracking server (UI + API) backed by a database.
mlflow experiments list
List all experiments on the tracking server.
mlflow runs list --experiment-id 1
Show runs for one experiment from the CLI.
mlflow artifacts download -u runs:/<id>/model
Pull artifacts to a local directory.
mlflow gc
Permanently purge soft-deleted runs from the store.

11Configurationwhere things live

mlflow.set_tracking_uri("file:./mlruns")★
Default local store — zero setup needed.
mlflow.set_tracking_uri("http://host:5000")★
Point at a shared remote tracking server.
MLFLOW_TRACKING_URI=http://…
Env-var alternative to set_tracking_uri() — any language.
MLFLOW_EXPERIMENT_NAME=my-exp
Pin the default experiment without code changes.
--backend-store-uri postgresql://…
Production: persist runs in a SQL database.
--default-artifact-root s3://bucket/path
Redirect artifacts to S3, GCS, Azure Blob, or HDFS.

★Model URI Schemesanywhere you load or serve

runs:/<run_id>/model
Artifact logged inside one specific training run.
models:/<name>/<version>
Exact registered version number.
models:/<name>/Production
Latest version in a stage (legacy, still works).
models:/<name>@<alias>★
Version pointed to by alias — e.g. @champion. Preferred.
models:/<model_id>v3
MLflow 3 direct model ID — no run path needed.

Symptom	Fix
`No such file or directory: mlruns`	Run `mlflow ui` from the directory containing `mlruns/`, or set `MLFLOW_TRACKING_URI` explicitly.
Authentication failures (Azure / Databricks)	Re-check tokens, service-principal permissions, and that the tracking URI matches the workspace endpoint exactly.
Logging is very slow in training loops	Batch metrics with `log_metrics({...})` and log every N steps rather than every step to reduce API round-trips.
Large files silently fail in `log_param`	Use `log_artifact()` for files — `log_param` is for scalar/string values only.
`RestException: Run not found`	Double-check experiment ID and run ID — runs are scoped to one experiment and one tracking server.
Permission errors on `mlruns/`	Check filesystem ownership and permissions on the backend store directory or S3 bucket policy.
`autolog` captures nothing	Call `mlflow.autolog()` or the flavor-specific version before the training call, not inside the loop.

MLflow cheat sheet

The four MLflow components

① Tracking

② Projects

③ Models

④ Model Registry

Two ways to log a run

Manual — full control

Autolog — one line

Common issues & fixes

Worth memorising