Quick Reference · gradient-boosted trees · single source of truth

xgboost cheat sheet

Every workflow moves a model through four stages: your data, the DMatrix + parameters you configure, the Booster you train, and the predictions/artifacts you deploy. Learn the pipeline once and the parameters stop being a list to memorize. Covers the native Booster API, the scikit-learn wrapper, every booster type, constraints, custom objectives, distributed training, and best practices — current through XGBoost 3.2 (Feb 2026).

data DMatrix & params training predict / deploy gotcha most common

Distilled & cross-checked across: xgboost.readthedocs.io (parameter reference · Python API · tutorials · release notes 3.0–3.2) · xgboost.ai · arXiv:1603.02754 · github.com/dmlc/xgboost · geeksforgeeks.org · wikipedia.org

The pipeline & the calls that move a model through it
Data NumPy / pandas / Arrow / Polars / SciPy / cuDF X, y, categories, NaN DMatrix + params binned & wrapped — "the index" max_depth, eta, objective… Booster trained tree ensemble bst.best_iteration Predictions scores · SHAP · saved model bst.predict(dtest) DMatrix(X,y) xgb.train() .predict() feature_importances_ xgb.cv() / early_stopping_rounds (tunes params from held-out score) BEFORE TRAINING AFTER TRAINING

Underneath every round: minimize Σ l(yᵢ,ŷᵢ) + Σ Ω(fₖ) — a loss term plus a regularization term Ω(f) = γT + ½λ‖w‖² + α‖w‖₁, solved via 2nd-order (gradient + Hessian) Taylor expansion each boosting round.

Data in — containers, formats, categories, missing values

Everything that happens before a single tree is grown.

01Install & Importget set up
02Global Configurationlibrary-wide switches
03DMatrixthe native container
04QuantileDMatrix & External Memorybig-data containers
05Accepted Input Formatswhat X can be
06Categorical Datano manual encoding
07Missing Values & Sparsitysparsity-aware splits

Configure — booster types, tree shape, sampling, constraints

The parameter dict: what each knob does and its default.

08Parameter Dict & Booster Typewhat you configure
09Tree Growth Controlshape of each tree
10Sampling — Bagging Rows & Colsfights overfitting
11Learning Rate, Rounds & Forestsstep size × count
12tree_method, Device & Updatershow splits are found
13Regularizationpenalize complexity
14Monotone & Interaction Constraintsdomain knowledge in
15DART Boosterdropout for trees
16gblinear Boosterlinear base learner
17Multi-Target & Vector Leafmany outputs, one model

Objectives, metrics & custom losses

What the trees minimize, and how progress is scored.

18Objectives — Classification & Regressionthe everyday set
19Objectives — Specializedrank · survival · counts
20eval_metrichow to score rounds
21Custom Objective & Metricbring your own loss

Train — native loop, sklearn wrappers, stopping, CV, continuation

Both APIs, and everything that controls the boosting loop.

22xgb.train() — Native Loopthe core API
23Early Stoppingstop at the right round
24Cross-Validationrobust round count
25Callbackshook into training
26Training Continuationwarm starts
27XGBClassifiersklearn API
28XGBRegressorsklearn API
29Ranker & Random-Forest Variantsspecialized estimators
30Scikit-learn Interopplugs into the ecosystem

Predict, explain, inspect & persist

Everything you do with a trained Booster.

31predict() Optionsshape the output
32inplace_predict()serving-path inference
33Model Inspectionopen the box
34Booster Slicing & Attributesthe model as an object
35Feature Importancewhat mattered
36SHAP Valuesexplain a prediction
37Plottingsee the trees
38Save / Load Modelspersistence

Scale out & best practices

GPU, Dask, Spark — and the tuning playbook that ties it together.

39GPU Trainingdevice='cuda'
40Dask — xgboost.daskmulti-node Python
41Spark — xgboost.sparkPySpark pipelines
42Ecosystem & Beyondwho plays well with it
43Tuning Playbookofficial guidance, condensed
44Performance Best Practicesspeed & memory
Most-Used Defaultsat a glance
objective → default eval_metricquick lookup
Version Milestoneswhat changed when

Four levers, one goal: generalize

Different mechanisms, same purpose — stop the ensemble from memorizing the training set. Complementary, not mutually exclusive.

eta (shrinkage)

Small steps + many rounds beats big steps + few rounds — each tree corrects only a fraction of the remaining error.

eta=0.05, 400 rounds eta=0.3, 30 rounds

subsample / colsample

Each tree only sees a random slice of rows and columns, so no single tree can over-fit one quirky sample or feature.

full data (rows × cols) → one tree's sampled subset subsample=0.8 colsample_bytree=0.8

max_depth / min_child_weight

Shallow trees with a higher child-weight floor generalize better than deep trees chasing every training point.

max_depth=3 generalizes better max_depth=12 memorizes noise

gamma / lambda / alpha

Structural penalty (γ) prunes weak splits outright; L2/L1 (λ/α) shrink surviving leaf weights toward zero.

raw leaf weight before λ, α shrunk weight after λ, α (one pruned by γ)

Worth memorizing

2 APIs, 2 defaultsnative num_boost_round=10 vs. sklearn n_estimators=100
hist is defaultsince 2.0 — required for categorical & GPU training
device replaced gpu_idgpu_id/gpu_hist/use_gpu are gone; use device=
last eval_metric winsonly the final metric in the list drives early stopping
train() keeps last iternot the best — slice with iteration_range or save_best=True
sklearn ES in constructorearly_stopping_rounds / eval_metric moved off fit() in 2.0
trees skip scalingthreshold splits mean no need to standardize numeric features
NaN handled nativelymissing values learn a default split direction — no imputation
categorical re-codingsince 3.1, the Booster stores & auto re-codes training categories
QDM val needs ref=QuantileDMatrix validation sets must reference the train matrix
lower eta ⇒ more roundsshrinking eta needs a proportionally larger num_boost_round
save as JSON/UBJSONlegacy binary .model is gone; pickle isn't version-safe
weight ≠ gain ≠ coverthree importance types can rank features differently
inplace_predict servesthread-safe, no DMatrix — predict() holds a per-Booster lock