Quick Reference · gradient-boosted trees

xgboost cheat sheet

Every workflow moves a model through four stages: your data, the DMatrix + parameters you configure, the Booster you train, and the predictions/artifacts you deploy. Learn the pipeline once and the parameters stop being a list to memorize.

data DMatrix & params training predict / deploy gotcha most common

Distilled & cross-checked across: xgboost.readthedocs.io · xgboost.ai · arXiv:1603.02754 · github.com/dmlc/xgboost · geeksforgeeks.org · wikipedia.org

The pipeline & the calls that move a model through it
Data NumPy / pandas / SciPy X, y, categorical dtypes missing values (NaN) DMatrix + params binned & wrapped — "the index" max_depth, eta, objective… Booster trained tree ensemble bst.best_iteration Predictions scores · SHAP · saved model bst.predict(dtest) DMatrix(X,y) xgb.train() .predict() feature_importances_ xgb.cv() / early_stopping_rounds (tunes params from held-out score) BEFORE TRAINING AFTER TRAINING

Underneath every round: minimize Σ l(yᵢ,ŷᵢ) + Σ Ω(fₖ) — a loss term plus a regularization term Ω(f) = γT + ½λ‖w‖² + α‖w‖₁, solved via 2nd-order (gradient + Hessian) Taylor expansion each boosting round.

01Install & Importget set up
02DMatrixthe native container
03Categorical & Missing Datano manual encoding
04Parameter Dict & Booster Typewhat you configure
05Tree Growth Controlshape of each tree
06Sampling — Bagging Rows & Colsfights overfitting
07Learning Rate & Roundsstep size × count
08tree_method & Devicehow splits are found
09Regularizationpenalize complexity
10Objective Functionswhat to minimize
11eval_metrichow to score rounds
12xgb.train() — Native Loopthe core API
13Booster Methodsthe trained object
14XGBClassifiersklearn API
15XGBRegressorsklearn API
16Ranker & Random-Forest Variantsspecialized estimators
17Scikit-learn Interopplugs into the ecosystem
18Early Stoppingstop at the right round
19Cross-Validationrobust round count
20Callbackshook into training
21predict() Optionsshape the output
22Feature Importancewhat mattered
23SHAP Valuesexplain a prediction
24Plottingsee the trees
25Save / Load Modelspersistence
26GPU & Distributedscale out
Most-Used Defaultsat a glance
objective → default eval_metricquick lookup

Four levers, one goal: generalize

Different mechanisms, same purpose — stop the ensemble from memorizing the training set. Complementary, not mutually exclusive.

eta (shrinkage)

Small steps + many rounds beats big steps + few rounds — each tree corrects only a fraction of the remaining error.

eta=0.05, 400 rounds eta=0.3, 30 rounds

subsample / colsample

Each tree only sees a random slice of rows and columns, so no single tree can over-fit one quirky sample or feature.

full data (rows × cols) → one tree's sampled subset subsample=0.8 colsample_bytree=0.8

max_depth / min_child_weight

Shallow trees with a higher child-weight floor generalize better than deep trees chasing every training point.

max_depth=3 generalizes better max_depth=12 memorizes noise

gamma / lambda / alpha

Structural penalty (γ) prunes weak splits outright; L2/L1 (λ/α) shrink surviving leaf weights toward zero.

raw leaf weight before λ, α shrunk weight after λ, α (one pruned by γ)

Worth memorizing

2 APIs, 2 defaultsnative num_boost_round=10 vs. sklearn n_estimators=100
hist is defaultsince 2.0 — required for categorical & GPU training
device replaced gpu_idgpu_id/gpu_hist/use_gpu are gone; use device=
last eval_metric winsonly the final metric in the list drives early stopping
trees skip scalingthreshold splits mean no need to standardize numeric features
NaN handled nativelymissing values learn a default split direction — no imputation
categorical re-codingsince 3.1, the Booster stores & auto re-codes training categories
lower eta ⇒ more roundsshrinking eta needs a proportionally larger num_boost_round
save as JSON/UBJSONlegacy binary .model format is gone; UBJSON is now default
weight ≠ gain ≠ coverthree importance types can rank features differently