XGBoost (Python) Cheat Sheet

01Install & Importget set up

pip install xgboost★
Full package, incl. GPU (CUDA) support.
pip install xgboost-cpu
Smaller wheel — no GPU / federated learning.
import xgboost as xgb★
Conventional alias.
xgb.__version__
Check installed version (current: 3.2).
conda install -c conda-forge py-xgboost
Conda auto-detects the GPU variant.

02DMatrixthe native container

xgb.DMatrix(X, label=y)★
Core input; wraps NumPy / pandas / SciPy sparse.
DMatrix(X, weight=w, base_margin=m)
Per-sample weights, custom initial score.
xgb.QuantileDMatrix(X, y, ref=None)
Pre-binned; faster + lower-memory init for hist.
ExtMemQuantileDMatrix(iterator)
Out-of-core / external-memory, TB-scale (3.0+).
dtrain.save_binary('train.dmatrix')
Cache the preprocessed matrix to disk.

03Categorical & Missing Datano manual encoding

DMatrix(X, enable_categorical=True)★
Auto-detects pandas category dtype.
tree_method must be 'hist'gotcha
Native categorical splits silently need hist/approx.
missing=np.nan
Default; NaN handled natively — no imputation needed.
Auto-recoding (3.1+)
Booster stores training categories, re-codes new/unseen values at inference automatically.

04Parameter Dict & Booster Typewhat you configure

params = {'max_depth':6, 'eta':0.3, 'objective':'binary:logistic'}★
Plain dict, or a list of (key, value) pairs.
booster: 'gbtree'
Default — additive regression trees.
booster: 'dart'
gbtree + dropout of trees each round.
booster: 'gblinear'
Linear base learner (deprecated since 3.3).
device: 'cpu' | 'cuda' | 'cuda:0'
Replaces the removed gpu_id/gpu_hist.

05Tree Growth Controlshape of each tree

max_depth = 6★
Default; deeper → more complex, more overfit risk.
min_child_weight = 1★
Min sum-Hessian in a child; ↑ = more conservative.
gamma / min_split_loss = 0
Min loss reduction required to split a leaf.
max_leaves = 0
Cap on leaves; used with grow_policy='lossguide'.
grow_policy: 'depthwise' | 'lossguide'
Level-wise (default) vs. best-gain-first.

06Sampling — Bagging Rows & Colsfights overfitting

subsample = 1★
Row-sample ratio per tree; try 0.6–0.9.
colsample_bytree = 1★
Column-sample once per tree.
colsample_bylevel / colsample_bynode
Re-sample columns per depth level / per split.
sampling_method: 'uniform' | 'gradient_based'
gradient_based requires GPU (hist).

07Learning Rate & Roundsstep size × count

eta / learning_rate = 0.3★
Shrinkage per round; typical range 0.01–0.3.
num_boost_round / n_estimators
Tree count — pair a lower eta with a higher count.
max_delta_step = 0
Helps logistic regression on imbalanced classes.

08tree_method & Devicehow splits are found

tree_method: 'hist'★
Default since 2.0; fast histogram binning.
tree_method: 'exact' | 'approx'
Exact = greedy, no binning; approx = global sketch.
device: 'cuda' / 'cuda:0'
Single/multi-GPU; pair with tree_method='hist'.
gpu_id, gpu_hist, use_gpuremoved
Deprecated params removed in 3.1 — use device=.

09Regularizationpenalize complexity

reg_lambda / lambda = 1★
L2 penalty on leaf weights (the Ω term).
reg_alpha / alpha = 0★
L1 penalty; pushes weak leaves toward zero.
scale_pos_weight
Imbalanced binary classes ≈ #neg / #pos.
base_score
Initial prediction; auto-estimated per objective since 3.1.

10Objective Functionswhat to minimize

binary:logistic★
Binary classification → probability.
multi:softmax / multi:softprob★
Multiclass; needs num_class.
reg:squarederror / reg:absoluteerror★
MSE / MAE regression.
rank:ndcg / rank:pairwise / rank:map
Learning-to-rank objectives.
survival:cox / survival:aft
Survival analysis.
count:poisson · reg:tweedie
Count data / zero-inflated continuous targets.

11eval_metrichow to score rounds

rmse · mae · logloss · error · mlogloss
Standard regression / classification metrics.
auc · aucpr · ndcg@k · map
Ranking-oriented metrics.
eval_metric=['auc','logloss']gotcha
Multiple metrics OK — but the last one drives early stopping.
custom_metric=fn(y_pred, dtrain)
Returns (name, value); pair with maximize=.

12xgb.train() — Native Loopthe core API

bst = xgb.train(params, dtrain, num_boost_round=100)★
Returns a fitted Booster.
evals=[(dtrain,'train'), (dval,'eval')]★
Watchlist, printed / logged each round.
early_stopping_rounds=20★
Stop if last eval metric stalls.
evals_result={}
Dict populated with per-round metric history.
verbose_eval=10
Print every N rounds (False to silence).

13Booster Methodsthe trained object

bst.num_boosted_rounds()
Trees actually built.
bst.best_iteration / bst.best_score★
Set when early stopping triggers.
bst.get_score(importance_type='gain')
Per-feature importance dict.
bst.save_model('model.json')
JSON/UBJSON — see card 25.

14XGBClassifiersklearn API

clf = XGBClassifier(n_estimators=300, max_depth=6, learning_rate=0.05, tree_method='hist')★
Sklearn-compatible estimator.
clf.fit(X_tr, y_tr, eval_set=[(X_val,y_val)])★
early_stopping_rounds passed in the constructor.
clf.predict(X_test)
Class labels.
clf.predict_proba(X_test)
Class probabilities.

15XGBRegressorsklearn API

reg = XGBRegressor(objective='reg:squarederror')★
Default squared-error regression.
reg.fit(X, y)
Standard sklearn fit/predict.
objective='reg:absoluteerror'
MAE — robust to outliers.
objective='reg:quantileerror'
With quantile_alpha=[0.1,0.5,0.9].
objective='reg:pseudohubererror'
Smooth Huber-style loss.

16Ranker & Random-Forest Variantsspecialized estimators

XGBRanker(objective='rank:ndcg')
Needs qid or group per query.
XGBRFClassifier() / XGBRFRegressor()
Single-round bagged forest, not boosting.
subsample=0.8, colsample_bynode=0.8
RF defaults — bootstrap-style sampling.
n_estimators
Here = trees in the forest (no shrinkage).

17Scikit-learn Interopplugs into the ecosystem

Pipeline([('sc',StandardScaler()),('xgb',clf)])
Drop-in step (trees don't need scaling though).
GridSearchCV(clf, param_grid, cv=5)
Standard hyperparameter search.
clf.get_booster()
Underlying native Booster.
clf.feature_importances_
Array aligned to gain by default.

18Early Stoppingstop at the right round

early_stopping_rounds=20★
Stop after N rounds with no improvement.
bst.best_iteration / clf.best_iteration
Round to use at inference time.
requires evals / eval_setgotcha
Silently does nothing without a validation watch pair.
maximize=True
Flip direction for metrics like AUC/NDCG.

19Cross-Validationrobust round count

xgb.cv(params, dtrain, num_boost_round=500, nfold=5, early_stopping_rounds=20)★
Returns a DataFrame — train/test mean ± std per round.
stratified=True★
Preserve class ratios across folds.
cross_val_score(clf, X, y, cv=5)
Via the sklearn wrapper instead.

20Callbackshook into training

xgb.callback.EarlyStopping(rounds=20, save_best=True)
Object form of early stopping.
xgb.callback.LearningRateScheduler(fn)
Vary eta across rounds.
xgb.callback.TrainingCheckPoint(directory=...)
Periodic model snapshots.
callbacks=[...]
Pass list to train() or .fit().

21predict() Optionsshape the output

output_margin=True
Raw score, before the link function.
pred_contribs=True★
SHAP values, one per feature + bias.
pred_interactions=True
Pairwise SHAP interaction values.
iteration_range=(0, bst.best_iteration+1)
Use only the first N trees.

22Feature Importancewhat mattered

bst.get_score(importance_type='gain')★
Types: weight, gain, cover, total_gain, total_cover.
clf.feature_importances_
Sklearn API array, default gain.
xgb.plot_importance(bst, max_num_features=15)
Quick bar-chart view.

23SHAP Valuesexplain a prediction

bst.predict(dtest, pred_contribs=True)★
Native, exact SHAP per feature + bias column.
shap.TreeExplainer(bst).shap_values(X)
Via the shap library — richer plots.
pred_interactions=True
Pairwise SHAP interaction values.

24Plottingsee the trees

xgb.plot_tree(bst, num_trees=0)
Renders one tree — needs graphviz.
xgb.to_graphviz(bst, num_trees=0)
Returns a Graphviz Source object.
xgb.plot_importance(bst)
Matplotlib importance bar chart.

25Save / Load Modelspersistence

bst.save_model('model.json')★
JSON (or .ubj default binary) — portable across bindings.
bst.load_model('model.json')
Re-hydrate a Booster.
pickle.dump(clf, f)
Sklearn wrapper — Python-only, less portable.
legacy .model binary formatremoved
Use JSON/UBJSON going forward.

26GPU & Distributedscale out

device='cuda', tree_method='hist'
Single-GPU training.
xgboost.dask · xgboost.spark
Distributed multi-node / multi-GPU training.
DataIter + ExtMemQuantileDMatrix
Stream TB-scale data via external memory (3.0+).

★Most-Used Defaultsat a glance

max_depth=6 · eta=0.3 · min_child_weight=1
Usually the first three to tune.
subsample=1 · colsample_bytree=1
Lower for regularization.
gamma=0 · reg_lambda=1 · reg_alpha=0
Raise to fight overfitting.
n_estimators=100 (sklearn) vs num_boost_round=10 (native)
Watch out — the two APIs default differently.

★objective → default eval_metricquick lookup

binary:logistic → logloss
reg:squarederror → rmse
multi:softmax / softprob → mlogloss
rank:ndcg → ndcg
survival:cox → cox-nloglik

xgboost cheat sheet

Four levers, one goal: generalize

eta (shrinkage)

subsample / colsample

max_depth / min_child_weight

gamma / lambda / alpha

Worth memorizing