XGBoost (Python) Cheat Sheet

01

Install & importsetup

pip install xgboostfull package incl. GPU (CUDA) support
pip install xgboost-cpusmaller wheel, no GPU / federated learning
import xgboost as xgbconventional alias
xgb.__version__check installed version (current: 3.2)
conda install -c conda-forge py-xgboostconda auto-detects GPU variant

02

DMatrix — native data container★setup

xgb.DMatrix(X, label=y)core input; wraps NumPy / pandas / SciPy sparse
DMatrix(X, weight=w, base_margin=m)per-sample weights, custom initial score
xgb.QuantileDMatrix(X, y, ref=None)pre-binned; faster + lower memory init for hist
ExtMemQuantileDMatrix(iterator)out-of-core / external-memory, TB-scale (3.0+)
dtrain.save_binary('train.dmatrix')cache preprocessed matrix to disk

03

Categorical & missing valuessetup

DMatrix(X, enable_categorical=True)auto-detects pandas category dtype
tree_method='hist'required for native categorical splits
missing=np.nandefault; NaN handled natively (learned split dir.)
Auto-recoding (3.1+)Booster stores training categories, re-codes new/unseen values at inference automatically

04

Parameters dict & booster type★core

params = {'max_depth':6,'eta':0.3,'objective':'binary:logistic'}plain dict or list of (key, value) pairs
booster: 'gbtree'default — additive regression trees
booster: 'dart'gbtree + dropout of trees each round
booster: 'gblinear'linear base learner (deprecated since 3.3)
device: 'cpu' | 'cuda' | 'cuda:0'replaces removed gpu_id/gpu_hist

05

xgb.train() — training loop★core

bst = xgb.train(params, dtrain, num_boost_round=100)returns a fitted Booster
evals=[(dtrain,'train'), (dval,'eval')]watchlist, printed / logged each round
early_stopping_rounds=20stop if last eval metric stalls
evals_result={}dict populated with per-round metric history
verbose_eval=10print every N rounds (False to silence)

06

Booster methodscore

bst.predict(dtest)inference on a DMatrix
bst.save_model('model.json')JSON/UBJSON — see card 25
bst.get_score(importance_type='gain')per-feature importance dict
bst.num_boosted_rounds()trees actually built
bst.best_iteration / bst.best_scoreset when early stopping triggers

07

predict() optionscore

output_margin=Trueraw score, before the link function
pred_contribs=TrueSHAP values, one per feature + bias
pred_interactions=Truepairwise SHAP interaction values
iteration_range=(0, bst.best_iteration+1)use only first N trees

08

XGBClassifier★sklearn

clf = XGBClassifier(n_estimators=300, max_depth=6, learning_rate=0.05, tree_method='hist')sklearn-compatible estimator
clf.fit(X_tr, y_tr, eval_set=[(X_val,y_val)])early_stopping_rounds passed in constructor
clf.predict(X_test)class labels
clf.predict_proba(X_test)class probabilities

09

XGBRegressor★sklearn

reg = XGBRegressor(objective='reg:squarederror')default squared-error regression
reg.fit(X, y)standard sklearn fit/predict
objective='reg:absoluteerror'MAE — robust to outliers
objective='reg:quantileerror'with quantile_alpha=[0.1,0.5,0.9]
objective='reg:pseudohubererror'smooth Huber-style loss

10

Ranker & random-forest variantssklearn

XGBRanker(objective='rank:ndcg')needs qid or group per query
XGBRFClassifier() / XGBRFRegressor()single-round bagged forest, not boosting
subsample=0.8, colsample_bynode=0.8RF defaults, bootstrap-style sampling
n_estimatorshere = trees in the forest (no shrinkage)

11

Scikit-learn interopsklearn

Pipeline([('sc',StandardScaler()),('xgb',clf)])drop-in step; trees don't need scaling though
GridSearchCV(clf, param_grid, cv=5)standard hyperparameter search
clf.get_booster()underlying native Booster
clf.feature_importances_array aligned to gain by default

12

Tree growth control★tree

max_depth = 6default; deeper → more complex, more overfit risk
min_child_weight = 1min sum Hessian in a child; ↑ = more conservative
gamma / min_split_loss = 0min loss reduction required to split a leaf
max_leaves = 0cap on leaves, used with grow_policy='lossguide'
grow_policy: 'depthwise' | 'lossguide'level-wise (default) vs best-gain-first

13

Sampling (prevents overfitting)★tree

subsample = 1row-sample ratio per tree; try 0.6–0.9
colsample_bytree = 1column-sample once per tree
colsample_bylevel / colsample_bynodere-sample columns per depth level / per split
sampling_method: 'uniform' | 'gradient_based'gradient-based needs GPU (hist)

14

Learning-rate controltree

eta / learning_rate = 0.3shrinkage per round; typical range 0.01–0.3
num_boost_round / n_estimatorstree count — pair a lower eta with a higher count
max_delta_step = 0helps logistic regression on imbalanced classes

15

tree_method & devicetree

tree_method: 'hist'default since 2.0; fast histogram binning, needed for categorical/GPU
tree_method: 'exact' | 'approx'exact = greedy, no binning; approx = global sketch
device: 'cuda' / 'cuda:0'single/multi-GPU; combine with tree_method='hist'
multi_strategy: 'one_output_per_tree' | 'multi_output_tree'vector-leaf multi-target trees

16

Regularization★reg

reg_lambda / lambda = 1L2 penalty on leaf weights (Ω term)
reg_alpha / alpha = 0L1 penalty; pushes weak leaves toward 0
scale_pos_weightimbalanced binary classes ≈ #neg / #pos
base_scoreinitial prediction; auto-estimated per objective since 3.1

17

Objective functions★reg

binary:logisticbinary classification → probability
multi:softmax / multi:softprobmulticlass; needs num_class
reg:squarederror / reg:absoluteerrorMSE / MAE regression
rank:ndcg / rank:pairwise / rank:maplearning-to-rank objectives
survival:cox / survival:aftsurvival analysis
count:poisson · reg:tweediecount data / zero-inflated continuous

18

eval_metricreg

rmse · mae · logloss · error · mloglossstandard regression / classification metrics
auc · aucpr · ndcg@k · mapranking-oriented metrics
eval_metric=['auc','logloss']multiple metrics; last one drives early stopping
custom_metric=fn(y_pred, dtrain)returns (name, value); use with maximize=

19

Early stopping★train

early_stopping_rounds=20stop after N rounds with no improvement
bst.best_iteration / clf.best_iterationround to use at inference time
requires evals / eval_setat least one validation watch pair
maximize=Trueflip direction for metrics like AUC/NDCG

20

Cross-validation★train

xgb.cv(params, dtrain, num_boost_round=500, nfold=5, early_stopping_rounds=20, seed=42)returns a DataFrame — train/test mean ± std per round
stratified=Truepreserve class ratios across folds
cross_val_score(clf, X, y, cv=5)via the sklearn wrapper instead

21

Callbackstrain

xgb.callback.EarlyStopping(rounds=20, save_best=True)object form of early stopping
xgb.callback.LearningRateScheduler(fn)vary eta across rounds
xgb.callback.TrainingCheckPoint(directory=...)periodic model snapshots
callbacks=[...]pass list to train() or .fit()

22

Feature importance★post

bst.get_score(importance_type='gain')types: weight, gain, cover, total_gain, total_cover
clf.feature_importances_sklearn API array, default gain
xgb.plot_importance(bst, max_num_features=15)quick bar-chart view

23

SHAP values★post

bst.predict(dtest, pred_contribs=True)native, exact SHAP per feature + bias column
shap.TreeExplainer(bst).shap_values(X)via the shap library — richer plots
pred_interactions=Truepairwise SHAP interaction values

24

Plottingpost

xgb.plot_tree(bst, num_trees=0)renders one tree — needs graphviz
xgb.to_graphviz(bst, num_trees=0)returns a Graphviz Source object
xgb.plot_importance(bst)matplotlib importance bar chart

25

Save / load modelspost

bst.save_model('model.json')JSON (or .ubj, default binary) — portable across bindings
bst.load_model('model.json')re-hydrate a Booster
pickle.dump(clf, f)sklearn wrapper — Python-only, less portable
legacy .model binary formatremoved — use JSON/UBJSON going forward

26

GPU & distributedpost

device='cuda', tree_method='hist'single-GPU training
xgboost.dask · xgboost.sparkdistributed multi-node / multi-GPU training
DataIter + ExtMemQuantileDMatrixstream TB-scale data via external memory (3.0+)

★

Most-used defaults, at a glancequick read

max_depth=6 · eta=0.3 · min_child_weight=1defaults — usually the first three to tune
subsample=1 · colsample_bytree=1defaults — lower for regularization
gamma=0 · reg_lambda=1 · reg_alpha=0defaults — raise to fight overfitting
n_estimators=100 (sklearn) · num_boost_round=10 (native)watch out — the two APIs default differently

★

objective → default eval_metricquick read

binary:logistic → logloss
reg:squarederror → rmse
multi:softmax / softprob → mlogloss
rank:ndcg → ndcg
survival:cox → cox-nloglik

XGBoost Cheat Sheet

Signature model — boosting is additive, not voting

Why the defaults are shaped the way they are

1 · split gain

2 · shrinkage vs. λ

3 · early stopping

4 · importance types disagree

Worth memorizing