pip install xgboost★Full package, incl. GPU (CUDA) support.pip install xgboost-cpuSmaller wheel — no GPU / federated learning.import xgboost as xgb★Conventional alias.xgb.__version__Check installed version (current: 3.2).conda install -c conda-forge py-xgboostConda auto-detects the GPU variant.
xgb.DMatrix(X, label=y)★Core input; wraps NumPy / pandas / SciPy sparse.DMatrix(X, weight=w, base_margin=m)Per-sample weights, custom initial score.xgb.QuantileDMatrix(X, y, ref=None)Pre-binned; faster + lower-memory init forhist.ExtMemQuantileDMatrix(iterator)Out-of-core / external-memory, TB-scale (3.0+).dtrain.save_binary('train.dmatrix')Cache the preprocessed matrix to disk.
DMatrix(X, enable_categorical=True)★Auto-detects pandascategorydtype.tree_method must be 'hist'gotchaNative categorical splits silently needhist/approx.missing=np.nanDefault; NaN handled natively — no imputation needed.Auto-recoding (3.1+)Booster stores training categories, re-codes new/unseen values at inference automatically.
params = {'max_depth':6, 'eta':0.3, 'objective':'binary:logistic'}★Plain dict, or a list of (key, value) pairs.booster: 'gbtree'Default — additive regression trees.booster: 'dart'gbtree+ dropout of trees each round.booster: 'gblinear'Linear base learner (deprecated since 3.3).device: 'cpu' | 'cuda' | 'cuda:0'Replaces the removedgpu_id/gpu_hist.
max_depth = 6★Default; deeper → more complex, more overfit risk.min_child_weight = 1★Min sum-Hessian in a child; ↑ = more conservative.gamma / min_split_loss = 0Min loss reduction required to split a leaf.max_leaves = 0Cap on leaves; used withgrow_policy='lossguide'.grow_policy: 'depthwise' | 'lossguide'Level-wise (default) vs. best-gain-first.
subsample = 1★Row-sample ratio per tree; try 0.6–0.9.colsample_bytree = 1★Column-sample once per tree.colsample_bylevel / colsample_bynodeRe-sample columns per depth level / per split.sampling_method: 'uniform' | 'gradient_based'gradient_basedrequires GPU (hist).
eta / learning_rate = 0.3★Shrinkage per round; typical range 0.01–0.3.num_boost_round / n_estimatorsTree count — pair a lower eta with a higher count.max_delta_step = 0Helps logistic regression on imbalanced classes.
tree_method: 'hist'★Default since 2.0; fast histogram binning.tree_method: 'exact' | 'approx'Exact = greedy, no binning; approx = global sketch.device: 'cuda' / 'cuda:0'Single/multi-GPU; pair withtree_method='hist'.gpu_id, gpu_hist, use_gpuremovedDeprecated params removed in 3.1 — usedevice=.
reg_lambda / lambda = 1★L2 penalty on leaf weights (the Ω term).reg_alpha / alpha = 0★L1 penalty; pushes weak leaves toward zero.scale_pos_weightImbalanced binary classes ≈ #neg / #pos.base_scoreInitial prediction; auto-estimated per objective since 3.1.
binary:logistic★Binary classification → probability.multi:softmax / multi:softprob★Multiclass; needsnum_class.reg:squarederror / reg:absoluteerror★MSE / MAE regression.rank:ndcg / rank:pairwise / rank:mapLearning-to-rank objectives.survival:cox / survival:aftSurvival analysis.count:poisson · reg:tweedieCount data / zero-inflated continuous targets.
rmse · mae · logloss · error · mloglossStandard regression / classification metrics.auc · aucpr · ndcg@k · mapRanking-oriented metrics.eval_metric=['auc','logloss']gotchaMultiple metrics OK — but the last one drives early stopping.custom_metric=fn(y_pred, dtrain)Returns(name, value); pair withmaximize=.
bst = xgb.train(params, dtrain, num_boost_round=100)★Returns a fittedBooster.evals=[(dtrain,'train'), (dval,'eval')]★Watchlist, printed / logged each round.early_stopping_rounds=20★Stop if last eval metric stalls.evals_result={}Dict populated with per-round metric history.verbose_eval=10Print every N rounds (Falseto silence).
bst.num_boosted_rounds()Trees actually built.bst.best_iteration / bst.best_score★Set when early stopping triggers.bst.get_score(importance_type='gain')Per-feature importance dict.bst.save_model('model.json')JSON/UBJSON — see card 25.
clf = XGBClassifier(n_estimators=300, max_depth=6,★
learning_rate=0.05, tree_method='hist')Sklearn-compatible estimator.clf.fit(X_tr, y_tr, eval_set=[(X_val,y_val)])★early_stopping_roundspassed in the constructor.clf.predict(X_test)Class labels.clf.predict_proba(X_test)Class probabilities.
reg = XGBRegressor(objective='reg:squarederror')★Default squared-error regression.reg.fit(X, y)Standard sklearnfit/predict.objective='reg:absoluteerror'MAE — robust to outliers.objective='reg:quantileerror'Withquantile_alpha=[0.1,0.5,0.9].objective='reg:pseudohubererror'Smooth Huber-style loss.
XGBRanker(objective='rank:ndcg')Needsqidorgroupper query.XGBRFClassifier() / XGBRFRegressor()Single-round bagged forest, not boosting.subsample=0.8, colsample_bynode=0.8RF defaults — bootstrap-style sampling.n_estimatorsHere = trees in the forest (no shrinkage).
Pipeline([('sc',StandardScaler()),('xgb',clf)])Drop-in step (trees don't need scaling though).GridSearchCV(clf, param_grid, cv=5)Standard hyperparameter search.clf.get_booster()Underlying nativeBooster.clf.feature_importances_Array aligned togainby default.
early_stopping_rounds=20★Stop after N rounds with no improvement.bst.best_iteration / clf.best_iterationRound to use at inference time.requires evals / eval_setgotchaSilently does nothing without a validation watch pair.maximize=TrueFlip direction for metrics like AUC/NDCG.
xgb.cv(params, dtrain, num_boost_round=500,★
nfold=5, early_stopping_rounds=20)Returns a DataFrame — train/test mean ± std per round.stratified=True★Preserve class ratios across folds.cross_val_score(clf, X, y, cv=5)Via the sklearn wrapper instead.
xgb.callback.EarlyStopping(rounds=20, save_best=True)Object form of early stopping.xgb.callback.LearningRateScheduler(fn)Varyetaacross rounds.xgb.callback.TrainingCheckPoint(directory=...)Periodic model snapshots.callbacks=[...]Pass list totrain()or.fit().
output_margin=TrueRaw score, before the link function.pred_contribs=True★SHAP values, one per feature + bias.pred_interactions=TruePairwise SHAP interaction values.iteration_range=(0, bst.best_iteration+1)Use only the first N trees.
bst.get_score(importance_type='gain')★Types:weight, gain, cover, total_gain, total_cover.clf.feature_importances_Sklearn API array, defaultgain.xgb.plot_importance(bst, max_num_features=15)Quick bar-chart view.
bst.predict(dtest, pred_contribs=True)★Native, exact SHAP per feature + bias column.shap.TreeExplainer(bst).shap_values(X)Via theshaplibrary — richer plots.pred_interactions=TruePairwise SHAP interaction values.
xgb.plot_tree(bst, num_trees=0)Renders one tree — needsgraphviz.xgb.to_graphviz(bst, num_trees=0)Returns a GraphvizSourceobject.xgb.plot_importance(bst)Matplotlib importance bar chart.
bst.save_model('model.json')★JSON (or.ubjdefault binary) — portable across bindings.bst.load_model('model.json')Re-hydrate a Booster.pickle.dump(clf, f)Sklearn wrapper — Python-only, less portable.legacy .model binary formatremovedUse JSON/UBJSON going forward.
device='cuda', tree_method='hist'Single-GPU training.xgboost.dask · xgboost.sparkDistributed multi-node / multi-GPU training.DataIter + ExtMemQuantileDMatrixStream TB-scale data via external memory (3.0+).
max_depth=6 · eta=0.3 · min_child_weight=1Usually the first three to tune.subsample=1 · colsample_bytree=1Lower for regularization.gamma=0 · reg_lambda=1 · reg_alpha=0Raise to fight overfitting.n_estimators=100 (sklearn) vs num_boost_round=10 (native)Watch out — the two APIs default differently.
binary:logistic → loglossreg:squarederror → rmsemulti:softmax / softprob → mloglossrank:ndcg → ndcgsurvival:cox → cox-nloglik