pip install xgboostfull package incl. GPU (CUDA) supportpip install xgboost-cpusmaller wheel, no GPU / federated learningimport xgboost as xgbconventional aliasxgb.__version__check installed version (current: 3.2)conda install -c conda-forge py-xgboostconda auto-detects GPU variant
xgb.DMatrix(X, label=y)core input; wraps NumPy / pandas / SciPy sparseDMatrix(X, weight=w, base_margin=m)per-sample weights, custom initial scorexgb.QuantileDMatrix(X, y, ref=None)pre-binned; faster + lower memory init forhistExtMemQuantileDMatrix(iterator)out-of-core / external-memory, TB-scale (3.0+)dtrain.save_binary('train.dmatrix')cache preprocessed matrix to disk
DMatrix(X, enable_categorical=True)auto-detects pandascategorydtypetree_method='hist'required for native categorical splitsmissing=np.nandefault; NaN handled natively (learned split dir.)Auto-recoding (3.1+)Booster stores training categories, re-codes new/unseen values at inference automatically
params = {'max_depth':6,'eta':0.3,'objective':'binary:logistic'}plain dict or list of (key, value) pairsbooster: 'gbtree'default — additive regression treesbooster: 'dart'gbtree + dropout of trees each roundbooster: 'gblinear'linear base learner (deprecated since 3.3)device: 'cpu' | 'cuda' | 'cuda:0'replaces removedgpu_id/gpu_hist
bst = xgb.train(params, dtrain, num_boost_round=100)returns a fittedBoosterevals=[(dtrain,'train'), (dval,'eval')]watchlist, printed / logged each roundearly_stopping_rounds=20stop if last eval metric stallsevals_result={}dict populated with per-round metric historyverbose_eval=10print every N rounds (False to silence)
bst.predict(dtest)inference on a DMatrixbst.save_model('model.json')JSON/UBJSON — see card 25bst.get_score(importance_type='gain')per-feature importance dictbst.num_boosted_rounds()trees actually builtbst.best_iteration / bst.best_scoreset when early stopping triggers
output_margin=Trueraw score, before the link functionpred_contribs=TrueSHAP values, one per feature + biaspred_interactions=Truepairwise SHAP interaction valuesiteration_range=(0, bst.best_iteration+1)use only first N trees
clf = XGBClassifier(n_estimators=300, max_depth=6,sklearn-compatible estimator
learning_rate=0.05, tree_method='hist')clf.fit(X_tr, y_tr, eval_set=[(X_val,y_val)])early_stopping_roundspassed in constructorclf.predict(X_test)class labelsclf.predict_proba(X_test)class probabilities
reg = XGBRegressor(objective='reg:squarederror')default squared-error regressionreg.fit(X, y)standard sklearnfit/predictobjective='reg:absoluteerror'MAE — robust to outliersobjective='reg:quantileerror'withquantile_alpha=[0.1,0.5,0.9]objective='reg:pseudohubererror'smooth Huber-style loss
XGBRanker(objective='rank:ndcg')needsqidorgroupper queryXGBRFClassifier() / XGBRFRegressor()single-round bagged forest, not boostingsubsample=0.8, colsample_bynode=0.8RF defaults, bootstrap-style samplingn_estimatorshere = trees in the forest (no shrinkage)
Pipeline([('sc',StandardScaler()),('xgb',clf)])drop-in step; trees don't need scaling thoughGridSearchCV(clf, param_grid, cv=5)standard hyperparameter searchclf.get_booster()underlying nativeBoosterclf.feature_importances_array aligned togainby default
max_depth = 6default; deeper → more complex, more overfit riskmin_child_weight = 1min sum Hessian in a child; ↑ = more conservativegamma / min_split_loss = 0min loss reduction required to split a leafmax_leaves = 0cap on leaves, used withgrow_policy='lossguide'grow_policy: 'depthwise' | 'lossguide'level-wise (default) vs best-gain-first
subsample = 1row-sample ratio per tree; try 0.6–0.9colsample_bytree = 1column-sample once per treecolsample_bylevel / colsample_bynodere-sample columns per depth level / per splitsampling_method: 'uniform' | 'gradient_based'gradient-based needs GPU (hist)
eta / learning_rate = 0.3shrinkage per round; typical range 0.01–0.3num_boost_round / n_estimatorstree count — pair a lower eta with a higher countmax_delta_step = 0helps logistic regression on imbalanced classes
tree_method: 'hist'default since 2.0; fast histogram binning, needed for categorical/GPUtree_method: 'exact' | 'approx'exact = greedy, no binning; approx = global sketchdevice: 'cuda' / 'cuda:0'single/multi-GPU; combine withtree_method='hist'multi_strategy: 'one_output_per_tree' | 'multi_output_tree'vector-leaf multi-target trees
reg_lambda / lambda = 1L2 penalty on leaf weights (Ω term)reg_alpha / alpha = 0L1 penalty; pushes weak leaves toward 0scale_pos_weightimbalanced binary classes ≈ #neg / #posbase_scoreinitial prediction; auto-estimated per objective since 3.1
binary:logisticbinary classification → probabilitymulti:softmax / multi:softprobmulticlass; needsnum_classreg:squarederror / reg:absoluteerrorMSE / MAE regressionrank:ndcg / rank:pairwise / rank:maplearning-to-rank objectivessurvival:cox / survival:aftsurvival analysiscount:poisson · reg:tweediecount data / zero-inflated continuous
rmse · mae · logloss · error · mloglossstandard regression / classification metricsauc · aucpr · ndcg@k · mapranking-oriented metricseval_metric=['auc','logloss']multiple metrics; last one drives early stoppingcustom_metric=fn(y_pred, dtrain)returns(name, value); use withmaximize=
early_stopping_rounds=20stop after N rounds with no improvementbst.best_iteration / clf.best_iterationround to use at inference timerequires evals / eval_setat least one validation watch pairmaximize=Trueflip direction for metrics like AUC/NDCG
xgb.cv(params, dtrain, num_boost_round=500,returns a DataFrame — train/test mean ± std per round
nfold=5, early_stopping_rounds=20, seed=42)stratified=Truepreserve class ratios across foldscross_val_score(clf, X, y, cv=5)via the sklearn wrapper instead
xgb.callback.EarlyStopping(rounds=20, save_best=True)object form of early stoppingxgb.callback.LearningRateScheduler(fn)varyetaacross roundsxgb.callback.TrainingCheckPoint(directory=...)periodic model snapshotscallbacks=[...]pass list totrain()or.fit()
bst.get_score(importance_type='gain')types:weight, gain, cover, total_gain, total_coverclf.feature_importances_sklearn API array, defaultgainxgb.plot_importance(bst, max_num_features=15)quick bar-chart view
bst.predict(dtest, pred_contribs=True)native, exact SHAP per feature + bias columnshap.TreeExplainer(bst).shap_values(X)via theshaplibrary — richer plotspred_interactions=Truepairwise SHAP interaction values
xgb.plot_tree(bst, num_trees=0)renders one tree — needsgraphvizxgb.to_graphviz(bst, num_trees=0)returns a Graphviz Source objectxgb.plot_importance(bst)matplotlib importance bar chart
bst.save_model('model.json')JSON (or.ubj, default binary) — portable across bindingsbst.load_model('model.json')re-hydrate a Boosterpickle.dump(clf, f)sklearn wrapper — Python-only, less portablelegacy .model binary formatremoved — use JSON/UBJSON going forward
device='cuda', tree_method='hist'single-GPU trainingxgboost.dask · xgboost.sparkdistributed multi-node / multi-GPU trainingDataIter + ExtMemQuantileDMatrixstream TB-scale data via external memory (3.0+)
max_depth=6 · eta=0.3 · min_child_weight=1defaults — usually the first three to tunesubsample=1 · colsample_bytree=1defaults — lower for regularizationgamma=0 · reg_lambda=1 · reg_alpha=0defaults — raise to fight overfittingn_estimators=100 (sklearn) · num_boost_round=10 (native)watch out — the two APIs default differently
binary:logistic → loglossreg:squarederror → rmsemulti:softmax / softprob → mloglossrank:ndcg → ndcgsurvival:cox → cox-nloglik