pip install xgboost★Full package, incl. GPU (CUDA) support.pip install xgboost-cpuSmaller wheel — no GPU / federated learning.import xgboost as xgb★Conventional alias.xgb.__version__Check installed version (current: 3.2).conda install -c conda-forge py-xgboostConda auto-detects the GPU variant.
with xgb.config_context(verbosity=0): ...Scoped global config (silence warnings, etc.).xgb.set_config(verbosity=2) / xgb.get_config()Set / inspect global configuration.verbosity: 0–3Per-booster: silent, warning, info, debug.validate_parameters=TrueWarn on unknown params (default in Python).seed / random_state · seed_per_iterationReproducibility; re-seed sampling each round.nthread / n_jobsThread count; default = all available cores.
xgb.DMatrix(X, label=y)★Core input; wraps NumPy / pandas / SciPy sparse.DMatrix(X, weight=w, base_margin=m)Per-sample weights, custom initial score.DMatrix(X, feature_names=[...], feature_types=[...])Explicit column labels/types (auto from pandas).DMatrix(X, feature_weights=fw)Per-feature selection probability for colsample.dtrain.num_row() · dtrain.num_col() · dtrain.slice(idx)Inspect dimensions; row-subset a DMatrix.dtrain.save_binary('train.dmatrix')Cache the preprocessed matrix to disk.DMatrix('train.dmatrix')Reload the cached binary — skips re-parsing.
xgb.QuantileDMatrix(X, y)★Pre-binned forhist; faster + far lower memory.QuantileDMatrix(X_val, y_val, ref=dtrain)gotchaValidation sets MUST passref=— 3.0+ errors without it.ExtMemQuantileDMatrix(data_iter)3.0+Out-of-core streaming; TB-scale on one machine.class MyIter(xgb.DataIter): def next(self, input_data)...Custom batch iterator feeding external memory.max_quantile_batches · min_cache_page_bytesSketching / cache-page tuning knobs.cache_host_ratio3.1+Split GPU external-memory cache between host/device RAM.
np.ndarray · pd.DataFrame★The everyday inputs; pandas dtypes are honored.scipy.sparse.csr_matrix / csc_matrixSparse input — zeros are NOT treated as missing in dense, but sparse missing = absent entries.pyarrow.TableZero-copy Arrow ingestion.polars.DataFrame / LazyFrame3.0+Native Polars support (categoricals from 3.1).cudf.DataFrame · cupy.ndarrayGPU-resident inputs — no host round-trip.text files (libsvm/csv URIs)warnsText input is discouraged since 3.1 — load via a DataFrame library instead.
DMatrix(X, enable_categorical=True)★Auto-detects pandas/Polarscategorydtype.XGBClassifier(enable_categorical=True)Same switch on the sklearn wrapper.tree_method must be 'hist'/'approx'gotchaNative categorical splits need histogram methods.max_cat_to_onehotBelow this cardinality, use one-hot style splits.max_cat_thresholdCap categories considered per partition split.Auto-recoding3.1+Booster stores training categories (incl. strings), re-codes new/unseen values at inference automatically.bst.get_categories() / dtrain.get_categories()3.1+Export the stored category index (Arrow-friendly).
missing=np.nan★Default; NaN handled natively — no imputation needed.DMatrix(X, missing=-999)Treat a sentinel value as missing instead.learned default directionEach split learns which branch missing values take — the paper's "sparsity-aware" algorithm.imputing to 0 or mean firstanti-patternUsually hurts — it hides the missingness signal XGBoost would otherwise exploit.