Comprehensive Quick Reference · tabular data analysis in Python

pandas cheat sheet v2 · all modules

A single source of truth across the whole pandas surface: the Series/DataFrame core, the .str / .dt / .cat accessors, groupby, window & resample, MultiIndex, reshaping, plotting, the Styler, nullable & Arrow dtypes, top-level functions, and options. It all rests on one idea: every DataFrame carries an index, and .loc (labels) vs .iloc (positions) is how you reach into it.

create / load inspect select & filter shape & combine clean & transform aggregate / stats index plot / style gotcha most common

Validated against the official pandas 3.x API reference (pandas.pydata.org) — I/O, General functions, Series/DataFrame, Accessors, GroupBy, Window, Resampling, Style, Plotting, Extensions & Options; cross-checked with the official PDF cheat sheet, Dataquest, DataCamp, Coursera & GeeksforGeeks. v2 gap-analysis edition.

The pandas surface — two objects, many accessors & modules
Series (1D) one labeled column DataFrame (2D) table of Series + Index / MultiIndex column accessors .str .dt .cat .plot operation modules (returned by method calls) .groupby()GroupBy .rolling / .ewmWindow .resample()Resampler .styleStyler pd.* funcsmerge/cut/melt dtypes: NumPy + nullableInt64 · string · category · Arrow pd.set_option()display / compute settings pd.api / testingtype checks · extensions everything is built on the dtype system, tuned via options, and extendable via the public api

I · Core: Load, Create & Inspect I/O · constructors · the first look

01Setup & Optionscreate · inspect
02Reading Datacreate · pd.read_*
03Writing Datacreate · df.to_*
04Creating Series & DataFramecreate
05Inspect & Exploreinspect

II · Select, Filter & Sort loc/iloc · boolean masks · query · eval

06Selecting Columns & Rowsselect & filter
07Boolean Indexing, Query & Evalselect & filter
08Sorting & Top-Nselect & filter

III · Clean & Transform missing data · dtypes · apply/map · accessors

09Missing Dataclean & transform
10Cleaning & Type Conversionclean & transform
11Function Application & Iterationclean & transform
12String Methods.str accessor
13Dates & Times.dt accessor
14Categorical Data.cat accessor

IV · Reshape, Combine & Index add/drop · pivot · concat · merge · MultiIndex

15Adding & Droppingshape & combine
16GroupBy & Aggregationaggregate / stats
17Pivoting & Reshapingshape & combine
18Combining: Concatenateshape & combine
19Combining: Merge & Joinshape & combine
20MultiIndex & Index Objectsidx

V · Statistics, Window & Time Series describe · rolling/ewm · resample · binning

21Statistics & Descriptiveaggregate / stats
22Window Operationsrolling · expanding · ewm
23Time Series & Resamplingaggregate / stats
24Binning & Top-Level Helperspd.* general functions

VI · Visualize, Style, Extend & Tune plotting · Styler · Arrow dtypes · api · gotchas

25Plottingviz · .plot
26Styling Outputviz · df.style
27Nullable & Arrow Dtypesextension types
28Type Checks, Testing & APIpd.api · pd.testing
29Performance & Gotchashandle with care
Common dtypesanywhere you see dtype:
Merge how= Quick-Readpd.merge(..., how=)
.loc vs .iloc Quick-Readthe #1 confusion

Split-apply-combine & the reshape family

The two mental models that unlock most of pandas: how groupby actually works, and how the reshape verbs move data between long and wide. Based on the official pandas user-guide diagrams.

groupby: split → apply → combine ★

Rows are split by key, a function runs per group, and results recombine into one output — the model behind agg, transform & filter.

A 1 / A 3 / B 2 / B 4 inputdf group A1, 3 group B2, 4 mean=2 mean=3 A 2B 3 split apply combine

The reshape verbs ★

melt/stack go wide→long; pivot/unstack go long→wide; explode unpacks list cells; pivot_table aggregates on the way.

WIDE idJanFeb a12 b34 LONG idmonval aJan1 aFeb2 bJan3 bFeb4 melt / stack → ← pivot / unstack pivot_table also aggregates duplicate keys

Worth memorizing

.loc vs .ilocloc = labels & inclusive; iloc = positions & exclusive
chained indexingdf[mask]['c']=x may not write — use df.loc[mask,'c']=x
named aggregationagg(new=('col','func')) gives clean, flat output columns
apply vs map vs transformmap=elementwise; apply=row/col func; transform=keeps shape
melt vs pivotmelt goes wide→long; pivot/pivot_table go long→wide
merge vs joinmerge joins on columns; join defaults to the index
category & nullable dtypeshuge memory savings on repeated strings / integer columns
NaN != NaNuse isna()/notna(), never ==, for missing checks
apply(axis=1) is slowvectorized column math beats row-wise apply 10×+
pandas 3.0 defaultsPyArrow-backed string dtype & Copy-on-Write are now on
accessors.str .dt .cat unlock vectorized text/date/category ops
resample = time groupbyneeds a datetime index; ME/QE/YE are period-end aliases