Pandas Cheat Sheet — Comprehensive Edition

01Setup & Optionscreate · inspect

import pandas as pd★
Universal alias — always pd.
import numpy as np
Almost always imported alongside.
pd.set_option('display.max_columns', None)★
Tune display / compute settings (accepts a dict in 3.0).
with pd.option_context('display.max_rows', 100):
Temporarily override options in a block.
pd.__version__ pd.show_versions()
Version / full environment report.

02Reading Datacreate · pd.read_*

pd.read_csv('f.csv')★
CSV → DataFrame; args: usecols, dtype, parse_dates, chunksize.
pd.read_excel('f.xlsx', sheet_name='S1')★
Read one Excel sheet (or a list / None for all).
pd.read_parquet('f.parquet')★
Columnar Parquet — fast & typed.
pd.read_json / read_html / read_xml
JSON / every <table> on a page / XML.
pd.read_sql(query, conn)
SQL query result (also read_sql_table).
pd.read_csv(..., dtype_backend='pyarrow')★
Read straight into Arrow-backed dtypes.

03Writing Datacreate · df.to_*

df.to_csv('f.csv', index=False)★
index=False skips writing row labels.
df.to_parquet('f.parquet')★
Columnar, compressed, preserves dtypes.
df.to_excel('f.xlsx', sheet_name='S1')
Write to an Excel sheet.
df.to_sql('table', conn, if_exists='replace')
Write to a SQL table.
df.to_dict() df.to_numpy() df.to_markdown()
Convert to other in-memory forms.

04Creating Series & DataFramecreate

pd.Series([1,2,3], index=['a','b','c'])★
One labeled 1D array.
pd.DataFrame({'a':[1,2], 'b':[3,4]})★
Dict keys → columns, values → rows.
pd.DataFrame(data, columns=[...], index=[...])
From records with explicit labels.
pd.date_range('2024-01-01', periods=5, freq='D')★
Build a DatetimeIndex.
pd.DataFrame().convert_dtypes()
Infer the best nullable dtypes for each column.

05Inspect & Exploreinspect

df.head(n) df.tail(n)★
First / last n rows (default 5).
df.shape df.info() df.describe()★
Dimensions / dtypes+nulls+memory / summary stats.
df.dtypes df.columns df.index
Per-column types / column & row labels.
df.memory_usage(deep=True)
Per-column memory (deep = true object cost).
df.sample(n) df.nunique() df['c'].unique()
Random rows / distinct counts / distinct values.

06Selecting Columns & Rowsselect & filter

df['col'] / df[['a','b']]★
One column (Series) / many columns (DataFrame).
df.loc['row', 'col']★
Label-based — inclusive of both slice endpoints.
df.iloc[2, 0]★
Integer-position — exclusive stop, like Python.
df.at['r','c'] df.iat[1,2]
Fast scalar access — by label / by position.
df.loc[:, 'a':'c']
Label slice across columns.
df.filter(like='2024', axis=1)
Select labels by name pattern / regex.

07Boolean Indexing, Query & Evalselect & filter

df[df['c'] > 50]★
Rows where a condition is True.
df[(df.a>1) & (df.b<9)]★
Combine masks with & | ~ (not and/or).
df.query('a > 1 and b < 9')★
Same filter, readable string; @var injects locals.
df[df.c.isin([...])] df.c.between(a, b)★
Membership / inclusive range.
df.eval('d = a + b')
Fast in-engine column expression (numexpr).
df.where(cond) df.mask(cond)
Keep / blank cells failing the condition.

08Sorting & Top-Nselect & filter

df.sort_values('c', ascending=False)★
Sort by one or more columns.
df.sort_values(['a','b'], ascending=[1,0])
Mixed sort direction, per column.
df.sort_index()
Sort by the row index/labels.
df.nlargest(n, 'c') df.nsmallest(n, 'c')★
Top / bottom n by a column — faster than sort+head.
df.rank(method='dense')
Rank values within a column.

09Missing Dataclean & transform

df.isna() df.notna() df.isna().sum()★
Mask of missing/present; per-column null count.
df.dropna(subset=['a'], how='any')★
Drop rows/cols with nulls (any/all, thresh, subset).
df.fillna(value) df.fillna({'a': 0})★
Replace nulls — scalar, dict, or per-column stat.
df.ffill() df.bfill()★
Forward / backward fill from neighbors.
df.interpolate()
Fill gaps by interpolating between values.
pd.NA vs np.nan
Nullable dtypes use pd.NA; float NaN is distinct.

10Cleaning & Type Conversionclean & transform

df.astype({'a': 'float', 'b': 'Int64'})★
Cast columns (capital Int64 = nullable).
pd.to_numeric(df.c, errors='coerce')★
Parse to numbers; bad values → NaN.
df.rename(columns={'old':'new'})★
Rename columns (or index via index=).
df.replace({1: 'one'})
Replace matching values (dict / regex).
df.duplicated() df.drop_duplicates(subset=['a'])★
Flag / remove duplicate rows.
df.clip(lower, upper) df.round(2)
Clamp outliers / round numerics.

11Function Application & Iterationclean & transform

df['c'].map(func)★
Elementwise transform of a Series.
df.apply(func, axis=0)★
Apply along columns (0) or rows (1).
df.map(func)
Elementwise over the whole frame (was applymap).
df.pipe(func)★
Chain a whole-DataFrame function fluently.
df.assign(x=lambda d: d.a * d.b)★
Add columns via chained expressions.
for i, row in df.iterrows(): ...slow
Row iteration — vectorize first; itertuples is faster.

12String Methods.str accessor

df.c.str.lower() / .upper() / .strip()★
Case / whitespace cleanup.
df.c.str.contains('pat') / .startswith(...)★
Boolean substring / prefix / regex match.
df.c.str.replace('a','b', regex=True)★
Find-and-replace (regex opt-in).
df.c.str.split(',', expand=True)★
Split into a list — or new columns.
df.c.str.extract(r'(\d+)')
Pull regex capture groups into columns.
df.c.str.get_dummies('|')
One-hot from delimited strings.

13Dates & Times.dt accessor

pd.to_datetime(df.c, format='%Y-%m-%d')★
Parse strings/ints into datetime64.
df.c.dt.year / .month / .day / .dayofweek★
Extract date components via .dt.
df.c.dt.tz_localize('UTC').dt.tz_convert('Asia/Kolkata')
Attach then shift a timezone.
df.c.dt.to_period('M') df.c.dt.strftime('%b %Y')
Period bucket / format as string.
df.c + pd.Timedelta(days=1)
Date arithmetic with offsets/timedeltas.

14Categorical Data.cat accessor

df.c = df.c.astype('category')★
Encode repeated strings — big memory win.
df.c.cat.categories df.c.cat.codes★
The category labels / their integer codes.
pd.Categorical(vals, categories=[...], ordered=True)★
Ordered category — enables < > comparisons.
df.c.cat.add_categories / .remove_unused_categories()
Manage the category set.
df.c.cat.reorder_categories([...])
Change ordering (e.g. for sorting/plots).

15Adding & Droppingshape & combine

df['new'] = values★
Add a column (length must match rows).
df.insert(1, 'c', values)
Insert a column at a specific position.
df.drop(columns=['c'])★
Drop column(s); rows via index=.
df.pop('c')
Remove & return a column in one step.
df.reset_index(drop=True)★
Fresh 0..n index; drop the old one.

16GroupBy & Aggregationaggregate / stats

df.groupby('c')['x'].mean()★
Split-apply-combine: per-group aggregate.
df.groupby('c').agg(m=('x','mean'), s=('y','sum'))★
Named aggregation — clean output columns.
df.groupby('c').transform('mean')★
Group stat broadcast back to row shape.
df.groupby('c').filter(lambda g: len(g) > 5)
Keep only groups meeting a condition.
df.groupby('c').apply(func)
Arbitrary per-group function (flexible, slower).
gb.size() gb.nunique() gb.cumcount() gb.get_group(k)
Counts / distinct / within-group index / one group.

17Pivoting & Reshapingshape & combine

df.pivot_table(index='a', columns='b', values='c', aggfunc='mean')★
Long → wide, aggregating duplicate keys.
df.pivot(index='a', columns='b', values='c')
Long → wide, no aggregation (keys must be unique).
pd.melt(df, id_vars='a', value_vars=[...])★
Wide → long — unpivot columns into rows.
df.explode('list_col')★
One row per element of a list-valued cell.
pd.crosstab(df.a, df.b)
Frequency table of two columns.
df.stack() df.unstack() df.T
Columns ↔ index level / transpose.

18Combining: Concatenateshape & combine

pd.concat([df1, df2], ignore_index=True)★
Stack rows (axis=0), renumber the index.
pd.concat([df1, df2], axis=1)
Stack columns side by side, aligned on index.
pd.concat([...], keys=['a','b'])
Tag each source with an outer index level.

19Combining: Merge & Joinshape & combine

pd.merge(df1, df2, on='key', how='left')★
SQL-style join on a shared column.
pd.merge(df1, df2, left_on='a', right_on='b')
Join on differently-named keys.
df1.join(df2, how='inner')
Merge on the index instead of a column.
pd.merge(df1, df2, on='k', validate='1:m')★
Assert the join cardinality (catches bad keys).
pd.merge_asof(l, r, on='ts', by='id')★
Nearest-key join — classic for time series.

20MultiIndex & Index Objectsidx

df.set_index(['a', 'b'])★
Promote columns to a hierarchical index.
df.loc[('x', 'y')]★
Select by a tuple of index levels.
df.xs('y', level='b')
Cross-section at one level.
df.loc[pd.IndexSlice[:, 'y'], :]
Slice inner levels with IndexSlice.
df.swaplevel() df.droplevel(0) df.reorder_levels([...])
Rearrange index levels.
df.reindex([...]) df.rename_axis('idx')
Conform to new labels / name the axis.

21Statistics & Descriptiveaggregate / stats

df.sum() df.mean() df.median() df.std()★
Per-column aggregates (axis=0 default).
df['c'].value_counts(normalize=True)★
Frequency (or proportion) of each value.
df.corr() df.cov()★
Correlation / covariance matrix.
df['c'].quantile([.25, .5, .75])
Percentile values.
df.cumsum() df.cummax() df['c'].agg(['min','max'])
Cumulative / multi-stat aggregation.
df['c'].describe()
One-column summary (count, mean, std, quartiles).

22Window Operationsrolling · expanding · ewm

df['c'].rolling(3).mean()★
Moving average over a fixed window.
df['c'].rolling('7D').sum()
Time-based window on a datetime index.
df['c'].rolling(3).agg(['mean','std'])
Multiple window stats at once.
df['c'].expanding().sum()
Cumulative aggregate from the start.
df['c'].ewm(span=5).mean()★
Exponentially weighted moving average.
df['c'].rolling(3).apply(func, raw=True)
Custom window function (raw=NumPy, faster).

23Time Series & Resamplingaggregate / stats

df.set_index('date').resample('ME').mean()★
Regroup a time series by frequency (ME=month-end).
df['c'].shift(1) df['c'].diff()★
Lag/lead / period-over-period change.
df['c'].pct_change()★
Fractional change vs. the previous row.
df.asfreq('D', method='ffill')
Conform to a regular frequency.
pd.date_range / period_range / bdate_range
Build datetime / period / business-day ranges.

24Binning & Top-Level Helperspd.* general functions

pd.cut(df.c, bins=[0,10,20])★
Bin values into fixed intervals.
pd.qcut(df.c, q=4)★
Bin into equal-frequency quantiles.
pd.get_dummies(df, columns=['cat'])★
One-hot encode categorical columns.
codes, uniques = pd.factorize(df.c)
Encode values as integer codes + labels.
pd.wide_to_long(df, stubnames, i, j)
Reshape repeated-column-group data to long.

25Plottingviz · .plot

df.plot()★
Line plot of every numeric column (matplotlib).
df.plot(kind='bar', x='a', y='b')★
Bar chart; also barh, area, pie.
df.plot.scatter(x='a', y='b', c='g')★
Scatter plot with an optional color column.
df['c'].plot.hist(bins=30) df.plot.box()
Distribution: histogram / box plot.
df.plot(subplots=True, figsize=(10,8))
One panel per column.
pd.plotting.scatter_matrix(df)
Pairwise scatter grid from pd.plotting.

26Styling Outputviz · df.style

df.style.format('{:.2f}')★
Format displayed values (dict for per-column).
df.style.background_gradient(cmap='Blues')★
Heatmap-shade cells by value.
df.style.highlight_max(axis=0)
Highlight max/min/null cells.
df.style.bar(subset=['c'], color='#5fba7d')
In-cell bar charts.
df.style.to_html() .to_excel()
Export the styled table.

27Nullable & Arrow Dtypesextension types

df.astype('Int64') / 'Float64' / 'boolean'★
Nullable (capitalized) types — hold pd.NA.
df.convert_dtypes()★
Auto-upgrade a frame to best nullable dtypes.
df.astype('string')★
Dedicated string dtype (PyArrow-backed in 3.0).
df.astype('int64[pyarrow]')
Explicit Arrow-backed column dtype.
pandas 3.0: string dtype by default
Object-dtype strings are being phased out.

28Type Checks, Testing & APIpd.api · pd.testing

pd.api.types.is_numeric_dtype(df.c)★
Programmatic dtype checks.
df.select_dtypes(include='number')★
Pick columns by dtype family.
pd.testing.assert_frame_equal(a, b)
Equality assert for tests (also assert_series_equal).
pd.api.extensions.register_dataframe_accessor('x')
Attach a custom df.x.* accessor.

29Performance & Gotchashandle with care

df[df.a>0]['b'] = 1chained
Chained indexing — may silently fail to write.
df.loc[df.a>0, 'b'] = 1safe
The single-step fix for the pattern above.
df.apply(..., axis=1)slow
Row-wise apply is slow — prefer vectorized ops.
category & nullable dtypeslean
Cut memory on repeated strings / integer columns.
df.copy()
Copy before mutating a slice you keep (CoW in 3.0).

★Common dtypesanywhere you see dtype:

int64 / float64★
Default NumPy-backed numerics.
Int64 / Float64 / boolean★
Nullable (capitalized) — support pd.NA.
object
Mixed / legacy text — the slow catch-all.
string★
Dedicated strings (Arrow-backed default in 3.0).
category
Repeated values encoded once — big memory win.
datetime64[ns] / timedelta64[ns] / period[M]★
Timestamps / durations / periods (enable .dt).
int64[pyarrow]
Arrow-backed column — fast, interop-friendly.

★Merge how= Quick-Readpd.merge(..., how=)

'inner'★
Only keys present in both frames.
'left' / 'right'★
All keys of the left / right frame.
'outer'
Every key from both; NaN where unmatched.
'cross'
Cartesian product of both frames.
indicator=True
Add a _merge column showing each row's source.

★.loc vs .iloc Quick-Readthe #1 confusion

df.loc['b', 'Price']labels
By label — slice end is inclusive.
df.iloc[1, 2]positions
By integer position — slice end exclusive.
boolean masks work in .loc
df.loc[mask, 'c'] both filters and assigns safely.

pandas cheat sheet v2 · all modules

I · Core: Load, Create & Inspect I/O · constructors · the first look

II · Select, Filter & Sort loc/iloc · boolean masks · query · eval

III · Clean & Transform missing data · dtypes · apply/map · accessors

IV · Reshape, Combine & Index add/drop · pivot · concat · merge · MultiIndex

V · Statistics, Window & Time Series describe · rolling/ewm · resample · binning

VI · Visualize, Style, Extend & Tune plotting · Styler · Arrow dtypes · api · gotchas

Split-apply-combine & the reshape family

groupby: split → apply → combine ★

The reshape verbs ★

Worth memorizing