Pandas Cheat Sheet

01Setup & Importcreate

import pandas as pd★
Universal alias — always pd.
import numpy as np
Almost always imported alongside pandas.
pip install pandas
Install (or conda install pandas).
pd.__version__
Check the installed pandas version.

02Reading Datacreate

pd.read_csv('f.csv')★
Read a CSV file into a DataFrame.
pd.read_excel('f.xlsx', sheet_name='S1')★
Read one sheet of an Excel workbook.
pd.read_json('f.json')
Read a JSON string, URL, or file.
pd.read_sql(query, conn)
Read the result of a SQL query.
pd.read_html(url)
Extract every <table> on a page as a list.
pd.read_parquet('f.parquet')★
Read a columnar Parquet file — fast & typed.
pd.read_clipboard()
Read whatever's on the system clipboard.

03Writing Datacreate

df.to_csv('f.csv', index=False)★
Write to CSV; index=False skips row labels.
df.to_excel('f.xlsx', sheet_name='S1')
Write to an Excel sheet.
df.to_json('f.json')
Write to a JSON file.
df.to_sql('table', conn)
Write to a SQL table.
df.to_parquet('f.parquet')
Write to columnar Parquet format.

04Creating Series & DataFramecreate

pd.Series([1,2,3])★
One labeled 1D array.
pd.DataFrame(dict)★
Dict keys become columns, values become rows.
pd.DataFrame(np.random.rand(20,5))
Random 20-row, 5-column frame.
pd.date_range('2024-01-01', periods=5)
Build a DatetimeIndex.

05Inspect & Exploreinspect

df.head(n) df.tail(n)★
First / last n rows (default 5).
df.shape★
(rows, columns) tuple.
df.info()★
Index, dtypes, non-null counts, memory use.
df.describe()★
Summary stats for numeric columns.
df.dtypes
Data type of every column.
df.columns df.index
Column labels / row labels.
df.values
Underlying data as a NumPy array.
df.sample(n)
n randomly selected rows.

06Selecting Columns & Rowsselect & filter

df['col']★
Select one column — returns a Series.
df[['a','b']]★
Select multiple columns — returns a DataFrame.
df.loc['row', 'col']★
Select by label — rows first, then columns.
df.iloc[2, 0]★
Select by integer position.
df.at['row', 'col']
Fast scalar access by label.
df.iat[1, 2]
Fast scalar access by position.
df.loc[:, 'a':'c']
Label slicing — inclusive of both endpoints.

07Boolean Indexing, Query & Filterselect & filter

df[df['col'] > 50]★
Rows where a condition is True.
df[(df.a>1)&(df.b<9)]
Combine masks with & | ~ (not and/or).
df.query('col > 50')★
Same filter, readable string syntax.
df.filter(items=['a','b'])
Select by column/row name list.
df['col'].isin([...])
Membership test against a list of values.
df['col'].between(a, b)
Inclusive range check.

08Sorting & Top-Nselect & filter

df.sort_values('col')★
Ascending sort by one or more columns.
df.sort_values('c', ascending=False)
Descending sort.
df.sort_index()
Sort by the row index/labels.
df.nlargest(n, 'col')★
Top n rows by a column's value.
df.nsmallest(n, 'col')
Bottom n rows by a column's value.

09Missing Dataclean & transform

df.isnull() df.notnull()★
Boolean mask of missing / present values.
df.isnull().sum()★
Missing-value count per column.
df.dropna()★
Drop rows containing any NaN.
df.dropna(axis=1, thresh=n)
Drop columns with fewer than n non-nulls.
df.fillna(value)★
Replace NaN with a constant, dict, or stat.
df.interpolate()
Fill gaps by interpolating neighboring values.

10Cleaning & Transformingclean & transform

df['c'].astype('float')★
Cast a column to a new dtype.
df.rename(columns={'old':'new'})★
Selectively rename columns.
df['c'].replace(1, 'one')
Replace matching values.
df.apply(func)★
Apply a function along an axis (rows/cols).
df['c'].map(func)
Elementwise transform of a Series.
df.duplicated() df.drop_duplicates()
Flag / remove duplicate rows.

11String Methods.str accessor

df['c'].str.lower() / .upper()★
Case conversion, elementwise.
df['c'].str.strip()
Remove leading/trailing whitespace.
df['c'].str.contains('x')★
Boolean mask — substring/regex match.
df['c'].str.replace('a','b')
Elementwise find-and-replace.
df['c'].str.split(',')
Split each string into a list.
df['c'].str.startswith('M')
Boolean prefix match.

12Dates & Times.dt accessor

pd.to_datetime(df['c'])★
Parse strings/ints into datetime64.
df['c'].dt.year / .month / .day★
Extract date components via .dt.
df.set_index('date').resample('M')
Regroup a time series by frequency.
pd.date_range(start, end, freq='D')
Build a range of dates.
df['c'].dt.dayofweek
Day of week as an integer (Mon = 0).

13Adding & Dropping Columns/Rowsshape & combine

df['new'] = values★
Add a column (length must match rows).
df.assign(x=lambda d: d.a*d.b)★
Add columns via chained expressions.
df.insert(1, 'c', values)
Insert a column at a specific position.
df.drop(columns=['c'])★
Drop column(s) — axis=1.
df.drop([1,3], axis=0)
Drop row(s) by index label — axis=0.
df.reset_index(drop=True)★
Replace the index with a fresh 0..n range.

14GroupBy & Aggregationaggregate / stats

df.groupby('col')★
Split into groups by one or more columns.
df.groupby('c')['x'].mean()★
Per-group mean of one column.
df.groupby('c').agg(['sum','mean'])★
Multiple stats per group at once.
df.groupby('c').transform(func)
Group stat, broadcast back to row shape.
df.groupby(['c1','c2']).size()
Row count per combination of groups.

15Pivoting & Reshapingshape & combine

df.pivot(index='a', columns='b', values='c')★
Long → wide, no aggregation.
df.pivot_table(index='a', values='c', aggfunc='mean')★
Like pivot, but aggregates duplicate keys.
pd.melt(df, id_vars='a')★
Wide → long — unpivot columns into rows.
df.stack() / df.unstack()
Pivot column labels ↔ innermost index level.
df.T
Transpose rows and columns.

16Combining: Concatenateshape & combine

pd.concat([df1, df2])★
Stack rows (axis=0, default).
pd.concat([df1, df2], axis=1)
Stack columns side by side.
pd.concat([...], ignore_index=True)★
Renumber the index 0..n after stacking.
pd.concat([...], keys=['a','b'])
Tag each source with an extra index level.

17Combining: Merge & Joinshape & combine

pd.merge(df1, df2, on='key')★
SQL-style join on a shared column.
pd.merge(df1, df2, how='left', on='k')★
Keep every row of df1.
pd.merge(df1, df2, how='outer', on='k')
Keep every row from both sides.
df1.join(df2, how='inner')
Merge on the index instead of a column.
pd.merge(df1, df2, left_on='a', right_on='b')
Join on differently-named key columns.

18Statistics & Descriptive Analysisaggregate / stats

df.sum() df.mean() df.median()★
Per-column aggregate (axis=0 default).
df.std() df.var()
Standard deviation / variance.
df.corr()★
Pearson correlation matrix between columns.
df.cov()
Covariance matrix between columns.
df['c'].value_counts()★
Frequency of each unique value.
df['c'].cumsum()
Running cumulative sum.
df['c'].quantile([.25,.75])
Values at the given percentiles.

19Window & Time Series Opsaggregate / stats

df['c'].rolling(3).mean()★
Moving average over a 3-row window.
df['c'].expanding().sum()
Cumulative aggregate from the start.
df['c'].ewm(span=5).mean()
Exponentially weighted moving average.
df['c'].shift(1)★
Lag/lead values by n rows.
df.resample('M').mean()
Downsample a datetime-indexed series.

20Performance Tips & Gotchashandle with care

df[df.a>0]['b'] = 1chained
Chained indexing — triggers SettingWithCopyWarning.
df.loc[df.a>0, 'b'] = 1safe
The single-step fix for the pattern above.
inplace=True
Often no faster, and breaks method chaining.
df.apply(..., axis=1)
Row-wise apply is slow — vectorize when possible.
df.copy()
Explicitly copy before mutating a slice you keep.

★Common dtypesanywhere you see dtype:

int64 / float64★
Default numeric types.
object
Text/mixed columns — pandas' catch-all (slow).
category
Encode repeated strings efficiently — big memory win.
bool
True / False values.
datetime64[ns]★
Timestamps — enables the .dt accessor.
Int64 / boolean
Nullable dtypes (capitalized) — support pd.NA.

★Merge how= Quick-Readanywhere you see how=

how='inner'★
Only keys present in both frames.
how='left'★
All of the left frame's keys, matched where possible.
how='right'
All of the right frame's keys, matched where possible.
how='outer'
Every key from both frames; NaN where unmatched.

pandas cheat sheet

pd.merge() join types, visually

how='inner' ★

how='left' ★

how='right'

how='outer'

Worth memorizing