Quick Reference · tabular data analysis in Python

pandas cheat sheet

Pandas is built around two objects — the Series (1D) and the DataFrame (2D table) — and every DataFrame carries a row index alongside its data. Master .loc vs .iloc, missing-data handling, and groupby/merge, and the rest of the API falls into place.

create / load inspect select & filter shape & combine clean & transform aggregate / stats gotcha most common

Distilled & cross-checked across: pandas.pydata.org/docs · official Pandas PDF cheat sheet · Dataquest · DataCamp · Coursera · GeeksforGeeks

Anatomy of a DataFrame — .loc vs .iloc
Fruits Quantity Price df.columns → Index(['Fruits','Quantity','Price']) a b c d Mango 40 80 Apple 20 100 Banana 25 50 Orange 10 70 df.index df.loc['b','Price'] df.iloc[1, 2] both reach the same cell → 100 df.shape →(4, 3) dtypes →object, int64, int64
01Setup & Importcreate
02Reading Datacreate
03Writing Datacreate
04Creating Series & DataFramecreate
05Inspect & Exploreinspect
06Selecting Columns & Rowsselect & filter
07Boolean Indexing, Query & Filterselect & filter
08Sorting & Top-Nselect & filter
09Missing Dataclean & transform
10Cleaning & Transformingclean & transform
11String Methods.str accessor
12Dates & Times.dt accessor
13Adding & Dropping Columns/Rowsshape & combine
14GroupBy & Aggregationaggregate / stats
15Pivoting & Reshapingshape & combine
16Combining: Concatenateshape & combine
17Combining: Merge & Joinshape & combine
18Statistics & Descriptive Analysisaggregate / stats
19Window & Time Series Opsaggregate / stats
20Performance Tips & Gotchashandle with care
Common dtypesanywhere you see dtype:
Merge how= Quick-Readanywhere you see how=

pd.merge() join types, visually

Same two tables, four different results — depending on how=. Based on the classic SQL-join Venn diagrams used across the official docs and cheat sheets.

how='inner' ★

Keeps only rows whose key exists in both df1 and df2. Smallest possible result.

df1 df2 matched keys only

how='left' ★

Keeps every row of df1; unmatched df2 columns become NaN. Shape of the result follows df1.

df1 df2 all of df1 + matches

how='right'

Keeps every row of df2; unmatched df1 columns become NaN. Mirror image of a left join.

df1 df2 all of df2 + matches

how='outer'

Keeps every row from both sides; unmatched cells on either side become NaN. Largest possible result.

df1 df2 everything, NaN where unmatched

Worth memorizing

.loc vs .ilocloc is label-based & inclusive; iloc is position-based & exclusive
chained indexingdf[mask]['c']=x may silently fail — use df.loc[mask,'c']=x
copy vs viewa slice may or may not be a view — call .copy(), don't guess
merge vs joinmerge joins on columns; join defaults to joining on the index
apply(axis=1)row-wise apply is slow — vectorized column math beats it 10×+
groupby is lazynothing is computed until you call an aggregation like .mean()
NaN != NaNuse isnull()/notnull(), never ==, for missing checks
inplace=Truerarely faster, breaks chaining — prefer df = df.method()