import pandas as pd★Universal alias — alwayspd.import numpy as npAlmost always imported alongside pandas.pip install pandasInstall (orconda install pandas).pd.__version__Check the installed pandas version.
pd.read_csv('f.csv')★Read a CSV file into a DataFrame.pd.read_excel('f.xlsx', sheet_name='S1')★Read one sheet of an Excel workbook.pd.read_json('f.json')Read a JSON string, URL, or file.pd.read_sql(query, conn)Read the result of a SQL query.pd.read_html(url)Extract every <table> on a page as a list.pd.read_parquet('f.parquet')★Read a columnar Parquet file — fast & typed.pd.read_clipboard()Read whatever's on the system clipboard.
df.to_csv('f.csv', index=False)★Write to CSV;index=Falseskips row labels.df.to_excel('f.xlsx', sheet_name='S1')Write to an Excel sheet.df.to_json('f.json')Write to a JSON file.df.to_sql('table', conn)Write to a SQL table.df.to_parquet('f.parquet')Write to columnar Parquet format.
pd.Series([1,2,3])★One labeled 1D array.pd.DataFrame(dict)★Dict keys become columns, values become rows.pd.DataFrame(np.random.rand(20,5))Random 20-row, 5-column frame.pd.date_range('2024-01-01', periods=5)Build a DatetimeIndex.
df.head(n) df.tail(n)★First / last n rows (default 5).df.shape★(rows, columns) tuple.df.info()★Index, dtypes, non-null counts, memory use.df.describe()★Summary stats for numeric columns.df.dtypesData type of every column.df.columns df.indexColumn labels / row labels.df.valuesUnderlying data as a NumPy array.df.sample(n)n randomly selected rows.
df['col']★Select one column — returns a Series.df[['a','b']]★Select multiple columns — returns a DataFrame.df.loc['row', 'col']★Select by label — rows first, then columns.df.iloc[2, 0]★Select by integer position.df.at['row', 'col']Fast scalar access by label.df.iat[1, 2]Fast scalar access by position.df.loc[:, 'a':'c']Label slicing — inclusive of both endpoints.
df[df['col'] > 50]★Rows where a condition is True.df[(df.a>1)&(df.b<9)]Combine masks with& | ~(notand/or).df.query('col > 50')★Same filter, readable string syntax.df.filter(items=['a','b'])Select by column/row name list.df['col'].isin([...])Membership test against a list of values.df['col'].between(a, b)Inclusive range check.
df.sort_values('col')★Ascending sort by one or more columns.df.sort_values('c', ascending=False)Descending sort.df.sort_index()Sort by the row index/labels.df.nlargest(n, 'col')★Top n rows by a column's value.df.nsmallest(n, 'col')Bottom n rows by a column's value.
df.isnull() df.notnull()★Boolean mask of missing / present values.df.isnull().sum()★Missing-value count per column.df.dropna()★Drop rows containing any NaN.df.dropna(axis=1, thresh=n)Drop columns with fewer than n non-nulls.df.fillna(value)★Replace NaN with a constant, dict, or stat.df.interpolate()Fill gaps by interpolating neighboring values.
df['c'].astype('float')★Cast a column to a new dtype.df.rename(columns={'old':'new'})★Selectively rename columns.df['c'].replace(1, 'one')Replace matching values.df.apply(func)★Apply a function along an axis (rows/cols).df['c'].map(func)Elementwise transform of a Series.df.duplicated() df.drop_duplicates()Flag / remove duplicate rows.
df['c'].str.lower() / .upper()★Case conversion, elementwise.df['c'].str.strip()Remove leading/trailing whitespace.df['c'].str.contains('x')★Boolean mask — substring/regex match.df['c'].str.replace('a','b')Elementwise find-and-replace.df['c'].str.split(',')Split each string into a list.df['c'].str.startswith('M')Boolean prefix match.
pd.to_datetime(df['c'])★Parse strings/ints into datetime64.df['c'].dt.year / .month / .day★Extract date components via.dt.df.set_index('date').resample('M')Regroup a time series by frequency.pd.date_range(start, end, freq='D')Build a range of dates.df['c'].dt.dayofweekDay of week as an integer (Mon = 0).
df['new'] = values★Add a column (length must match rows).df.assign(x=lambda d: d.a*d.b)★Add columns via chained expressions.df.insert(1, 'c', values)Insert a column at a specific position.df.drop(columns=['c'])★Drop column(s) —axis=1.df.drop([1,3], axis=0)Drop row(s) by index label —axis=0.df.reset_index(drop=True)★Replace the index with a fresh 0..n range.
df.groupby('col')★Split into groups by one or more columns.df.groupby('c')['x'].mean()★Per-group mean of one column.df.groupby('c').agg(['sum','mean'])★Multiple stats per group at once.df.groupby('c').transform(func)Group stat, broadcast back to row shape.df.groupby(['c1','c2']).size()Row count per combination of groups.
df.pivot(index='a', columns='b', values='c')★Long → wide, no aggregation.df.pivot_table(index='a', values='c', aggfunc='mean')★Like pivot, but aggregates duplicate keys.pd.melt(df, id_vars='a')★Wide → long — unpivot columns into rows.df.stack() / df.unstack()Pivot column labels ↔ innermost index level.df.TTranspose rows and columns.
pd.concat([df1, df2])★Stack rows (axis=0, default).pd.concat([df1, df2], axis=1)Stack columns side by side.pd.concat([...], ignore_index=True)★Renumber the index 0..n after stacking.pd.concat([...], keys=['a','b'])Tag each source with an extra index level.
pd.merge(df1, df2, on='key')★SQL-style join on a shared column.pd.merge(df1, df2, how='left', on='k')★Keep every row ofdf1.pd.merge(df1, df2, how='outer', on='k')Keep every row from both sides.df1.join(df2, how='inner')Merge on the index instead of a column.pd.merge(df1, df2, left_on='a', right_on='b')Join on differently-named key columns.
df.sum() df.mean() df.median()★Per-column aggregate (axis=0default).df.std() df.var()Standard deviation / variance.df.corr()★Pearson correlation matrix between columns.df.cov()Covariance matrix between columns.df['c'].value_counts()★Frequency of each unique value.df['c'].cumsum()Running cumulative sum.df['c'].quantile([.25,.75])Values at the given percentiles.
df['c'].rolling(3).mean()★Moving average over a 3-row window.df['c'].expanding().sum()Cumulative aggregate from the start.df['c'].ewm(span=5).mean()Exponentially weighted moving average.df['c'].shift(1)★Lag/lead values by n rows.df.resample('M').mean()Downsample a datetime-indexed series.
df[df.a>0]['b'] = 1chainedChained indexing — triggers SettingWithCopyWarning.df.loc[df.a>0, 'b'] = 1safeThe single-step fix for the pattern above.inplace=TrueOften no faster, and breaks method chaining.df.apply(..., axis=1)Row-wise apply is slow — vectorize when possible.df.copy()Explicitly copy before mutating a slice you keep.
int64 / float64★Default numeric types.objectText/mixed columns — pandas' catch-all (slow).categoryEncode repeated strings efficiently — big memory win.boolTrue / False values.datetime64[ns]★Timestamps — enables the.dtaccessor.Int64 / booleanNullable dtypes (capitalized) — supportpd.NA.
how='inner'★Only keys present in both frames.how='left'★All of the left frame's keys, matched where possible.how='right'All of the right frame's keys, matched where possible.how='outer'Every key from both frames; NaN where unmatched.