Polars Cheat Sheet

01Setup & Importcreate

import polars as pl★
Universal alias — always pl.
import polars.selectors as cs
Dtype- and name-based column selectors.
pip install polars
Add extras, e.g. polars[numpy,pandas,pyarrow].
pl.__version__ / pl.show_versions()
Check the installed Polars version.
pl.Config.set_tbl_rows(n)
Control how many rows print to the console.

02Reading Datacreate

pl.read_csv('f.csv')★
Read CSV eagerly into memory.
pl.read_parquet('f.parquet')★
Read columnar Parquet — the Polars-native default.
pl.read_json('f.json')
Read a JSON file.
pl.scan_csv('f.csv')★
Lazy CSV read — no data loaded until .collect().
pl.read_database(query, connection_uri)
Read the result of a SQL query.
pl.from_pandas(pandas_df)
Convert from a pandas DataFrame.

03Writing Datacreate

df.write_csv('f.csv')★
Write to CSV.
df.write_parquet('f.parquet')★
Write columnar Parquet — fast, typed, compressed.
df.write_json('f.json')
Write to a JSON file.
df.write_database('table', connection_uri)
Write to a relational database.

04Creating DataFrames & Seriescreate

pl.DataFrame({'a': [1,2,3]})★
Dict keys become columns, values become rows.
pl.Series('a', [1,2,3])★
One named, typed 1D array.
pl.DataFrame({...}, schema={'a': pl.Int64})
Pass an explicit schema.
df.lazy()★
Convert an eager DataFrame to a LazyFrame.

05Inspect & Exploreinspect

df.head(n) / df.tail(n)★
First / last n rows (default 5).
df.shape★
(rows, columns) tuple.
df.schema / df.dtypes★
Column name→type mapping / just the type list.
df.columns
List of column names.
df.describe()★
Summary stats for every column.
df.glimpse()
Compact, transposed preview — great for wide frames.
df.null_count()★
Missing-value count per column.
df.estimated_size()
Approximate memory footprint in bytes.

06Selecting Columns & Rowsselect & filter

df.select('a', 'b')★
Project a subset of columns.
df.select(pl.col('a', 'b'))★
Same, via the expression API.
df[1, 1] df[1:3] df[:, 1:]
Positional [] indexing — like pandas' .iloc.
df.filter(pl.col('age') > 24)★
Keep rows matching a boolean expression.
df.filter(a=1, b=2)
Keyword shortcut — equality filters only.
df.get_column('a')
Pull one column out as a Series.

07Filtering & Boolean Logicselect & filter

df.filter((pl.col('a')>1)&(pl.col('b')<9))★
Combine conditions with & | ~, in parens.
df.filter(pl.col('a')>1, pl.col('b')<9)★
Comma-separated = implicit AND, no parens needed.
pl.col('a').is_in([...])★
Membership test against a list of values.
pl.col('a').is_between(2, 4)★
Inclusive range check.
pl.col('a').is_null() / .is_not_null()
Boolean null checks.

08Column Expressions & Castingshape & combine

df.with_columns(pl.col('a') * 2)★
Add or replace a column (returns a new DataFrame).
df.with_columns(new = expr)★
Keyword form — name the new column inline.
df.with_columns(pl.col('a').cast(pl.Float64))★
Cast a column to a new dtype.
df.rename({'old': 'new'})★
Rename one or more columns.
df.drop('a', 'b')★
Remove one or more columns.

09Conditional Logicselect & filter

pl.when(cond).then(val).otherwise(default)★
Vectorized if/elif/else — chain more .when() for elif.
pl.coalesce('a', 'b', pl.lit(0))
First non-null value across columns.

10Missing Values & Duplicatesclean & transform

df.null_count()★
Missing-value count per column.
df.drop_nulls()★
Drop rows containing any null.
df.drop_nulls(subset=['a'])
Drop rows null in specific columns only.
df.fill_null(value) / strategy='forward'★
Replace nulls — constant or fill strategy.
df.fill_nan(value)
Separate from fill_null — NaN ≠ null in Polars.
df.unique()★
Remove duplicate rows.
df.unique(subset=['a', 'b'])
De-duplicate on a subset of columns.

11String Expressions.str accessor

pl.col('c').str.to_uppercase() / to_lowercase()★
Case conversion.
pl.col('c').str.strip_chars()
Strip whitespace (or given characters).
pl.col('c').str.contains('pat')★
Boolean mask — substring/regex match.
pl.col('c').str.replace('a','b') / replace_all(...)
Replace first match / all matches.
pl.col('c').str.split('-')
Split into a list column.
pl.col('c').str.len_chars()
Character count of each string.

12Date & Time Expressions.dt accessor

pl.col('d').str.to_date('%Y-%m-%d')★
Parse a string column into a Date.
pl.col('d').dt.year() / .month() / .day()★
Extract date components.
pl.col('d').dt.weekday()
ISO weekday — 1 = Monday.
pl.col('d') + pl.duration(days=1)
Date arithmetic with pl.duration().
pl.date_range(start, end, interval='1d')
Build a range of dates.

13Numeric & Math Expressionsclean & transform

pl.col('a').round(2) / .floor() / .ceil()
Rounding functions.
pl.col('a').abs()
Absolute value.
pl.col('a').clip(0, 100)
Clamp values into a range.
pl.col('a') + pl.col('b')★
Elementwise arithmetic between expressions.

14List & Struct Operationsshape & combine

df.explode('list_col')★
One output row per list element.
pl.col('list_col').list.len()★
Length of each list.
pl.col('list_col').list.get(0)
Element at index 0 of each list.
pl.struct(['a', 'b']).alias('s')
Bundle columns into a struct column.
df.unnest('s')★
Expand a struct column back into flat columns.

15Joinsshape & combine

df1.join(df2, on='key', how='inner')★
SQL-style join on a shared column.
df1.join(df2, on='key', how='left')★
Keep every row of df1.
df1.join(df2, left_on='a', right_on='b')
Join on differently named key columns.
df1.join(df2, on='key', how='anti')★
Rows in df1 with NO match in df2.
df1.join_asof(df2, on='date', by='id')
Nearest-match join — common for time series.
df1.join(df2, how='cross')
Cartesian product.

16Combining: Concatenateshape & combine

pl.concat([df1, df2])★
Stack rows (default: vertical).
pl.concat([df1, df2], how='diagonal')
Stack rows, aligning by name, filling gaps with null.
df1.hstack(df2)
Stack columns side by side, no key needed.
df1.vstack(df2)
Append rows in place — mutates df1.

17GroupBy & Aggregationaggregate / stats

df.group_by('g').agg(pl.col('x').mean())★
Per-group mean of one column.
df.group_by('g').agg(pl.col('x').sum().alias('x_sum'))★
Multiple named aggregates per group.
df.group_by('g').agg(pl.all().sum())
Aggregate every remaining column at once.
df.group_by(['g1', 'g2']).len()★
Row count per group — Polars' .size().
df.group_by_dynamic('date', every='1mo').agg(...)
Time-bucketed grouping — like pandas' resample.

18Window Expressions — .over()aggregate / stats

pl.col('x').sum().over('g')★
Group aggregate broadcast back to every row.
pl.col('x').rank().over('g')★
Per-group ranking.
pl.col('x').cum_sum().over('g')★
Per-group running total.
pl.col('x').shift(1).over('g')
Per-group lag.
pl.col('x').rolling_mean(window_size=3)
Moving average over a fixed window.

19Sorting & Top-Nselect & filter

df.sort('a')★
Ascending sort by one or more columns.
df.sort('a', descending=True)★
Descending sort.
df.sort(['a','b'], descending=[False,True])
Mixed sort direction, per column.
df.top_k(n, by='a')★
Top n rows by a column's value.
df.bottom_k(n, by='a')
Bottom n rows by a column's value.

20Reshaping: Pivot & Unpivotshape & combine

df.pivot(index='g', on='k', values='v')★
Long → wide (on replaces pandas' columns=).
df.unpivot(index='g')★
Wide → long — Polars' modern name for melt.
df.transpose()
Flip rows and columns.

21Lazy Execution & Performancehandle with care

df.lazy()★
Start a query plan instead of running immediately.
lazy_df.collect()★
Optimize the whole plan, then execute it once.
lazy_df.explain()
Print the (optimized) query plan.
lazy_df.collect(streaming=True)
Process larger-than-RAM data in batches.
pl.scan_parquet('f.parquet')★
Lazy scan — enables predicate/projection pushdown.

★Common Data Typesanywhere you see dtype:

pl.Int8/16/32/64 pl.UInt8..64★
Signed / unsigned integers.
pl.Float32 / pl.Float64
Floating-point numeric types.
pl.Utf8 (alias pl.String)★
Text / string columns.
pl.Boolean
True / False values.
pl.Date pl.Datetime pl.Duration★
Calendar date, date+time, and time delta.
pl.List(inner) / pl.Struct
Nested column types — list / struct.
pl.Categorical
Efficient encoding for repeated strings.

★Join how= Quick-Readdf.join(..., how=)

'inner'★
Only keys present in both DataFrames.
'left'★
All rows from the left frame, matched where possible.
'full'
Every key from both sides — Polars' name for outer.
'semi'
Left rows that have a match — right columns dropped.
'anti'★
Left rows with no match in the right — great for exclusions.
'cross'
Cartesian product, no key needed.

polars cheat sheet

Why Polars is fast, visually

Expressions run in parallel ★

Eager vs. lazy execution

Predicate & projection pushdown

Zero-copy Arrow interop

Worth memorizing