Polars Cheat Sheet — Comprehensive Edition

01Setup & Configcreate · inspect

import polars as pl★
Universal alias — always pl.
import polars.selectors as cs★
Dtype- and name-based column selectors.
pl.Config.set_tbl_rows(50)★
Show more rows; also set_tbl_cols, set_fmt_str_lengths.
with pl.Config(tbl_rows=100): ...
Scoped config as a context manager.
pl.__version__ pl.show_versions()
Version / full environment report.

02Reading Data (eager)create · pl.read_*

pl.read_csv('f.csv')★
Read CSV eagerly into memory.
pl.read_parquet('f.parquet')★
Columnar Parquet — Polars-native default.
pl.read_json pl.read_ndjson pl.read_ipc
JSON / line-delimited JSON / Arrow IPC.
pl.read_excel('f.xlsx', sheet_name='S1')
Read an Excel/ODS sheet.
pl.read_database(query, connection)
Read a SQL query result.

03Scanning Data (lazy)lazy · pl.scan_*

pl.scan_csv('f.csv')★
Lazy CSV — nothing read until .collect().
pl.scan_parquet('f.parquet')★
Enables predicate & projection pushdown.
pl.scan_ndjson pl.scan_ipc pl.scan_delta
Lazy readers for other formats.
pl.scan_parquet('s3://bucket/*.parquet')★
Glob & cloud paths supported.
scan_* → LazyFrame; read_* → DataFrame
Prefer scan for big data.

04Writing & Streaming Sinkscreate · lazy

df.write_parquet('f.parquet')★
Eager write (also write_csv, write_ipc).
lf.sink_parquet('f.parquet')★
Stream a lazy query to disk — never fully materialized.
lf.sink_csv / sink_ipc / sink_ndjson
Streaming writers for other formats.
df.write_database('table', connection)
Write to a relational database.
df.write_excel('f.xlsx')
Write a formatted Excel sheet.

05Creating & Interopcreate

pl.DataFrame({'a': [1,2,3]})★
Dict keys → columns, values → rows.
pl.DataFrame(data, schema={'a': pl.Int64})
Explicit schema (name → dtype).
pl.Series('a', [1,2,3])★
One named, typed 1D array.
pl.from_pandas(pdf) pl.from_arrow(tbl)★
Zero-copy-ish bridges in.
df.to_pandas() df.to_numpy() df.to_arrow()★
Bridges out; also to_dicts().
df.lazy() lf.collect()★
Switch between eager & lazy.

06Inspect & Exploreinspect

df.head(n) df.tail(n) df.glimpse()★
Preview; glimpse is transposed, great for wide frames.
df.shape df.schema df.dtypes df.columns★
Dimensions / name→type map / types / names.
df.describe()★
Summary stats for every column.
df.null_count() df.estimated_size()
Nulls per column / memory footprint.
df.get_column('a') df.to_series(0)
Pull a column out as a Series.
df.row(0) df.rows() df.iter_rows()
Materialize rows as tuples (use sparingly).

07Selecting Columns & Rowsselect & filter

df.select('a', 'b')★
Project a subset of columns.
df.select(pl.col('a'), pl.col('b') * 2)★
Select with expressions.
df[1, 1] df[1:3] df[:, 1:]
Positional [] — like pandas' .iloc.
df.select(pl.col('^col_.*$'))
Regex column selection.
df.select(pl.exclude('a')) pl.all()★
All columns except / every column.
df.with_row_index('idx')★
Add a 0..n row-index column.

08Selectorsselect · cs.*

df.select(cs.numeric())★
All numeric columns by dtype.
cs.string() cs.temporal() cs.boolean()★
Select whole dtype families.
cs.starts_with('x') cs.ends_with('_id')★
Name-pattern selectors.
cs.numeric() - cs.first()
Set-ops on selectors: | & ~ -.
df.with_columns(cs.numeric().fill_null(0))★
Broadcast one expression over selected columns.
cs.by_dtype(pl.Float64) cs.matches('regex')
By explicit dtype / regex.

09Filtering & Boolean Logicselect & filter

df.filter(pl.col('age') > 24)★
Keep rows matching a boolean expression.
df.filter(pl.col('a')>1, pl.col('b')<9)★
Comma-separated = implicit AND.
df.filter((pl.col('a')>1) | (pl.col('b')<9))
Explicit & | ~ in parens.
pl.col('a').is_in([...]) .is_between(2,4)★
Membership / inclusive range.
pl.col('a').is_null() .is_not_null() .is_nan()
Null / NaN masks (they're distinct).

10Column Expressions & Castingshape & combine

df.with_columns(pl.col('a') * 2)★
Add / replace a column (returns new frame).
df.with_columns(new = pl.col('a') + pl.col('b'))★
Keyword form names the output column.
pl.col('a').cast(pl.Float64)★
Cast a column to a new dtype.
pl.col('a').alias('b') .name.suffix('_x')★
Rename one column / bulk-rename via .name.
df.rename({'old':'new'}) df.drop('a','b')★
Rename / drop columns.

11Conditionals & Horizontal Opsselect & combine

pl.when(c1).then(v1).when(c2).then(v2).otherwise(d)★
Vectorized if/elif/else — chain more .when().
pl.coalesce('a', 'b', pl.lit(0))★
First non-null value across columns.
pl.sum_horizontal('a','b') pl.mean_horizontal(...)★
Row-wise aggregate across columns.
pl.min_horizontal(...) pl.max_horizontal(...)
Row-wise min / max.
pl.concat_list(['a','b']) pl.concat_str([...], separator='-')
Combine columns into a list / joined string.
pl.fold(acc, fn, exprs)
General horizontal reduction.

12Missing Values & Duplicatesclean

df.fill_null(0) .fill_null(strategy='forward')★
Replace nulls — value or strategy.
df.drop_nulls(subset=['a'])★
Drop rows with nulls.
df.fill_nan(0)
Fill NaN — separate from null in Polars.
pl.col('a').interpolate()
Fill gaps by interpolation.
df.unique(subset=['a','b'], keep='first')★
De-duplicate rows.
df.is_duplicated() df.is_unique()
Row-level duplicate masks.

13UDFs & Custom Logichandle with care

pl.col('a').map_elements(fn, return_dtype=pl.Int64)slow
Per-element Python UDF — last resort.
pl.col('a').map_batches(fn)★
Whole-Series UDF — faster than per-element.
df.map_rows(fn)
Frame-level row UDF (loses column names).
prefer native expressionsfaster
Native exprs run in Rust, parallelize & optimize; UDFs don't.

14String Expressions.str namespace

pl.col('c').str.to_uppercase() / to_lowercase()★
Case conversion.
.str.contains('pat') .str.starts_with('x')★
Boolean substring / prefix / regex match.
.str.replace('a','b') .str.replace_all(...)★
Replace first / all matches.
.str.extract(r'(\d+)', 1) .str.extract_all(...)
Pull out regex capture groups.
.str.split('-') .str.slice(0, 3) .str.strip_chars()★
Split / substring / trim.
.str.to_datetime('%Y-%m-%d') .str.json_decode()
Parse strings into dates / structs.

15Date & Time Expressions.dt namespace

pl.col('d').dt.year() / .month() / .day()★
Extract date components.
.dt.weekday() .dt.hour() .dt.ordinal_day()
More calendar parts.
.dt.truncate('1mo') .dt.round('1h')★
Snap timestamps to a grid.
.dt.offset_by('1d') + pl.duration(days=1)
Date arithmetic.
.dt.convert_time_zone('Asia/Kolkata')
Shift timezone (also replace_time_zone).
.dt.strftime('%b %Y') .dt.to_string()
Format as string.

16List & Array Expressions.list / .arr namespace

pl.col('l').list.len() .list.get(0)★
Length / element by index.
.list.sum() .list.mean() .list.max()★
Aggregate within each list.
.list.eval(pl.element() * 2)★
Run an expression over each list's elements.
.list.contains(x) .list.unique() .list.sort()
Membership / dedup / sort per list.
.list.gather([0,2]) .list.slice(0,2)
Pick / slice elements.
.arr.* (fixed-size Array dtype)
Parallel namespace for the fixed-width Array type.

17Struct, Categorical & Name.struct / .cat / .name

pl.struct(['a','b']).alias('s')★
Bundle columns into a struct.
pl.col('s').struct.field('a')★
Pull one field out of a struct.
df.unnest('s') .struct.rename_fields([...])★
Explode a struct back into flat columns.
pl.col('c').cast(pl.Categorical)★
Encode repeated strings — big memory win.
.cat.get_categories() .cat.set_ordering(...)
Inspect / order categories.
.name.prefix('x_') .name.to_lowercase()
Programmatic column renaming.

18Sorting, Rank & Top-Nselect & filter

df.sort('a', descending=True)★
Sort by one or more columns.
df.sort(['a','b'], descending=[False,True])
Mixed sort direction per column.
df.top_k(n, by='a') df.bottom_k(n, by='a')★
Top / bottom n rows by a column.
pl.col('a').rank() .arg_sort() .sort_by('b')★
Rank / sorting-index / sort by another column.
pl.col('a').search_sorted(x)
Insertion point in a sorted column.

19Joinsshape & combine

df1.join(df2, on='key', how='inner')★
Join on a shared column.
df1.join(df2, on='key', how='left')★
Keep every row of df1.
df1.join(df2, left_on='a', right_on='b')
Join on differently-named keys.
df1.join(df2, on='k', how='anti')★
Rows of df1 with NO match — great for exclusions.
df1.join_asof(df2, on='date', by='id')★
Nearest-match join — for time series.
df1.join_where(df2, pl.col('a') > pl.col('b'))
Inequality / non-equi join.

20Combining: Concatenateshape & combine

pl.concat([df1, df2])★
Stack rows (default: vertical).
pl.concat([df1, df2], how='diagonal')★
Align by name, fill missing with null.
pl.concat([df1, df2], how='horizontal')
Stack columns side by side.
df1.vstack(df2) df1.hstack(df2)
Append rows / columns in place.
df1.extend(df2)
Append rows, reusing memory where possible.

21GroupBy & Aggregationaggregate / stats

df.group_by('g').agg(pl.col('x').mean())★
Per-group aggregate.
.agg(m=pl.col('x').mean(), s=pl.col('y').sum())★
Multiple named aggregates.
.agg(pl.col('x').sum(), pl.len(), pl.first('y'))★
pl.len / first / n_unique / quantile in agg.
.agg(pl.all().sum()) .agg(cs.numeric().mean())
Aggregate every / selected column at once.
df.group_by('g', maintain_order=True)★
Preserve group order (else unordered & faster).

22Window & Time-Grouped Opsaggregate / stats

pl.col('x').sum().over('g')★
Group aggregate broadcast back to every row.
pl.col('x').rank().over('g') .cum_sum().over('g')★
Per-group rank / running total.
pl.col('x').shift(1).over('g')
Per-group lag / lead.
pl.col('x').rolling_mean(window_size=3)★
Moving-window aggregate.
df.group_by_dynamic('t', every='1mo').agg(...)★
Time-bucketed grouping (like resample).
df.rolling(index_column='t', period='7d').agg(...)
Rolling time-window groups.

23Reshaping: Pivot & Unpivotshape & combine

df.pivot(index='g', on='k', values='v', aggregate_function='sum')★
Long → wide (on replaces pandas' columns=).
df.unpivot(index='g')★
Wide → long (Polars' name for melt).
df.explode('list_col')★
One row per list element.
df.transpose(include_header=True)
Flip rows and columns.
df.partition_by('g')
Split into a list of frames, one per group.

24Lazy Execution & Optimizationlazy

df.lazy() ... .collect()★
Build a plan, optimize, then run once.
lf.explain() lf.show_graph()★
Print / visualize the (optimized) query plan.
lf.collect(engine='streaming')★
Batch execution for larger-than-RAM data.
pl.collect_all([q1, q2])★
Run several plans together — shares common subplans (CSE).
lf.profile()
Collect + per-node timing breakdown.

25SQL Interfacesql

df.sql("SELECT a, b*2 FROM self WHERE a > 1")★
Query a frame directly — refer to it as self.
pl.sql("SELECT * FROM df1 JOIN df2 USING(a)")★
Query frames in the global namespace.
ctx = pl.SQLContext(frame=lf)
Managed context — register & query named tables.
ctx.execute("SELECT ...").collect()
Run a query inside a context.
pl.sql_expr("a * 2 AS d")
Turn a SQL fragment into a native expression.

26Extension API & Testingsql · pl.api / testing

@pl.api.register_expr_namespace('my')★
Attach custom pl.col(...).my.* methods.
@pl.api.register_dataframe_namespace('my')
Attach custom df.my.* methods.
from polars.testing import assert_frame_equal★
Equality assert for tests (also assert_series_equal).
pl.col('a').meta.output_name()
.meta — introspect an expression itself.
with pl.StringCache(): ...
Share categorical encodings across frames.

27Descriptive Statisticsaggregate / stats

df.mean() df.sum() df.median() df.std()★
Whole-frame column aggregates.
pl.col('a').quantile(0.9) .mode()
Percentile / most-frequent value.
pl.col('a').n_unique() .value_counts()★
Distinct count / frequency table.
df.select(pl.corr('a','b')) pl.cov('a','b')
Correlation / covariance.
pl.col('a').cum_sum() .diff() .pct_change()★
Cumulative / period-over-period.

28Ranges, Literals & Helperstop-level pl.*

pl.int_range(0, 10) pl.arange(0, 10, 2)★
Integer ranges as expressions.
pl.date_range(start, end, interval='1d')★
Range of dates.
pl.lit(5) pl.lit([1,2,3])★
Wrap a Python value as an expression.
pl.select(pl.int_range(0, 5))
Run expressions with no source frame.
pl.element() pl.first() pl.last()
Context tokens used inside .list.eval / agg.

29Performance & Gotchashandle with care

map_elements Python UDFsslow
Break parallelism & optimization — avoid in hot paths.
scan_* + collect over read_*lean
Pushdown means fewer bytes read from disk.
no inplace methods
Every method returns a new frame — always reassign.
null ≠ NaN
Two distinct concepts — fill_null vs fill_nan.
Categorical + StringCachelean
Cut memory on repeated strings across frames.

★Common Data Typesanywhere you see dtype:

pl.Int8/16/32/64 pl.UInt8..64★
Signed / unsigned integers.
pl.Float32 pl.Float64
Floating-point numerics.
pl.String pl.Boolean★
Text (alias Utf8) / boolean.
pl.Date pl.Datetime pl.Duration pl.Time★
Temporal types.
pl.List(inner) pl.Array(inner, n) pl.Struct
Nested — variable list / fixed array / record.
pl.Categorical pl.Enum([...]) pl.Decimal
Encoded strings / fixed set / exact decimal.

★Join how= Quick-Readdf.join(..., how=)

'inner'★
Only keys present in both frames.
'left'★
All rows from the left frame.
'full'
Every key from both — Polars' name for outer.
'semi'
Left rows that have a match — right cols dropped.
'anti'★
Left rows with no match — great for exclusions.
'cross'
Cartesian product, no key needed.

★Expression Contextswhere expressions run

df.select(expr)★
Keep only the produced columns.
df.with_columns(expr)★
Add produced columns to the existing frame.
df.filter(expr)★
Keep rows where a boolean expression is true.
df.group_by('g').agg(expr)★
Reduce each group with the expression.
same expr syntax in all four
Learn the expression once, reuse everywhere.

polars cheat sheet v2 · all namespaces

I · Core: Load, Create & Inspect I/O · constructors · config · interop

II · Select, Filter & Selectors [] · expressions · cs.* selectors

III · Expressions & Transformation with_columns · when/then · missing · UDFs

IV · Expression Namespaces .str · .dt · .list · .struct · .cat

V · Reshape, Combine & Aggregate joins · concat · group_by · window · pivot

VI · Lazy, Streaming, SQL & Extend optimizer · engine · SQLContext · api

Expressions into contexts & the lazy optimizer

One expression, four contexts ★

Lazy pushdown optimization ★

Worth memorizing