Comprehensive Quick Reference · fast, multi-threaded DataFrames in Python

polars cheat sheet v2 · all namespaces

A single source of truth across the whole Polars surface: the DataFrame/LazyFrame/Series objects, the full expression API with its .str / .dt / .list / .struct / .cat / .name namespaces, selectors, horizontal & window functions, the SQL interface, streaming sinks, config, interop & the extension API. It all rests on one idea: expressions plug into contexts, and you choose eager or a lazy, query-optimized run.

create / load inspect / config select & filter shape & combine expr namespaces aggregate / stats SQL / extend lazy / streaming handle with care most common

Validated against the official Polars Python API reference (docs.pola.rs) — DataFrame, LazyFrame, Series, Expressions & their namespaces, Selectors, SQL, Config, I/O, the extension API & testing; cross-checked with the franzdiebold PDF sheet, rhosignal (tested pandas→polars), Real Python & dailydoseofds. v2 gap-analysis edition.

The Polars surface — expressions, their namespaces & the wider API
DataFrameeager LazyFrameoptimized plan Seriesone column Expression (pl.col / pl.lit / pl.when) runs inside select / with_columns / filter / group_by.agg expression namespaces (typed sub-APIs) .str .dt .list .struct .cat .name the wider API selectors (cs.*)cs.numeric() · cs.string() SQLpl.sql · SQLContext lazy / streamingscan_* · sink_* · collect Config · api · testingregister_*_namespace Apache Arrow columnar memory — zero-copy interop with pandas / numpy / pyarrow

I · Core: Load, Create & Inspect I/O · constructors · config · interop

01Setup & Configcreate · inspect
02Reading Data (eager)create · pl.read_*
03Scanning Data (lazy)lazy · pl.scan_*
04Writing & Streaming Sinkscreate · lazy
05Creating & Interopcreate
06Inspect & Exploreinspect

II · Select, Filter & Selectors [] · expressions · cs.* selectors

07Selecting Columns & Rowsselect & filter
08Selectorsselect · cs.*
09Filtering & Boolean Logicselect & filter

III · Expressions & Transformation with_columns · when/then · missing · UDFs

10Column Expressions & Castingshape & combine
11Conditionals & Horizontal Opsselect & combine
12Missing Values & Duplicatesclean
13UDFs & Custom Logichandle with care

IV · Expression Namespaces .str · .dt · .list · .struct · .cat

14String Expressions.str namespace
15Date & Time Expressions.dt namespace
16List & Array Expressions.list / .arr namespace
17Struct, Categorical & Name.struct / .cat / .name

V · Reshape, Combine & Aggregate joins · concat · group_by · window · pivot

18Sorting, Rank & Top-Nselect & filter
19Joinsshape & combine
20Combining: Concatenateshape & combine
21GroupBy & Aggregationaggregate / stats
22Window & Time-Grouped Opsaggregate / stats
23Reshaping: Pivot & Unpivotshape & combine

VI · Lazy, Streaming, SQL & Extend optimizer · engine · SQLContext · api

24Lazy Execution & Optimizationlazy
25SQL Interfacesql
26Extension API & Testingsql · pl.api / testing
27Descriptive Statisticsaggregate / stats
28Ranges, Literals & Helperstop-level pl.*
29Performance & Gotchashandle with care
Common Data Typesanywhere you see dtype:
Join how= Quick-Readdf.join(..., how=)
Expression Contextswhere expressions run

Expressions into contexts & the lazy optimizer

Two ideas carry most of Polars: one expression works in every context, and the lazy engine rewrites your whole plan before running it. Based on the official Polars user-guide diagrams.

One expression, four contexts ★

The same pl.col('x').sum() works in select, with_columns, filter (as a predicate) and group_by().agg() — no per-operation API to relearn.

pl.col('x')... .select(expr) .with_columns(expr) .filter(expr) .group_by().agg(expr)

Lazy pushdown optimization ★

The optimizer pushes filter and column selection down into the scan, so unneeded rows & columns are never read from disk.

naive scan: ALL rows/cols filter select 2 cols ↓ optimizer optimized scan: only 2 cols + matched rowspredicate & projection pushdown result far fewer bytes read from disk / Parquet

Worth memorizing

expressions are reusablethe same pl.col(...) works in select/with_columns/filter/agg
eager vs lazyscan_*...collect() lets the optimizer rewrite the plan first
selectorscs.numeric() etc. select whole column families & support set-ops
no inplaceevery method returns a new object — always reassign
null vs NaNdistinct — fill_null handles missing, fill_nan handles float NaN
sink_* streams to disklazy write that never fully materializes the frame
map_elements is the slow pathnative expressions run in Rust, parallelize & optimize; UDFs don't
group_by is unorderedpass maintain_order=True if you need stable group order
outer → full, melt → unpivotPolars renamed several pandas concepts
SQL on framesdf.sql("... FROM self") or pl.sql() over the whole engine
.list.eval + pl.element()run an expression across the elements inside each list cell
collect_all shares workCSE runs common subplans of multiple queries only once