Quick Reference · fast, multi-threaded DataFrames in Python

polars cheat sheet

Polars is built on two ideas: expressions (composable, reusable computations like pl.col('x').mean()) that plug into contexts (select, filter, with_columns, group_by), and a choice between eager execution and a lazy, query-optimized one via .lazy()...collect().

create / load inspect select & filter shape & combine clean & transform aggregate / stats handle with care most common

Distilled & cross-checked across: docs.pola.rs (official Python API reference) · the franzdiebold PDF cheat sheet · rhosignal.com (pandas→polars, tested) · a community gist cheat sheet · dailydoseofds.com pandas/polars/SQL/PySpark translations

Expressions plug into Contexts — the core Polars mental model
EXPRESSIONS — reusable building blocks CONTEXTS — where they run pl.col('price') * 1.1 pl.col('qty').sum() pl.col('name').str.to_uppercase() expr df.select(expr) df.with_columns(expr) df.filter(expr) df.group_by('g').agg(expr) the same expression syntax works in every context — no separate API to learn per operation
01Setup & Importcreate
02Reading Datacreate
03Writing Datacreate
04Creating DataFrames & Seriescreate
05Inspect & Exploreinspect
06Selecting Columns & Rowsselect & filter
07Filtering & Boolean Logicselect & filter
08Column Expressions & Castingshape & combine
09Conditional Logicselect & filter
10Missing Values & Duplicatesclean & transform
11String Expressions.str accessor
12Date & Time Expressions.dt accessor
13Numeric & Math Expressionsclean & transform
14List & Struct Operationsshape & combine
15Joinsshape & combine
16Combining: Concatenateshape & combine
17GroupBy & Aggregationaggregate / stats
18Window Expressions — .over()aggregate / stats
19Sorting & Top-Nselect & filter
20Reshaping: Pivot & Unpivotshape & combine
21Lazy Execution & Performancehandle with care
Common Data Typesanywhere you see dtype:
Join how= Quick-Readdf.join(..., how=)

Why Polars is fast, visually

Three ideas do most of the work: automatic multi-threading, a lazy query optimizer, and zero-copy Apache Arrow memory. Based on the official Polars user-guide diagrams.

Expressions run in parallel ★

Independent expressions inside one select/with_columns call are automatically spread across CPU cores.

col('a').sum() col('b').mean() col('c').max() core 1 core 2 core 3 result

Eager vs. lazy execution

Eager runs and materializes every step. Lazy (.lazy()...collect()) builds one plan the optimizer can reorder before running it once.

eager df .filter() .select() each step runs & materializes immediately lazy scan filter select .collect() optimizer reorders steps; one pass, at .collect()

Predicate & projection pushdown

The lazy optimizer pushes filters and column selection down into the scan itself, so unneeded rows & columns are never read.

naive read all cols+rows filter select 2 cols optimized scan: 2 cols, matched rows result far fewer bytes read from disk/Parquet

Zero-copy Arrow interop

Polars uses Apache Arrow's columnar memory layout, so converting to/from pandas or NumPy is often near-instant — no copying.

Apache Arrow Polars pandas NumPy

Worth memorizing

eager vs lazydf.lazy()...collect() lets the optimizer rewrite your plan first
expressions are reusablethe same pl.col(...) expr works in select/filter/agg/with_columns
no inplaceevery method returns a new object — always reassign, df = df.with_columns(...)
null vs NaNnull is the universal missing marker; NaN is a distinct float value
group_by row ordernot guaranteed by default — pass maintain_order=True if needed
melt → unpivotpandas' melt/columns= became Polars' unpivot/on=
outer → fullpandas' how='outer' is Polars' how='full'
multi-threaded by defaultexpressions inside one context run in parallel automatically