import polars as pl★Universal alias — alwayspl.import polars.selectors as csDtype- and name-based column selectors.pip install polarsAdd extras, e.g.polars[numpy,pandas,pyarrow].pl.__version__ / pl.show_versions()Check the installed Polars version.pl.Config.set_tbl_rows(n)Control how many rows print to the console.
pl.read_csv('f.csv')★Read CSV eagerly into memory.pl.read_parquet('f.parquet')★Read columnar Parquet — the Polars-native default.pl.read_json('f.json')Read a JSON file.pl.scan_csv('f.csv')★Lazy CSV read — no data loaded until.collect().pl.read_database(query, connection_uri)Read the result of a SQL query.pl.from_pandas(pandas_df)Convert from a pandas DataFrame.
df.write_csv('f.csv')★Write to CSV.df.write_parquet('f.parquet')★Write columnar Parquet — fast, typed, compressed.df.write_json('f.json')Write to a JSON file.df.write_database('table', connection_uri)Write to a relational database.
pl.DataFrame({'a': [1,2,3]})★Dict keys become columns, values become rows.pl.Series('a', [1,2,3])★One named, typed 1D array.pl.DataFrame({...}, schema={'a': pl.Int64})Pass an explicit schema.df.lazy()★Convert an eager DataFrame to a LazyFrame.
df.head(n) / df.tail(n)★First / last n rows (default 5).df.shape★(rows, columns) tuple.df.schema / df.dtypes★Column name→type mapping / just the type list.df.columnsList of column names.df.describe()★Summary stats for every column.df.glimpse()Compact, transposed preview — great for wide frames.df.null_count()★Missing-value count per column.df.estimated_size()Approximate memory footprint in bytes.
df.select('a', 'b')★Project a subset of columns.df.select(pl.col('a', 'b'))★Same, via the expression API.df[1, 1] df[1:3] df[:, 1:]Positional[]indexing — like pandas'.iloc.df.filter(pl.col('age') > 24)★Keep rows matching a boolean expression.df.filter(a=1, b=2)Keyword shortcut — equality filters only.df.get_column('a')Pull one column out as a Series.
df.filter((pl.col('a')>1)&(pl.col('b')<9))★Combine conditions with& | ~, in parens.df.filter(pl.col('a')>1, pl.col('b')<9)★Comma-separated = implicit AND, no parens needed.pl.col('a').is_in([...])★Membership test against a list of values.pl.col('a').is_between(2, 4)★Inclusive range check.pl.col('a').is_null() / .is_not_null()Boolean null checks.
df.with_columns(pl.col('a') * 2)★Add or replace a column (returns a new DataFrame).df.with_columns(new = expr)★Keyword form — name the new column inline.df.with_columns(pl.col('a').cast(pl.Float64))★Cast a column to a new dtype.df.rename({'old': 'new'})★Rename one or more columns.df.drop('a', 'b')★Remove one or more columns.
pl.when(cond).then(val).otherwise(default)★Vectorized if/elif/else — chain more.when()for elif.pl.coalesce('a', 'b', pl.lit(0))First non-null value across columns.
df.null_count()★Missing-value count per column.df.drop_nulls()★Drop rows containing any null.df.drop_nulls(subset=['a'])Drop rows null in specific columns only.df.fill_null(value) / strategy='forward'★Replace nulls — constant or fill strategy.df.fill_nan(value)Separate fromfill_null— NaN ≠ null in Polars.df.unique()★Remove duplicate rows.df.unique(subset=['a', 'b'])De-duplicate on a subset of columns.
pl.col('c').str.to_uppercase() / to_lowercase()★Case conversion.pl.col('c').str.strip_chars()Strip whitespace (or given characters).pl.col('c').str.contains('pat')★Boolean mask — substring/regex match.pl.col('c').str.replace('a','b') / replace_all(...)Replace first match / all matches.pl.col('c').str.split('-')Split into a list column.pl.col('c').str.len_chars()Character count of each string.
pl.col('d').str.to_date('%Y-%m-%d')★Parse a string column into a Date.pl.col('d').dt.year() / .month() / .day()★Extract date components.pl.col('d').dt.weekday()ISO weekday — 1 = Monday.pl.col('d') + pl.duration(days=1)Date arithmetic withpl.duration().pl.date_range(start, end, interval='1d')Build a range of dates.
pl.col('a').round(2) / .floor() / .ceil()Rounding functions.pl.col('a').abs()Absolute value.pl.col('a').clip(0, 100)Clamp values into a range.pl.col('a') + pl.col('b')★Elementwise arithmetic between expressions.
df.explode('list_col')★One output row per list element.pl.col('list_col').list.len()★Length of each list.pl.col('list_col').list.get(0)Element at index 0 of each list.pl.struct(['a', 'b']).alias('s')Bundle columns into a struct column.df.unnest('s')★Expand a struct column back into flat columns.
df1.join(df2, on='key', how='inner')★SQL-style join on a shared column.df1.join(df2, on='key', how='left')★Keep every row ofdf1.df1.join(df2, left_on='a', right_on='b')Join on differently named key columns.df1.join(df2, on='key', how='anti')★Rows indf1with NO match indf2.df1.join_asof(df2, on='date', by='id')Nearest-match join — common for time series.df1.join(df2, how='cross')Cartesian product.
pl.concat([df1, df2])★Stack rows (default: vertical).pl.concat([df1, df2], how='diagonal')Stack rows, aligning by name, filling gaps with null.df1.hstack(df2)Stack columns side by side, no key needed.df1.vstack(df2)Append rows in place — mutatesdf1.
df.group_by('g').agg(pl.col('x').mean())★Per-group mean of one column.df.group_by('g').agg(pl.col('x').sum().alias('x_sum'))★Multiple named aggregates per group.df.group_by('g').agg(pl.all().sum())Aggregate every remaining column at once.df.group_by(['g1', 'g2']).len()★Row count per group — Polars'.size().df.group_by_dynamic('date', every='1mo').agg(...)Time-bucketed grouping — like pandas'resample.
pl.col('x').sum().over('g')★Group aggregate broadcast back to every row.pl.col('x').rank().over('g')★Per-group ranking.pl.col('x').cum_sum().over('g')★Per-group running total.pl.col('x').shift(1).over('g')Per-group lag.pl.col('x').rolling_mean(window_size=3)Moving average over a fixed window.
df.sort('a')★Ascending sort by one or more columns.df.sort('a', descending=True)★Descending sort.df.sort(['a','b'], descending=[False,True])Mixed sort direction, per column.df.top_k(n, by='a')★Top n rows by a column's value.df.bottom_k(n, by='a')Bottom n rows by a column's value.
df.pivot(index='g', on='k', values='v')★Long → wide (onreplaces pandas'columns=).df.unpivot(index='g')★Wide → long — Polars' modern name formelt.df.transpose()Flip rows and columns.
df.lazy()★Start a query plan instead of running immediately.lazy_df.collect()★Optimize the whole plan, then execute it once.lazy_df.explain()Print the (optimized) query plan.lazy_df.collect(streaming=True)Process larger-than-RAM data in batches.pl.scan_parquet('f.parquet')★Lazy scan — enables predicate/projection pushdown.
pl.Int8/16/32/64 pl.UInt8..64★Signed / unsigned integers.pl.Float32 / pl.Float64Floating-point numeric types.pl.Utf8 (alias pl.String)★Text / string columns.pl.BooleanTrue / False values.pl.Date pl.Datetime pl.Duration★Calendar date, date+time, and time delta.pl.List(inner) / pl.StructNested column types — list / struct.pl.CategoricalEfficient encoding for repeated strings.
'inner'★Only keys present in both DataFrames.'left'★All rows from the left frame, matched where possible.'full'Every key from both sides — Polars' name for outer.'semi'Left rows that have a match — right columns dropped.'anti'★Left rows with no match in the right — great for exclusions.'cross'Cartesian product, no key needed.